Tuesday, February 5, 2008

Spectral Searching on PUBCHEM-Structures

About 2 years ago, a spectral search system based on 16M PUBCHEM structures (approx. 5M unique) has been built and made available on http://nmrpredict.orc.univie.ac.at/identify . It went online during May 2006.

Some background information:
  • C-NMR spectra have been calculated using CSEARCH-NN-technology
  • Spectral search technique is based on SAHO as implemented into CSEARCH
  • The main intention of this system is to get some feeling about the compound class for an unknown. It must be clearly stated, that a database of 5M unqiue structures is definitely too small to cover known organic chemistry (approx. 33M at 02/2007). When taking into account the possible structures for a given molecular formula, 5M structures represent a neglictable part of possible organic chemistry !

In the meantime there were massive updates on PUBCHEM - this was the reason for rerunning the predictions and implementing another (much faster) search technique - the principle is still based on Wolfgang Bremsers SAHO-technique - the speed has been increased to allow searching of 1 billion (10**9) of CNMR-spectra within less than 3 seconds on a single CPU. At the moment the system is only partly installed and allows searching of 405,704,611 spectral patterns (usually in 1.2-1.6 seconds).

Key features:

  • PREDICTED CNMR-spectra for approx. 23M unique structures downloaded from PUBCHEM using CSEARCH-NN-technology
  • Structures deposited from CHEMSPIDER are already included
  • Intention is again to give some flavour of possible compound classes for an unknown

A detailed description of the search-technique will be given soon - stay tuned !

Another nice feature of this system: Whenever an experimental set of NMR-data is available within CSEARCH / SPECINFO / NMRPRedict / NMRShiftDB / CHEMGATE - this information is automatically included into the final resulting table of structures !

Feel free to test it ! The URL is

http://nmrpredict.orc.univie.ac.at/case/propose.php

Your feedback is highly appreciated - use the comment section !

3 comments:

ChemSpiderman said...

Wolfgang..looks very interested. I tried to access it today and obtained "Access denied (unauthorized client IP address: 68.33.151.242)". Help.

Wolfgang Robien said...

Sorry - my fault ! Already corrected - please retry.

Please keep in mind, that the installation is ongoing - only about 30-40% of the complete database behind has been installed. The number of 400M spectral patterns is also outdated - will be upgraded tomorrow, when the next installation job finishes. The 'newer' PUBCHEM structures are still missing.

Wolfgang Robien said...

Some testing on the growing database gives the following specifications:

Average performance: 600,000,000 searches per second
Peak performance: 1,500,000,000 searches per second