Thursday, April 17, 2008

Spectral Similarity Search Engine using PUBCHEM-Structures

The final version of my 'Spectral Similarity Search Engine' is now online. It holds approx. 3 billions of structure-spectra pairs derived from 70,000,000 of PUBCHEM-Structures & Compounds (including the Chemspider structures). The average search time is somewhere between 1 and 2 seconds for 3 billions of structure-spectra pairs - not that bad for approx. 250GB of data !

Feel free to test it:

Technical description:

The description of a C-NMR spectrum using a 15-character 'Spectral Hashkey' as developed for this project allows indexing of spectral patterns similar to InChIKeys (for structures). Therefore spectral patterns can be searched using the usual text-based search-engines like google. Using this technique it is possible to put the correlation between a spectral pattern and its associated chemical structure on the web AND to search for it - you might be surprised how many authors 'sell' the same spectrum with different structural proposals to the community WITHOUT referencing their wrong original proposal ! With this technique we have a tool in hand, which allows us to answer the question 'Has somebody already seen this spectrum and which structure has been associated to it ?' On the other hand this technique can be used to get structure proposals to a query peaktable.

Feel free to test and post your feedback here !