Thursday, April 17, 2008

Spectral Similarity Search Engine using PUBCHEM-Structures

The final version of my 'Spectral Similarity Search Engine' is now online. It holds approx. 3 billions of structure-spectra pairs derived from 70,000,000 of PUBCHEM-Structures & Compounds (including the Chemspider structures). The average search time is somewhere between 1 and 2 seconds for 3 billions of structure-spectra pairs - not that bad for approx. 250GB of data !

Feel free to test it: http://nmrpredict.orc.univie.ac.at/case/propose.php

Technical description: http://nmrpredict.orc.univie.ac.at/csearch_summary/strpro.html

The description of a C-NMR spectrum using a 15-character 'Spectral Hashkey' as developed for this project allows indexing of spectral patterns similar to InChIKeys (for structures). Therefore spectral patterns can be searched using the usual text-based search-engines like google. Using this technique it is possible to put the correlation between a spectral pattern and its associated chemical structure on the web AND to search for it - you might be surprised how many authors 'sell' the same spectrum with different structural proposals to the community WITHOUT referencing their wrong original proposal ! With this technique we have a tool in hand, which allows us to answer the question 'Has somebody already seen this spectrum and which structure has been associated to it ?' On the other hand this technique can be used to get structure proposals to a query peaktable.

Feel free to test and post your feedback here !

4 comments:

Anonymous said...

good, good! keep it up! XD

Anonymous said...

Promising approach and VERY fast ! (hko)

Wolfgang Robien said...

(1) There was an interesting remark about utilization of multiplicity information. For details please see:

http://depth-first.com
see article about 'pherobase' and the comment-section

(2) There were about 150 uses of this search-engine during the last few days since my announcement, but only 4 feedbacks ( 2 comments here, 1 comment on depth-first, 1 personal email ) - hope to get more feedback.

Anonymous said...

interesting.
How accurate Input? one decimal?
Different solvents?
Keep the good work!