CSEARCH NMR-Database

Thursday, April 8, 2010

CNMR-Predictions free of charge

Chemspider has launched C13-NMR predictions using the NMRSHIFTDB-database system. An (again critical) evaluation can be found at

http://nmrpredict.orc.univie.ac.at/chemspider_nmrshiftdb.html

I strongly recommend to take a textbook on Carbon-NMR and to rerun the examples I have shown on my webpage - otherwise you won't believe !

Before somebody starts a 'blog-war', please convince me that 1000's of literature citations claiming that a terminal CH3 in an alkyl-chain resonates at 14ppm, are wrong - I will immediately publish here an "Erratum".

In order to make the superior CSEARCH-technology available free-of-charge an interface can be found at

http://nmrpredict.orc.univie.ac.at/c13robot/robot.php

Every request is processed twice for your infomation - give it a try and enjoy yourself !

Stay tuned - more to come ! Wolfgang Robien

Saturday, May 9, 2009

Quality Analysis of Journals with respect to their C13-NMR Data

I have analyzed the combined CSEARCH and SPECINFO - collections, which are commercially available via http://chemgate.emolecules.com and as NMRPREDICT via http://www.modgraph.co.uk/ , with respect to the quality of the underlying data and their origin in a large variety of journals.

The result of this investigation has been reviewed in 'Trends in Analytical Chemistry' and is available here:

http://dx.doi.org/10.1016/j.trac.2009.03.012

I hope that this paper promotes the integration of spectral similarity searches and spectrum prediction engines during the peer-reviewing process. The largest NMR-database available is the combined CSEARCH+SPECINFO collection named NMRPredict (available from MODGRAPH, see http://www.modgraph.co.uk/ ) holding some 450,000 spectra at the moment with the expectation of massive upgrades in the near future. Details about the ongoing data-extraction process can be found on:

http://nmrpredict.orc.univie.ac.at/csearchlite/update.htm

The integration of the CSEARCH-based prediction engines into data processing programs is also possible - the available DLL fits into MESTRENOVA ( http://www.mestrec.com/ ) and into Bruker's TopSpin-program ( http://www.bruker-biospin.com/topspin.html ).

Stay tuned - more to come ! Despite my 'impressive office', which can visited on http://nmrpredict.orc.univie.ac.at/csearchlite/Home_of_CSEARCH.html ;-))

Wolfgang Robien

Wednesday, December 3, 2008

SymyxDraw 3.1 and InChIKeys:

In the new Symyx Draw 3.1, which can be dowloaded for free for non-commercial purpose from their website, there is a nice feature included, allowing WEB-searches directly from your drawing using the InChIKey.

In the menu 'Chemistry --> Inchi settings' you have a line named 'WebBrowserURL:'
the default value here is: http://www.google.com/search?q=%INCHI%
which performs a search using 'Google' as search engine for the variable '%INCHI%, which will be substituted by your actual string calculated from your drawing.
This command performs a search over all indexed webpages having this particular string, if you are more specifically interested in available NMR-data for your drawn molecule, simply replace the line by the following:

http://www.google.com/search?q=site:nmrpredict.orc.univie.ac.at+%INCHI%

This command performs a site-specific search over all my indexed INCHIKEY-pages, which link to the NMR-data ( follow the links to the supplier OR recall the data directly by clicking the 'WEBCSEARCH'-Logo - the NMRShiftDB data are already online, the other data are under installation).

Technical hint: I strongly recommend to click a 'Edit --> Select all' in order to force Symyx Draw to calculate the strings from your complete molecule - when you select only a part, only this part will be used to calculate the descriptors

Hope you enjoy ! Your feedback is highly appreciated !

Friday, May 2, 2008

Misassigned NMR-Spectra

Today I have put my list of literature citations online, which are suspected to have at least one assignment error within their NMR-data.
This list has been generated by searching the string 'reassign' within the REMARK-field. Every data modification is documented by 'Operator/Date/Reason' within the remark. This long list holds 2,388 literature citations with more than 6,000 spectra, which have been reassigned either during data-input and/or during our correction cycles. For details see
http://nmrpredict.orc.univie.ac.at/csearchlite/NMR_misinterpretation.html

In the case of NMR-spectroscopy, data-curation is extremely important, because many assignments are based on comparison with reference material ..... imagine, what happens, when your reference material is wrong ? Your assignment is wrong too, but having now better statistical parameters ! Great, isnt it ?

Thursday, April 17, 2008

Spectral Similarity Search Engine using PUBCHEM-Structures

The final version of my 'Spectral Similarity Search Engine' is now online. It holds approx. 3 billions of structure-spectra pairs derived from 70,000,000 of PUBCHEM-Structures & Compounds (including the Chemspider structures). The average search time is somewhere between 1 and 2 seconds for 3 billions of structure-spectra pairs - not that bad for approx. 250GB of data !

Feel free to test it: http://nmrpredict.orc.univie.ac.at/case/propose.php

Technical description: http://nmrpredict.orc.univie.ac.at/csearch_summary/strpro.html

The description of a C-NMR spectrum using a 15-character 'Spectral Hashkey' as developed for this project allows indexing of spectral patterns similar to InChIKeys (for structures). Therefore spectral patterns can be searched using the usual text-based search-engines like google. Using this technique it is possible to put the correlation between a spectral pattern and its associated chemical structure on the web AND to search for it - you might be surprised how many authors 'sell' the same spectrum with different structural proposals to the community WITHOUT referencing their wrong original proposal ! With this technique we have a tool in hand, which allows us to answer the question 'Has somebody already seen this spectrum and which structure has been associated to it ?' On the other hand this technique can be used to get structure proposals to a query peaktable.

Feel free to test and post your feedback here !

Saturday, March 29, 2008

Revision of Assignment

On my webpage http://nmrpredict.orc.univie.ac.at/csearchlite/NMRSHIFTDB_March_2008.html I have proposed a reassignment of 10 signals (out of 16 !) solely based on spectrum prediction using CSEARCH despite the original assignment has been done by means of HH-COSY, HMQC and HMBC. I am glad that this proposals has been fully integrated into NMRShiftDB, obviously after extensive verification by another program according to the protocols of my web-server.

The statement 'This was possible, because the data are open' is definitely wrong - within a more professional system such a wrong entry would never be able to step from the 'purgatory database' into the 'production database.' The detailed analysis can be found on the webpage given above.

Tuesday, March 25, 2008

My New Office

About 2 weeks ago I had to move into my new office, which is now for approximately one year the "Home of CSEARCH".

Step (at least virtually) by and enjoy !

http://nmrpredict.orc.univie.ac.at/csearchlite/Home_of_CSEARCH.html

Your comments are welcome !

Tuesday, March 4, 2008

Basic Misinterpretations of NMR-Data

I have created a series of pages on my webserver dealing with misinterpretations, typos and any other type of errors within NMR-data. I am definitely not talking about errors below 10ppm - I am talking about errors, which can be easily detected by application of appropriate computer algorithms using a few seconds of CPU-time.

At the moment 2 examples are online - I promise 'More to come' ! - Stay tuned, check back !

http://nmrpredict.orc.univie.ac.at/csearchlite/NMR_misinterpretation.html

In order to do a serious job I have to cite every paper in error I find during my daily work - BUT I dont want to blame somebody personally. On the other hand I think its necessary to analyze the quality of available NMR-data, because this is the basis for solving future structure elucidation problems ! Keep in mind, what is necessary to perform this task: State-of-the-art algorithms for automatic data-checking with an underlying database of highly verified spectra AND the largest CNMR-database available (despite its size of more than half a million C-spectra it is still incomplete)

Wednesday, February 27, 2008

Proton Prediction

A nice article on proton prediction can be found in the latest issue of Spectroscopy Europe

http://www.spectroscopyeurope.com/TD_20_1.pdf

A few more links summarizing where this new development has been already integrated, can be found on

http://nmrpredict.orc.univie.ac.at/

Saturday, February 9, 2008

InChIKey Resolver

Tony Williams ('Chemspiderman') posted an interesting article on his weblog at http://www.chemspider.com/blog/we-need-an-inchikey-resolver-and-we-need-it-now.html

dealing with the 'translation' of an InChIKey back to a structural diagram via a 'lookup-service'.

I like Tony's idea of an Inchikey-resolver and I would like to support it. The only questions/remarks I have, deals with the efficiency of such a process in our world of 'parallel systems'.

A few facts first:

At the moment approx. 33 millions of organics are known. Chemspider holds approx. 21M, PUBCHEM-Compounds approx. 18M structures, which represents 2/3 of known chemistry. I know that within CHEMSPIDER structure correction is an ongoing process as it is e.g. within my own CSEARCH-project. NMRshiftdb has been severly improved over the last months, etc.

Now all these systems exchange structures ..... 'A' gets 10,000 structures from 'B', 'A' does some corrections and gives its structures to 'C'. 'B' doesnt know A's corrections and gives also its structures to 'C'. Now 'C' has 2 "versions" of the same structure - in principle you can ignore that for an Inchikey-resolver, but the situation is much more complicated, because CHEMSPIDER, PUBCHEM, etc. have dozens of contributors.

I have definitely understanding of data-curation and I know that data-curation is sometimes a work like 'Sherlock Holmes' has done, because experimental parts of publications (and especially NMR-assignments) tends to be cryptic. We have a lot of systems in parallel, everybody doing his/her job seriously, spends a lot of time on data-curation. What we need is not 10, 20, maybe 100 structure repositories - each of it is incomplete (see above). What we really need is ONE SINGLE STRUCTURE REPOSITORY ( we live on only ONE PLANET ! ) - now I also put my kevlar vest on and put 300 feet landmines around my house - we have it, its CAS ! Sorry to say so, but this is the most complete one. When you are interested in a specific structure and you dont find it in Chemspider, emolecules, Pubchem, etc. - what does this answer tell you. It simply tells you, it is not stored - it DOES NOT TELL you, that IT DOES NOT EXIST ! I am quite sure I will be (hopefully only virtually) beaten by the community for this statement, but please keep in mind the relationship between 'new things' (algorithm, data, new procedures, etc.) and 'data-curation' when hosting a large database. The 'curation-effort' doesnt linearly increase with the size of the database - its at least a quadratic relationship.

What we need is ONE, CENTRALIZED place for structures and 'retrieval functionality' (including this inchikey-resolver), which covers the COMPLETE KNOWN CHEMISTRY and NOT hundreds of incomplete and severly overlapping installations. Let me know, when I can put off my kevlar vest ;-))

An example in order to convince that this (highly desirable) curation-process leads to a lot of confusion:

Globostellatic acid F: was drawn with C-O-O-H (hydroperoxide) instead of a carboxyl group in NMRSHIFTDB -> the data went to PUBCHEM ( CID: 15938977 / original NMRSHIFTDB-number was 22047)

Within NMRShiftDB this entry has been corrected: NMRSHIFTDB-number 20093989 and went again to PUBCHEM: CID=11526176

Do a search on PUBCHEM for the name 'globostellatic' - you end up with 2 'globostellatic acid F' structures, one is correct, the other is a hydroperoxide instead of an acid. Its simply applied error-propagation ...... like in school, when you put your eyes into your neighbors work. When you copy it perfectly, you are consistent, but your examination might also be completely wrong, when your neighbor failes. In chemistry we have a more technical term - its called 'citation'.