Wednesday, December 3, 2008
In the new Symyx Draw 3.1, which can be dowloaded for free for non-commercial purpose from their website, there is a nice feature included, allowing WEB-searches directly from your drawing using the InChIKey.
In the menu 'Chemistry --> Inchi settings' you have a line named 'WebBrowserURL:'
the default value here is: http://www.google.com/search?q=%INCHI%
which performs a search using 'Google' as search engine for the variable '%INCHI%, which will be substituted by your actual string calculated from your drawing.
This command performs a search over all indexed webpages having this particular string, if you are more specifically interested in available NMR-data for your drawn molecule, simply replace the line by the following:
http://www.google.com/search?q=site:nmrpredict.orc.univie.ac.at+%INCHI%
This command performs a site-specific search over all my indexed INCHIKEY-pages, which link to the NMR-data ( follow the links to the supplier OR recall the data directly by clicking the 'WEBCSEARCH'-Logo - the NMRShiftDB data are already online, the other data are under installation).
Technical hint: I strongly recommend to click a 'Edit --> Select all' in order to force Symyx Draw to calculate the strings from your complete molecule - when you select only a part, only this part will be used to calculate the descriptors
Hope you enjoy ! Your feedback is highly appreciated !
Friday, May 2, 2008
Misassigned NMR-Spectra
This list has been generated by searching the string 'reassign' within the REMARK-field. Every data modification is documented by 'Operator/Date/Reason' within the remark. This long list holds 2,388 literature citations with more than 6,000 spectra, which have been reassigned either during data-input and/or during our correction cycles. For details see
http://nmrpredict.orc.univie.ac.at/csearchlite/NMR_misinterpretation.html
In the case of NMR-spectroscopy, data-curation is extremely important, because many assignments are based on comparison with reference material ..... imagine, what happens, when your reference material is wrong ? Your assignment is wrong too, but having now better statistical parameters ! Great, isnt it ?
Thursday, April 17, 2008
Spectral Similarity Search Engine using PUBCHEM-Structures
Feel free to test it: http://nmrpredict.orc.univie.ac.at/case/propose.php
Technical description: http://nmrpredict.orc.univie.ac.at/csearch_summary/strpro.html
The description of a C-NMR spectrum using a 15-character 'Spectral Hashkey' as developed for this project allows indexing of spectral patterns similar to InChIKeys (for structures). Therefore spectral patterns can be searched using the usual text-based search-engines like google. Using this technique it is possible to put the correlation between a spectral pattern and its associated chemical structure on the web AND to search for it - you might be surprised how many authors 'sell' the same spectrum with different structural proposals to the community WITHOUT referencing their wrong original proposal ! With this technique we have a tool in hand, which allows us to answer the question 'Has somebody already seen this spectrum and which structure has been associated to it ?' On the other hand this technique can be used to get structure proposals to a query peaktable.
Saturday, March 29, 2008
Revision of Assignment
The statement 'This was possible, because the data are open' is definitely wrong - within a more professional system such a wrong entry would never be able to step from the 'purgatory database' into the 'production database.' The detailed analysis can be found on the webpage given above.
Tuesday, March 25, 2008
My New Office
Step (at least virtually) by and enjoy !
http://nmrpredict.orc.univie.ac.at/csearchlite/Home_of_CSEARCH.html
Your comments are welcome !
Tuesday, March 4, 2008
Basic Misinterpretations of NMR-Data
At the moment 2 examples are online - I promise 'More to come' ! - Stay tuned, check back !
http://nmrpredict.orc.univie.ac.at/csearchlite/NMR_misinterpretation.html
In order to do a serious job I have to cite every paper in error I find during my daily work - BUT I dont want to blame somebody personally. On the other hand I think its necessary to analyze the quality of available NMR-data, because this is the basis for solving future structure elucidation problems ! Keep in mind, what is necessary to perform this task: State-of-the-art algorithms for automatic data-checking with an underlying database of highly verified spectra AND the largest CNMR-database available (despite its size of more than half a million C-spectra it is still incomplete)
Wednesday, February 27, 2008
Proton Prediction
http://www.spectroscopyeurope.com/TD_20_1.pdf
A few more links summarizing where this new development has been already integrated, can be found on
http://nmrpredict.orc.univie.ac.at/
Saturday, February 9, 2008
InChIKey Resolver
At the moment approx. 33 millions of organics are known. Chemspider holds approx. 21M, PUBCHEM-Compounds approx. 18M structures, which represents 2/3 of known chemistry. I know that within CHEMSPIDER structure correction is an ongoing process as it is e.g. within my own CSEARCH-project. NMRshiftdb has been severly improved over the last months, etc.
Tuesday, February 5, 2008
Spectral Searching on PUBCHEM-Structures
Some background information:
- C-NMR spectra have been calculated using CSEARCH-NN-technology
- Spectral search technique is based on SAHO as implemented into CSEARCH
- The main intention of this system is to get some feeling about the compound class for an unknown. It must be clearly stated, that a database of 5M unqiue structures is definitely too small to cover known organic chemistry (approx. 33M at 02/2007). When taking into account the possible structures for a given molecular formula, 5M structures represent a neglictable part of possible organic chemistry !
In the meantime there were massive updates on PUBCHEM - this was the reason for rerunning the predictions and implementing another (much faster) search technique - the principle is still based on Wolfgang Bremsers SAHO-technique - the speed has been increased to allow searching of 1 billion (10**9) of CNMR-spectra within less than 3 seconds on a single CPU. At the moment the system is only partly installed and allows searching of 405,704,611 spectral patterns (usually in 1.2-1.6 seconds).
Key features:
- PREDICTED CNMR-spectra for approx. 23M unique structures downloaded from PUBCHEM using CSEARCH-NN-technology
- Structures deposited from CHEMSPIDER are already included
- Intention is again to give some flavour of possible compound classes for an unknown
A detailed description of the search-technique will be given soon - stay tuned !
Another nice feature of this system: Whenever an experimental set of NMR-data is available within CSEARCH / SPECINFO / NMRPRedict / NMRShiftDB / CHEMGATE - this information is automatically included into the final resulting table of structures !
Feel free to test it ! The URL is
http://nmrpredict.orc.univie.ac.at/case/propose.php
Your feedback is highly appreciated - use the comment section !
Thursday, January 31, 2008
NMRPredict as robot-referee
One out of many possible applications of such a program like NMRPredict is the field of structure-verification. An excellent example has been analyzed coming from the debate on 1,7-Diaza[12]annulenes, which have been shown by Manfred Christl to be well-known pyridinium salts. A simple spectral similarity search using NMRPredict - either applied by the authors of the 2 papers (Angew.Chem. & Org.Lett.) or by the referees - would have shown that these spectral data are known since 1980. A detailed analysis including screen dumps can be found on:
http://nmrpredict.orc.univie.ac.at/csearchlite/Annulenes_or_Pyridines.html
Wednesday, January 30, 2008
Prediction of H1-NMR Spectra
I am proud that I could apply the 'Best'-technology, which has been already successfully implemented into the HOSE-code and NN-based prediction engines for C13, to the H1-prediction module giving an average deviation of 0.18ppm on a testset of 90,000 well-assigned proton-spectra provided by Wiley. It was a great pleasure to me to work together with Ernö and Ray on this subject. We all know, that there is space for further improvements - the corresponding concepts are already there and are waiting for implementation and subsequent testing.
For detailed information have a look into:
http://www.modgraph.co.uk/best_proton_press_release.htm
Real-world structure verification examples can be found on the MESTRELAB RESEARCH webpage:
http://www.mestrec.com/recursos.php?idr=54&i18n=1
http://www.mestrec.com/recursos.php?idr=55&i18n=1
Tuesday, January 8, 2008
NMR-Spectral Data and InChIKeys
Sunday, January 6, 2008
Could someone explain to me
I am now surprised, that the NMRShiftDB-collection ( http://nmrshiftdb.org/ ) increased only by 8 structures within 7 weeks ( from Nov 18th, 2007 to Jan 6th, 2008 ), when OSCAR-3 is around, which allows automatic extraction of NMR-data from articles ?! For legal reasons only the automatic extraction of data from OA-journals seems to be possible, which reduces the number of available data. Therefore I simply want to see ONE, SINGLE FULLY ASSIGNED C-NMR spectrum. which has been AUTOMATICALLY EXTRACTED by OSCAR-3 from the chemical literature.
A corresponding question has been deposited at Peter Murray-Rust's Weblog - I hope to get an answer. Check back, I'll keep you up-to-date.
My questions can be found on
http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=916#comments
Just for your comparison:
The increase of spectra within CSEARCH can be found here - without OSCAR-3 support ;-))
CSEARCH Data and InChIKeys
Within the CSEARCH-environment all structures have been converted into InChIKeys and together with my data-exchange protocols dating back into the late '80s a collection of links has been built. Each page summarizes for a specific two-dimensional molecular topology all systems, where the corresponding C13/O17/N15/F19/P31/B11/Si29-spectrum is available.
The following systems have indexed:
CSEARCH including upcoming data
SPECINFO
CHEMGATE
NMRPredict
NMRPredict ONLINE
KnowItAll
KnowItAll Anywhere
NMRShiftDB
University of Mainz, In-house database
A total number of nearly half a million spectra from approx. 350,000 different structures has been indexed in a systematic way. The pages have already been crawled by the most important search engines.
How to make use of this 'portal of existing NMR-spectral information' ?
Generate the InchiKey for your query structure, e.g. this is "ABCDEFGHIJKLMN-ABCDEFGHIJ"; now take the first 14 characters (before the hyphen!) and construct the follwoing URL:
http://nmrpredict.orc.univie.ac.at/inchikey/ABCDEFGHIJKLMN.html
Request the corresponding page; in case this particular 'structure family' (=two-dimensional molecular topology) has C/O/N/F/P/B/Si NMR-data available in one of the above listed systems, you will get a list of links to them. In case there is no NMR-spectral information available, your http-request will be answered by a 'Page not found (404)' - error.
Saturday, January 5, 2008
Stereochemistry
Within our 2-step process of integrating new data, the second step consists mainly of adding stereochemical information to each record. During the past few days, stereochemical information has been added to some 2,000 records within CSEARCH. A detailed summary of the data-upgrades can be found on http://nmrpredict.orc.univie.ac.at/csearchlite/update.htm
CSEARCH NMR-Database
1) Information about new algorithms and spectral search techniques
2) Information about new data available
3) Information about new activities on my web-server 'nmrpredict.orc.univie.ac.at'
Your feedback is highly appreciated ! Hopefully a fruitful discussion can be started here !