Thursday, January 31, 2008

NMRPredict as robot-referee

As well-known within the NMR-community NMRPredict uses CSEARCH-technology for predicting and searching X-nuclei spectra. The databases behind consist of the combined collections of CSEARCH and SPECINFO.

One out of many possible applications of such a program like NMRPredict is the field of structure-verification. An excellent example has been analyzed coming from the debate on 1,7-Diaza[12]annulenes, which have been shown by Manfred Christl to be well-known pyridinium salts. A simple spectral similarity search using NMRPredict - either applied by the authors of the 2 papers (Angew.Chem. & Org.Lett.) or by the referees - would have shown that these spectral data are known since 1980. A detailed analysis including screen dumps can be found on:

http://nmrpredict.orc.univie.ac.at/csearchlite/Annulenes_or_Pyridines.html

Wednesday, January 30, 2008

Prediction of H1-NMR Spectra

The prediction of H1-NMR spectra within MODGRAPH's NMRPredict-program is based on the algorithms developed by Ray Abraham's and Ernö Pretsch's groups. Both techniques have excellent performance on their own, but a combination of these method gives superior results.

I am proud that I could apply the 'Best'-technology, which has been already successfully implemented into the HOSE-code and NN-based prediction engines for C13, to the H1-prediction module giving an average deviation of 0.18ppm on a testset of 90,000 well-assigned proton-spectra provided by Wiley. It was a great pleasure to me to work together with Ernö and Ray on this subject. We all know, that there is space for further improvements - the corresponding concepts are already there and are waiting for implementation and subsequent testing.

For detailed information have a look into:
http://www.modgraph.co.uk/best_proton_press_release.htm

Real-world structure verification examples can be found on the MESTRELAB RESEARCH webpage:
http://www.mestrec.com/recursos.php?idr=54&i18n=1
http://www.mestrec.com/recursos.php?idr=55&i18n=1

Tuesday, January 8, 2008

NMR-Spectral Data and InChIKeys

Within my ongoing project to create a portal for existing NMR-spectral data, another set of some 90,000 structures has been made available to me. They will be processed within the next 2 weeks and my collection of links will be updated accordingly. Afterwards more than 500,000 spectra from some 400,000 different structures will be available. Feel free to generate requests automatically using utilities like 'curl' or 'wget'. Please be so kind and restrict requests to less than 100 per day !

Sunday, January 6, 2008

Could someone explain to me

I have found an article about "Open Data in Science" written by Peter Murray-Rust (Article can be downloaded from http://www.dspace.cam.ac.uk/handle/1810/194890 ), where 'OSCAR-3' (a tool for extracting data from the chemical literature) in the context of C-NMR spectroscopy has been mentioned.

I am now surprised, that the NMRShiftDB-collection ( http://nmrshiftdb.org/ ) increased only by 8 structures within 7 weeks ( from Nov 18th, 2007 to Jan 6th, 2008 ), when OSCAR-3 is around, which allows automatic extraction of NMR-data from articles ?! For legal reasons only the automatic extraction of data from OA-journals seems to be possible, which reduces the number of available data. Therefore I simply want to see ONE, SINGLE FULLY ASSIGNED C-NMR spectrum. which has been AUTOMATICALLY EXTRACTED by OSCAR-3 from the chemical literature.

A corresponding question has been deposited at Peter Murray-Rust's Weblog - I hope to get an answer. Check back, I'll keep you up-to-date.

My questions can be found on
http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=916#comments

Just for your comparison:
The increase of spectra within CSEARCH can be found here - without OSCAR-3 support ;-))

CSEARCH Data and InChIKeys

InChIKeys are an excellent tool for identical structure searches on the web using the usual search-engines like google, yahoo, msn, etc. The architecture of the InChIKeys allows searching for two-dimensional topologies when using only the first 14 characters, whereas using the full InChIKey allows the inclusion of all additional features like stereochemistry.

Within the CSEARCH-environment all structures have been converted into InChIKeys and together with my data-exchange protocols dating back into the late '80s a collection of links has been built. Each page summarizes for a specific two-dimensional molecular topology all systems, where the corresponding C13/O17/N15/F19/P31/B11/Si29-spectrum is available.

The following systems have indexed:

CSEARCH including upcoming data
SPECINFO
CHEMGATE
NMRPredict
NMRPredict ONLINE
KnowItAll
KnowItAll Anywhere
NMRShiftDB
University of Mainz, In-house database

A total number of nearly half a million spectra from approx. 350,000 different structures has been indexed in a systematic way. The pages have already been crawled by the most important search engines.

How to make use of this 'portal of existing NMR-spectral information' ?

Generate the InchiKey for your query structure, e.g. this is "ABCDEFGHIJKLMN-ABCDEFGHIJ"; now take the first 14 characters (before the hyphen!) and construct the follwoing URL:

http://nmrpredict.orc.univie.ac.at/inchikey/ABCDEFGHIJKLMN.html

Request the corresponding page; in case this particular 'structure family' (=two-dimensional molecular topology) has C/O/N/F/P/B/Si NMR-data available in one of the above listed systems, you will get a list of links to them. In case there is no NMR-spectral information available, your http-request will be answered by a 'Page not found (404)' - error.

Saturday, January 5, 2008

Stereochemistry

Most of you are aware of the fact, that stereochemical information has been implemented into the CSEARCH-data during the early 1990's. Since 1996 stereochemical information can be utilized during spectrum prediction and different types of searches.

Within our 2-step process of integrating new data, the second step consists mainly of adding stereochemical information to each record. During the past few days, stereochemical information has been added to some 2,000 records within CSEARCH. A detailed summary of the data-upgrades can be found on http://nmrpredict.orc.univie.ac.at/csearchlite/update.htm

CSEARCH NMR-Database

This blog will inform you about new developments within my CSEARCH-project.

1) Information about new algorithms and spectral search techniques
2) Information about new data available
3) Information about new activities on my web-server 'nmrpredict.orc.univie.ac.at'

Your feedback is highly appreciated ! Hopefully a fruitful discussion can be started here !