EMI

Empirical Metabolite Identification via GA Feature Selection and Bayes Classification
The interpretation of nuclear magnetic resonance (NMR) experimental results for metabolomics studies requires intensive signal processing and multivariate data analysis techniques. A critical process in the typical work ow is the identification of significant metabolites, typically compiled post hoc. Current techniques rely on manual tuning and are built on databases of pure compound samples, where the experimental conditions are simulated in the laboratory. Herein, we develop a novel metabolite identification algorithm utilizing a Bayes classifier with GA feature selection that is built upon empirical spectroscopic data. This captures the inherent variability in experimental data, while greatly reducing the need to build databases of pure compounds. The ability to annotate spectra by learning patterns within empirical data allows the metabolomics community to utilize existing datasets to improve and extend our method. The feasibility and accuracy of our algorithm is shown by measuring the specificity (>0.75) and sensitivity (>0.65) on 1H urine derived spectroscopic data. A genetic algorithm is used to successfully remove more than 60% of the features without sacrificing accuracy, which is necessary to reduce redundant and remove irrelevant data in the empirical dataset. This increase in efficiency is critical to extending and improving a community annotated identification database.

Interested in contributing to this project?
This is an open source project hosted on GitHub. We welcome additional collaboration and contributions. The GitHub page is available here. The source code is rapidly being updated, so please check back regularly.