This project aims to develop new statistical machine learning methods for metabolomics data from diverse platforms, including targeted and unbiased/global mass spectrometry (MS), labeled MS experiments for measuring metabolic ?ux and Nuclear Magnetic Resonance (NMR) platforms. Unbiased MS and NMR pro?ling studies result in identifying a large number of unnamed spectra, which cannot be directly matched to known metabolites and are hence often discarded in downstream analyses. The ?rst aim develops a novel kernel penalized regression method for analysis of data from unbiased pro?ling studies. It provides a systematic framework for extracting the relevant information from unnamed spectra through a kernel that highlights the similarities and differences between samples, and in turn boosts the signal from named metabolites. This results in improved power in identi?cation of named metabolites associated with the phenotype of interest, as well as improved prediction accuracy. An extension of this kernel-based framework is also proposed to allow for systematic integration of metabolomics data from diverse pro?ling studies, e.g. targeted and unbiased MS pro?ling technologies.
The second aim pro- vides a formal inference framework for kernel penalized regression and thus complements the discovery phase of the ?rst aim.
The third aim focuses on metabolic pathway enrichment analysis that tests both orchestrated changes in activities of steady state metabolites in a given pathway, as well as aberrations in the mechanisms of metabolic reactions.
The fourth aim of the project provides a uni?ed framework for network-based integrative analysis of static (based on mass spectrometry) and dynamic (based on metabolic ?ux) metabolomics measurements, thus providing an integrated view of the metabolome and the ?uxome. Finally, the last aim implements the pro- posed methods in easy-to-use open-source software leveraging the R language, the capabilities of the Cytoscape platform and the Galaxy work?ow system, thus providing an expandable platform for further developments in the area of metabolomics. The proposed software tool will also provide a plug-in to the Data Repository and Coordination Center (DRCC) data sets, where all regional metabolomics centers supported by the NIH Common Funds Metabolomics Program deposit curated data.
Metabolomics, i.e. the study of small molecules involved in metabolism, provides a dynamic view into processes that re?ect the actual physiology of the cell, and hence offers vast potential for detection of novel biomarkers and targeted therapies for complex diseases. However, despite this potential, the development of computational methods for analysis of metabolomics data lags the rapid growth of metabolomics pro?ling technologies. The current application addresses this need by developing novel statistical machine learning methods for integrative analysis of static and dynamic metabolomics measurements, as well as easy-to-use open-source software to facilitate the application of these methods.
Showing the most recent 10 out of 17 publications