This project aims to develop new statistical machine learning methods for metabolomics data from diverse platforms, including targeted and unbiased/global mass spectrometry (MS), labeled MS experiments for measuring metabolic ?ux and Nuclear Magnetic Resonance (NMR) platforms. Unbiased MS and NMR pro?ling studies result in identifying a large number of unnamed spectra, which cannot be directly matched to known metabolites and are hence often discarded in downstream analyses. The ?rst aim develops a novel kernel penalized regression method for analysis of data from unbiased pro?ling studies. It provides a systematic framework for extracting the relevant information from unnamed spectra through a kernel that highlights the similarities and differences between samples, and in turn boosts the signal from named metabolites. This results in improved power in identi?cation of named metabolites associated with the phenotype of interest, as well as improved prediction accuracy. An extension of this kernel-based framework is also proposed to allow for systematic integration of metabolomics data from diverse pro?ling studies, e.g. targeted and unbiased MS pro?ling technologies.
The second aim pro- vides a formal inference framework for kernel penalized regression and thus complements the discovery phase of the ?rst aim.
The third aim focuses on metabolic pathway enrichment analysis that tests both orchestrated changes in activities of steady state metabolites in a given pathway, as well as aberrations in the mechanisms of metabolic reactions.
The fourth aim of the project provides a uni?ed framework for network-based integrative analysis of static (based on mass spectrometry) and dynamic (based on metabolic ?ux) metabolomics measurements, thus providing an integrated view of the metabolome and the ?uxome. Finally, the last aim implements the pro- posed methods in easy-to-use open-source software leveraging the R language, the capabilities of the Cytoscape platform and the Galaxy work?ow system, thus providing an expandable platform for further developments in the area of metabolomics. The proposed software tool will also provide a plug-in to the Data Repository and Coordination Center (DRCC) data sets, where all regional metabolomics centers supported by the NIH Common Funds Metabolomics Program deposit curated data.

Public Health Relevance

Metabolomics, i.e. the study of small molecules involved in metabolism, provides a dynamic view into processes that re?ect the actual physiology of the cell, and hence offers vast potential for detection of novel biomarkers and targeted therapies for complex diseases. However, despite this potential, the development of computational methods for analysis of metabolomics data lags the rapid growth of metabolomics pro?ling technologies. The current application addresses this need by developing novel statistical machine learning methods for integrative analysis of static and dynamic metabolomics measurements, as well as easy-to-use open-source software to facilitate the application of these methods.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Mathur, Ravi; Rotroff, Daniel; Ma, Jun et al. (2018) Gene set analysis methods: a systematic comparison. BioData Min 11:8
Randolph, Timothy W; Zhao, Sen; Copeland, Wade et al. (2018) KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA. Ann Appl Stat 12:540-566
Randolph, Timothy W; Ding, Jimin; Kundu, Madan G et al. (2017) Adaptive penalties for generalized Tikhonov regularization in statistical regression models with application to spectroscopy data. J Chemom 31:
Wang, Xiaoliang; Shojaie, Ali; Zhang, Yuzheng et al. (2017) Exploratory plasma proteomic analysis in a randomized crossover trial of aspirin among healthy men and women. PLoS One 12:e0178444
Chen, Shizhe; Witten, Daniela; Shojaie, Ali (2017) Nearly assumptionless screening for the mutually-exciting multivariate Hawkes process. Electron J Stat 11:1207-1234
Basu, Sumanta; Duren, William; Evans, Charles R et al. (2017) Sparse network modeling and metscape-based visualization methods for the analysis of large-scale metabolomics data. Bioinformatics 33:1545-1553
Miles, Fayth L; Navarro, Sandi L; Schwarz, Yvonne et al. (2017) Plasma metabolite abundances are associated with urinary enterolactone excretion in healthy participants on controlled diets. Food Funct 8:3209-3218
Chen, Shizhe; Shojaie, Ali; Witten, Daniela M (2017) Network Reconstruction From High-Dimensional Ordinary Differential Equations. J Am Stat Assoc 112:1697-1707
Seshadri, Chetan; Sedaghat, Nafiseh; Campo, Monica et al. (2017) Transcriptional networks are associated with resistance to Mycobacterium tuberculosis infection. PLoS One 12:e0175844
Zhao, Sen; Shojaie, Ali (2016) A significance test for graph-constrained estimation. Biometrics 72:484-93

Showing the most recent 10 out of 17 publications