Increasingly available metabolomics data enable greater understanding of metabolite changes in response to physiological or disease processes. Recent developments have proven metabolomics to be a valuable technology for significantly advancing medical research by accelerating the translation of knowledge from bench to bedside. However, the effective use of these data requires expertise from both metabolomics and statistics due to a series of data pre-processing steps prior to statistical analysis, such as data conversion, data scaling, data normalization, peak alignment and metabolites annotation, among many others. Despite the promise of metabolomics in the clinic, there are well documented challenges that limit the full potential of metabolomics, such as identification of metabolite biomarkers, validation of metabolite biomarkers, and metabolites-based disease predication or progression. These barriers have significantly hampered the application of metabolomics to clinical and translational research. To overcome these challenges, our team proposes to develop a series of multivariate statistical methods that are specifically designed for metabolomics data analysis. More specifically, instead of investigating one metabolite at a time, a group of biologically related metabolites will be modeled simultaneously. Meanwhile, other clinical covariates (such as gender, age, BMI, etc.) will be evaluated for their effects on the metabolites. The proposed project has three main goals: (1) introduce the new idea of using a group of metabolites as potential biomarkers for diseases. By incorporating the biological knowledge in grouping correlated metabolites, we propose to employ the seemingly unrelated regression model to investigate the relationship between a group of metabolites and disease status while adjusting the effects of other clinical covariates. (2) Construct metabolic networks to better understand their systematic perturbations accompanied by human diseases, where the network can serve as more robust biomarkers for disease diagnostics, and (3) advocate the disease prediction by the combination of metabolite profiles, clinical covariates, as well as their interactions. A direct modeling approach, generalized orthogonal components regression, is proposed to handle the large number of metabolites compared to the relatively small number of individuals. The utility of the methods will be evaluated extensively by simulation studies, and real data collected from different diseases including publically available as well as in-house data from our ongoing cancer care engineering project. With all these data, the methods will be compared to the most popular method, partial least squares discriminant analysis. The proposed statistical methods will be made freely available to the research community through GitHub, cceHUB, Metabolomics Consortium Data Repository, and the Metabolomics Workbench. The project is directly responsive to RFA-RM-15-021 because it will foster close collaboration between metabolomics experts and biostatisticians, produce efficient and reliable statistical methods that can be used to maximize the value of existing metabolomics resources, and enable the promise of metabolomics in early diagnosis of common complex diseases.

Public Health Relevance

As the systematic study of small molecule metabolites, Metabolomics can provide insights into the molecular mechanisms underlying diseases, and the analysis of metabolomics data requires appropriate statistical methods. The proposed multivariate statistical methods have the potential to efficiently identify metabolite biomarkers for earlier and more accurate disease detection, monitoring disease progression and treatment effects. This will accelerate the utilization of metabolomics in medical research and has the potential to positively impact human health outcomes.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Small Research Grants (R03)
Project #
1R03CA211831-01
Application #
9223351
Study Section
Special Emphasis Panel (ZRG1-BST-U (50)R)
Program Officer
Spalholz, Barbara A
Project Start
2016-09-14
Project End
2017-08-31
Budget Start
2016-09-14
Budget End
2017-08-31
Support Year
1
Fiscal Year
2016
Total Cost
$171,467
Indirect Cost
$38,907
Name
Purdue University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
072051394
City
West Lafayette
State
IN
Country
United States
Zip Code
47907