Data preprocessing is critical for the success of any MS-based untargeted metabolomics study, as it is the first informatics step for making sense of the data. Despite the enormous contributions that existing software tools have made to metabolomics, errors in compound identification and relative quantitation are still plaguing the field. This issue is becoming more serious as the sensitivity of LC/MS and GC/MS platforms is constantly increasing. Preprocessing involves peak detection, peak grouping and annotation for LC/MS or spectral deconvolution for GC/MS data, and peak alignment. Existing software tools invariably yield an immense number of false positive and false negative peaks, produce inaccurate peak groups, mis-align detected peaks, and extract inaccurate information of relative metabolite quantitation. These errors can translate downstream into spurious or missing compound identifications and cause misleading interpretations of the metabolome. Furthermore, users need to specify a large number of parameters for existing software tools to work. Unfortunately, general users usually do not understand how to optimize these parameters, and maximizing one aspect (e.g., sensitivity) often has deleterious effects on another (e.g., specificity). We will address these challenges by developing more accurate algorithms for improving the rigor and reproducibility of data preprocessing. The proposed algorithms will be implemented in Java and integrated with the widely-used MZmine 2, making the software cross-platform and user-friendly with rich visualization capabilities. In addition, the implementation will be optimized for memory efficiency and computing speed allowing large-scale data preprocessing. Extensive testing of the software will be conducted in close collaborations with metabolomics core facilities and users around the world.

Public Health Relevance

The proposed algorithm will allow scientists to extract information about metabolites from metabolomics data more accurately. This more accurate information will help scientists to investigate mechanisms of various diseases and develop therapeutic measures to treat the diseases.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project--Cooperative Agreements (U01)
Project #
5U01CA235507-03
Application #
10005903
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Zanetti, Krista A
Project Start
2018-09-19
Project End
2022-08-31
Budget Start
2020-09-01
Budget End
2021-08-31
Support Year
3
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of North Carolina Charlotte
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
066300096
City
Charlotte
State
NC
Country
United States
Zip Code
28223