* * * Abstract * * * In omic studies of all types (e.g., genomic, transcriptomic, proteomic, metabolomic), technical batch effects pose a fundamental challenge to quality control and reproducibility. The possibilities for serious error are greatly magnified in metabolomics, however, due to a range of possible platform, operator, instrument, and environmental factors that can cause batch (or trend) effects. Hence, there is a need for routine surveillance and correction of batch effects within and across metabolomics laboratories and technological platforms. Accordingly, we propose here to develop the MetaBatch algorithms, computational tool, and web portal. For development of MetaBatch, we will leverage our experience in developing MBatch, a tool that became indispensible for quality-control of data in all 33 projects of The Cancer Genome Atlas (TCGA) program.
Our first aim i s to translate the successful quality control model from TCGA to metabolomics by customizing and extending the MBatch pipeline for detection, quantitation, diagnosis, interpretation, and correction of batch and trend effects.
The second aim i s to develop and incorporate innovative metabolomics-specific algorithms, including major visualization resources such as our interactive Next-Generation Clustered Heat Maps.
The third aim i s to distribute MetaBatch to the research community as open-source software and in cloud-based and Galaxy versions.
The fourth aim i s to provide plug-in capability for integration of MetaBatch with other metabolomic resources, prominently including Metabolomics Workbench (in collaboration with Dr. Shankar Subramaniam) and others developed within the Common Fund Metabolomics Program. Our fifth aim is to promote MetaBatch actively and interact extensively with other Consortium members and the metabolomics research community. With active support from MD Anderson Faculty and Academic Development, we will provide documentation, tutorials, videos, demonstrations, and training to accelerate use and to solicit feedback on limitations, possible improvements, and additional modules that would be useful in real-world workflows. We bring a variety of assets to the project, including: the MBatch resource as a starting point for software development; multidisciplinary expertise in bioinformatics, biostatistics, software engineering, biology, and clinical medicine; PIs with a combined 21 years of experience in molecular profiling studies of clinical disease (in a consortial context); international leadership in batch effects analysis; a software engineering team with a track record of producing high-end, highly visual bioinformatics packages and websites; a team of 20 Analysts whose expertise can be called on; extensive computing resources, including one of the most powerful academically based machines in the world; strong institutional support; and close working relationships with first-class basic, translational, and clinical researchers throughout MD Anderson, one of the foremost cancer centers in the country. Our bottom-line mission will be to aid the research community's effort to improve rigor and reproducibility in metabolomics for scientific understanding and to alleviate disease. !

Public Health Relevance

Our principal goals are (i) to help protect against ?batch effect? quality control problems in metabolomic data; (ii) to provide the research community with user-friendly, highly interactive bioinformatic tools for doing so; and (iii) to participate actively in the Common Fund Metabolomics Consortium, a community of researchers dedicated to scientific understanding and the alleviation of disease.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project--Cooperative Agreements (U01)
Project #
5U01CA235510-02
Application #
9773012
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Zanetti, Krista A
Project Start
2018-09-01
Project End
2022-08-31
Budget Start
2019-09-01
Budget End
2020-08-31
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Texas MD Anderson Cancer Center
Department
Biostatistics & Other Math Sci
Type
Hospitals
DUNS #
800772139
City
Houston
State
TX
Country
United States
Zip Code
77030