An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis

Liang, Faming

Abstract

The dramatic improvement in data collection and acquisition technologies in the past decades has enabled scientists to collect vast amounts of health-related data from biomedical studies. If analyzed properly, these data will expand our knowledge for testing new hypotheses about disease management from diagnosis to prevention to personalized treatment. However, the biomedical data can be rather complex, how to analyze them has posed many challenges on the existing methods. This proposal attempts to address three fundamental challenges: (i) Missing data are ubiquitous in biomedical research, how to make a sufficient use of biomedical complex data in presence of missing values? (ii) With the growing data size, typically comes a growing complexity of the patterns in the data and of the models needed to account for the patterns. What is the general recipe for estimating parameters of complex models? (iii) Biomarker identification from high-throughput omics data has been one of major focuses in cancer research. Yet despite intense effort, the number of biomarkers approved by FDA each year for clinical use is still in single digits. An important factor contributing to this failure is the lack of appropriate statistical methods for analyzing such heterogeneous and high-dimensional data. Toward a sufficient use of biomedical complex data, this project proposes an imputation-consistency algorithm as a general algorithm for high-dimensional missing data problems. Then the algorithm is extended to address other two challenges under the principles of conditioning and consistency; in particular, this project proposes some highly efficient and effective statistical algorithms that address the heterogeneity and high-dimensionality issues encountered in biomarker identifications and eQTL analysis. The proposed algorithms are applied to (i) select anticancer drug sensitive genes with the CCLE and SANGER data, (ii) identify prognostic mRNA biomarkers for multiple types of cancers using the TCGA data, (iii) conduct eQTL analysis for multiple types of cancers using the TCGA data, and (iv) identify informative circulating biomarkers for type 1 diabetes. The proposed methods are highly efficient and general and can be applied to other types of disease as well. Statistically, this project is to develop some general, effective, and highly efficient algorithms for complex data analysis; biomedically, this project will significantly improve accuracy of biomarker identification from omics data, which advances people's understanding of molecular mechanism and development of precision medicine. 1

Public Health Relevance

Successful completion of this project will generate hands-on tools for biomedical complex data analysis and identify some biomarkers that are potentially in clinics for type 1 diabetes and multiple cancers. This will improve our understanding to the mechanism of complex diseases and our ability to predict disease risk and prognosis, and accelerate the integration of biomarkers into clinical trials and the development of personalized medicine, which ultimately will enhance our public health system and improve patient care. 1

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM126089-04
Application #: 9842625
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Brazhnik, Paul

Project Start: 2018-01-01
Project End: 2021-12-31
Budget Start: 2020-01-01
Budget End: 2020-12-31
Support Year: 4
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Purdue University
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 072051394

City: West Lafayette
State: IN
Country: United States
Zip Code: 47907

Related projects


NIH 2021 R01 GM	An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis Liang, Faming / Purdue University
NIH 2020 R01 GM	An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis Liang, Faming / Purdue University
NIH 2019 R01 GM	An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis Liang, Faming / Purdue University
NIH 2018 R01 GM	An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis Liang, Faming / Purdue University

Comments

Be the first to comment on Faming Liang's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: