The broad, long-term objectives of this project are to develop semiparametric regression methods for analyzing censored data, which are commonly encountered in biomedical research on chronic diseases. This renewal application is focused on addressing the computational challenges in the analysis of big data involving hun- dreds of thousands to tens of millions of individuals with thousands to tens of millions of variables. The speci?c aims are to develop: (1) a communication-ef?cient, distributed boosting algorithm based on semiparametric ef?- cient score functions for ?tting the Cox proportional hazards model to a wide variety of big censored data; (2) a communication-ef?cient, distributed boosting algorithm that embeds a random feature-set selection scheme into variable selection in high-dimensional settings; (3) a communication-ef?cient, distributed boosting algorithm for ?tting a Cox model with latent factors to multiple types of high-dimensional features with missing values; and (4) a distributed EM algorithm that incorporates both the preconditioned conjugate-gradient method for matrix inver- sion and a novel modi?cation of the Laplace approximation to numerical integration for ?tting a random-effect Cox model with a large number of genetically related individuals. Each of these aims addresses important new chal- lenges arising from today's big biomedical studies. The proposed methods and algorithms are based on likelihood and other sound statistical principles. The desired asymptotic properties of the estimators will be established rig- orously through innovative use of modern empirical process theory and other advanced mathematical tools. The proposed methods and algorithms will be evaluated extensively through simulation studies mimicking real data and tested in the cloud computing environment, which provides high data security guarantees and scalable com- puting infrastructures. In addition, the methods and algorithms will be applied to our ongoing biomedical studies, including the NHLBI Trans-Omics for Precision Medicine program and the UK Biobank. Finally, ef?cient, reliable, and user-friendly open-source software with proper documentation will be produced. The overall impact of the proposed work will be to create new paradigms for survival analysis, advance biomedical research in the United States and other countries, and accelerate the search for effective strategies to prevent and treat cardiovascular diseases, cancers, AIDS, and other diseases of utmost importance to global public health.

Public Health Relevance

This research intends to tackle new computational challenges in the analysis of big data from cutting-edge biomedical research, including precision medicine programs and biobanks. The proposed paradigms will ac- celerate the search for effective strategies to prevent and treat cardiovascular disorders, cancers, AIDS, and other diseases of utmost importance to global public health.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Research Project (R01)
Project #
2R01HL149683-29A1
Application #
9966371
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Wolz, Michael
Project Start
2020-04-21
Project End
2024-03-31
Budget Start
2020-04-21
Budget End
2021-03-31
Support Year
29
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of North Carolina Chapel Hill
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599