The broad, long-term objectives of this project are to develop semiparametric regression methods for analyzing censored data, which are commonly encountered in biomedical research on chronic diseases. This renewal application is focused on addressing the computational challenges in the analysis of big data involving hun- dreds of thousands to tens of millions of individuals with thousands to tens of millions of variables. The speci?c aims are to develop: (1) a communication-ef?cient, distributed boosting algorithm based on semiparametric ef?- cient score functions for ?tting the Cox proportional hazards model to a wide variety of big censored data; (2) a communication-ef?cient, distributed boosting algorithm that embeds a random feature-set selection scheme into variable selection in high-dimensional settings; (3) a communication-ef?cient, distributed boosting algorithm for ?tting a Cox model with latent factors to multiple types of high-dimensional features with missing values; and (4) a distributed EM algorithm that incorporates both the preconditioned conjugate-gradient method for matrix inver- sion and a novel modi?cation of the Laplace approximation to numerical integration for ?tting a random-effect Cox model with a large number of genetically related individuals. Each of these aims addresses important new chal- lenges arising from today's big biomedical studies. The proposed methods and algorithms are based on likelihood and other sound statistical principles. The desired asymptotic properties of the estimators will be established rig- orously through innovative use of modern empirical process theory and other advanced mathematical tools. The proposed methods and algorithms will be evaluated extensively through simulation studies mimicking real data and tested in the cloud computing environment, which provides high data security guarantees and scalable com- puting infrastructures. In addition, the methods and algorithms will be applied to our ongoing biomedical studies, including the NHLBI Trans-Omics for Precision Medicine program and the UK Biobank. Finally, ef?cient, reliable, and user-friendly open-source software with proper documentation will be produced. The overall impact of the proposed work will be to create new paradigms for survival analysis, advance biomedical research in the United States and other countries, and accelerate the search for effective strategies to prevent and treat cardiovascular diseases, cancers, AIDS, and other diseases of utmost importance to global public health.
This research intends to tackle new computational challenges in the analysis of big data from cutting-edge biomedical research, including precision medicine programs and biobanks. The proposed paradigms will ac- celerate the search for effective strategies to prevent and treat cardiovascular disorders, cancers, AIDS, and other diseases of utmost importance to global public health.