The use of human genome discoveries and other established risk predictors for early disease prediction is an essential step towards precision medicine. However, the task of developing clinically useful risk prediction models is hampered by the present state of evidence, in which currently known risk predictors are insufficient for accurately predicting most human diseases. With rapidly evolving high-throughput technologies and ever- decreasing costs, it becomes feasible to collect diverse types of omic data in large-scale studies. While the multi-level omic data generated from these studies hold great promise for novel predictors to further improve existing models, the high-dimensionality of omic data, the heterogeneous etiology of human diseases, and the complex inter-relationships among various levels of omic data bring tremendous analytic challenges. New methods and software are in great need to address these challenges, and to facilitate ongoing and future high- dimensional risk prediction research. The goal of this application is thus to complete the development of a random field (RF) framework and software for high-dimensional risk prediction research using omic data, and then apply the framework to Alzheimer's disease (AD). The proposed research will integrate a kernel function and a spatial adaptive lasso into RF, making it applicable for high-dimensional data with a large number of predictors. Moreover, the new framework is able to utilize the family design to address several important issues (e.g., genetic heterogeneity) in predicting complex diseases, and will adopt a cross-diffusion process to integrate information from different levels of omic data. Based on preliminary simulation results, our central hypothesis is that the proposed framework attains a more accurate and robust performance than existing methods. The successful completion of this project should address analytical challenges faced by massive amounts of omic data, and advance the methodology and software development for high-dimensional risk prediction in general. The application of the new methods and software to large-scale AD datasets could also lead to novel AD risk prediction models that could be further replicated and investigated through collaborative research.

Public Health Relevance

Risk prediction capitalizing on emerging human genome findings holds great promise for improved healthcare and precision medicine. The proposed research by a new early-stage investigator will develop an analytical framework and software for risk prediction using omic data, and will apply the new framework to Alzheimer's disease. The success of the project will facilitate high-dimensional risk prediction research in general, and will benefit translational research aimed at developing accurate risk prediction models for Alzheimer's disease.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Florida
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Shen, Xiaoxi; Lu, Qing (2018) Joint analysis of genetic and epigenetic data using a conditional autoregressive model. BMC Genet 19:71
Li, Ming; He, Zihuai; Tong, Xiaoran et al. (2018) Detecting Rare Mutations with Heterogeneous Effects Using a Family-Based Genetic Random Field Method. Genetics 210:463-476
He, Zihuai; Zhang, Min; Zhan, Xiaowei et al. (2014) Modeling and testing for joint association using a genetic random field model. Biometrics 70:471-9