Time to onset of chronic diseases such as cancer, cardiovascular disease, and diabetes is expected to be influenced by multiple gene-gene interactions that add to the complexity of the genotype-phenotype mapping relationship. Unfortunately, parametric statistical methods such as Cox regression lack sufficient power to detect high-order gene-gene interactions due to the sparseness of the data. Machine learning methods offer a more powerful alternative but rely on computationally-intensive searching methods to identify the top models. We propose here to develop a powerful and computationally efficient bioinformatics strategy that combines machine learning algorithm and Cox regression for identifying gene-gene and gene-environment interaction models that are associated with time of onset of chronic disease. Specifically, we firs propose to develop a novel Robust Survival Multifactor Dimensionality Reduction method (RS-MDR) for the detection of gene-gene interactions in rare variants that influence time of onset of human disease (AIM 1). The power of RS-MDR method will be evaluated by comparing it to other existing methods in simulation studies. We then propose to change the representation space of the gene-gene interaction models using RS-MDR's construction induction method and apply L1 penalized Cox regression to identify a set of interaction models that can predict patients' survival probability (AIM 2). We hypothesize that RS-MDR can effectively identify high order interaction models and the combined approach provides a powerful and computational efficient way to select a set of interaction models. We will use extensive simulations that are derived from GWAS studies to thoroughly evaluate this hypothesis. Next, we will apply the new combined method for detecting and characterizing gene-gene and gene-environment interactions in genome-wide association study (GWAS) data from large population-based studies of lung cancer and rheumatoid arthritis (AIM 3). Results from the real data analysis will be used to refine the method. Finally, we will distribute the proposed method as part of an open- source R software package (AIM 4). We anticipate that the proposed method will combine the strength from both parametric and non-parametric methods and enable detection of interaction models that are jointly affecting time of onset of chronic diseases. This is important because time of onset has more variation than case-control status and it may be more clinically relevant. Furthermore, studies of genetic factors predicting time of onset have not been pursued aggressively using GWAS studies, despite the relevance of this information for the discovery of high risk variants like mutations in BRCA1.
This project will provide a general method for genome data analysis that can be potentially apply to any specific disease. We focus the application of this method on an investigation into the role of genetic polymorphisms in lung cancer and rheumatoid arthritis. This will increase our knowledge of the basic biology and lead for screening and targeted therapy.
Showing the most recent 10 out of 11 publications