The efforts of the human genome project are beginning to provide important findings for human health. Technological advances in the laboratory, particularly in characterizing human genomic variation, have created new approaches for studying the human genome. However, current statistical and computational strategies are taking only partial advantage of this wealth of information. In the quest for disease susceptibility genes for common, complex disease, we are faced with many challenges. Selecting genetic, clinical, and environmental factors important for the trait of interest is increasingly more difficult as high throughput data generation technologies are developed. We know that genes do not act in isolation, thus numerous other factors are likely important in complex disease phenotypes. However, techniques for robust statistical modeling of important variables to predict clinical outcomes are limited in their capability for interaction effects. Ultimately, we want to know what factors are important to provide superior prevention, diagnosis, and treatment of human disease. Unfortunately, interpretation of statistical models in a meaningful way for biomedical research has been lacking due to the inherent difficulty in making such connections. Thus, a technology that embraces the complexity of human disease and integrates multiple data sources including biological knowledge from the public domain, through a powerful analytical framework is essential for dissecting the architecture of common diseases. ATHENA: the Analysis Tool for Heritable and Environmental Network Associations is a novel framework that incorporates variable selection, modeling, and interpretation to learn more about diseases of public health interest. As the field gains experience in analyzing large scale genomic data, it is crucial that we learn from each other and develop and codify the best strategies.

Public Health Relevance

Many common, complex diseases are likely due to a combination of genetic and environmental risk factors. Out ability to extract all of the meaningful information from very large genomic and phenotypic datasets has been limited by our analytic strategies. The methodology described in this proposal is a powerful new approach to maximize the information learned from large datasets to improve prevention, diagnosis, and treatment of diseases of public health interest.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Vanderbilt University Medical Center
Schools of Medicine
United States
Zip Code
Hall, Molly A; Wallace, John; Lucas, Anastasia et al. (2017) PLATO software provides analytic framework for investigating complexity beyond genome-wide association studies. Nat Commun 8:1167
Hohman, Timothy J; Bush, William S; Jiang, Lan et al. (2016) Discovery of gene-gene interactions across multiple independent data sets of late onset Alzheimer disease from the Alzheimer Disease Genetics Consortium. Neurobiol Aging 38:141-150
Moore, Carrie Colleen Buchanan; Basile, Anna Okula; Wallace, John Robert et al. (2016) A biologically informed method for detecting rare variant associations. BioData Min 9:27
Kim, Dokyoon; Li, Ruowang; Dudek, Scott M et al. (2015) Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer. J Biomed Inform 56:220-8
Pendergrass, Sarah A; Verma, Shefali S; Hall, Molly A et al. (2015) Next-generation analysis of cataracts: determining knowledge driven gene-gene interactions using biofilter, and gene-environment interactions using the Phenx Toolkit*. Pac Symp Biocomput :495-505
Kim, Dokyoon; Joung, Je-Gun; Sohn, Kyung-Ah et al. (2015) Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J Am Med Inform Assoc 22:109-20
Kim, Dokyoon; Li, Ruowang; Dudek, Scott M et al. (2015) Binning somatic mutations based on biological knowledge for predicting survival: an application in renal cell carcinoma. Pac Symp Biocomput :96-107
Ritchie, Marylyn D; Holzinger, Emily R; Li, Ruowang et al. (2015) Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet 16:85-97
Kim, Dokyoon; Shin, Hyunjung; Sohn, Kyung-Ah et al. (2014) Incorporating inter-relationships between different levels of genomic data into cancer clinical outcome prediction. Methods 67:344-53
Holzinger, Emily R; Dudek, Scott M; Frase, Alex T et al. (2014) ATHENA: the analysis tool for heritable and environmental network associations. Bioinformatics 30:698-705

Showing the most recent 10 out of 34 publications