An important goal of infectious disease research is to develop genetic predictors of susceptibility. Our success in this endeavor will depend critically on the informatics methods and software that are available for making sense of high-dimensional genetic and genomic data. The goal of this research program is to develop, evaluate, distribute and support new and novel biomedical computing algorithms and open-source software for identifying combinations of genetic predictors of clinically important infectious disease outcomes. This application will target the growing body of rare genetic variants identified by high-throughput DNA sequencing. Our clinical application will focus on the prediction of antiretroviral response in clinical trials for HIV/AIDS. We propose here a highly innovative Hierarchical Rare Variant Collapsing Machine (HRVCM) algorithm for identifying and collapsing combinations of rare variants across gene regions (AIM 1). We will then integrate these new collapsed HRVCM variables into our popular Multifactor Dimensionality Reduction (MDR) method that will assess them in combination with common single-nucleotide polymorphisms (SNPs) from genome-wide association studies or GWAS (AIM 2). Our novel HRVCM-MDR approach will, for the first time, make it possible to assess non-additive interactions among sets of rare and common variants simultaneously in genetic studies of infectious diseases. We will apply these new and novel methods to approximately 13 million rare and common variants from nearly 3000 subjects that participated in an AIDS Clinical Trials Group (ACTG) study to evaluate risk for virologic failure with efavirenz-containing antiretroviral therapy (ART) regimens (AIM 3). Finally, we will release all methods as open source to the biomedical research community through our freely available MDR software package (AIM 4).

Public Health Relevance

The overall goal of this application is to develop innovative new computational methods for the genetic analysis of infectious diseases. We will focus on the development of methods that are able to detect synergistic effects of multiple genetic variants regardless of whether they are rare of common in human populations. We will apply these methods to the study of HIV/AIDS vaccination response.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Mckaig, Rosemary G
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Schools of Medicine
United States
Zip Code
Verma, Shefali Setia; Ritchie, Marylyn D (2018) Another Round of ""Clue"" to Uncover the Mystery of Complex Traits. Genes (Basel) 9:
Olson, Randal S; Cava, William La; Mustahsan, Zairah et al. (2018) Data-driven advice for applying machine learning to bioinformatics problems. Pac Symp Biocomput 23:192-203
Piette, Elizabeth R; Moore, Jason H (2018) Identification of epistatic interactions between the human RNA demethylases FTO and ALKBH5 with gene set enrichment analysis informed by differential methylation. BMC Proc 12:59
Urbanowicz, Ryan J; Olson, Randal S; Schmitt, Peter et al. (2018) Benchmarking relief-based feature selection methods for bioinformatics data mining. J Biomed Inform 85:168-188
Huang, Jing; Du, Jingcheng; Duan, Rui et al. (2018) Characterization of the Differential Adverse Event Rates by Race/Ethnicity Groups for HPV Vaccine by Integrating Data From Different Sources. Front Pharmacol 9:539
Basile, Anna O; Byrska-Bishop, Marta; Wallace, John et al. (2018) Novel features and enhancements in BioBin, a tool for the biologically inspired binning and association analysis of rare variants. Bioinformatics 34:527-529
Urbanowicz, Ryan J; Meeker, Melissa; La Cava, William et al. (2018) Relief-based feature selection: Introduction and review. J Biomed Inform 85:189-203
Moore, Jason H; Shestov, Maksim; Schmitt, Peter et al. (2018) A heuristic method for simulating open-data of arbitrary complexity that can be used to compare and evaluate machine learning methods. Pac Symp Biocomput 23:259-267
Chernikova, Diana A; Madan, Juliette C; Housman, Molly L et al. (2018) The premature infant gut microbiome during the first 6 weeks of life differs based on gestational maturity at birth. Pediatr Res 84:71-79
Piette, Elizabeth R; Moore, Jason H (2018) Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV). BioData Min 11:6

Showing the most recent 10 out of 23 publications