Every individual genome predisposes its carrier to some set of diseases. Despite all research efforts, however, heritable causes of complex disease remain elusive. This is largely due to the inherent complexity of pathogenesis pathways and the interaction of individual genomic determinants with the environment. Elucidating causative genetics of pathogenesis will spur the development of better treatments and prevention tactics, modulating the presence of individual-specific stressors. Here, we propose to build AVA, Dx (Analysis of Variation for Association with Disease) a computational method for defining the functional role of DNA variation in complex diseases. AVA, Dx will use exome sequence data to pinpoint the molecular pathways affected in disease and to predict individual disease predisposition. As a proof of concept, we will use the nearly two thousand available sequenced exomes of Tourette Disorder, Crohn's Disease, and Chronic Obstructive Pulmonary Disease cohorts to build separate AVA, Dx instances. For each individual disease cohort we will first build a predictor of the impact of genetic variation on molecular gene function. This predictor will be unique in its ability to account for variant genotype in evaluating the impact of all kinds of gene-associated variants, rare and common, coding and non-coding. We will further encode each exome in our set as a vector of function impact scores for all genes. Based on this set of vectors, feature selection techniques will identify disease-genes; i.e. genes with exome-specific function changes correlating best to the clinical annotation of individual disease status (disease/healthy). Note that in this manner we expect to find a sizeable set of novel disease genes. We will train an artificial learning classifier to recognize the functional differences in sts of selected genes to distinguish the clinical status of the newly sequenced exomes (individuals). As the exome sequencing techniques used in our study vary by cohort, we will build experimental setup flexibility into our analysis structure. As a result, AVA, Dx techniques will be useful for drawing conclusions on existing sequencing data. AVA, Dx will generate experimentally testable hypotheses of disease pathogenesis by pinpointing the affected molecular functions. Moreover, AVA, Dx will be prognostic, allowing determination of disease predisposition prior to clinical diagnosis.

Public Health Relevance

Every person is genetically predisposed to number of disorders that could significantly affect their span or quality of life. One in four adults in the United States is diagnosable with a mental illness in any given year. Autoimmune disorders and chronic obstructive pulmonary disease affect one in ten people, each. Despite all research efforts, however, genetic causes of these and other complex diseases remain elusive. Here we propose to develop AVA, Dx (Analysis of Variation for Association with Disease), and a novel computational method that leverages predictions of functional effects of genome variants in disorder- specific genes to predict individual disease susceptibility. We will demonstrate proof of concept functionality of our method using the genetic and clinical data from Tourette disorder, Crohn's disease, and chronic obstructive pulmonary disease patients and their families. AVA, Dx will motivate new experimentally-testable hypothesis regarding the biological mechanisms of various diseases and provide a means for earlier prognosis, more accurate diagnosis and the development of better treatments.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Rutgers University
Schools of Earth Sciences/Natur
New Brunswick
United States
Zip Code
Mahlich, Yannick; Steinegger, Martin; Rost, Burkhard et al. (2018) HFSP: high speed homology-driven function annotation of proteins. Bioinformatics 34:i304-i312
Miller, Maximilian; Zhu, Chengsheng; Bromberg, Yana (2017) clubber: removing the bioinformatics bottleneck in big data analyses. J Integr Bioinform 14:
Daneshjou, Roxana; Wang, Yanran; Bromberg, Yana et al. (2017) Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges. Hum Mutat 38:1182-1192
Mahlich, Yannick; Reeb, Jonas; Hecht, Maximilian et al. (2017) Common sequence variants affect molecular function more than rare variants? Sci Rep 7:1608
Willsey, A Jeremy; Fernandez, Thomas V; Yu, Dongmei et al. (2017) De Novo Coding Variants Are Strongly Associated with Tourette Disorder. Neuron 94:486-499.e9
Miller, M; Bromberg, Y; Swint-Kruse, L (2017) Computational predictors fail to identify amino acid substitution effects at rheostat positions. Sci Rep 7:41329
Bruse, Shannon; Moreau, Michael; Bromberg, Yana et al. (2016) Whole exome sequencing identifies novel candidate genes that modify chronic obstructive pulmonary disease susceptibility. Hum Genomics 10:1
Reeb, Jonas; Hecht, Maximilian; Mahlich, Yannick et al. (2016) Predicted Molecular Effects of Sequence Variants Link to System Level of Disease. PLoS Comput Biol 12:e1005047
Goldberg, Tatyana; Rost, Burkhard; Bromberg, Yana (2016) Computational prediction shines light on type III secretion origins. Sci Rep 6:34516
Rost, Burkhard; Radivojac, Predrag; Bromberg, Yana (2016) Protein function in precision medicine: deep understanding with machine learning. FEBS Lett 590:2327-41