Genomic medicine offers hope for improved diagnostic methods and for more effective, patient-specific therapies. Genome-wide associated studies (GWAS) elucidate genetic markers that improve understanding of risks and causes for many diseases, and may guide diagnosis and therapy on a patient-specific basis. This project will take another approach to identify gene-disease associations: perform "reverse GWAS," or phenome- wide association study (PheWAS), to determine which phenotypes are associated with a given genotype. The project is enabled by a large DNA biobank coupled to a de- identified copy of the electronic medical record. This project has four specific aims. First, the project will develop and validate a standardized approach to extract disease phenotypes from EMR records, integrating national standard terminologies of clinical disorders and descriptors relating to treatment and diagnosis of each disease to create a sharable knowledge base. The project will use natural language processing, structured data queries, and heuristic and machine learning methods to accurately identify patients with each disease and corresponding controls.
The second aim i s to perform PheWAS analyses using existing genotype data. To validate the method, the project will use PheWAS to "rediscover" SNPs with known disease associations. The project will also investigate statistical methods for large-scale multiple hypothesis testing to discover novel phenotype associations.
The third aim i s to apply the PheWAS algorithms in four other sites with EMR-linked DNA biobanks and compare results. In the fourth aim, the project will validate novel phenotype-genotype associations discovered through PheWAS with new genotyping in a previously untested population. The tools generated from this project will not only make PheWAS possible, but will also broadly enable clinical research and subsequent genetic studies.
The promise of genomic medicine is to predict individuals'disease risk and treatment given their genetic information. This project will develop methods to identify diseases from electronic medical records and then find novel genetic associations from existing genomic data.
|Rasmussen, Luke V; Thompson, Will K; Pacheco, Jennifer A et al. (2014) Design patterns for the development of electronic health record-driven phenotype extraction algorithms. J Biomed Inform 51:280-6|
|Shameer, Khader; Denny, Joshua C; Ding, Keyue et al. (2014) A genome- and phenome-wide association study to identify genetic variants influencing platelet count and volume and their pleiotropic effects. Hum Genet 133:95-109|
|Heatherly, Raymond; Denny, Joshua C; Haines, Jonathan L et al. (2014) Size matters: how population size influences genotype-phenotype association studies in anonymized data. J Biomed Inform 52:243-50|
|Carroll, Robert J; Bastarache, Lisa; Denny, Joshua C (2014) R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics 30:2375-6|
|Ritchie, Marylyn D; Denny, Joshua C; Zuvich, Rebecca L et al. (2013) Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 127:1377-85|
|Lasko, Thomas A; Denny, Joshua C; Levy, Mia A (2013) Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PLoS One 8:e66341|
|Heatherly, Raymond D; Loukides, Grigorios; Denny, Joshua C et al. (2013) Enabling genomic-phenomic association discovery without sacrificing anonymity. PLoS One 8:e53875|
|Wei, Wei-Qi; Cronin, Robert M; Xu, Hua et al. (2013) Development and evaluation of an ensemble resource linking medications to their indications. J Am Med Inform Assoc 20:954-61|