Electronic health records (EHRs) are a powerful and efficient tool for biological discovery globally. However, a vital step for EHR-based research is valid, accurate, and reliable phenotyping (i.e., correctly identifying individuals with a particular trait of interest). Conventional approaches to phenotyping are ad hoc, domain expert dependent, rule-based, and usually specific to EHR environments. However, each requires an extensive investment of time and resources to develop due to the heterogeneity, complexity, inaccuracy, and frequent fragmentation of EHRs. The lack of general, automatic, and portable approaches to enable accurate high- throughput phenotyping is a critical barrier that hampers our ability to leverage valuable clinical data in EHRs for better healthcare. We propose a new generalizable high-throughput approach: Phenotyping by Measured, Automated Profile (PheMAP) that we have developed from public resources and will further refine and implement across various EHRs. We recognize that mass information about phenotypes is often described in significant detail and continuedly accumulated within publicly available resources (e.g., MedlinePlus and Wikipedia). We hypothesize this information can be retrieved, filtered, organized, measured, and formalized into standard EHR phenotype profiles. Indeed, we have used such an ensemble approach to integrate four generalizable online medication resources (e.g., SIDER and RxNorm) to create MEDI--a resource linking 2,136 medications and 13,304 indications. In preliminary studies, we extended this strategy to phenotyping and created a prototype PheMAP. For each phenotype, we identified relevant clinical concepts and weighted each based on its importance to the phenotype. We then mapped all associated concepts to commonly-used clinical terminologies. Our preliminary studies showed an average consistency of 98.6%0.8% between our early-stage PheMAP and three validated eMERGE algorithms (Type 2 Diabetes, dementia, and hypothyroidism). We seek support to refine and optimize PheMAP and develop tools to allow researchers to implement PheMAP efficiently in different EHRs. This will allow researchers to rapidly and accurately determine the status of thousands of phenotypes for millions of individuals with minimal human intervention. Since PheMAP is created using independent resources that are more generalizable than a local clinical dataset, the implementation will generate more consistent outcomes in different EHRs for large-scale analyses.The work we propose is a necessary step toward being able to conduct high-throughput genome-wide and phenome-wide association analyses (GWASs and PheWASs). We will use data from multiple biobanks to accomplish these tasks. Specifically, we will achieve the following goals in this grant: 1.refine PheMAP and conduct large-scale validation, 2. implement PheMAP and perform representative GWASs and PheWASs, 3. Use PheMAP to conduct GWASs for unstudied or understudied diseases and phenotypes, and 4. Share PheMAP to facilitate research using EHRs.
Electronic health records (EHRs) are a powerful and efficient tool for biological discovery globally while a vital step for EHR-based research is valid, accurate, and reliable phenotyping. Conventional approaches to phenotyping are ad hoc, domain expert dependent, rule-based, and usually specific to EHR environments. We propose to refine, validate, and share a new generalizable high-throughput approach: Phenotyping by Measured, Automated Profile (PheMAP) that allows researchers to rapidly and accurately determine the status of thousands of phenotypes for millions of individuals with minimal human intervention.