Integrative data science approaches for rare disease discovery in health records

Pejaver, Vikas

Abstract

There are nearly 7,000 diseases that have a prevalence of only one in 2,000 individuals or less. Yet, such rare diseases are estimated to collectively affect over 300 million people worldwide, representing a significant healthcare concern. Although rare diseases have predominantly genetic origins, nearly half of them do not manifest symptoms until adulthood and frequently confound discovery and diagnosis. Even in the case of early onset disorders, the sheer number of possible diagnoses can often overwhelm clinicians. As a result, rare diseases are often diagnosed with delay, misdiagnosed or even remain undiagnosed, not only disrupting patient lives but also hindering progress on our understanding of such diseases. Data science methods that mine large-scale retrospective health record data for phenotypic information will aid in timely and accurate diagnoses of rare diseases, especially when combined with additional data types, thus, having significant real- world impact. This proposal will integrate electronic health record (EHR) data sets with publicly available vocabularies and ontologies, and genomic data for the improved identification and characterization of patients with rare diseases, using approaches from machine learning, natural language processing (NLP) and basic bioinformatics. The work has three specific aims and will be carried out in two phases. During the mentored phase, the principal investigator (PI) will develop data-driven methods to extract standardized concepts related to rare diseases from clinical notes and infer the occurrence of each disease (Aim 1). He will also develop data science approaches to compare and contrast longitudinal patterns associated with patients' journeys through the healthcare system when seeking a diagnosis for a rare disease, and aid in clinical decision-making by leveraging these patterns (Aim 2). During the independent phase (Aim 3), computational methods will be developed for the integrated modeling and analysis of genotypic (from Aim 3) and phenotypic information (from Aims 1 and 2). Cohorts to be sequenced will cover diseases for which causal genes or disease definitions are unclear (discovery), as well as those for which these are well known (validation). This work will be carried out under the mentorship of four faculty members with complementary expertise in biomedical informatics, data science, NLP, and rare disease genomics at the University of Washington, the largest medical system in the Pacific Northwest (four million EHRs), world-renowned researchers in medical genetics, and a robust data science environment. In addition, under the direction of the mentoring team, the PI will complete advanced coursework, receive training in translational bioinformatics and clinical research informatics, submit manuscripts, and seek an independent research position. This proposal will yield preliminary results for subsequent studies on data-driven phenotyping and enable the realization of the PI's career goals by providing him with the necessary training to build on his machine learning and basic bioinformatics expertise to transition into an independent investigator in biomedical data science.

Public Health Relevance

Rare genetic diseases are estimated to affect the lives of 25 to 30 million Americans and their families, and present a significant economic burden on the healthcare system. Currently, our knowledge of the broad spectrum of the 7,000 observed rare diseases is limited to a few well-studied ones, hindering our ability to make correct and timely diagnoses. The objective of this study is to improve the identification of patients with rare diseases in healthcare systems by developing data science approaches that automatically recognize rare disease-related patterns in patient health records and correlate them with genomic data, thus, aiding in diagnosis and discovery.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Career Transition Award (K99)
Project #: 5K99LM012992-02
Application #: 9884791
Study Section: Special Emphasis Panel (ZLM1)
Program Officer: Sim, Hua-Chuan

Project Start: 2019-03-04
Project End: 2021-02-28
Budget Start: 2020-03-01
Budget End: 2021-02-28
Support Year: 2
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: University of Washington
Department: Other Health Professions
Type: Schools of Medicine
DUNS #: 605799469

City: Seattle
State: WA
Country: United States
Zip Code: 98195

Related projects


NIH 2020 K99 LM	Integrative data science approaches for rare disease discovery in health records Pejaver, Vikas Rao / University of Washington
NIH 2019 K99 LM	Integrative data science approaches for rare disease discovery in health records Pejaver, Vikas Rao / University of Washington

Comments

Be the first to comment on Vikas Pejaver's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: