An unsolved problem in health informatics is how to apply the past experiences of patients, stored in large-scale medical records systems, to predict the outcomes of patients and to individualize care. One approach to prediction, heretofore impractical, is rapidly finding a patient cohort similar enough to an index case that the health experiences and outcomes of this cohort are informative for prediction. This task is formidable because of large variability of the vast numbers of patient attributes with the added complexity of sequences of patient encounters evolving over time. Epidemiological considerations such as confounding by indication for treatment also come into play. The objective of this research effort is to (1) create a modular test bed that uses a big data systems architecture to support research in rapid individualized prediction of outcomes from large clinical repositories and (2) to explore various approaches to making pragmatic near-term predictions of outcomes. Using the Department of Veterans Affairs' (VA) Informatics and Computing Infrastructure database (VINCI), a research database with records of tens of millions of patients, we will explore two synergistic strategies for rapidly finding a cohort of patients that are similar enough to an index patient to predict near-term treatment response and/or adverse effects in an elastic cloud environment: 1) use of temporal alignment of critical events including use of gene sequence alignment methods to relax requirements for exact temporal matching; and, 2) use of conceptual distance metrics to model the degree of content similarity of case records. The initial domain of application will be treatment of Type 2 diabetes. The approach will apply open source big data methodologies, including Hadoop and Accumulo, to store and filter medical log files. The content of these logs will be processed by a combination with strategies including conceptual markup of events using natural language processing tools, matching of event streams, and statistical data mining methods to rapidly retrieve and identify patients that are sufficiently similar to an index case to be able to make personalized yet pragmatic clinical predictions of outcomes.

Public Health Relevance

This proposal studies how to use experience of past patients, stored in electronic medical records systems, to help clinicians make practical decisions on the care of complex patients with type 1 diabetes. Research applies methods adapted from Internet search engines and from studies of the human genome to determine what it means for one patient's disease experiences to be similar to and relevant to another's.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM108346-04
Application #
8840825
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Marcus, Stephen
Project Start
2013-08-01
Project End
2017-04-30
Budget Start
2015-05-01
Budget End
2016-04-30
Support Year
4
Fiscal Year
2015
Total Cost
Indirect Cost
Name
Medical University of South Carolina
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
183710748
City
Charleston
State
SC
Country
United States
Zip Code
29403
Frey, Lewis J (2018) Data integration strategies for predictive analytics in precision medicine. Per Med 15:543-551
Frey, Lewis J; Bernstam, Elmer V; Denny, Joshua C (2016) Precision medicine informatics. J Am Med Inform Assoc 23:668-70
Dunlea, Robert; Lenert, Leslie (2015) Understanding Patients' Preferences for Referrals to Specialists for an Asymptomatic Condition. Med Decis Making :
Frey, L J; Lenert, L; Lopez-Campos, G (2014) EHR Big Data Deep Phenotyping. Contribution of the IMIA Genomic Medicine Working Group. Yearb Med Inform 9:206-11
Lenert, Leslie; Dunlea, Robert; Del Fiol, Guilherme et al. (2014) A model to support shared decision making in electronic health records systems. Med Decis Making 34:987-95