With increasing use of electronic medical records for a variety of patients, a large investment is being made in a resource still vastly underused. Especially in mental health, where problems are highly individualized, requiring personalized intervention, and often accompanied by rich data not easily captured in structured templates, the need for extracting information from free text in existing records for use as large-scale stand- alone datasets or in combination with other data is real. Without scalable and effective computational approaches to capture this data, much time, effort and money is used to create limited-use records that instead could be leveraged into precious data sources to inform existing research and lead to new insights, progress and treatments. Our broad, long-term goal is processing free text in EHR in mental health. We focus on Autism Spectrum Disorders (ASD), a particularly interesting example of both shortcomings and opportunities. ASD?s prevalence has increased over the years, and estimates range from 1 in 150 in 2000 to 1 in 68 in 2010(1-5). These numbers are based on surveillance using electronic health records. The increasing prevalence is not well understood, and hypotheses range from changing diagnostic criteria to environmental factors. The lines of inquiry used to find cures are similarly broad and range from brain scans and genetics, resulting in large structured datasets, to highly individualized therapies, resulting in rich but unstructured data. Currently the text information in the electronic records is not being leveraged on a large scale. The proposed project continues our preliminary work and uses a data-driven approach to create human- interpretable models that allow automated extraction of relevant structured data from free text. The Diagnostic and Statistical Manual of Mental Disorders (DSM) is the starting point for identifying features. A database of thousands of records is leveraged to design and test the algorithms. The two specific aims are: 1) design and test natural language processing (NLP) algorithms to detect DSM criteria for ASD in free text in EHR, and 2) demonstrate feasibility and usefulness of the models for large-scale analysis of ASD cases, which is inconceivable today with current approaches. Our methods include analysis of free text in electronic records and end-user annotations to create a large gold standard of instances of DSM criteria for ASD, application of machine learning and rule-based approaches to create human-interpretable models for automated annotation of diagnostic patterns in textual records, and demonstrate usefulness with new research (e.g., Automatically detect ASD vs. no-ASD status for challenging cases; evaluate prevalence of symptoms over time). Through NLP algorithms, this project has the potential to significantly shift away from the current paradigm of attempting to understand ASD by relying on small-scale data from individual interventions and lack of integration between different data sources, to leveraging information from existing large-scale data sources to propose novel analyses and hypotheses.

Public Health Relevance

Lack of sophisticated tools to extract relevant diagnostic patterns from free text from the increasingly large number of electronic medical/health records is a critical barrier in the field of mental health to leverage and utilize the already available data. Natural language processing (NLP) algorithms designed specifically for mental health can make new data analysis and integration with other sources possible at a scale previously unseen. Using a data-driven process, this project will design NLP algorithms to annotate free text with criteria from the Diagnostic and Statistical Manual of Mental Disorders (DSM) and demonstrate scope, feasibility and usefulness by focusing on Autism Spectrum Disorders (ASD) where prevalence is increasing and much rich clinical text is stored in electronic health records (EHR).

Agency
National Institute of Health (NIH)
Institute
Agency for Healthcare Research and Quality (AHRQ)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21HS024988-02
Application #
9547263
Study Section
Healthcare Information Technology Research (HITR)
Program Officer
Hsiao, Janey
Project Start
2017-09-01
Project End
2019-08-31
Budget Start
2018-09-01
Budget End
2019-08-31
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of Arizona
Department
Administration
Type
Sch of Business/Public Admin
DUNS #
806345617
City
Tucson
State
AZ
Country
United States
Zip Code
85721