Major depressive disorder is highly prevalent, and represents a major driver of disability as well as health care cost. Progress in improving diagnosis and treatment of this disorder has been hindered by its heterogeneity in clinical presentation and course. Such heterogeneity makes the underlying neurobiology difficult to characterize, and has led to efforts to identify more homogeneous subgroups. These efforts date back to the dawn of the modern psychopharmacologic era - initially focused on atypical and melancholic depression, and more recently on subtypes such as anxious and irritable depression. Subtyping efforts are complicated by a paucity of large clinical cohorts with similar ascertainment and phenotyping. In particular, the available data often focuses on a very narrow range of depressive symptoms, along with a restricted set of comorbidities, and typically encompasses only the acute phase of treatment. As a result, despite intriguing findings in one or occasionally two cohorts, subtyping has not been widely deployed in clinical practice, nor used to meaningfully improve translational investigation. The utility of electronic health records and registries to create in silico cohort studies has been demonstrated in numerous settings, including psychiatry. Beyond sample size and efficiency of ascertainment, these data types often have advantages in the range of non-depressive phenotypes captured and availability of longitudinal data. The present study therefore proposes to create a very large cohort of individuals with MDD, defined by a validated algorithm, spanning two health systems, and to apply novel machine learning methods to identify MDD subtypes. These subtypes will be validated by comparison with standard phenotypic definitions, annotation by trained raters using a standard 'intruder' paradigm, and correlation with medication prescribing Then, as proof of concept the biological basis of these subtypes will be characterized by examining heritability and polygenic risk using a large genetic biobank. Beyond determining convergent validity, this last step will provide proof-of-concept for broader application of data-driven subtypes for translational investigation in biobanks and registries. The study builds on existing collaborations between a team experienced in mood disorder phenotypic and genomic study as well as application of electronic health records, and a team active in developing and applying emerging methods in machine learning. It will lay the groundwork for further validation and application of data-driven disease subtyping across medicine.

Public Health Relevance

significance The wide variation in symptoms of major depressive disorder complicates efforts to understand the underlying causes of this illness. Applying machine learning methods to electronic health records should enable the identification of more specific disease subgroups. These subgroups will facilitate efforts to understand the causes of depression, and to begin to develop more targeted treatments.

Agency
National Institute of Health (NIH)
Institute
National Institute of Mental Health (NIMH)
Type
High Priority, Short Term Project Award (R56)
Project #
1R56MH115187-01A1
Application #
9717620
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Ferrante, Michele
Project Start
2018-08-10
Project End
2019-08-09
Budget Start
2018-08-10
Budget End
2019-08-09
Support Year
1
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Massachusetts General Hospital
Department
Type
DUNS #
073130411
City
Boston
State
MA
Country
United States
Zip Code