Family history is an essential predictor of disease risk, yet it is often incomplete, inaccurate, and underutilized in today's clinical settings. With the increasingly widespread adoption of electronic medical records (EMRs), many individuals are born today into a health system in which many of their family members have substantial longitudinal EMRs. These records present a vast untapped resource for deriving Data-Driven Family Histories - family histories constructed directly from the EMRs of patients' family members. The goal of this project is to systematically quantify the value of data-driven family histories for predicting an individual's future risk of disease, compared with patient-reported family histories currently available in the individual's own EMR. Improved family histories can drive better medical decisions about whom to screen and which preventative actions to take, saving costs and improving outcomes. This study will also lay the foundations for the development of privacy-preserving data sharing frameworks that enable linking of family medical records. We will use a unique comprehensive health database of over 4 Million covered lives with family-linkage information.
Aim 1. Assemble data-driven and patient-reported family histories for a selection of common diseases, and compare reporting rates and bivariate risks associated with different histories. We will assemble data-driven, patient-reported and combined family histories for a set of diseases for which family history is believed to be an important risk factor, and compare the relative reporting rates and bivariate disease risks associated with the different types of histories.
Aim 2. Develop Bayesian predictive models for each of the selected diseases, and compare the performance of models based on data-driven, patient-reported, and combined family histories. We will develop Bayesian models that predict an individual's future disease risk for each of the diseases above, and measure the improvements in model performance achieved by including patient-reported family histories, data-driven family histories, and combined family histories. These models will use all coded information available in family members' medical records, including family history of other diseases. Thus, we will aim to identify novel heritability associations, alongside known ones.
Aim 3 Quantify the predictive value of data-driven family history information taken from different family members at different levels of aggregation. Any future data-sharing framework should let individuals decide which medical information they wish to share with which family members.
We aim to start building the evidence base to inform these decisions: For each of the disease models above, we will quantify the effects of including information from different family members at different levels of detail.

Public Health Relevance

Family histories are essential for clinical risk prediction, yet they are often incomplete, inaccurate and underutilized in today's clinical settings. This project will help protect public health by developing improved approaches to providing more complete, accurate and detailed family histories, enabling improved clinical risk prediction in clinical settings.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Wise, Anastasia Leigh
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Boston Children's Hospital
United States
Zip Code