Enormous amounts of biomedical data are generated by hospitals, but most of this data is available only after people become ill. For people with chronic diseases such as diabetes, though, many important events happen outside of the medical system. Patient generated health data (PGHD) can provide detailed insight into an individual's health during daily life. With longterm continuous glucose data, activity data, and food logs, we could develop personalized models of how factors affect blood glucose and deliver personalized guidance to patients on how to better manage it. Transforming PGHD into information to guide decisions is a highly general problem that applies to all forms of diabetes, and other chronic diseases. We specifically focus on identifying dietary and lifestyle risk factors for gestational diabetes mellitus (GDM). GDM occurs in 9% of pregnancies, and leads to a 7-fold increase in Type 2 Diabetes risk after birth, making it a significant public health problem. Pregnancy provides an ideal test bed for methods designed to make use of PGHD and uncover causes, as outcomes can be captured in a limited study duration. Motivated by trying to find causes and effects of nutrition in pregnancy, we develop generalizable algorithms that address widespread challenges in the use of PGHD for causal inference. First, existing causal inference methods assume we have well-defined variables (e.g. bodyweight), but nutrition can be measured in many ways (calories, macronutrients, food groups). This puts a large burden on users, and limits the potential for data-driven inference. We introduce the first causal inference algorithm that automatically identifies optimal variable granularity for each relationship, by leveraging ontologies. This allows identification of different effects between, say, protein and specific meats on health outcomes, without users needing to specify such hypotheses. Second, while individual level data is essential for personalized inference, only limited data may be available when a treatment decision must be made or when health status is changing over time, such as during pregnancy. Leveraging population data can yield more accurate inferences, but existing methods are unable to identify relevant data dynamically and pregnant individuals may be more similar to others at the same stage of pregnancy than to themselves in the recent past. We introduce new methods for dynamic causal transfer learning that continually identify and adapt relevant population data for personalized causal inference. We initially test our approach on publicly available ICU, diabetes, and nutrition datasets, before collecting a unique dietary and activity dataset from 150 pregnant individuals.

Public Health Relevance

(Sae instructions): Gestational diabetes mellitus (GDM) is a significant public health problem, and while changes in diet during pregnancy may increase risk, it is currently unknown which dietary factors have the most influence. This project develops better methods for continually assessing GDM risk and gaining insight into diet in a longterm individualized way. The methods developed will be generalizable to other types of health data and may in particular yield insights into causes and effects of diabetes and chronic disease.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stevens Institute of Technology
Biostatistics & Other Math Sci
Biomed Engr/Col Engr/Engr Sta
United States
Zip Code