Given the wealth and availability of genomic and environmental exposure data, computational methods provide a powerful opportunity to identify population-specific determinants of disease. Proper treatment of data types emerging from a diverse set of molecular and environmental profiling technologies cannot be analyzed using traditional statistical routines and new computational approaches are needed. In line with the President's Precision Medicine Initiative, the goal of this proposal is to develop computational methods and integrate large- scale genetic and environmental exposure datasets to elucidate factors that affect preterm birth (PTB) in diverse populations. Preterm birth, or the delivery of an infant prior to 37 weeks of gestation, is a major health concern. Infants born prematurely, comprising of about 12% of the US newborns, have elevated risks of neonatal mortality and a wide array of health problems. Preterm birth rates vary among different ethnic groups, with frequencies significantly elevated in African Americans and moderately elevated in Hispanics in comparison to Europeans. Environmental and socioeconomic factors alone may not explain these disparities and despite the evidence for a genetic basis to preterm birth, to date no causal genetic variants have been identified. In this proposal I aim to leverage the rich genetic and environmental variation data and develop computational approaches to advance our understanding of biology of preterm birth as it relates to all populations. To that extent, I propose three aims.
In aim 1, I will develop computational methods to identify and validate novel genetic factors for preterm birth by genome-wide association (GWA) study in diverse ethnic populations. I obtained a comprehensive set of publicly available PTB case and control datasets consisting of ethnically diverse mothers and babies including 3,500 cases and nearly 16,000 controls from dbGAP and will carry out an ancestry-based case-control GWA study to identify genetic factors influencing PTB.
In aim 2, I will develop analytical methodology to identify environmental and socioeconomic factors that impact preterm birth in diverse ethnic populations. I propose to integrate linked California State databases covering over 3 million births across diverse populations with geographical location data and pollution levels and UV exposure data from the Environmental Protection Agency in order to identify whether these exposures play a role in contributing to population-specific PTB risk.
In aim 3, I will carry out integrative data analysis and build computational models in order to identify population specific interactions between the genetic and environmental factors affecting PTB risk. I hypothesize that gene-environment interactions contribute to population differences in preterm birth risk following environmental exposures. The proposed work will allow us to learn more about the etiology PTB, but could also be extended to other phenotypes of interest. This project is the logical next step for the study of the interaction of genetics and environment in the context of disease, which can be used to inform precise population-specific diagnostic and therapeutic strategies.
Given the wealth and availability of genomic and environmental exposure data, computational methods provide a powerful opportunity to identify population-specific determinants of disease. The goal of this proposal is to develop computational approaches to integrate diverse genetic and environmental exposure datasets to elucidate factors that affect disease in diverse populations and apply them to the study of preterm birth. The methodology developed as part of this proposal can be extended and applied to other phenotypes of interest and inform precise population-specific diagnostic and therapeutic strategies.