Advancing our understanding of social and behavioral effects in aging will increasingly require the integration of complex and often disparate data sets To measure such effects, investigators must manage demographic data on mortality and fertility, behavioral or survey data on social relationships, and, increasingly, biomarker data tha capture genetic and physiological variation. Currently, extant databases are designed to facilitate analysis of complex demographic and sociobehavioral data or complex genetic and genomic data: databases that meet the challenge of integrating all of these data types do not yet exist. Consequently, the research community is not reaping the full benefits of these data sets for understanding aging. Further, we lack models for how to house these data types together in a cohesive and accessible fashion. We propose to develop such a model by building on an existing database on wild primates that houses individual-based multidimensional, longitudinal phenotypic data that have already proved valuable for studies of social and behavioral effects on aging. We also have a growing set of complementary genetic and genomic data on the same individuals, including candidate gene and whole genome resequencing, gene expression, and epigenetic data sets that promise to capture physiological changes across the life course. In the proposed work, we will provide centralized, integrated archival storage for these multidimensional data sets, create a seamless integration of genetic and phenotypic information at the individual level, and provide a much needed, well documented model of such an integration. We will also work with the National Archive of Computerized Data on Aging (NACDA) to build mechanisms for sharing these data with other researchers in the field. Specifically, we propose to (1) build database modules to house our expanding multi-dimensional genetic and genomic data sets, (2) link these new genetics and genomics modules to our existing database (BABASE) and to each other, to integrate our genetic and phenotypic information, and (3) create a public portal to BABASE that will allow open access to all components of the genetics/genomics modules of the database, as well as open access to key aging- related components of the phenotypic data;this portal will be accessible through NACDA. Our new genetic modules will draw on database module designs pioneered by Chado, a branch of the Generic Model Organism Database project (GMOD). Together, these efforts will provide important archival storage for these valuable data sets, increase the efficiency of data analysis, and promote new, synergistic research directions, including collaborations with outside investigators that will allow us to gain deeper insights into aging in natural mammal populations. In addition, because all the code underlying BABASE and its new extensions will be open source, the proposed work will produce models for how other population studies focused on aging can achieve similar goals.
Genetic and genomic data play an increasingly important role in understanding the aging process, including social and behavioral effects on aging. While database systems for storing genomic data alone are well developed, the research community is lagging behind in its ability to link these genomic data to the traits of individual people (or animals), limiting their applicability to aging research and other kinds of work;specifically, we lack mechanisms to link complex, multi-dimensional, individual-based trait data with complex, multi- dimensional genetic/genomic data. We propose to build such a mechanism and provide open access to it, as well as to the data we store in it, as a model for the integration of complex genetic/genomic data with individual trait data.
Lea, Amanda J; Tung, Jenny; Zhou, Xiang (2015) A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data. PLoS Genet 11:e1005650 |