Cardiovascular disease (CVD) and its associated risk factors such as hypertension and dyslipidemia constitute a major public-health burden due to increased mortality and morbidity and rising health care costs. Massive epidemiological data are needed to detect the small effects of many individual genes and the environment on these traits. However, sample sizes needed to make powerful inferences may only be reached by integrating multiple epidemiological studies. Meaningful integration of information from multiple studies requires the development of data ontologies which make it possible to integrate information across studies in an optimum manner so as to maximize the information content and hence the statistical power for detecting small effect sizes. A second compounding problem of data integration is that software applications that manage such study data are typically non-interoperable, i.e. """"""""silos"""""""" of data, and are incapable of being shared in a syntactically and semantically meaningful manner. Consequently, an infrastructure that integrates across studies in an interoperable manner is needed to ensure that epidemiological cardiovascular research remains a viable and major player in the biomedical informatics revolution which is currently underway. The cancer Biomedical Informatics Grid (caBIGTM) is addressing these problems in the cancer domain by developing software systems that are able to exchange information or that are syntactically interoperable by accessing metadata that is semantically annotated using controlled vocabularies. Our overarching goal is to develop ontologies for integrating cardiovascular epidemiological data from multiple studies. Specifically, we propose three Aims: First, develop cardiovascular data ontologies and vocabularies for each of three disparate multi-center epidemiological studies that facilitate data integration across the studies and data mining for various phenotypes. Second, adopt a technology infrastructure that leverages the cardiovascular data ontologies and vocabularies using Model Driven Architecture (MDA) and caBIGTM tools to facilitate the integration and widespread sharing of cardiovascular data sets. Third, facilitate seamless data sharing and promote widespread data dissemination among research communities cutting across clinical, translational and epidemiological domains, primarily through collaboration with the established CardioVascular Research Grid (CVRG).

Public Health Relevance

Cardiovascular disease (CVD) is a leading cause of mortality and morbidity which contributes substantially to rising health care costs and consequently constitutes a major public health burden. Therefore, understanding the genetic and environmental effects on these CVD traits is important. Massive epidemiological study data are needed to detect the small individual effects of genes and their interactions, and integration of multiple epidemiological studies are necessary for generating large sample sizes. Unfortunately, integrating information from multiple studies in a meaningful manner requires the development of data ontologies (language and grammar). Our proposal addresses this need, and does this in a way that is informative and user-friendly from the End User's point of view.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Research Project (R01)
Project #
1R01HL094286-01
Application #
7558424
Study Section
Special Emphasis Panel (ZRG1-BST-G (50))
Program Officer
Larkin, Jennie E
Project Start
2009-07-01
Project End
2011-06-30
Budget Start
2009-07-01
Budget End
2010-06-30
Support Year
1
Fiscal Year
2009
Total Cost
$488,000
Indirect Cost
Name
Washington University
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
068552207
City
Saint Louis
State
MO
Country
United States
Zip Code
63130
Shimoyama, Mary; Nigam, Rajni; McIntosh, Leslie Sanders et al. (2012) Three ontologies to define phenotype measurement data. Front Genet 3:87