Complex diseases are caused by a variety of genomics, transcriptomics, epigenomics, and proteomics factors and many studies have suggested that these different factors do not act in isolation, but rather interact/crosstalk at multiple levels and depend on one another in an intertwined manner. A variety of genomics techniques such as SNPs, microarray gene expressions, and the emerging next generation sequencing (NGS), have generated vast amount of multiscale genomic data, providing multi-dimensional and complementary information. However, currently these multiscale genomics data have not been well integrated and associated with clinical data for comprehensive analysis of a disease. The difficulty lies in the complexity and heterogeneity of these multi-omics data. In addition, the specific properties of these data (e.g., their correlations across multiple levels, small sample size but large number of biomarkers, group structures) have not been well considered, which necessitate a paradigm shift in the technical approaches. The goal of this project is therefore to tackle these significant bioinformatics challenges by developing innovative integration approaches such as sparse models by considering the specific features of multiscale genomic data. Furthermore, we will apply them to the diagnosis (e.g., identification of genes) and prediction of risks to complex diseases (e.g., osteoporosis). Our multi-/inter-disciplinary research team consisting of statisticians, geneticists, molecular biologists, bioinformaticians and biomedical engineers with complementary expertise has worked synergistically in the past few years and contributed significantly to the development of data integration approaches. Building on this work, we plan to accomplish the following specific aims: 1) To extract genetic signatures (e.g., CNVs) from multiple NGS samples and incorporate them into multi-omics studies;2) To study the cross-talks/correlations between multi-omics data, from which epistatic networks can be detected;3) To develop data integration techniques that can combine multiple genomic factors for the identification of risk genes and regions;and 4) To construct a sparse regression model to predict quantitative traits with increased power from multiple sources of genomic information including pathways and interaction networks. We will validate our model with the study of osteoporosis at Tulane Center for Bioinformatics and Genomics. With over 20,000 patients collected, to our knowledge, we have the largest and most comprehensive datasets, which will serve as a unique platform for validating our approaches. We anticipate that the project will have a large and sustained impact. The successful implementation of the project will enable us to 1) better elucidate specific genetic risk mechanisms for osteoporosis;2) search for potential drug targets;and 3) ultimately obtain novel approaches for better prevention and treatment of osteoporosis. Upon the completion of the project, we will provide a set of efficient and powerful analytical tools for integrative data analysis, and make them freely available through our ongoing software development of GCATs (Genomic Convergence Analysis Tools) for multiscale genomic data management and analysis.
The study of genetic mechanisms underlying complex diseases is of paramount importance for diagnosis and prognosis. Our integrated and comprehensive approach promises to greatly change current ways of genomic data analysis, e.g., without fully utilizing correlated and complementary information, and incorporating prior knowledge from multi-omics data. Given the ubiquitous use of multi-omics techniques in biomedicine, our approaches with a new paradigm on multiscale genomic data integration will therefore have a significant impact.
|Xu, Chao; Zhang, Jigang; Wang, Yu-Ping et al. (2014) Characterization of human chromosomal material exchange with regard to the chromosome translocations using next-generation sequencing data. Genome Biol Evol 6:3015-24|
|Lin, Dongdong; Cao, Hongbao; Calhoun, Vince D et al. (2014) Sparse models for correlative and integrative analysis of imaging and genetic data. J Neurosci Methods 237:69-78|
|Duan, Junbo; Zhang, Ji-Gang; Wan, Mingxi et al. (2014) Population clustering based on copy number variations detected from next generation sequencing data. J Bioinform Comput Biol 12:1450021|
|Cao, Shaolong; Qin, Huaizhen; Deng, Hong-Wen et al. (2014) A unified sparse representation for sequence variant identification for complex traits. Genet Epidemiol 38:671-9|