Integrating large-scale genomics data has huge potential to accelerate the identification of disease genes in human. Three major challenges lie in the current integrative approach for predicting disease genes. First, previous integrations in general limit genomic data input to one species at a time, while disease datasets are often generated in multiple model organisms. Second, public functional genomic datasets are dominated and biased by certain data types and accessible tissues, which can be addressed by expert curation of input datasets. Third, when multiple tissue-specific networks have been generated, a mathematical formulation is lacking to prioritize among these competing networks for the specific disease under consideration. This collaborative proposal aims at addressing the above challenges by exploring a prototype of bioinformatics tools to integrate multiple relevant global and tissue-specific networks across mammalian species targeting a specific disease, here ataxia. This proposal is based on our preliminary data in developing both global and cerebellum-specific networks to prioritize ataxia associated genes, and on the two PIs'complementary expertise in genomic data integration and experimental ataxia gene confirmation. We will 1) use domain-specific and multiple species data to establish global, brain, cerebellum, related tissue, and ataxia-specific networks, and develop web tools to explore these networks;and 2) develop multiple kernel learning algorithms to weigh and integrate multiple networks to predict ataxia-associated genes. Although the algorithms will be developed targeting ataxia only, we envision that this expert-driven integrative approach will be adaptable to other disease gene identification scenarios.
Computational networks generated through large-scale genomic data integration can help place genes, proteins and their mutations into functional context. Traditional functional networks do not address tissue-specificity and are limited to single species whereas animal models have often significantly informed human disease research. We propose a strategy to integrate multiple tissue-specific functional networks to prioritize disease genes through incorporating expert knowledge input and genomics data from multiple mammalian species. We will develop our strategy in the context of a rare genetic disease, ataxia, for which we have extensive expert knowledge on collecting relevant genomic datasets and gathered initial candidate gene set to test. We expect the successful implementation of our pipeline will become a prototype for gene identification in other diseases using integrated tissue-specific networks, which may eventually be brought to clinical settings in which DNA from subjects with genetic disorders of unknown cause are being sequenced.
|Li, Hong-Dong; Menon, Rajasree; Omenn, Gilbert S et al. (2014) The emerging era of genomic data integration for analyzing splice isoform function. Trends Genet 30:340-7|
|Shi, Lihong; Sierant, M C; Gurdziel, Katherine et al. (2014) Biased, non-equivalent gene-proximal and -distal binding motifs of orphan nuclear receptor TR4 in primary human erythroid cells. PLoS Genet 10:e1004339|
|Li, Hong-Dong; Menon, Rajasree; Omenn, Gilbert S et al. (2014) Revisiting the identification of canonical splice isoforms through integration of functional genomics and proteomics evidence. Proteomics 14:2709-18|
|Zhu, Fan; Guan, Yuanfang (2014) Predicting dynamic signaling network response under unseen perturbations. Bioinformatics 30:2772-8|
|Omenn, Gilbert S; Guan, Yuanfang; Menon, Rajasree (2014) A new class of protein cancer biomarker candidates: differentially expressed splice variants of ERBB2 (HER2/neu) and ERBB1 (EGFR) in breast cancer cell lines. J Proteomics 107:103-12|
|Shi, Lihong; Lin, Yu-Hsuan; Sierant, M C et al. (2014) Developmental transcriptome analysis of human erythropoiesis. Hum Mol Genet 23:4528-42|
|Eksi, Ridvan; Li, Hong-Dong; Menon, Rajasree et al. (2013) Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data. PLoS Comput Biol 9:e1003314|