High-throughput biotechnologies have generated a large number and variety of molecular networks, including protein interaction networks, gene coexpression networks, and regulatory networks. Network biology is an emerging field aiming to understand basic biological mechanisms and disease processes by using molecular networks. Therefore, computational and statistical tools are urgently needed to mine biological knowledge from multiple networks. However, few such computational algorithms are available, and almost no statistical methods have been developed for multiple network analysis. The investigators hypothesize 1) that efficient score functions for gene subnetworks can be defined so that high score correlates with biological significance, 2) that the statistical significance of biological networks are mathematically tractable, and 3) that efficient computational tools can be developed to find statistically significant patterns in biological networks. The objective of this application is to address these questions. In addition, the researchers will develop the software necessary to implement these programs. As a practical application, and to gain an understanding of molecular networks involved in aging, these algorithms will be implemented to analyze a large collection of aging-related gene expression datasets. The investigators will achieve all of these objectives through the following specific aims: 1) define novel scoring functions for network modules, taking both node degrees (the number of links of a node) and edge transitivity (the dependency between links forming triangles) into consideration; and develop efficient computational algorithms to identify molecular modules with high scores; 2) develop a rigorous theory to evaluate the statistical significance of the identified molecular modules; and 3) apply the fully developed tools to analyze a large collection of aging-related datasets and experimentally test a subset of the predictions in yeast. The large number of networks, their size, and their complexity, together make this an especially challenging project. The results from this research can be extremely useful for large scale network analysis, and therefore for the systematic understanding of biology.
Identifying genetic subnetworks related to diseases or drug treatments is an important challenging problem in biomedical research. The statistical and computational tools developed in this application for the analysis of multiple networks will be essential for the effort. The tools will be used to identify genetic networks specific to aging. ? ? ?
|Liu, Xuemei; Wan, Lin; Li, Jing et al. (2011) New powerful statistics for alignment-free sequence comparison under a pattern transfer model. J Theor Biol 284:106-16|
|Meng, Lu; Sun, Fengzhu; Zhang, Xuegong et al. (2011) Sequence alignment as hypothesis testing. J Comput Biol 18:677-91|
|Li, Wenyuan; Liu, Chun-Chi; Zhang, Tong et al. (2011) Integrative analysis of many weighted co-expression networks using tensor computation. PLoS Comput Biol 7:e1001106|
|Wan, Lin; Reinert, Gesine; Sun, Fengzhu et al. (2010) Alignment-free sequence comparison (II): theoretical power of comparison statistics. J Comput Biol 17:1467-90|
|Zhai, Zhiyuan; Ku, Shih-Yen; Luan, Yihui et al. (2010) The power of detecting enriched patterns: an HMM approach. J Comput Biol 17:581-92|
|Zhou, Linqi; Ma, Xiaotu; Arbeitman, Michelle N et al. (2009) Chromatin regulation and gene centrality are essential for controlling fitness pleiotropy in yeast. PLoS One 4:e8086|
|Wang, Wenhui; Nunez-Iglesias, Juan; Luan, Yihui et al. (2009) Usefulness and limitations of dK random graph models to predict interactions and functional homogeneity in biological networks under a pseudo-likelihood parameter estimation approach. BMC Bioinformatics 10:277|
|Reinert, Gesine; Chew, David; Sun, Fengzhu et al. (2009) Alignment-free sequence comparison (I): statistics and power. J Comput Biol 16:1615-34|
|Pape, Utz J; Rahmann, Sven; Sun, Fengzhu et al. (2008) Compound poisson approximation of the number of occurrences of a position frequency matrix (PFM) on both strands. J Comput Biol 15:547-64|