High-throughput technology now allows measuring the activities and interactions of tens of thousands of molecules in the cell simultaneously, opening new doors to systems-level scientific exploration in biology. Advanced computational methods are in development to analyze the huge amount of data to extract patterns that represent knowledge and to construct predictive models that have the promise to determine the characteristics of an organism, such as cancer outcomes or plant growth phenotypes. To help achieve these goals, this project aims at developing efficient and effective computational algorithms and tools to integrate heterogeneous and noisy high-throughput data, and to analyze them in ways that treat genes as inter-connected rather than independent components of the cell.

Biological networks are mathematical models that describe the interactions among molecules in the cell and are critical to the modeling and understanding of complex biological systems. However, the large sizes and complexity of biological networks as well as the noisy and incomplete data pose critical challenges to network-based data analysis. To tackle these challenges, a practical and intuitive strategy is to analyze/utilize such networks on the level of functional pathways, i.e., genes/proteins involved in similar biological processes, which would significantly reduce the complexity of biological networks and improve the understanding of complex phenotypes. As the current knowledge of functional pathways is rather limited for most species, this project will develop a set of algorithms and software tools for fully automated discovery of dense subnetworks as candidate functional modules, and develop functional module-oriented algorithms for analyzing/utilizing biological networks for several real applications. First, this project will develop algorithms to improve network quality and network module discovery using information embedded in network topology. For networked (e.g. protein-protein interaction) data, topology is utilized to improve edge reliability, and subsequently module discovery, using a novel topological similarity measurement based on random walks on graphs. For non-networked (e.g., transcriptomic) data, global network topology is utilized to construct an "optimal" network that enables fully automated module discovery without any user-specified parameters. Second, this research will develop computational methods to systematically investigate the relationship between network topology and biological functions, which is expected to advance the current understanding of the organizing principles of biological networks, and facilitate prioritizing genes in disease studies. Finally, this project proposes a novel Steiner tree based algorithm for identifying potential causal genes associated with cancer phenotypes, and a novel similarity metric to compare patients based on pathway/subnetwork-level gene expression patterns, which can be easily combined with existing clustering/classification algorithms for network-based prediction of cancer outcomes.

The final outputs of this project will include both bioinformatics tools for integrative data analysis and databases of biological knowledge discovered from different input datasets. These tools and resources will be made freely available on the web, which can be used by a broad range of researchers who are interested in bioinformatics algorithm development or applications. These tools and resources will be applied to study several biological processes of central interests to collaborators, who have committed to validate some of the computational predictions. These include identifying novel plant hormone response genes, predicting and characterizing DNA damage response genes, and predicting metastasis potentials for breast cancer patients, by integrating protein-protein interaction and transcriptomic data. This project will also contribute to the advancement of computing with the development of novel network link prediction and module discovery algorithms and network-constrained clustering/classification methods that are expected to have immediate applications in other domains besides biological sciences. The activities undertaken as part of this research will be incorporated into several courses and will expand the educational and research opportunities available at the University of Texas at San Antonio, a minority-serving institute where the majority of undergraduates are from under-represented minorities, and is expected to increase the geographic and ethnic diversity and encourage the participation of minority groups in bioinformatics and computational biology research.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1218201
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2012-10-01
Budget End
2017-09-30
Support Year
Fiscal Year
2012
Total Cost
$452,657
Indirect Cost
Name
University of Texas at San Antonio
Department
Type
DUNS #
City
San Antonio
State
TX
Country
United States
Zip Code
78249