The objectives of this research program are (1) to develop and apply novel computational approaches for uncovering genome-wide networks of interactions between genes and proteins, and (2) to conduct related educational activities in a newly established bioinformatics program in the Department of Electrical Engineering and Computer Science at the University of Kansas. Specifically, built upon reconstructing biological networks of moderate size, the new research will computationally uncover genome-wide biological networks and map interactions of genes and proteins across a variety of organisms. The research directions include: Simultaneously integrating multiple biological knowledge into dynamic Bayesian networks for learning networks of gene interactions; learning networks of protein interactions from heterogeneous data; learning integrated networks of gene and protein interactions; learning genome-wide networks of gene and protein interactions; and cross-species network learning. It will advance the state of the art by developing machine learning methods for effectively integrating multiple prior knowledge from different sources of data, including learning for highly heterogeneous data and large-scale network. The research will also produce new methods and user-friendly software that can be applied by molecular biologists to gain insight into diverse biological problems, such as how biological processes are regulated on a genome scale and how individual bio-molecules interact with one another in the cell.

Learning with prior knowledge and highly heterogeneous data sources are fundamental to computational biology, information theory, machine learning, data mining, and other areas. Thus, the proposed research will benefit a variety of application domains including research in biology and medicine. The biological discovery derived from this project will also contribute to a variety of fields that include agriculture development, rational drug design, and health care. The research program will foster and facilitate collaborations between biologists and the PI. The educational components are closely tied to the research activities, which include (1) developing and improving bioinformatics courses that are closely related to the research outlined here and integrating them into the core bioinformatics curriculum, and (2) providing special training opportunities in the interdisciplinary area of bioinformatics for a wide-range of students, from high school through graduate school, including groups typically underrepresented in the field of science and technology.

Project Report

With the completion of the Human Genome Project and the development of advanced high-throughput technologies, the great challenge currently confronting scientists in life science research is how to computationally model and elucidate the complex biological networks from these high-throughput biological data sets. Toward this end, the proposed research focused on developing and applying novel computational methods for reconstructing genome-wide biological networks from high-throughput data. The objectives of this research program are (1) to develop and apply novel computational??approaches for uncovering genome-wide networks of interactions between genes and proteins, and (2) to conduct related educational activities in bioinformatics. Towards achieving these goals, we have developed and implemented novel machine learning algorithms for (1) simultaneously integrating multiple biological knowledge into Bayesian networks for learning networks of gene interactions; (2) learning networks of protein interactions from heterogeneous data; (3) learning integrated networks; and (4) learning genome-wide networks of gene and protein interactions. The intellectual merit of this proposal includes novel computational methods for uncovering genome-wide biological networks. We have developed and applied multiple machine learning methods for (i) effectively integrating multiple prior knowledge from different sources of data, (ii) highly heterogeneous data learning, and (iii) large-scale network learning. We have also developed user-friendly software that can be applied by molecular biologists to gain insight into diverse biological problems, including, for example, The University of Kansas Proteomics Service (KUPS) that provides high-quality protein-protein interaction datasets for researchers who are interested in elucidating PPIs with in silico methods; The University of Kansas Gene Ontology Analysis Layer (KU GOAL) that is free and open to all users for discovering similar genes or gene products, validating drug targets, and developing analyzer for measuring GO-based functional similarity (see www.ittc.ku.edu/chenlab). The broader impacts of this proposal include computational algorithms that are fundamental to computational biology, information theory, machine learning, data mining, and other areas. The biological discovery derived from this project also contributes to a variety of fields that include biology, rational drug design, and health care. Furthermore, the research program highly fostered and facilitated collaborations between biologists and the PI. The educational components were closely tied to the research activities, which included new bioinformatics courses closely related to the research outlined here and special training opportunities in the interdisciplinary area of bioinformatics for a wide-range of students, from high school through graduate school, including groups typically underrepresented in the field of science and technology (e.g., minority and female students).

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1347706
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2012-11-15
Budget End
2014-04-30
Support Year
Fiscal Year
2013
Total Cost
$85,226
Indirect Cost
Name
Wayne State University
Department
Type
DUNS #
City
Detroit
State
MI
Country
United States
Zip Code
48202