The overall Phase II objective is to continue the development/validation of new system biology computational tools for inferencing gene regulatory relationships from gene expression data obtained from multi-perturbation gene knockout experiments. NIH's Knockout Mouse Project (KOMP) is an initiative to generate a public resource of mouse embryonic stem (ES) cells containing a null mutation in every gene in the mouse genome - important for deciphering the complexity of biological systems of mice and ultimately man. It is anticipated that a new generation of multi-perturbation/KO studies with a biological system perspective will emerge in all areas of biomedical research. New computational tools for deciphering genetically regulated responses (genotype-to- phenotype signaling cascades) will significantly aid in advancing our understanding of the molecular targets and mechanisms of many diseases. Today, researchers need new tools to deal with and decipher the tremendous volumes of gene/protein expression data generated from multi-perturbation investigations. Seralogix's Phase II efforts focus on improving and creating new functionality for learning larger scale (biological system level) gene regulatory networks and integrating this network learning functionality into our existing Biosystem Analysis Framework (BAF). Our BAF is comprised of a suite of integrated mathematical analysis and modeling tools and databases. The BAF core tools are based on Dynamic Bayesian Networks (DBNs). DBNs allow us to systematically integrate prior knowledge with empirical time-course expression data for modeling, pattern recognition and eventually biological system genetic network learning as proposed herein. Our algorithmic innovation, proven feasible in Phase I, is the incorporation of biological prior knowledge and multi-perturbation data with our DBNs for enabling a genetic network learning approach. This approach is based on well established Bayesian statistical methods that we adopt in a sampling scheme enhanced with biological prior knowledge to overcome the intrinsic difficulty of structure learning from sparse and noisy gene expression data. We show in Phase I that prior-knowledge, coupled with Bayesian network learning methods and multi-perturbation/KO experimental data, resulted in reliable gene regulatory relationship identification. We believe this approach can be scaled up, leading to a more robust mathematical/functional system level model. Further, we believe that integrating genetic network learning into Seralogix's BAF will provide an important new tool for identifying novel gene regulatory relations and insights into disease processes and have significant commercial potential for Seralogix. We will be collaborating with the Texas Institute of Genomic Medicine as a provider of mouse gene expression KO data who are studying the genomic causes of birth defects. Our Phase II aims include: 1) scaling our approach to support biological system level network learning;2) statistical assessment and biological validation of our learned networks;3) developing new tools/techniques to interrogate the resulting system network models so biologist can extract important knowledge.

Public Health Relevance

It is one of the ultimate goals for modern biological research to fully elucidate the intricate interplays and the regulations of the molecular determinants that control health and disease, to name a few, cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases. Having new computational methods (software tools) for identifying and deciphering genetically regulated response (e.g. signaling cascades) will significantly aid in advancing our understanding of the molecular targets and mechanisms of many diseases of high public health concern. The discovery of underlying genetic function and relationships will be extremely important for making medical breakthroughs, especially for the safe and effective development of drugs and diagnostics. Today, researchers are hindered by the tremendous volumes of gene/protein expression data generated from knockout investigations. Computational tools that transform these volumes of raw genomic/proteomic data to actionable knowledge via mathematical modeling will help guide and accelerate researchers'investigations of genetic disorder and identifying targets of intervention and treatment.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
2R44HG004457-02
Application #
7999392
Study Section
Special Emphasis Panel (ZRG1-IMST-E (15))
Program Officer
Bonazzi, Vivien
Project Start
2007-07-01
Project End
2012-06-30
Budget Start
2010-09-27
Budget End
2011-06-30
Support Year
2
Fiscal Year
2010
Total Cost
$537,957
Indirect Cost
Name
Seralogix
Department
Type
DUNS #
119169386
City
Austin
State
TX
Country
United States
Zip Code
78746