A holy grail of bioinformatics is the creation of whole-cell models with the ability to enhance human understanding and facilitate discovery. To this end, a successful and widely-used effort is the Gene Ontology (GO), a massive project to manually annotate genes into terms describing molecular functions, biological processes and cellular components and provide relationships between terms, e.g. capturing that small ribosomal subunit and large ribosomal subunit come together to make ribosome. GO is widely used to understand the function of a gene or group of genes. Unfortunately, GO is limited by the effort required to create and update it by hand. It exists only for well-studied organisms and even then in only one, generic form per organism with limited overall genome coverage and a bias towards well-studied genes and functions. It is not possible to learn about an uncharacterized gene or discover a new function using GO, and one cannot quickly assemble an ontology model for a new organism, let alone a specific cell-type or disease-state. This proposed research will change this state of affairs. Already, work has shown that large networks of gene and protein interactions in Saccharomyces cerevisiae can be used to computationally infer an ontology whose coverage and power are equivalent to those of the manually-curated GO Cellular Component ontology. Still, this first attempt was limited in the types of experimental data used and its ability to infer the more generally useful Biological Process ontology. Here machine learning approaches will be applied to integrate many types of experimental data into ontology model construction and analyze the type of biological information provided by each experiment, revealing those experiments most informative for capturing Biological Process information. Furthermore, the high-throughput experimental data to ontology paradigm explored here will be used to develop a computational tool to highlight novel types of hypotheses that are inaccessible by current high-throughput experimental data analysis methods. Preliminary work has shown GO to be useful for prediction of synthetic lethal pairs of genes, i.e. genes that are individually non-essential but when knocked out together cause cell death. Given the high mutation rate in cancer, these pairs provide potential cancer drug targets, as a drug may target a gene product which is now essential in the mutated cancer cells but not other cells, thereby killing only cancer cells. Because data-driven ontologies are not as hindered by issues with bias and coverage and are specifically designed to capture only functional relationships, this proposal will explore the idea that data-driven ontologies will be better suited to help predict synthetic lethal pairs than GO. To this end, algorithms will be developed to construct a data-driven ontology of yeast DNA repair and use this ontology to predict synthetic lethal pairs of genes. Overall, this proposal will develop the computational and experimental roadmap to construct a whole-cell model of gene function - an ontology - and use the model to discover useful biology - synthetic lethal pairs.

Public Health Relevance

In this proposal, a new framework for using the results of commonly performed, genome-wide experiments has the potential to create whole-cell models of gene function, similar to the widely-used Gene Ontology, directly from data without manual intervention. This will allow creation of useful models of cells from different organisms, tissues and diseases which researchers can use to discover the function of unstudied genes and to uncover new functions performed by the cell. Furthermore, this proposal will use these models for the discovery of new cancer drug targets called synthetic lethal pairs of genes.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Individual Predoctoral NRSA for M.D./Ph.D. Fellowships (ADAMHA) (F30)
Project #
5F30HG007618-03
Application #
9145523
Study Section
Special Emphasis Panel (ZRG1-F08-A (20)L)
Program Officer
Gatlin, Christine L
Project Start
2014-09-01
Project End
2017-08-31
Budget Start
2016-09-01
Budget End
2017-08-31
Support Year
3
Fiscal Year
2016
Total Cost
$48,576
Indirect Cost
Name
University of California San Diego
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
804355790
City
La Jolla
State
CA
Country
United States
Zip Code
92093
Kramer, Michael H; Farré, Jean-Claude; Mitra, Koyel et al. (2017) Active Interaction Mapping Reveals the Hierarchical Organization of Autophagy. Mol Cell 65:761-774.e5
Yu, Michael Ku; Kramer, Michael; Dutkowski, Janusz et al. (2016) Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems. Cell Syst 2:77-88