Literature and Data Driven Hypothesis Generation for High Throughput Experiments

Jagadish, H

Abstract

Literature and Data Driven Hypothesis Generation for High Throughput Experiments Microarray gene expression analyses are used widely in biomedical research today. Thousands of genes can be assayed in a single experiment, and differences in their expression level observed across some experimental condition variance of interest, such as diseased versus healthy tissue. The difficulty is that there is natural variation in gene expression levels, and experimental differences in samples and microarrays. In consequence, it is hard to know which observed differences are biologically significant and which are just the result of random fluctuations. It is generally accepted that this problem is best addressed by integrating other sources of biological knowledge, such as co-occurrence in the literature, in the Gene Ontology, or in pre-defined gene sets. However, most techniques still produce only a ranked list of genes or gene clusters, and these still require biological interpretation. A biomedical scientist knows well what to do if a single gene, or a set of genes on a known pathway, is shown to be differentially expressed. The difficulty with interpreting the results of high throughput experiments is that the human effort required does not scale to hundreds of genes and, even worse, human expertise cannot be as deep across such a large set of genes as for a particular gene under careful investigation. Most standard computational approaches use bulk manipulation of candidate genes, performing analyses that no biomedical scientist would conduct if a single gene were at hand. The goal of this project is to emulate computationally, for thousands of candidate genes, what a biomedical scientist would want to do for one gene. This involves bringing to bear biological knowledge, as found in the literature and in public databases, to develop biologically sound hypotheses that could explain the observed differential expression. Specifically, we will develop techniques to generate putative pathways dynamically, boot-strapping from observed differential expression data, based upon external evidence of relationship from the literature and from interaction databases. In a separate project, not part of this proposal, we have developed techniques for extraction of gene and protein interaction information from biomedical literature, including important information such as the type of interaction and the experimental conditions. We will exploit this extracted information resource, which currently includes full text of all articles in PubMed Central. The expected output of our algorithm will be a small number of hypothesized pathways that the scientist can choose to evaluate further experimentally.

Public Health Relevance

Bench biology has been transformed through the recent development of high throughput techniques, which permit the scientist to perform thousands of experiments in parallel at low cost. But this has in turn caused interpretation of experimental results to become a bottleneck. This project uses computational techniques to glean biological knowledge from the literature and from public databases to address this challenge.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 1R01LM010138-01
Application #: 7727544
Study Section: Special Emphasis Panel (ZLM1-AP-E (M3))
Program Officer: Ye, Jane

Project Start: 2009-07-01
Project End: 2011-06-30
Budget Start: 2009-07-01
Budget End: 2010-06-30
Support Year: 1
Fiscal Year: 2009
Total Cost: $435,850
Indirect Cost

Institution

Name: University of Michigan Ann Arbor
Department: Engineering (All Types)
Type: Schools of Engineering
DUNS #: 073133571

City: Ann Arbor
State: MI
Country: United States
Zip Code: 48109

Related projects


NIH 2010 R01 LM	Literature and Data Driven Hypothesis Generation for High Throughput Experiments Jagadish, H V. / University of Michigan Ann Arbor	$591,848
NIH 2009 R01 LM	Literature and Data Driven Hypothesis Generation for High Throughput Experiments Jagadish, H V. / University of Michigan Ann Arbor	$435,850

Publications

Farfán, Fernando; Ma, Jun; Sartor, Maureen A et al. (2012) THINK Back: KNowledge-based Interpretation of High Throughput data. BMC Bioinformatics 13 Suppl 2:S4

Ma, Jun; Sartor, Maureen A; Jagadish, H V (2011) Appearance frequency modulated gene set enrichment testing. BMC Bioinformatics 12:81

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: