Detecting relations among heterogeneous genomic datasets

Noble, William

Abstract

During the past decade, the new focus on genomics has highlighted a particular challenge: to integrate the different views of the genome that are provided by various types of experimental data. The long-term objective of this work is to provide a coherent computational framework for integrating and drawing inferences from a collection of genome-wide measurements. Hence, the proposed research plan develops algorithms and computational tools for learning from heterogeneous data sets. We focus on the analysis of the yeast genome because so many genome-wide data sets are currently available; however, the tools we develop will be applicable to any genome. We approach this task using two recent trends from the field of machine learning: kernel algorithms that represent data via specialized similarity functions, and transductive algorithms that exploit the availability of unlabeled test data during the training phase of the algorithm. We apply focus on two tasks: (1) classifying groups of genes that are of interest to our collaborators, including components of the spindle pole body, cell cycle regulated genes, and genes involved in meiosis and sporulation, splicing, alcohol metabolism, etc., and (2) prediction of protein-protein interactions. These two specific aims are not only important scientific tasks, but also represent typical challenges that future genomic studies will face. Accomplishing these aims requires the integration of many heterogeneous sources of data, the prediction of multiple properties of genes and proteins, the explicit introduction of domain knowledge, the automatic introduction of knowledge from side information, scalability to large data sizes, and tolerance of large levels of noise. ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Exploratory/Developmental Grants Phase II (R33)
Project #: 5R33HG003070-03
Application #: 7120160
Study Section: Special Emphasis Panel (ZRG1-SSS-Y (11))
Program Officer: Bonazzi, Vivien

Project Start: 2004-09-30
Project End: 2008-08-31
Budget Start: 2006-09-01
Budget End: 2008-08-31
Support Year: 3
Fiscal Year: 2006
Total Cost: $414,036
Indirect Cost

Institution

Name: University of Washington
Department: Genetics
Type: Schools of Medicine
DUNS #: 605799469

City: Seattle
State: WA
Country: United States
Zip Code: 98195

Related projects


NIH 2006 R33 HG	Detecting relations among heterogeneous genomic datasets Noble, William Stafford / University of Washington	$414,036
NIH 2005 R33 HG	Detecting relations among heterogeneous genomic datasets Noble, William Stafford / University of Washington	$412,000
NIH 2004 R33 HG	Detecting relations among heterogeneous genomic datasets Noble, William Stafford / University of Washington	$400,000

Publications

Muratore, Kathryn E; Engelhardt, Barbara E; Srouji, John R et al. (2013) Molecular function prediction for a family exhibiting evolutionary tendencies toward substrate specificity swapping: recurrence of tyrosine aminotransferase activity in the I? subfamily. Proteins 81:1593-609

Sankararaman, Sriram; Kimmel, Gad; Halperin, Eran et al. (2008) On the inference of ancestries in admixed populations. Genome Res 18:668-75

Qiu, Jian; Noble, William Stafford (2008) Predicting co-complexed protein pairs from heterogeneous data. PLoS Comput Biol 4:e1000054

Pena-Castillo, Lourdes; Tasan, Murat; Myers, Chad L et al. (2008) A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol 9 Suppl 1:S2

Bleakley, Kevin; Biau, Gerard; Vert, Jean-Philippe (2007) Supervised reconstruction of biological networks with local models. Bioinformatics 23:i57-65

Qiu, Jian; Hue, Martial; Ben-Hur, Asa et al. (2007) A structural alignment kernel for protein structures. Bioinformatics 23:1090-8

Vert, Jean-Philippe; Qiu, Jian; Noble, William S (2007) A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics 8 Suppl 10:S8

Xing, Eric P; Jordan, Michael I; Sharan, Roded (2007) Bayesian haplotype inference via the Dirichlet process. J Comput Biol 14:267-84

Mann, Tobias P; Noble, William Stafford (2006) Efficient identification of DNA hybridization partners in a sequence database. Bioinformatics 22:e350-8

Lewis, Darrin P; Jebara, Tony; Noble, William Stafford (2006) Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics 22:2753-60

Showing the most recent 10 out of 20 publications

Comments

Be the first to comment on William Noble's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: