Recent cancer genome sequencing projects and human genome-wide association studies (GWAS) have underscored the principle that complex phenotypes like cancer or disease susceptibility do not result from single DNA sequence variants in the same gene in all individuals. Rather, the inherited or somatic variants responsible for these phenotypes affect multiple genes in cellular signaling, regulatory, and metabolic pathways. New genome sequencing technologies are now providing measurements of these sequence variants in large numbers of samples, while other technologies are measuring whole-genome networks of interactions between genes. There is an urgent need for computational techniques to identify pathways, or groups of genes, that are associated to a phenotype.

This project will develop robust algorithmic and statistical techniques for four challenges in the analysis of DNA sequence variants in the context of known and novel gene-gene interactions. (1) Incorporating prior knowledge of gene interactions. This project develops a diffusion model to determine subnetworks of a genome-scale interaction network that are enriched for genetic variants across multiple samples. (2) Deriving robust statistical tests to overcome multiple hypothesis-testing problems in network analysis. Biological interaction networks containing tens to hundreds of thousands of nodes and edges have an enormous number of subnetworks that might be enriched for variants. This proposed work will design techniques to evaluate multiple candidate subnetworks with rigorous bounds on the false discovery rate. (3) Performing de novo identification of gene groups without an interaction network. The proposed work will examine combinatorial approaches to extract subsets of altered genes without prior knowledge of their interactions. These approaches will leverage the increasingly large number of sequenced samples that are becoming available. (4) Implementation of algorithms for evaluation on biological data from two applications: (a) somatic mutations identified in cancer genome sequencing studies, and (b) rare genetic variants in human association studies. These applications will be conducted in collaboration with two biomedical research groups.

Algorithms developed in this proposal will be implemented and released as open-source software for use by the biological and medical community. The project will partially support the training of graduate students, and undergraduates will be involved in implementing proposed algorithms. Finally, research from this project will use incorporated as pedagogical examples in multiple undergraduate and graduate courses.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1016648
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2010-08-15
Budget End
2015-07-31
Support Year
Fiscal Year
2010
Total Cost
$527,627
Indirect Cost
Name
Brown University
Department
Type
DUNS #
City
Providence
State
RI
Country
United States
Zip Code
02912