This project's goal is to develop innovative statistical approaches to multi-study genomic data analysis. Specific targets include generalization of meta-analysis tools used in medicine and social sciences to the genomics context, metrics for evaluating reproducibility of expression measurements across platform in the absence of a gold standard, approaches for deriving and validating common expression scales across platforms, and a novel reformulation of the combination problem based on constructing ``coexpression matrices'' in which an element represents the coexpression of a subset of genes in a given study. The project includes software implementation, application to a set of representative genomic analyses, and development of public-domain support website.

Genomics studies are studies that measure simultaneously the activity of a large portion of the thousands of genes in a biological system. These have given a great impulse to the life sciences in the past decade, and changed the way in which biology, medicine, and biotechnology make progress. A large number and variety of genomics studies are accruing. Because of cost and difficulty in the acquisition of biological samples, especially in medicine, the majority of genomic investigations are carried out using a limited number of samples, and focus on highly specific problems. This scenario poses two important questions for the genomics community. First, given the wide variety of genomic technologies and protocols, there is concern about reproducibility of genomic findings across technologies and laboratories. How can one systematically use the large body of genomic information available to assess reproducibility? Second, given the large, but fragmented and heterogeneous, set of studies that are accruing, there is concern about the ability of the scientific community to efficiently integrate the resulting knowledge. How can one perform analysis of genomics data across studies, across technologies and across related biological systems? This project's overall goal is to address these two questions by developing data analysis tools for comparison and integration of genomic information across studies, across measurement technologies and across biological systems. Today, multi-study genomic analysis are rare, despite the wide availability of genomic data in the public domain. The premise underlying this proposal is that this is due in large part to the lack of specific, systematic and rigorous statistical approaches and the associated software tools. This project aims at providing such tools and therefore, if the investigator's premise is correct, will promote a more extensive, more efficient and more rigorous use of the vast resources made available by the massive investment made on genomic studies.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1041698
Program Officer
Junping Wang
Project Start
Project End
Budget Start
2009-09-01
Budget End
2011-10-31
Support Year
Fiscal Year
2010
Total Cost
$212,427
Indirect Cost
Name
Dana-Farber Cancer Institute
Department
Type
DUNS #
City
Boston
State
MA
Country
United States
Zip Code
02215