The main objectives of this proposal are to develop improved statistical methods for detecting sets of genes that are differentially expressed across two or more conditions and to apply these methods to discover new genetic and physiological mechanisms that control food intake, nutrient utilization, energy regulation, and metabolism. The developed statistical methods will provide powerful alternatives to tests of enrichment or overrepresentation that have become popular tools for interpreting microarray experiments. The proposed methods gain advantages over existing methods by (1) recognizing, accounting for, and utilizing dependence among genes; (2) maintaining continuous information about the degree of difference between gene set expression distributions; (3) identifying interesting gene sets by comparison of gene sets across treatments rather than comparing gene sets to one another; and (4) capturing information about differential expression contained in the joint expression distributions rather than using only marginal distributions. In addition to their use for identifying differentially expressed gene sets in traditional microarray experiments, the proposed methods offer a new and powerful approach for identifying genetic loci that control the expression of gene networks. The integrated research team of statisticians and biologists will identify the best of the proposed methods by theoretical study of their asymptotic properties, by comparisons of their performance on simulated data sets designed to mimic structures found in real data sets, and by weighing the value of biological insights provided by their application to actual data from a variety of microarray experiments. The asymptotic framework used in this research considers the statistical properties of the testing procedures as both the dimension of the data vectors (number of genes in a set)and the sample size (number of experimental units) grow large. Such a framework permits evaluation of methods for use on data of very high dimension and produces results that are intrinsically interesting from the statistical point of view.

The developed methods will be used to investigate genetic control of food intake and energy regulation in pigs and to discover genetic regions that control the expression of gene networks in a population of mice that serve as a model for human obesity. The insights provided by these studies may be used to develop treatment strategies for human obesity. In addition, the proposed methods have much broader application to nearly any microarray-based investigation of differential gene expression. Applications range from the identification of sets of genes that play a role in distinguishing cancerous tissue from non-cancerous tissue to the identification of sets of genes important for developing high-quality plant material suitable for conversion to biofuel. The general goal of this work is to provide scientific researchers with powerful tools for identifying the most important genes behind a wide variety of biological phenomena.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0714978
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2007-08-15
Budget End
2011-07-31
Support Year
Fiscal Year
2007
Total Cost
$552,927
Indirect Cost
Name
Iowa State University
Department
Type
DUNS #
City
Ames
State
IA
Country
United States
Zip Code
50011