Studies carried out at the genome-wide level now play a central role in modern biology and medicine. There continues to be a substantial need for new statistical methods that can be applied in these studies, particularly as study designs become more ambitious, sample sizes increase, and new technologies emerge. The overall goal of the proposed research is to develop statistical methods and software useful in understanding high- throughput molecular profiling data centered around characterizing genome-wide gene expression. We propose to develop statistical models, methods, and software that allow one to rigorously characterize variation of gene expression in terms of both study design and latent variables. Our proposed research is particularly focused on the most modern form of gene expression profiling, RNA-Seq, as well as the most ambitious and biologically fruitful problems currently being studied. We will develop rigorous, flexible, and robust models of variation in high-throughput data that encompass: (i) latent sources of systematic variation and more general sources of dependence among features, (ii) a principled dissection of sources of variation in next-generation RNA-Seq data and new methods for emerging RNA-Seq data, (iii) rigorous multiple hypothesis testing of associations between genomics features and latent variables, (iv) simultaneous inference of complex, yet commonly sought after statistical hypotheses, and (v) dissemination to the greater research community through user-friendly and platform independent software packages.

Public Health Relevance

Measuring genome-wide gene expression variation has been a revolutionary tool in biomedicine over the past decade. This is a data-centric endeavor that requires new and sophisticated statistical methods in order to arrive at sound biological conclusions. The proposed work will make novel contributions to statistical methods and software that will be applied to genome-wide gene expression studies in humans and model organisms, for both microarray data and next generation sequencing data.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
2R01HG002913-10A1
Application #
8580300
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Struewing, Jeffery P
Project Start
2003-07-01
Project End
2016-06-30
Budget Start
2013-08-22
Budget End
2014-06-30
Support Year
10
Fiscal Year
2013
Total Cost
$377,856
Indirect Cost
$143,163
Name
Princeton University
Department
None
Type
Organized Research Units
DUNS #
002484665
City
Princeton
State
NJ
Country
United States
Zip Code
08544
Robinson, David G; Chen, Wei; Storey, John D et al. (2014) Design and analysis of Bar-seq experiments. G3 (Bethesda) 4:11-8
Marstrand, Troels T; Storey, John D (2014) Identifying and mapping cell-type-specific chromatin programming of gene expression. Proc Natl Acad Sci U S A 111:E645-54
Leek, Jeffrey T; Johnson, W Evan; Parker, Hilary S et al. (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882-3
Woo, Sangsoon; Leek, Jeffrey T; Storey, John D (2011) A computationally efficient modular optimal discovery procedure. Bioinformatics 27:509-15
Gresham, David; Boer, Viktor M; Caudy, Amy et al. (2011) System-level analysis of genes and functions affecting survival during nutrient starvation in Saccharomyces cerevisiae. Genetics 187:299-317
Leek, Jeffrey T (2011) Asymptotic conditional singular value decomposition for high-dimensional genomic data. Biometrics 67:344-52
Mecham, Brigham H; Nelson, Peter S; Storey, John D (2010) Supervised normalization of microarrays. Bioinformatics 26:1308-15
Leek, Jeffrey T; Storey, John D (2008) A general framework for multiple testing dependence. Proc Natl Acad Sci U S A 105:18718-23
Biswas, Shameek; Storey, John D; Akey, Joshua M (2008) Mapping gene expression quantitative trait loci by singular value decomposition and independent component analysis. BMC Bioinformatics 9:244
Dabney, Alan R; Storey, John D (2007) Normalization of two-channel microarrays accounting for experimental design and intensity-dependent relationships. Genome Biol 8:R44

Showing the most recent 10 out of 17 publications