Statistical Methods for High-Throughput Gene Expression Profiling

Storey, John

Abstract

Studies carried out at the genome-wide level now play a central role in modern biology and medicine. There continues to be a substantial need for new statistical methods that can be applied in these studies, particularly as study designs become more ambitious, sample sizes increase, and new technologies emerge. The overall goal of the proposed research is to develop statistical methods and software useful in understanding high- throughput molecular profiling data centered around characterizing genome-wide gene expression. We propose to develop statistical models, methods, and software that allow one to rigorously characterize variation of gene expression in terms of both study design and latent variables. Our proposed research is particularly focused on the most modern form of gene expression profiling, RNA-Seq, as well as the most ambitious and biologically fruitful problems currently being studied. We will develop rigorous, flexible, and robust models of variation in high-throughput data that encompass: (i) latent sources of systematic variation and more general sources of dependence among features, (ii) a principled dissection of sources of variation in next-generation RNA-Seq data and new methods for emerging RNA-Seq data, (iii) rigorous multiple hypothesis testing of associations between genomics features and latent variables, (iv) simultaneous inference of complex, yet commonly sought after statistical hypotheses, and (v) dissemination to the greater research community through user-friendly and platform independent software packages.

Public Health Relevance

Measuring genome-wide gene expression variation has been a revolutionary tool in biomedicine over the past decade. This is a data-centric endeavor that requires new and sophisticated statistical methods in order to arrive at sound biological conclusions. The proposed work will make novel contributions to statistical methods and software that will be applied to genome-wide gene expression studies in humans and model organisms, for both microarray data and next generation sequencing data.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 2R01HG002913-10A1
Application #: 8580300
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Struewing, Jeffery P

Project Start: 2003-07-01
Project End: 2016-06-30
Budget Start: 2013-08-22
Budget End: 2014-06-30
Support Year: 10
Fiscal Year: 2013
Total Cost: $377,856
Indirect Cost: $143,163

Institution

Name: Princeton University
Department
Type: Organized Research Units
DUNS #: 002484665

City: Princeton
State: NJ
Country: United States
Zip Code: 08544

Related projects

Publications

Hackett, Sean R; Zanotelli, Vito R T; Xu, Wenxin et al. (2016) Systems-level analysis of mechanisms regulating yeast metabolic flux. Science 354:

Ochoa, Alejandro; Storey, John D; Llinás, Manuel et al. (2015) Beyond the E-Value: Stratified Statistics for Protein Domain Prediction. PLoS Comput Biol 11:e1004509

Chung, Neo Christopher; Storey, John D (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31:545-54

Robinson, David G; Wang, Jean Y; Storey, John D (2015) A nested parallel experiment demonstrates differences in intensity-dependence between RNA-seq and microarrays. Nucleic Acids Res 43:e131

Robinson, David G; Storey, John D (2014) subSeq: determining appropriate sequencing depth through efficient read subsampling. Bioinformatics 30:3424-6

Robinson, David G; Chen, Wei; Storey, John D et al. (2014) Design and analysis of Bar-seq experiments. G3 (Bethesda) 4:11-8

Kim, Jinhee; Ghasemzadeh, Nima; Eapen, Danny J et al. (2014) Gene expression profiles associated with acute myocardial infarction and risk of cardiovascular death. Genome Med 6:40

Marstrand, Troels T; Storey, John D (2014) Identifying and mapping cell-type-specific chromatin programming of gene expression. Proc Natl Acad Sci U S A 111:E645-54

Jaffe, Andrew E; Storey, John D; Ji, Hongkai et al. (2013) Gene set bagging for estimating the probability a statistically significant result will replicate. BMC Bioinformatics 14:360

Leek, Jeffrey T; Johnson, W Evan; Parker, Hilary S et al. (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882-3

Showing the most recent 10 out of 25 publications

Comments

Be the first to comment on John Storey's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: