Studies carried out at the genome-wide level now play a central role in modern biology and medicine. There is a substantial need for new statistical methods that can be applied in these studies. The overall goal of the proposed research is to develop statistical methods and software useful in understanding genomic data. The particular focus is in functional genomics, where data from DNA microarrays and large-scale genotyping can be used to study how large numbers of genes work to accomplish various functional roles. New statistical methods for these high-dimensional data sets will be developed, where biological knowledge is taken into account whenever possible. Genetics plays a role in almost every human disease, whether the disease itself is inherited or the disease is associated with a substantial change in the activity of genes. This work is aimed at contributing to the understanding of the molecular biology and genetic basis of human disease by providing analytical tools for genomics studies. The particular focus of this competitive renewal is to develop a broad framework for modeling the inter-dependence of expression levels among genes as manifested in differential expression variation and their regulatory networks, by (i) borrowing strength across the genes'expression measurements through noel multivariate models, (ii) utilizing multiple data types such as large-scale genotyping and gene expression to build a framework for dissecting causation from correlation, and (iii) rethinking randomization and experimental design as it can be utilized in this high-dimensional genomics setting. From this work, the aim is to provide methodology that allows one to characterize gene expression variation in terms of common sources of variation among genes as well as specific causal relationships among pairs of genes.

Public Health Relevance

The primary mechanism by which information in the genome is transferred into our cells is through gene expression. It has been shown that changes in gene expression are associated with many important human diseases. Technologies that measure the expression of thousands of gene simultaneously are now in widespread use. This grant will aid in the understanding of the role of gene expression in human diseases by providing quantitative methods for understanding how expression variation is functioning on a large-scale.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Princeton University
United States
Zip Code
Hackett, Sean R; Zanotelli, Vito R T; Xu, Wenxin et al. (2016) Systems-level analysis of mechanisms regulating yeast metabolic flux. Science 354:
Ochoa, Alejandro; Storey, John D; LlinĂ¡s, Manuel et al. (2015) Beyond the E-Value: Stratified Statistics for Protein Domain Prediction. PLoS Comput Biol 11:e1004509
Chung, Neo Christopher; Storey, John D (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31:545-54
Robinson, David G; Wang, Jean Y; Storey, John D (2015) A nested parallel experiment demonstrates differences in intensity-dependence between RNA-seq and microarrays. Nucleic Acids Res 43:e131
Marstrand, Troels T; Storey, John D (2014) Identifying and mapping cell-type-specific chromatin programming of gene expression. Proc Natl Acad Sci U S A 111:E645-54
Robinson, David G; Storey, John D (2014) subSeq: determining appropriate sequencing depth through efficient read subsampling. Bioinformatics 30:3424-6
Robinson, David G; Chen, Wei; Storey, John D et al. (2014) Design and analysis of Bar-seq experiments. G3 (Bethesda) 4:11-8
Kim, Jinhee; Ghasemzadeh, Nima; Eapen, Danny J et al. (2014) Gene expression profiles associated with acute myocardial infarction and risk of cardiovascular death. Genome Med 6:40
Jaffe, Andrew E; Storey, John D; Ji, Hongkai et al. (2013) Gene set bagging for estimating the probability a statistically significant result will replicate. BMC Bioinformatics 14:360
Leek, Jeffrey T; Johnson, W Evan; Parker, Hilary S et al. (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882-3

Showing the most recent 10 out of 25 publications