Data acquisition capacity in the biomedical field has increased substantially in the last ten years. Genetics is perhaps the most spectacular example: we have gone from obtaining the first sequence of the human genome through a large-scale, multi-center effort to a multiplicity of studies, each relying on the DNA sequence of thousands of subjects. At the same time, technological advance- ments allow us to measure human phenotypes with unprecedented precision and resolution. In order to harness the information in these new large-scale datasets, novel analytical methods that are well adapted to the scale of the problem are needed. This proposal focuses on developing statistical approaches for the analysis of resequencing data with the goal of identifying the genetic underpinning of medically relevant phenotypes, possibly multivariate. In fact, we are motivated by the concrete needs emerging from the analysis of datasets collected to study metabolic syndrome and Bipolar disorder.
In Aim 1, we consider the case where re-sequencing is motivated by the goal of identifying genetic variants that influence phenotypes in genomic loci whose relevance was previously established.
In Aim 2 we take on the challenge of providing guarantees on the reproducibility of the identified results when multiple phenotypes are investigated simultaneously. All methodology developed will be implemented in software released to the scientific community. The statistical tools we plan to develop are varied: Bayesian hierarchical modeling and inno- vative strategies to control the number of false discoveries. They are all well-adapted to the char- acteristics of contemporary datasets, allowing for search of sparse signals in high-dimensional spaces. A postdoctoral scholar and a graduate student will contribute to the research program, and the training they will acquire is an additional benefit of the proposed work.

Public Health Relevance

A substantial fraction of the complex diseases that represent current public health challenges have a genetic component: unraveling those genes and mutations that influence the diseases fosters understanding of the relevant biological pathways, facilitates prevention, informs treatment, and inspires drug development. Technological advancements and substantial public investments have enabled the collection of large datasets where genetic and phenotypic variation is measured with unprecedented resolution. New powerful methods of analysis are needed to harness this information and translate it into medically relevant applications: the proposed research would develop some of these needed statistical methods.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Ramos, Erin
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Schools of Medicine
United States
Zip Code
Brzyski, Damian; Peterson, Christine B; Sobczyk, Piotr et al. (2017) Controlling the Rate of GWAS False Discoveries. Genetics 205:61-75
Peterson, C B; Bogomolov, M; Benjamini, Y et al. (2016) TreeQTL: hierarchical error control for eQTL findings. Bioinformatics 32:2556-8
Peterson, Christine B; Service, Susan K; Jasinska, Anna J et al. (2016) Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder. PLoS Genet 12:e1006046
Pagani, Lucia; St Clair, Patricia A; Teshiba, Terri M et al. (2016) Genetic contributions to circadian activity rhythm and sleep pattern phenotypes in pedigrees segregating for severe bipolar disorder. Proc Natl Acad Sci U S A 113:E754-61
Peterson, Christine B; Bogomolov, Marina; Benjamini, Yoav et al. (2016) Many Phenotypes Without Many False Discoveries: Error Controlling Strategies for Multitrait Association Studies. Genet Epidemiol 40:45-56
Stell, Laurel; Sabatti, Chiara (2016) Genetic Variant Selection: Learning Across Traits and Sites. Genetics 202:439-55
Bogdan, Ma?gorzata; van den Berg, Ewout; Sabatti, Chiara et al. (2015) SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. Ann Appl Stat 9:1103-1140
Fears, Scott C; Schür, Remmelt; Sjouwerman, Rachel et al. (2015) Brain structure-function associations in multi-generational families genetically enriched for bipolar disorder. Brain 138:2087-102
Service, Susan K; Teslovich, Tanya M; Fuchsberger, Christian et al. (2014) Re-sequencing expands our understanding of the phenotypic impact of variants at GWAS loci. PLoS Genet 10:e1004147
Fears, Scott C; Service, Susan K; Kremeyer, Barbara et al. (2014) Multisystem component phenotypes of bipolar disorder for genetic investigations of extended pedigrees. JAMA Psychiatry 71:375-87