Data acquisition capacity in the biomedical field has increased substantially in the last ten years. Genetics is perhaps the most spectacular example: we have gone from obtaining the first sequence of the human genome through a large-scale, multi-center effort to a multiplicity of studies, each relying on the DNA sequence of thousands of subjects. At the same time, technological advance- ments allow us to measure human phenotypes with unprecedented precision and resolution. In order to harness the information in these new large-scale datasets, novel analytical methods that are well adapted to the scale of the problem are needed. This proposal focuses on developing statistical approaches for the analysis of resequencing data with the goal of identifying the genetic underpinning of medically relevant phenotypes, possibly multivariate. In fact, we are motivated by the concrete needs emerging from the analysis of datasets collected to study metabolic syndrome and Bipolar disorder.
In Aim 1, we consider the case where re-sequencing is motivated by the goal of identifying genetic variants that influence phenotypes in genomic loci whose relevance was previously established.
In Aim 2 we take on the challenge of providing guarantees on the reproducibility of the identified results when multiple phenotypes are investigated simultaneously. All methodology developed will be implemented in software released to the scientific community. The statistical tools we plan to develop are varied: Bayesian hierarchical modeling and inno- vative strategies to control the number of false discoveries. They are all well-adapted to the char- acteristics of contemporary datasets, allowing for search of sparse signals in high-dimensional spaces. A postdoctoral scholar and a graduate student will contribute to the research program, and the training they will acquire is an additional benefit of the proposed work.
A substantial fraction of the complex diseases that represent current public health challenges have a genetic component: unraveling those genes and mutations that influence the diseases fosters understanding of the relevant biological pathways, facilitates prevention, informs treatment, and inspires drug development. Technological advancements and substantial public investments have enabled the collection of large datasets where genetic and phenotypic variation is measured with unprecedented resolution. New powerful methods of analysis are needed to harness this information and translate it into medically relevant applications: the proposed research would develop some of these needed statistical methods.
|Fears, Scott C; Service, Susan K; Kremeyer, Barbara et al. (2014) Multisystem component phenotypes of bipolar disorder for genetic investigations of extended pedigrees. JAMA Psychiatry 71:375-87|
|Service, Susan K; Teslovich, Tanya M; Fuchsberger, Christian et al. (2014) Re-sequencing expands our understanding of the phenotypic impact of variants at GWAS loci. PLoS Genet 10:e1004147|