One of the most important challenges facing biology today is to make sense of genetic variation. Understanding how genotypic variation translates into phenotypic variation is fundamental to our understanding of evolution, and has enormous practical implications for human health as well as for agriculture and conservation. Witness the large number of genome-wide association studies now underway. The long-term objective of this project is to develop methods for association mapping methods that exploit the power of sequence-level data. The project has 3 main aims: First, the development of theoretical methods to allow efficient analysis of sequence- level genetic. We propose to investigate the effect of different experimental designs and data imputation methods on the power of the study, aiming to find designs that optimize the ability to detect genetic variation that is associated with phenotypic variation. We also propose to develop methods that allow for the unique challenges and opportunities presented by sequence-level data. Second, the development of population genetics models for the evolution of copy number variation [CNV] data. Our proposal will develop models that will allow us to assess the utility of proposed mechanisms for change in copy number, the effects of patterns of copy number variation on patterns of polymorphism in nearby sequence, and will also provide key theoretical under-pinnings for future model-based methods for haplotype inference, for example. Third, the development of theoretical methods to allow efficient analysis of sequence- level data in situations where the distribution of traits of interest is correlated with global features of the data (such as genetic ancestry or location). Our focus is on the integration of mixed-models and cluster-based methods. Project Narrative One of the most important challenges facing biology today is to understand how genetic variation between individuals translates into variation we can see or measure, like blood pressure in humans, or drought tolerance in rice. Our proposal seeks to develop methods that will help us use DNA sequence-level data to understand the genetic causes of human phenotypes such as disease susceptibility.

Agency
National Institute of Health (NIH)
Institute
National Institute of Mental Health (NIMH)
Type
Research Project (R01)
Project #
3R01MH084678-03S1
Application #
8064560
Study Section
Special Emphasis Panel (ZMH1-ERB-C (06))
Program Officer
Bender, Patrick
Project Start
2008-09-25
Project End
2012-06-30
Budget Start
2010-07-01
Budget End
2012-06-30
Support Year
3
Fiscal Year
2010
Total Cost
$374,520
Indirect Cost
Name
University of Southern California
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90089
Liang, Wei E; Thomas, Duncan C; Conti, David V (2012) Analysis and optimal design for association studies using next-generation sequencing with case-control pools. Genet Epidemiol 36:870-81
Kang, Chul Joo; Marjoram, Paul (2012) A sample selection strategy for next-generation sequencing. Genet Epidemiol 36:696-709
Kang, Chul Joo; Marjoram, Paul (2012) Exact coalescent simulation of new haplotype data from existing reference haplotypes. Bioinformatics 28:838-44
Jung, Hsuan; Marjoram, Paul (2011) Choice of summary statistic weights in approximate Bayesian computation. Stat Appl Genet Mol Biol 10:
Kang, Chul Joo; Marjoram, Paul (2011) Inference of population mutation rate and detection of segregating sites from next-generation sequence data. Genetics 189:595-605
Jiang, Rong; Tavare, Simon; Marjoram, Paul (2009) Population genetic inference from resequencing data. Genetics 181:187-97