Genome Analysis: Data Accuracy, Haplotyping, and Mapping

Stephens, Matthew

Abstract

Long-term objective: to develop quantitative methods and software for the interpretation and analysis of human genetic variation. The methods will be tailored to the specific needs of large-scale studies of sequence variation, particularly those attempting to understand the genetic basis of complex diseases.
The aim will be to supply scientists involved in such studies with an integrated set of tools to a) monitor and improve data quality, b) design effective studies, and c) perform powerful data analyses, ultimately reducing the cost of developing effective medical treatments for common diseases.
Major specific aims : 1. To develop automatic methods for calling genotypes from sequence trace data, and for assigning each genotype call a """"""""quality score"""""""", quantifying the probability that the call is correct, allowing data accuracy to be carefully monitored. 2. To extend an existing statistical method for inferring haplotypes from population genotype data to allow it to impute missing genotypes, identify potential genotyping errors, and make it more applicable to data on a larger (genomic) scale. 3. To develop methods to infer recombination rates, and identify potential recombination """"""""hotspots"""""""" or """"""""coldspots"""""""", from population data (information that will aid in the design of effective mapping studies aiming to locate variants affecting disease susceptibility). 4. To develop methods for linkage disequilibrium mapping that make efficient use of data from many SNP markers simultaneously, thus reducing the costs, and increasing the chances of success, of mapping studies.
Aim 1 will be achieved through a statistical analysis of pertinent sequence trace features for analyst-called genotypes.
Aims 2 -4 will exploit population genetics models that make predictions about patterns of haplotypic variation expected in natural populations, and how patterns of linkage disequilibrium will be affected by variations in local recombination rate. Computational statistical methods, such as Markov chain Monte Carlo, will be used extensively in implementing these methods. The methods will be tested on real and simulated data. User-friendly software will be developed, documented, distributed and supported.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 7R01HG002585-05
Application #: 7117392
Study Section: Genome Study Section (GNM)
Program Officer: Ramos, Erin

Project Start: 2002-09-20
Project End: 2008-08-31
Budget Start: 2006-09-01
Budget End: 2008-08-31
Support Year: 5
Fiscal Year: 2006
Total Cost: $299,786
Indirect Cost

Institution

Name: University of Chicago
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 005421136

City: Chicago
State: IL
Country: United States
Zip Code: 60637

Related projects

Publications

Zhu, Xiang; Stephens, Matthew (2018) Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat Commun 9:4361

Gerard, David; Stephens, Matthew (2018) Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation. Biostatistics :

Al-Asadi, Hussein; Dey, Kushal K; Novembre, John et al. (2018) Inference and visualization of DNA damage patterns using a Grade of Membership Model. Bioinformatics :

Zhu, Xiang; Stephens, Matthew (2017) BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES. Ann Appl Stat 11:1561-1592

Dey, Kushal K; Hsiao, Chiaowen Joyce; Stephens, Matthew (2017) Visualizing the structure of RNA-seq expression data using grade of membership models. PLoS Genet 13:e1006599

Stephens, Matthew (2017) False discovery rates: a new deal. Biostatistics 18:275-294

Petkova, Desislava; Novembre, John; Stephens, Matthew (2016) Visualizing spatial population structure with estimated effective migration surfaces. Nat Genet 48:94-100

Lu, Mengyin; Stephens, Matthew (2016) Variance adaptive shrinkage (vash): flexible empirical Bayes estimation of variances. Bioinformatics 32:3428-3434

Raj, Anil; Wang, Sidney H; Shim, Heejung et al. (2016) Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. Elife 5:

Shim, Heejung; Chasman, Daniel I; Smith, Joshua D et al. (2015) A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS One 10:e0120758

Showing the most recent 10 out of 46 publications

Comments

Be the first to comment on Matthew Stephens's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: