Quality Control Genotype Calling and Study Design for 1000 Genomes Project

Rocha Abecasis, Goncalo

Abstract

The 1000 Genomes Project aims to achieve a nearly complete catalog of common human genetic variants by generating high-quality sequence data surveying the genomes of >1000 individuals. This catalog will include SNPs, copy number variants, and short insertion and deletion polymorphisms. By cataloging and describing the relationships between these variants, the Project will provide important benefits to genetic association studies of complex disease. Specifically, availability of very complete lists of candidate functional variants will: (a) accelerate fine-mapping efforts in gene regions indentified through genome-wide association studies or candidate gene studies;(b) improve the power of future genetic association studies by enabling design of next generation genotyping microarrays that more fully represent human genetic variation, and (c) enhance the analysis of ongoing and already completed association studies by improving our ability to """"""""impute"""""""" or """"""""predict"""""""" untyped genetic variants. This application supports the execution of several tasks essential to the completion of the 1000 Genomes Project. Specifically, we propose working with production centers to finalize the design of the project (for example, by deciding the depth of sequencing required for each individual that is examined or the read length and insert size for the associated sequencing libraries) and to evaluate the trade-offs from different choices of individuals to sequence;we also propose to monitor the data generated to provide regular summaries of data quality and to identify problems with sample tracking before data is released;finally, we will help generate genotype and haplotype calls and prepare submissions of project results to public databases. We believe that timely completion of these tasks, in collaboration with other groups participating in the analysis of project data is critical to ensure the genetics community obtains maximum benefit from the project.

Public Health Relevance

Reconstructing the genome sequence of many individuals will allow the 1000 Genome Project to deliver catalogs of common genetic variants and the relationships between these variants in the population. These catalogs are an essential component of genetic association studies focused on complex diseases such as diabetes, asthma, cancer and aging associated disorders. In this application, we propose to help design a data collection strategy for the project, to monitor the quality of the primary sequence data, and to analyze the primary sequence data to deliver a processed dataset that is useful to the genetics community at large.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 3U01HG005214-01S1
Application #: 7929931
Study Section: Special Emphasis Panel (ZHG1-HGR-M (M2))
Program Officer: Brooks, Lisa

Project Start: 2009-09-19
Project End: 2012-08-31
Budget Start: 2009-09-19
Budget End: 2012-08-31
Support Year: 1
Fiscal Year: 2009
Total Cost: $477,709
Indirect Cost

Institution

Name: University of Michigan Ann Arbor
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 073133571

City: Ann Arbor
State: MI
Country: United States
Zip Code: 48109

Related projects


NIH 2010 U01 HG	Quality Control Genotype Calling and Study Design for 1000 Genomes Project Rocha Abecasis, Goncalo / University of Michigan Ann Arbor	$781,690
NIH 2010 U01 HG	Quality Control Genotype Calling and Study Design for 1000 Genomes Project Rocha Abecasis, Goncalo / University of Michigan Ann Arbor	$28,596
NIH 2009 U01 HG	Quality Control Genotype Calling and Study Design for 1000 Genomes Project Rocha Abecasis, Goncalo / University of Michigan Ann Arbor	$688,625
NIH 2009 U01 HG	Quality Control Genotype Calling and Study Design for 1000 Genomes Project Rocha Abecasis, Goncalo / University of Michigan Ann Arbor	$477,709

Publications

1000 Genomes Project Consortium; Auton, Adam; Brooks, Lisa D et al. (2015) A global reference for human genetic variation. Nature 526:68-74

1000 Genomes Project Consortium; Abecasis, Goncalo R; Auton, Adam et al. (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56-65

Jun, Goo; Flickinger, Matthew; Hetrick, Kurt N et al. (2012) Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am J Hum Genet 91:839-48

Voight, Benjamin F; Kang, Hyun Min; Ding, Jun et al. (2012) The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet 8:e1002793

Elhaik, Eran (2012) Empirical distributions of F(ST) from large-scale human polymorphism data. PLoS One 7:e49837

Lango Allen, Hana (see original citation for additional authors) (2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467:832-8

Willer, Cristen J; Li, Yun; Abecasis, Goncalo R (2010) METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26:2190-1

Pruim, Randall J; Welch, Ryan P; Sanna, Serena et al. (2010) LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26:2336-7

Li, Yun; Willer, Cristen J; Ding, Jun et al. (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816-34

Sanna, Serena; Pitzalis, Maristella; Zoledziewska, Magdalena et al. (2010) Variants within the immunoregulatory CBLB gene are associated with multiple sclerosis. Nat Genet 42:495-7

Showing the most recent 10 out of 12 publications

Comments

Be the first to comment on Goncalo Rocha Abecasis's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: