The 1000 Genomes Project aims to achieve a nearly complete catalog of common human genetic variants by generating high-quality sequence data surveying the genomes of >1000 individuals. This catalog will include SNPs, copy number variants, and short insertion and deletion polymorphisms. By cataloging and describing the relationships between these variants, the Project will provide important benefits to genetic association studies of complex disease. Specifically, availability of very complete lists of candidate functional variants will: (a) accelerate fine-mapping efforts in gene regions indentified through genome-wide association studies or candidate gene studies;(b) improve the power of future genetic association studies by enabling design of next generation genotyping microarrays that more fully represent human genetic variation, and (c) enhance the analysis of ongoing and already completed association studies by improving our ability to """"""""impute"""""""" or """"""""predict"""""""" untyped genetic variants. This application supports the execution of several tasks essential to the completion of the 1000 Genomes Project. Specifically, we propose working with production centers to finalize the design of the project (for example, by deciding the depth of sequencing required for each individual that is examined or the read length and insert size for the associated sequencing libraries) and to evaluate the trade-offs from different choices of individuals to sequence;we also propose to monitor the data generated to provide regular summaries of data quality and to identify problems with sample tracking before data is released;finally, we will help generate genotype and haplotype calls and prepare submissions of project results to public databases. We believe that timely completion of these tasks, in collaboration with other groups participating in the analysis of project data is critical to ensure the genetics community obtains maximum benefit from the project.

Public Health Relevance

Reconstructing the genome sequence of many individuals will allow the 1000 Genome Project to deliver catalogs of common genetic variants and the relationships between these variants in the population. These catalogs are an essential component of genetic association studies focused on complex diseases such as diabetes, asthma, cancer and aging associated disorders. In this application, we propose to help design a data collection strategy for the project, to monitor the quality of the primary sequence data, and to analyze the primary sequence data to deliver a processed dataset that is useful to the genetics community at large.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project--Cooperative Agreements (U01)
Project #
3U01HG005214-01S1
Application #
7929931
Study Section
Special Emphasis Panel (ZHG1-HGR-M (M2))
Program Officer
Brooks, Lisa
Project Start
2009-09-19
Project End
2012-08-31
Budget Start
2009-09-19
Budget End
2012-08-31
Support Year
1
Fiscal Year
2009
Total Cost
$477,709
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109
1000 Genomes Project Consortium; Auton, Adam; Brooks, Lisa D et al. (2015) A global reference for human genetic variation. Nature 526:68-74
1000 Genomes Project Consortium; Abecasis, Goncalo R; Auton, Adam et al. (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56-65
Jun, Goo; Flickinger, Matthew; Hetrick, Kurt N et al. (2012) Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am J Hum Genet 91:839-48
Voight, Benjamin F; Kang, Hyun Min; Ding, Jun et al. (2012) The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet 8:e1002793
Elhaik, Eran (2012) Empirical distributions of F(ST) from large-scale human polymorphism data. PLoS One 7:e49837
Lango Allen, Hana (see original citation for additional authors) (2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467:832-8
Willer, Cristen J; Li, Yun; Abecasis, Goncalo R (2010) METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26:2190-1
Pruim, Randall J; Welch, Ryan P; Sanna, Serena et al. (2010) LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26:2336-7
Li, Yun; Willer, Cristen J; Ding, Jun et al. (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816-34
Sanna, Serena; Pitzalis, Maristella; Zoledziewska, Magdalena et al. (2010) Variants within the immunoregulatory CBLB gene are associated with multiple sclerosis. Nat Genet 42:495-7

Showing the most recent 10 out of 12 publications