The International 1000 Genomes Project (1000 Genomes) aims to leverage the emergence of next generation sequencing technologies to catalogue common human genetic variability. The ambitious goals and timeline of 1000 Genomes will require highly coordinated collaboration by multiple research groups. While aspects of our development efforts will involve all three platforms, a major proportion of our proposal will focus on optimal integration of SOLID data into the 100 Genomes data production pipeline. Our goal is to insure that the unique capabilities of the platform are maximized during data processing, while adhering to common data and analytical standards established across the 1000 Genomes.
In Aim 1, we will develop tools for monitoring data quality, focusing partly tools for detecting experimental biases and partly on developing better quality metrics.
In Aim 2, we will develop tools for detection of genetic variation through the use of recursive use of alignment and variant discovery.
In Aim 3, we will further develop client/server software capable of simultaneous viewing of sequence data across multiple sites for the purpose of quality control and variant inspection. The deliverable of our proposal is series of stand-alone software utilities that can be integrated into the software pipelines developed by the 1000 Genome DCC's and that fit within the collective analytical framework of 1000 genomes participants. This collaborative proposal includes teams from academia (UCLA), industry (Applied Biosystems) and non-profit research institutes (TGen).
The purpose of this proposal is to develop tools for monitoring and interpreting data from the 1000 Genomes Project.
Li, Heng; Homer, Nils (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11:473-83 |
1000 Genomes Project Consortium; Abecasis, Gonçalo R; Altshuler, David et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061-73 |
Homer, Nils; Nelson, Stanley F (2010) Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA. Genome Biol 11:R99 |