The 1000 Genomes Project is an initiative to sequence the complete genomes of over 1000 individuals and create a reference set of common and uncommon genetic variation among various ethnic populations. This project aims to more comprehensively identify all types of genetic variation, including Single nucleotide polymorphisms (SNPs) and Structural genome variants (SVs) which include regions that have been duplicated, deleted, inverted, or translocated through the course of human evolution. Some of these structural variants have been correlated with many different disease phenotypes and thus play a major role in human health. In the course of the pilot phase of this project, numerous diverse, yet complementary, analytical methods have been developed to detect these types of variation on multiple sequencing platforms. However, there remains a need to coalesce these approaches in an optimal fashion to apply to the large amounts of genomic sequence data that will be produced during the production phase. Our consortium include members of the structural genomic variation analysis group for the 1000 genome project and have been conducting analysis from the 1000 genome pilot project 2 over the past year. We will conduct a concerted effort to coordinate our resources to develop a unified process to analyze these data. We will research new ways of integrating and optimizing our existing methods of detection, and will cooperate with similar international and industrial efforts in order to provide a set of high quality structural variants to the biomedical research community.
Specific Aim 1 : Facilitate and coordinate computational analysis to provide structural variation data on data being generated by the 1000 genomes project.
Specific Aim 2 : Research and develop new methods for structural genomic variation data integration and processing.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (M2))
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brigham and Women's Hospital
United States
Zip Code
Yang, Lixing; Luquette, Lovelace J; Gehlenborg, Nils et al. (2013) Diverse mechanisms of somatic structural variations in human cancer genomes. Cell 153:919-29
1000 Genomes Project Consortium; Abecasis, Goncalo R; Auton, Adam et al. (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56-65
Chen, Feng; Ding, Li (2012) Co-survival of the fittest few: mosaic amplification of receptor tyrosine kinases in glioblastoma. Genome Biol 13:141
Demichelis, Francesca; Setlur, Sunita R; Banerjee, Samprit et al. (2012) Identification of functionally active, low frequency copy number variants at 15q21.3 and 12q21.31 associated with prostate cancer risk. Proc Natl Acad Sci U S A 109:6686-91
Chen, Ken; Wallis, John W; Kandoth, Cyriac et al. (2012) BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data. Bioinformatics 28:1923-4
Lee, Eunjung; Iskow, Rebecca; Yang, Lixing et al. (2012) Landscape of somatic retrotransposition in human cancers. Science 337:967-71
Hormozdiari, Fereydoun; Alkan, Can; Ventura, Mario et al. (2011) Alu repeat discovery and characterization within human genomes. Genome Res 21:840-9
Pinto, Dalila; Darvishi, Katayoon; Shi, Xinghua et al. (2011) Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants. Nat Biotechnol 29:512-20
Mills, Ryan E; Walter, Klaudia; Stewart, Chip et al. (2011) Mapping copy number variation by population-scale genome sequencing. Nature 470:59-65
Hormozdiari, Farhad; Hach, Faraz; Sahinalp, S Cenk et al. (2011) Sensitive and fast mapping of di-base encoded reads. Bioinformatics 27:1915-21

Showing the most recent 10 out of 14 publications