This application addresses board Challenge Area (08) Genomics and specific challenge topic, 08-DA-102 Improved Bioinformatics Analysis for Deep Sequencing. The number of human samples undergoing whole-genome sequencing is expected to increase dramatically in the next few years, as advances in next-generation sequencing technologies continue to lower the cost of sequencing. In addition to detection of sequence variation, these data can be used to estimate DNA copy number variation and subsequently to examine correlation between copy number and phenotype. In this proposal, we aim to develop a series of computational steps and integrated analysis pipeline for accurate estimation of copy number from next-generation sequencing data. This involves efficient processing of the sequencing data, including appropriate alignment procedures and correction for experiment artifacts. For estimation of the copy number along chromosomal location, we will develop novel segmentation procedures, both for a single sample and for multiple samples, to take advantage of the specific nature of sequencing data. Importantly, we also address issues in experimental design, especially the effect of depth of sequencing (genome coverage) and read length on the resolution and accuracy of copy number profiles. We use data from a number of platforms including Solexa, SOLiD, and CompleteGenomes for our studies. The pipeline developed in this proposal will be implemented on a powerful distributed computing system and will be made available freely to the research community. The results of this project will thus enable efficient extraction of copy number from whole-genome sequencing data and will facilitate rapid translation of next-generation sequencing technology to identify structural variations associated with normal or disease phenotypes.

Public Health Relevance

) A large number of prevalent diseases, most notably cancer, involve variations in DNA copy number. Thus, precise characterization of DNA copy number variations in both normal and diseased individuals is important for understanding their impact on human health and disease. Next- generation sequencing technology will be the major source of data for copy number variations in the coming years and the resources created in this project will enable researchers to obtain accurate copy number profiles from their experimental data. )

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
NIH Challenge Grants and Partnerships Program (RC1)
Project #
5RC1HG005482-02
Application #
7935506
Study Section
Special Emphasis Panel (ZRG1-GGG-F (58))
Program Officer
Brooks, Lisa
Project Start
2009-09-22
Project End
2012-06-30
Budget Start
2010-07-01
Budget End
2012-06-30
Support Year
2
Fiscal Year
2010
Total Cost
$376,168
Indirect Cost
Name
Harvard University
Department
Miscellaneous
Type
Schools of Medicine
DUNS #
047006379
City
Boston
State
MA
Country
United States
Zip Code
02115
Yang, Lixing; Luquette, Lovelace J; Gehlenborg, Nils et al. (2013) Diverse mechanisms of somatic structural variations in human cancer genomes. Cell 153:919-29
Lee, Eunjung; Iskow, Rebecca; Yang, Lixing et al. (2012) Landscape of somatic retrotransposition in human cancers. Science 337:967-71
Xi, Ruibin; Hadjipanayis, Angela G; Luquette, Lovelace J et al. (2011) Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion. Proc Natl Acad Sci U S A 108:E1128-36
Xi, Ruibin; Kim, Tae-Min; Park, Peter J (2010) Detecting structural variations in the human genome using next generation sequencing. Brief Funct Genomics 9:405-15