We propose to develop, demonstrate, and validate a pipeline for high-throughput, low-cost targeted resequencing of all human exons based on next-generation (gen2) sequencing techniques in support of the long term goal of enabling sequencing to be used routinely to characterize genotypes and genetic variation in genome-wide medical targets for large populations of individuals. We will develop and integrate two genome-scale target capturing methods--padlock capture and hybridization capture--that are well suited to work with short reads yielded by the gen2 technologies that will be our primary focus (Illumina Genome Analyzer and Polony). We expect that a pipeline that can be used to define the RefSeq exome of thousands of subjects will play a critical role in the next phase of genetic medical research. We propose three specific Aims:
Aim 1 (Capture and sequence the human exome with padlock probes): Here we will develop a padlock probe library for exon capture;optimize the capture protocol to reduce cost, amplification bias, and increased coverage;similarly optimize generation of the padlock probe set, and scale up from small sets of exons to the entire RefSeq exome.
Aim 2 (Capturing and sequencing the human exome by hybridization selection): Here we develop methods for targeted capture of exonic sheared DNA fragments on nitrocellulose filters;optimize capture and hybridization protocols and reduce cost;develop molecular bar-coding methods for multiplexed sequencing of multiple subject exome libraries;and scale up to RefSeq exome sequencing.
Aim 3 (Develop data analysis and management for targeted exome sequence) develops algorithms for calling genotypes from sequence generated by Aims 1 and 2 that take into account sequence quality and coverage distributions;provides feedback to Aims 1 and 2 regarding feasibility of proposed coverage and accuracy targets;develops algorithms for indel detection;and provides for computer resources, software, and data distribution. Accuracy, replicability, coverage, cost, and quality control are common themes supported by all Aims. We will use samples with known genotype content in assessing capture efficiency, accuracy, and algorithm effectiveness. The two capture technologies we develop are both complementary and mutually supporting. For instance, sheared exome capture will provide opportunities to detect larger indels than padlock capture, but the padlock probes may be useful reagents for sheared exome capture. Our group is one of the few pioneering all component aspects required by this RFA. PUBLIC HEALTH REVELANCE This research will immediately advance medical research by providing technology for the cost-effective sequencing of thousands of individual DNA samples in large populations, giving deep insight into the genetic variations at work in human health and disease. It will eventually also enable health providers to learn the genetic variations of their patients very inexpensively and thus help refine and improve their medical care.

Public Health Relevance

PROJECT SUMMARY (See instructions): RELEVANCE (See instructions): This research will immediately advance medical research by providing technology for the cost-effective sequencing of thousands of individual DNA samples in large populations, giving deep insight into the genetic variations at work in human health and disease. It will eventually also enable health providers to learn the genetic variations of their patients very inexpensively and thus help refine and improve their medical care. PROJECT/PERFORMANCE SITE(S) (if additional space is needed, use Project/Performance Site Format Page) Project/Performance Site Primary Location Organizational Name: DUNS: Street 1: City: State: Province: Country: Zip/Postal Code: Project/Performance Site Congressional Districts: Additional Project/Performance Site Location Organizational Name: DUNS: Street 1: City: State: Province: Country: Zip/Postal Code: Street 2: County: Street 2: County: Project/Performance Site Congressional Districts:

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Research Project (R01)
Project #
3R01HL094963-02S1
Application #
8060093
Study Section
Special Emphasis Panel (ZHG1-HGR-N (O1))
Program Officer
Gan, Weiniu
Project Start
2008-09-30
Project End
2012-06-30
Budget Start
2010-04-15
Budget End
2012-06-30
Support Year
2
Fiscal Year
2010
Total Cost
$250,013
Indirect Cost
Name
Harvard University
Department
Genetics
Type
Schools of Medicine
DUNS #
047006379
City
Boston
State
MA
Country
United States
Zip Code
02115
Barua, Moumita; Shieh, Eric; Schlondorff, Johannes et al. (2014) Exome sequencing and in vitro studies identified podocalyxin as a candidate gene for focal and segmental glomerulosclerosis. Kidney Int 85:124-33
Barua, Moumita; Stellacci, Emilia; Stella, Lorenzo et al. (2014) Mutations in PAX2 associate with adult-onset FSGS. J Am Soc Nephrol 25:1942-53
Ball, Madeleine P; Thakuria, Joseph V; Zaranek, Alexander Wait et al. (2012) A public resource facilitating clinical use of genomes. Proc Natl Acad Sci U S A 109:11920-7
Howden, Sara E; Gore, Athurva; Li, Zhe et al. (2011) Genetic correction and analysis of induced pluripotent stem cells from a patient with gyrate atrophy. Proc Natl Acad Sci U S A 108:6537-42
Gore, Athurva; Li, Zhe; Fung, Ho-Lim et al. (2011) Somatic coding mutations in human induced pluripotent stem cells. Nature 471:63-7
Zhang, Kun; Li, Jin Billy; Gao, Yuan et al. (2009) Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human. Nat Methods 6:613-8