Making structurally complex genomic regions accessible

Page, David

Abstract

The most structurally complex regions in the genome are comprised of ampliconic sequences, which are defined as repeats that display >99% identity and are >10 kb in length. Ampliconic regions are of immensely disproportionate biomedical significance and interest. However, these regions are inaccessible by standard genome sequencing strategies, so are grossly misrepresented in or entirely missing from reference genome assemblies. Biomedical researchers cannot extract insights from parts of the genome to which they have no access, so our understanding of the frequency and mechanism of amplicon-mediated rearrangements and their role in disease is far from complete. Furthermore, ampliconic sequences are systematically excluded from all experiments based on mapping to the reference sequence (e.g. exome re-sequencing, RNA-seq, ChIP-seq), severely limiting the insights to be gained from such studies. The chief obstacle to accessing entire genomes is not a lack of interest on the part of the biomedical research community, but the lack of a practical, affordable, and distributable technology with which to generate reference-quality sequence of ampliconic regions. Single Haplotype Iterative Mapping and Sequencing (SHIMS) is the only proven strategy to assemble such regions. SHIMS relies on the use of mapped large-insert clones (usually BACs) derived from a single haplotype so that polymorphisms do not confound the assembly of ampliconic repeats. The major bottleneck and cost associated with the traditional SHIMS approach - SHIMS 1.0 - is the sequencing of individual BACs. Using standard capillary-based sequencing, this endeavor is expensive in terms of both reagents and highly skilled labor. Here we propose to dramatically restructure the SHIMS operational paradigm, so that ultra-high-quality reference sequence can be generated by a small research team at modest cost. We will achieve this by setting up an efficient SHIMS 2.0 pipeline encompassing all steps in generating finished BAC sequence using the Illumina MiSeq platform. We will sequence pools of 192 indexed BACs, generating deep sequence coverage that will dramatically reduce if not eliminate the need for directed finishing. We will optimize all components of the process, from high-throughput plasmid preparation and DNA fragmentation to de novo sequence assembly and quality assessment, with an eye toward quality of product, cost, efficiency, and reproducibility. We will ensure that this new technology and software is distributable and actively promote and support the application of the SHIMS 2.0 pipeline by other researchers to complex genomic regions. For example, it will be possible to use SHIMS 2.0 to assemble multiple human genomes, providing an invaluable resource for studies in human genetics. The SHIMS 2.0 strategy can be applied in other species, enabling insight into the evolutionary dynamics of ampliconic regions. In addition, applying SHIMS 2.0 to improve the genomes of model organisms will be of tremendous benefit to researchers in multiple biomedical disciplines.

Public Health Relevance

Structurally complex or repetitive regions of the genome are of immensely disproportionate medical significance because of their susceptibility to large-scale rearrangements, which can add or subtract genes and cause disease. Because of the inherent difficulty in sequencing such complex regions, especially using the latest sequencing technologies, they are missing from or poorly represented in genome sequences of humans and important model organisms, impeding the study of the nature and mechanism of disease-causing rearrangements. We will develop a practical, affordable, and distributable technology capable of generating accurate sequences of complex genomic regions and ensure that the technology has a broad impact by actively promoting its use by other researchers.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG007852-01A1
Application #: 8880459
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Smith, Michael

Project Start: 2015-04-24
Project End: 2018-03-31
Budget Start: 2015-04-24
Budget End: 2016-03-31
Support Year: 1
Fiscal Year: 2015
Total Cost: $923,439
Indirect Cost: $442,865

Institution

Name: Whitehead Institute for Biomedical Research
Department
Type
DUNS #: 120989983

City: Cambridge
State: MA
Country: United States
Zip Code: 02142

Related projects


NIH 2017 R01 HG	Making structurally complex genomic regions accessible Page, David C. / Whitehead Institute for Biomedical Research	$810,946
NIH 2016 R01 HG	Making structurally complex genomic regions accessible Page, David C. / Whitehead Institute for Biomedical Research
NIH 2015 R01 HG	Making structurally complex genomic regions accessible Page, David C. / Whitehead Institute for Biomedical Research	$923,439

Publications

Teitz, Levi S; Pyntikova, Tatyana; Skaletsky, Helen et al. (2018) Selection Has Countered High Mutability to Preserve the Ancestral Copy Number of Y Chromosome Amplicons in Diverse Human Lineages. Am J Hum Genet 103:261-275

Ly, Peter; Teitz, Levi S; Kim, Dong H et al. (2017) Selective Y centromere inactivation triggers chromosome shattering in micronuclei and repair by non-homologous end joining. Nat Cell Biol 19:68-75

Bellott, Daniel W; Skaletsky, Helen; Cho, Ting-Jan et al. (2017) Avian W and mammalian Y chromosomes convergently retained dosage-sensitive regulators. Nat Genet 49:387-394

Hughes, Jennifer F; Skaletsky, Helen; Koutseva, Natalia et al. (2015) Sex chromosome-to-autosome transposition events counter Y-chromosome gene loss in mammals. Genome Biol 16:104

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: