Genes in highly identical segmental duplications (SDs) play critical roles in human evolution and disease. SDs themselves mediate pathogenic duplications, deletions, and other rearrangements whose effects range from neurodevelopmental conditions like autism to syndromic congenital diseases. The genes contained within SDs, once duplicated, are fertile ground for adaptive tinkering, and may provide innovations that underlie the evolution of human-specific traits. However, the duplicate nature of these genes has always presented extra challenges to their study. They are found in regions of the genome that are some of the most difficult to sequence and assemble; they suffer from incomplete and inaccurate annotation due to the difficulty of correctly assigning and assembling sequenced fragments of transcripts; and related to this, for many duplicated genes it is not known if they are functional?i.e., if they encode a translated and functioning protein. This project seeks to annotate segmentally duplicated genes at the level of transcription and translation and proposes a strategy to address these challenges. We will leverage a haploid genome to better discriminate between highly identical copies of genome sequence, we will combine single-molecule long-read sequencing technology with a custom cDNA enrichment strategy to accurately determine transcription of SD genes, and we will take advantage of new developments in mass spectrometry technology to identify paralog-specific peptides and determine which of these genes are translated. The goal of this study is identify functional, protein-coding genes among segmentally duplicated regions of the human genome. The generalizable approach developed in this study can be applied to duplicated space in other genomes as well. These genes will serve as candidates for future studies of human evolution and disease. If successful, this study will shed enormous light onto one of the oldest and most challenging problems in the study of the human genome.

Public Health Relevance

The human genome has been dramatically shaped by the effects of large segmental duplications, which still today mediate disease-causing rearrangements that contribute to both common neurodevelopmental diseases like autism spectrum disorder and epilepsy as well as rare, severe, congenital syndromes. Such regions of the genome, however, remain poorly assembled and annotated, severely limiting the ability to make associations with disease and evolution. This project will develop state-of-the-art genomic and proteomic approaches to resolve the sequence, structure, and protein-coding potential of such regions, which will lead to a better understanding of how these duplications cause these rearrangements and what the functional consequences of these duplications are for human evolution and disease.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Individual Predoctoral NRSA for M.D./Ph.D. Fellowships (ADAMHA) (F30)
Project #
5F30HG009478-02
Application #
9402541
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Gatlin, Christine L
Project Start
2016-12-16
Project End
2020-12-15
Budget Start
2017-12-16
Budget End
2018-12-15
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of Washington
Department
Genetics
Type
Schools of Medicine
DUNS #
605799469
City
Seattle
State
WA
Country
United States
Zip Code
98195
Dougherty, Max L; Nuttle, Xander; Penn, Osnat et al. (2017) The birth of a human-specific neural gene by incomplete duplication and gene fusion. Genome Biol 18:49