Genes in highly identical segmental duplications (SDs) play critical roles in human evolution and disease. SDs themselves mediate pathogenic duplications, deletions, and other rearrangements whose effects range from neurodevelopmental conditions like autism to syndromic congenital diseases. The genes contained within SDs, once duplicated, are fertile ground for adaptive tinkering, and may provide innovations that underlie the evolution of human-specific traits. However, the duplicate nature of these genes has always presented extra challenges to their study. They are found in regions of the genome that are some of the most difficult to sequence and assemble; they suffer from incomplete and inaccurate annotation due to the difficulty of correctly assigning and assembling sequenced fragments of transcripts; and related to this, for many duplicated genes it is not known if they are functional?i.e., if they encode a translated and functioning protein. This project seeks to annotate segmentally duplicated genes at the level of transcription and translation and proposes a strategy to address these challenges. We will leverage a haploid genome to better discriminate between highly identical copies of genome sequence, we will combine single-molecule long-read sequencing technology with a custom cDNA enrichment strategy to accurately determine transcription of SD genes, and we will take advantage of new developments in mass spectrometry technology to identify paralog-specific peptides and determine which of these genes are translated. The goal of this study is identify functional, protein-coding genes among segmentally duplicated regions of the human genome. The generalizable approach developed in this study can be applied to duplicated space in other genomes as well. These genes will serve as candidates for future studies of human evolution and disease. If successful, this study will shed enormous light onto one of the oldest and most challenging problems in the study of the human genome.

Public Health Relevance

The human genome has been dramatically shaped by the effects of large segmental duplications, which still today mediate disease-causing rearrangements that contribute to both common neurodevelopmental diseases like autism spectrum disorder and epilepsy as well as rare, severe, congenital syndromes. Such regions of the genome, however, remain poorly assembled and annotated, severely limiting the ability to make associations with disease and evolution. This project will develop state-of-the-art genomic and proteomic approaches to resolve the sequence, structure, and protein-coding potential of such regions, which will lead to a better understanding of how these duplications cause these rearrangements and what the functional consequences of these duplications are for human evolution and disease.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Individual Predoctoral NRSA for M.D./Ph.D. Fellowships (ADAMHA) (F30)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Gatlin, Tina L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
Schools of Medicine
United States
Zip Code
Kronenberg, Zev N; Fiddes, Ian T; Gordon, David et al. (2018) High-resolution comparative analysis of great ape genomes. Science 360:
Fiddes, Ian T; Lodewijk, Gerrald A; Mooring, Meghan et al. (2018) Human-Specific NOTCH2NL Genes Affect Notch Signaling and Cortical Neurogenesis. Cell 173:1356-1369.e22
Dougherty, Max L; Underwood, Jason G; Nelson, Bradley J et al. (2018) Transcriptional fates of human-specific segmental duplications in brain. Genome Res 28:1566-1576
Dougherty, Max L; Nuttle, Xander; Penn, Osnat et al. (2017) The birth of a human-specific neural gene by incomplete duplication and gene fusion. Genome Biol 18:49