Although it is generally assumed that the trillions of cells in a human body share identical DNA sequences, in reality we are a mosaic of genomes. The extent of this mosaicism is largely unknown, but both theoretical and empirical studies suggest that the burden of somatic mutations in humans is considerable. Indeed, in addition to cancer and ageing, two processes where somatic mutations are known to play an integral role, over thirty additional disease phenotypes are attributable to somatic variability. Somatic mutations have also been hypothesized to play a role in other complex diseases and account for some of the missing heritability observed for many traits. Nonetheless, there have been few systematic and comprehensive studies of human somatic variability among tissues and individuals, and therefore the landscape of somatic mutations remains largely unknown. This gap in knowledge is a significant impediment to many ongoing and future studies of human phenotypic variation and disease susceptibility, such as the interpretation of somatic variability in cancer genome sequencing projects. To this end, the goals of the proposed project are to leverage the resources created by the GTEx Project to rigorously and systematically analyze patterns of human somatic variability.
In Aim 1, we will perform deep exome sequencing on 15 tissues that have been collected from 40 individuals each (600 total exomes) and identify somatic sequence and structural variation. Importantly, we have carefully designed the study and particular tissues to study to facilitate testing biologically important hypotheses such as patterns and levels of somatic mutation and how these characteristics vary as a function of tissue type, age, and sex. Moreover, we will experimentally validate a large number of putative somatic mutations, which will allow filtering criteria to be adjusted resulting in a robust catalog of somatic mutations.
In Aim 2, we will capitalize on the RNA-Seq data generated by the GTEx Project and use innovative approaches to test the hypothesis that somatic mutations contribute to gene expression variability. Collectively, these data will profoundly increase our understanding of human somatic mutations, their patterns and characteristics among tissues and individuals, and their influence on transcript abundance. Moreover, our data will be a considerable resource to the GTEx and scientific community, and we will make all project data easily accessible.

Public Health Relevance

Although it is often assumed that DNA sequences among our bodies trillions of cells are identical, in reality we are composed of a mosaic of genomes. The goal of this project is to leverage advances in sequencing technology to generate the most comprehensive assessment of somatic mutations to date. These data will facilitate the identification and interpretation of mutations that cause human disease.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-IMST-M (50))
Program Officer
Volpi, Simona
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
Schools of Medicine
United States
Zip Code
Searle, Brian C; Gittelman, Rachel M; Manor, Ohad et al. (2016) Detecting Sources of Transcriptional Heterogeneity in Large-Scale RNA-Seq Data Sets. Genetics 204:1391-1396
Moon, Sunjin; Akey, Joshua M (2016) A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets. Genome Res 26:834-43
Shendure, Jay; Akey, Joshua M (2015) The origins, determinants, and consequences of human mutations. Science 349:1478-83