De novo DNA sequence mutations are mutations not inherited from a parent and play an important role in many human disorders, including cancer, autism, schizophrenia, and heart conditions. However, de novo mutations can be difficult to identify because sequencing errors are more common than mutations. Current approaches used to analyze DNA sequence data are inadequate to identify de novo mutations successfully at a genome scale because each potential de novo mutation must be validated by a costly and time-consuming validation process. Our goal is to improve the identification of de novo mutations in order to understand their role in genetic disorders. We will develop a novel statistical approach to identify de novo mutations, and we will implement it in software to make our method readily available to other researchers. Our first objective is to determine the probability that an apparent DNA sequence change is due to a de novo mutation, when analyzing short-read sequencing data from families. To determine this probability, we will integrate over other possible sources of error/noise including sequencing error, population diversity, and chromosome segregation. Secondly, we will expand on this model to detect somatic de novo mutations between multiple tissues from the same individual (e.g. matched tumor-normal datasets). Thirdly, we will develop new models to handle sequencing data from single-cell sequencing, which generates different probabilities of error compared to those discussed previously. These three aims are unified by a common approach of using genealogical information relating people, tissues, or individual cells, to improve the accuracy of de novo mutation discovery. Finally, we will implement these methods in an easy-to-use software package that will make the identification of de novo mutations possible for scientists working on subjects ranging from variation in mutation rates to the effects of aging. The methods developed here can benefit the hundreds, if not thousands, of studies that will search for and characterize de novo mutations in the coming decade. This package will be open source and free for the community to use.

Public Health Relevance

Each new child is born with 60-100 new mutations not present in their parents. New mutations within genes have been shown to be a risk factor for common pediatric diseases, such as autism and developmental delay. Despite the importance of mutation rates in clinical practice, our understanding of how these rates vary between individuals on the basis of sex, age, genetic background, and environmental exposure is rudimentary at best. The research proposed here will create software that improves the accuracy of new mutation detection, and thus enable the integration of this important class of variation into medical genetics research.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG007178-05
Application #
9478284
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
2014-05-08
Project End
2019-02-28
Budget Start
2018-03-01
Budget End
2019-02-28
Support Year
5
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Washington University
Department
Genetics
Type
Schools of Medicine
DUNS #
068552207
City
Saint Louis
State
MO
Country
United States
Zip Code
63130
Nagirnaja, Liina; Aston, Kenneth I; Conrad, Donald F (2018) Genetic intersection of male infertility and cancer. Fertil Steril 109:20-26
Chiang, Colby; Scott, Alexandra J; Davis, Joe R et al. (2017) The impact of structural variation on human gene expression. Nat Genet 49:692-699
Sievert, Christian; Nieves, Lizbeth M; Panyon, Larry A et al. (2017) Experimental evolution reveals an effective avenue to release catabolite repression via mutations in XylR. Proc Natl Acad Sci U S A 114:7349-7354
Wu, Steven H; Schwartz, Rachel S; Winter, David J et al. (2017) Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions. Bioinformatics 33:2322-2329
Tan, Meng How; Li, Qin; Shanmugam, Raghuvaran et al. (2017) Dynamic landscape and regulation of RNA editing in mammals. Nature 550:249-254
Wilfert, Amy B; Chao, Katherine R; Kaushal, Madhurima et al. (2016) Genome-wide significance testing of variation from single case exomes. Nat Genet 48:1455-1461
Zeng, Qinglong; Sukumaran, Jeet; Wu, Steven et al. (2015) Neutral Models of Microbiome Evolution. PLoS Comput Biol 11:e1004365
Hughes, Andrew E O; Magrini, Vincent; Demeter, Ryan et al. (2014) Clonal architecture of secondary acute myeloid leukemia defined by single-cell sequencing. PLoS Genet 10:e1004462
Ramu, Avinash; Noordam, Michiel J; Schwartz, Rachel S et al. (2013) DeNovoGear: de novo indel and point mutation discovery and phasing. Nat Methods 10:985-7