This proposal for the NIH Pathway to Independence Award (K99/R00) focuses on the training of Dr. PingHsun Hsieh to become an independent investigator of large-scale genomics and human population genetics. Dr. Hsieh is a population geneticist by training, and the proposed studies will advance his training into long-read- based sequencing technologies and novel machine-learning approaches to study the fitness consequences of new mutations, with a focus on structural variants (SVs), in humans and nonhuman primates. Another essential piece will be the development of resources on which types of new SVs are most likely to be pathogenic and hence most worth further effort by medical researchers. The methods developed in this work will enable other researchers to do more hypothesis-free analysis of SVs in disease etiology. Specifically, the training program will center on the study of the distribution of fitness effects of new SVs in human and nonhuman primates using high-quality SV calls and genotypes from several large-scale long- and short-read sequencing projects. The mentored work will take place under the supervision of the primary mentor, Dr. Evan Eichler, and the co-mentor, Dr. Sharon Browning, both at the University of Washington (UW). The mentor and co-mentor are well-established experts in the characterization of genomic variations using high-throughput technologies and the development of stochastic modeling methods for large-scale genetic data, respectively. Dr. Hsieh will also gain advice from a formal advisory committee as well as through activities arranged by the Department of Genome Sciences (GS), which is an optimal place for the mentored training providing the candidate with access to outstanding scientists in areas including genetics of model organisms, disease, population genetics, and the development of high-throughput genomic technologies. While found in nature and yet generally deemed to be deleterious given their size, SVs can be beneficial, and thus, the distribution of fitness effects (DFE) of new SVs (i.e., the relative frequencies of beneficial, neutral, and deleterious SVs) remains elusive. In the proposed studies, we will infer the DFE of new SVs and other variants to assess their relative importance in nature, which in turn helps prioritize variants (e.g., SVs vs. single- nucleotide variants [SNVs]) in medical genetics. Specifically, in the K99/R00 phases we will (1) infer the DFE of new SVs and SNVs using a diverse panel of ~100 long-read and ~4,000 short-read high-coverage human and nonhuman primate genomes; (2) compare the DFE of new mutations among primates using contemporary and ancient DNA genomes; and (3) study the fitness effects and selective constraints on diseases in different mutation categories in large cohorts of >20,000 genomes. The skills learned in this proposal are on the cutting-edge and are tailored for the candidate to amass a great amount of knowledge in new areas of genomics, which will be applicable to many organisms and diseases and critical to the candidate?s future independent laboratory.

Public Health Relevance

Understanding the relative abundance of beneficial, neutral, and deleterious mutations in different variant categories (e.g., genic vs. intergenic) provides useful guidance and resources for biomedical researchers to prioritize disease-causing mutations and strategize their efforts. To date, however, little effort has been made to study the full spectrum of fitness effects of new structural variants ? an important but largely underappreciated genomic variation. The work proposed here seeks to leverage long-read sequencing technologies and develop novel machine-learning approaches to quantify the fitness effects of new mutation, with a focus on structural variants, and subsequently delineate the relative importance among different types of mutations in genetic diseases.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Career Transition Award (K99)
Project #
1K99HG011041-01
Application #
9950314
Study Section
National Human Genome Research Institute Initial Review Group (GNOM)
Program Officer
Sofia, Heidi J
Project Start
2020-05-01
Project End
2022-04-30
Budget Start
2020-05-01
Budget End
2021-04-30
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Washington
Department
Genetics
Type
Schools of Medicine
DUNS #
605799469
City
Seattle
State
WA
Country
United States
Zip Code
98195