Characterizing the functions of protein-coding genes is an important goal in the post-genomic era. While proteins are the ultimate effectors of most cellular functions, including those mis-regulated in disease, we have an extremely limited understanding of the roles of the majority of proteins in the human proteome. Though powerful, existing technologies for the high-throughput interrogation of protein-coding genes, including CRISPR/Cas9-based approaches and RNA interference, require extended periods of time to effect changes in protein levels, and thus suffer two critical shortcomings. First, they are unable to detect the contribution of growth- essential genes to any cellular process other than viability, as any cell carrying a perturbation in such a gene would fail to propagate. Second, compensatory and adaptive effects have ample opportunity to manifest, thus convoluting screen results by ameliorating the effect of the perturbation, or by generating a novel, unrelated effect. To address these critical limitations, I propose to develop a new screening technology that will minimize the time between perturbation and screen readout by inducibly and rapidly degrading endogenous proteins. This is made possible by a readily scalable endogenous tagging technology that harnesses homology-independent targeted integration to insert a synthetic exon into the intron of a protein-coding gene at the site of a double strand break. The synthetic exon will encode a multifunctional ligand-binding protein that depending on the ligand, will lead to fluorescence or rapid degradation. Pooled libraries of sgRNAs targeting different introns allows for the creation of custom libraries of cells, where each cell carries this multifunctional tag on a different protein. The utility of this approach will be established aims 1 and 2 by testing (1) whether cells that have undergone rapid depletion of growth-essential proteins are maintained in the cell library at the end of the short perturbation window and (2) whether rapid depletion and CRISPR knockout at the same protein produce different effects on a well-established phenotype, due to the distorting effects of adaptation events in the knockout.
Aim 3 witnesses the use of a machine learning approach and the data from thousands of attempted tagging events to identify how the features of a potential tag site dictate the likelihood that a functional protein carrying the multifunctional tag will be produced. The resulting model will be unleashed on the protein-coding genome to predict high-quality tag sites for as many protein-coding genes as possible. This will establish an improved screening paradigm that will allow for the pooled interrogation of the contributions of thousands of proteins to a phenotype of interest, will thus accelerate the rate at which we come to understand the poorly understood elements of the protein-coding genome. These efforts will be well supported by the outstanding resources for experimentation and mentorship at both the University of Pennsylvania and the Children?s Hospital of Philadelphia, and will provide excellent training in experimental techniques for protein perturbation and characterization, as well computational literacy in the broadly useful field of machine learning.

Public Health Relevance

Proteins are the ultimate effectors of most cellular functions in health and disease. Yet, the functions of the majority proteins encoded by the human genome are very poorly understood. A novel high-throughput screening strategy leveraging rapid protein degradation will dramatically accelerate functional characterization of the human proteome.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Predoctoral Individual National Research Service Award (F31)
Project #
1F31HG011185-01
Application #
9989245
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Gatlin, Tina L
Project Start
2020-09-01
Project End
Budget Start
2020-09-01
Budget End
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104