Despite extraordinary advances in genome engineering, tools for precise and efficient gene correction and delivery across all cell types remain lacking. Current programmable DNA cleavage tools, such as meganucleases, Zinc finger nucleases, transcription activator like effector nucleases (TALENs), and CRISPR- Cas9, rely on recruiting the DNA repair machinery, either using error-prone, non-homologous end joining (NHEJ) repair for gene knockout, or homology directed repair (HDR) for precise correction. However, HDR is inactive in post-mitotic cells, such as neurons, and is often inefficient, achieving 50% correction at most. Genome editing still lacks efficient, robust tools that can insert, delete, and recombine large stretches of DNA sequence. Moreover, delivery tools are a significant barrier to deploying tissue-specific genomic engineering technologies as current vehicles, including widely-used viral vectors and liposomal approaches, have limited capacity, offer variable efficiency, and lack precise tissue tropism. The proposed work involves computationally mining bacteria for new classes of gene editing enzymes and delivery vectors. We know that natural recombinases and transposases can mediate programmed DNA rearrangement and insertion, and these classes of enzymes are present in phage defense and mobile islands in bacteria, which largely remain uncharacterized. Additionally, recent work has demonstrated that retroviral/retrotransposon gag-like proteins can self-assemble and encapsulate nucleic acid in extracellular vesicles (EVs) and viral-like particles (VLPs) for cell-to-cell communication. As with defense islands, these proteins can also be systematically catalogued via a bioinformatic pipeline and experimentally characterized. The proposed work will focus on three main goals: 1) signatures for phage defense, mobile genetic activity, or VLP-forming activity will be mined to build a machine learning pipeline for comprehensively discovering novel gene clusters from novel metagenomic sequences with a focus for proteins that can manipulate nucleic acid or self-assemble, 2) candidate gene clusters will be cloned from metagenomic samples and undergo high-throughput screening using biochemical and bacterial assays for gene editing and capsid formation, as well as engineering to hone activity, and 3) the most promising candidates will be evaluated for activity in mammalian systems with assays for highly efficient gene insertion and VLP formation. The work will elucidate novel bacterial phage defense and VLP biology, and result in the development of new technologies for more efficient genetic manipulation and gene delivery. Moreover, this gene exploration and engineering framework will serve as a model for discovering diverse bacterial gene clusters and defense systems, evaluating biochemical activity across a range of assays, and converting these findings into high impact biotechnologies. The developed technologies will accelerate the pace of biomedical research and enable greater exploration of basic biological processes and disease mechanisms.

Public Health Relevance

There are more than 5,000 human diseases caused by known genetic variation, including mutations, insertions, and deletions, but programmable tools and delivery vectors to reliably study and model these diseases are lacking. We propose a two-pronged approach to discovering novel gene editing and delivery strategies by (1) mining bacterial and archaeal systems to identify novel defense systems and enzymes that have activities useful for genome editing and (2) identifying gene clusters linked to extracellular vesicle or viral-like particle formation that will allow development of better gene delivery systems. The technologies developed from this work will accelerate the study and modeling of disease biology and provide a framework for general prokaryotic gene discovery across many biotechnological areas.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Shabman, Reed Solomon
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Massachusetts Institute of Technology
Organized Research Units
United States
Zip Code