Advances in synthetic biology have accelerated to the point where the synthesis of entire genomes is now possible. However, the technologies for these feats are painstaking, and the production of a new chromosome or genome requires multiple years of effort, working from small fragments to ever larger assemblies. The cumbersome assembly process is due in large measure to the need to carry out an ordered series of hierarchical homologous recombination steps that proceed through transformations into organisms, primarily yeast. The speed (and ultimately scale) of large fragment assembly would be greatly improved if it were possible to routinely amplify very long stretches of DNA (> 100 kb) in vitro. To that end, this proposal is focused on the further development of a novel directed evolution method known as Compartmentalized Self-Replication (CSR), in which polymerases expressed in cells in emulsions undergo thermal cycling to amplify their own genes, to generate long read DNA polymerases that should prove capable of generating PCR amplicons > 100 kb in length, with few errors. To achieve this goal, we propose to develop a novel library construction method that most efficiently brings together sequence and structural domains from a variety of DNA polymerase variants to form diverse chimeras (Aim 1.1), and to sieve these libraries using improvements to CSR that will allow us to select for extreme processivity in yeast (Aim 1.2) and efficient error-correction (Aim 1.3). The variants that result will be characterized for their ability to synthesize long amplicons in vitro (Aim 2.1), for their fidelity (Aim 2.2), and for their detailed kinetic properties (Aim 2.3). Finally, to better ensure the processivity of the resultant polymerase chimeras, we will append either DNA-binding domains (Aim 3.1) or clamps (Aim 3.2) that should lead to much better ability to grip DNA. In addition to accelerating the ongoing revolution in genome synthesis, such long-read polymerases should also pave the way to new sequencing technologies, including for single molecule sequencing and for single cell sequencing.

Public Health Relevance

By developing a DNA polymerase that can copy and amplify DNA over very long (chromosome-sized) stretches, we will generate a tool for the research community, industry, and medicine that can be used to better create genomes from scratch, and read long stretches of DNA sequence information. The former application will abet the growing field of synthetic biology as it crafts new organisms, while the latter application should foment additional advances in Next Generation DNA sequencing. While there is ample activity in both spheres, the lack of an enzyme that can amplify long stretches of DNA is a hard ceiling that slows further advances.

National Institute of Health (NIH)
National Institute of Biomedical Imaging and Bioengineering (NIBIB)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Rampulla, David
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Texas Austin
Schools of Arts and Sciences
United States
Zip Code