Life scientists rely heavily on engineered DNA molecules in basic research and the development of new biotechnology products. Proper documentation of these molecules is critical to ensure the reproducibility and safety of the processes in which they are used. Documentation generally comes in the form of standard computer files. However, there is no direct linkage between the DNA molecules and the files that describe them. Keeping track of the association between a DNA molecule and the computer file describing it proves surprisingly challenging because DNA molecules can mutate on their own or be modified at the molecular level. Similarly, biologists can edit bioinformatics files using bioinformatics software. As a result, there are widespread discrepancies between the physical sequences of the molecules in circulation in the life sciences community (both commercial and research) and their supposed reference sequence. Many synthetic biologists have the experience of wasting weeks or months working with plasmids that turned out to differ from what they were expecting. The project will focus on embedding the plasmid documentation in the DNA molecule itself using methods from computer security. The project supports the training of a postdoc in chemical engineering, a computer science graduate student, and an undergraduate student majoring in biochemistry. It will also initiate a community effort to evaluate the possible application of self-documenting plasmids to securing the synthetic biology supply chain. It will also support outreach efforts to educate the general public through news publications and YouTube videos. Software will be released open-source, plasmids will be deposited with Addgene a community repository, and data will be made available through Figshare.

The project will develop DNAdoc, a software designed to ensure the identity and fidelity of plasmids and other synthetic DNA molecules used in life science research. DNAdoc will allow developers of plasmids and other synthetic DNA to capture essential data within the plasmid itself and enable users to retrieve this information from sequencing data. These self-documenting plasmids will be composed of two types of sequences 1) functional sequences, which encode genes and other functional elements and 2) documenting sequences whose only purpose is to encode information about the functional sequences. The integrity of the structural link between DNA sequences coding for biological functions and sequences documenting these functions will be ensured by cybersecurity techniques including digital signature protocols and error correction codes. The project includes two objectives. Objective 1 is focused on writing self-documenting plasmids and characterizing the performance penalty arising from the insertion of documenting sequences. The focus of Objective 2 is to develop a bioinformatics workflow producing a complete and high-quality sequence file from Illumina sequencing data. Information on the progress of this research will be found at https://peccoud.org .

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
1934573
Program Officer
Jean Gao
Project Start
Project End
Budget Start
2019-08-01
Budget End
2021-07-31
Support Year
Fiscal Year
2019
Total Cost
$300,000
Indirect Cost
Name
Colorado State University-Fort Collins
Department
Type
DUNS #
City
Fort Collins
State
CO
Country
United States
Zip Code
80523