With the exceptional growth of data generated from social and economic networks and from wide-scale scientific experiments in physics, biology and astronomy, the problem of efficient and durable data storage has become of paramount importance. To address this challenge, various new solutions for ultra-high density storage media have been considered, including macromolecular (DNA) recording systems. DNA-based systems are inherently nonvolatile, they retain information under standard environmental conditions for thousands of years, and they offer unprecedented information densities and fast data replication/copying rates. This project is concerned with developing new error control coding solutions for DNA-based storage systems and various DNA synthesis (writing) and DNA sequencing (reading) models and technologies. The project includes plans for curriculum development as well as outreach to industry partners and interdisciplinary collaborators.
Error control coding is a critical aspect of all storage systems. The research in this project introduces and develops error control coding techniques for DNA storage including DNA Profile Codes, Asymmetric Lee Distance Codes, and Weakly Mutually Uncorrelated Codes. All of these techniques are expected to improve the accuracy of DNA storage systems by adapting the stored sequences to the media, avoiding common sequencing errors, and simplifying the process of DNA synthesis. These coding methods are expected to allow for random access and reduce the implementation cost of the technology. In turn, this will enable miniaturization of the write system, minimize sequencing error rates, and significantly speed up the process of DNA assembly.