This NSF award supports development of an instrument that is capable of sequencing human and other genomes with unprecedented low cost, long read sections, and high speed and accuracy. To achieve this goal, the probe of an atomic force microscope (AFM) will be functionalized with a DNA polymerase, and the conformational perturbations of the polymerase during DNA synthesis will be monitored by the AFM in real-time. Because different nucleotides cause different conformational perturbations of the enzyme, the sequence of the template DNA will be read out directly from the order of unique conformational perturbations as it travels through the polymerase. Using instruments currently on the market, genome sequencing is highly expensive, which hampers many applications such as personalized medicine. With these instruments, only short read segments can be obtained, which results in high computational efforts in post-sequencing genome assembly. The short reads also cause problems for sequencing genomes that have long repeats. Existing methods require sample amplification by polymerase chain reaction (PCR). Because of inaccuracies introduced during sample amplification, the genome sequences are not accurate enough for applications such as disease diagnosis. During traditional sequencing, information on epigenetic DNA base modifications, which is linked to many important biological processes, is lost. The proposed AFM-based instrument is expected to overcome these problems. Because the new instrument uses single-molecular sequencing, the costs for DNA sample preparation and amplification in existing technologies will be minimized. In many known sequencing technologies, expensive reagents are required. The new technology may only need natural nucleotides, which will further reduce costs. The sample DNAs used for sequencing with the proposed instrument will not need amplification by PCR, thus the inaccuracies introduced during sample preparation in existing sequencing methods will be avoided. Because the sequences of DNAs are read out directly and continuously during DNA synthesis, the sequencing speed and read length will far exceed those of technologies currently on the market. Because original sample DNAs are used for sequencing and epigenetically modified nucleobases are predicted to cause different conformational fluctuations of polymerase from unmodified ones, the new instrument is expected to sequence genomes without losing any epigenetic base modification information.
The instrument will have a broad impact on many research areas such as human health, food, energy, environment, and national security, all of which demand sequencing the genomes of human, animals, plants, bacteria, viruses or other organisms. Besides sequencing, the instrument will also find application in DNA polymerase conformational dynamic studies giving data that cannot be obtained directly using known techniques. Similar instruments for studying other enzymes can also be readily made using the technologies developed in this project. These instruments will help to answer important fundamental questions on enzyme catalysis. Initially, the new sequencing service will be provided to biological research labs through collaborations. Later, the service and the instrument will be made commercially available to medical, academic, and commercial labs. The project is highly multidisciplinary. Three research groups that have expertise in biology, chemistry, and engineering will work together to develop the instrument. During this process, two postdoctoral researchers and at least two PhD students will gain extensive research experiences in these fields. In addition, three or more undergraduate students will also be trained. Some of these next generation scientists are expected to help the commercialization of the instrument and sequencing technology.