We aim to provide a comprehensive foundation for development of an ultra-low-cost, ultra-fast nucleic acid polymer sequencing technology based on single-atom resolution transmission electron microscopy (TEM) of heavy atom-labeled nucleic acid polymers. Our particular approach is based on TEM imaging of ultra-dense (3 nm strand-to-strand spacing) parallel arrays of high molecular weight ssDNA molecules labeled base- selectively with heavy atoms. This will allow read lengths of at least ~150 kb and potentially as much as 2-4 Mb or more, with no special difficulties posed by highly repetitive DNA. With appropriate optimization, automation, and scaling, and with further funding beyond the scope of this proposal, this technology (""""""""TEM sequencing"""""""") will potentially enable human genome sequencing at significantly lower cost and with much greater speed and consensus accuracy/completeness than other proposed third- generation sequencing approaches. Our project will involve the optimization of our novel ssDNA array deposition protocol, improvement of imaging conditions and substrate quality, and subsequent design and building of a prototype TEM sequencing system with which we hope to demonstrate the approach's potential by delivering a human reference genome assembly that we believe may possess unprecedented consensus accuracy and completeness due to the inherently extreme read lengths and high coverage enabled by the approach.

Public Health Relevance

The pace and impact of biomedical research on improving human health may be greatly increased by the development of ultra-low-cost, ultra-high-quality genome sequencing. Our electron microscopy-based approach employs preparation and readout unbiased by sequence content with extremely long read lengths (at least 150,000 bases and potentially as much as 2-4 Mb), suggesting that nearly gapless assemblies will be achievable, shedding light on previously unassembled long repetitive regions and structural variations with potentially important roles in complex disease. Furthermore, our models indicate that TEM sequencing may enable sequencing of whole human genomes to >99.9999% consensus accuracy and completeness in <10 minutes/genome, at a cost of <$100, and thus its realization may have a broad, near- term, lasting impact on biomedical research. )

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
High Impact Research and Research Infrastructure Programs (RC2)
Project #
1RC2HG005592-01
Application #
7853623
Study Section
Special Emphasis Panel (ZHG1-HGR-N (O2))
Program Officer
Schloss, Jeffery
Project Start
2009-09-30
Project End
2011-07-31
Budget Start
2009-09-30
Budget End
2010-07-31
Support Year
1
Fiscal Year
2009
Total Cost
$1,230,309
Indirect Cost
Name
Harvard University
Department
Genetics
Type
Schools of Medicine
DUNS #
047006379
City
Boston
State
MA
Country
United States
Zip Code
02115
Payne, Andrew C; Andregg, Michael; Kemmish, Kent et al. (2013) Molecular threading: mechanical extraction, stretching and placement of DNA molecules from a liquid-air interface. PLoS One 8:e69058
Kanavarioti, Anastassia; Greenman, Kevin L; Hamalainen, Mark et al. (2012) Capillary electrophoretic separation-based approach to determine the labeling kinetics of oligodeoxynucleotides. Electrophoresis 33:3529-43