The sequences and concentrations of nucleic acid molecules within a sample hold vast amounts of scientific as well as clinical information that can be used to understand pathways and inform treatment. However, our current tools for nucleic acid sequence analysis fall orders of magnitude short of analyzing all 1017 nucleotides of DNA within a typical 1 mL sample of human blood. Enrichment, i.e. the selective capture/retention of desired DNA loci or sequences, is crucial to effective and rapid next-generation sequencing (NGS) of DNA and RNA samples. Current enrichment techniques (predominantly multiplexed PCR, hybrid capture, and molecular inversion probes) all suffer from limited uniformity of capture and limited capture specificity. Th first limitation results in poor quantitation of sequences relative to one another (e.g. copy number variations), and the second limitation results downstream in wasted NGS reads. Due to the high multiplexing requirement of most enrichment applications, it is generally difficult to systematically optimize either rationally or empirically, due to the large number of potential interactions between probes and target sequences. The PI proposes to develop novel hybridization probes and systems to allow multiplexed capture and enrichment of DNA and RNA sequences. Unlike previous hybrid capture techniques, the PI's approach focuses on probes with custom designable kinetics of hybridization to different sequences, and seeks to utilize precise predictive understanding to design probes that produce desired sequence capture behavior at particular points in time. By using differential hybridization kinetics, the research team will be able to achieve complex pre-equilibrium enrichment distributions that cannot be achieved at equilibrium. The research team will use a uniquely knowledge-driven design process, based on biophysical models of nucleic acids, and use only minimal empirical optimization. To further enhance the predictability of new capture probe set design, the team will also use novel methods to quickly and more accurately measure nucleic acid thermodynamics and kinetics at native conditions.

Public Health Relevance

There are roughly 1017, or one hundred million billion, nucleotides of DNA in every milliliter of human blood, and next-generation sequencing (NGS) is roughly a factor of 10 million short of being able to sequence all of that. Consequently, we need a way of enriching the sample for DNA of interest by removing the vast majority of the healthy human DNA that does not provide meaningful scientific or clinical information. This project proposes new, rationally-designed reagents to enrich for specific sequences and classes of sequences with better uniformity and specificity than prior methods, based on customizable binding speeds (kinetics).

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Smith, Michael
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Rice University
Biomedical Engineering
Biomed Engr/Col Engr/Engr Sta
United States
Zip Code
Pinto, Alessandro; Chen, Sherry X; Zhang, David Yu (2018) Simultaneous and stoichiometric purification of hundreds of oligonucleotides. Nat Commun 9:2467
Zhang, Jinny X; Fang, John Z; Duan, Wei et al. (2018) Predicting DNA hybridization kinetics from sequence. Nat Chem 10:91-98
Wang, Juexiao Sherry; Yan, Yan Helen; Zhang, David Yu (2017) Modular probes for enriching and detecting complex nucleic acid sequences. Nat Chem 9:1222-1228