In the past few years, we have experienced a paradigm shift in human genetics. Accumulating lines of evidence have highlighted the pivotal role of rare genetic variations in a wide variety of traits and diseases. Studying rare variations is a needle in a haystack problem, as large cohorts have to be sequenced in order to trap the variations and gain statistical power. The performance of high throughput sequencing is exponentially growing, providing sufficient capacity to profile an extensive number of specimens. However, sample preparation schemes do not scale as sequencing capacity. A brute force approach of preparing hundredths to thousands of specimens for sequencing is cumbersome and cost-prohibited. The next challenge, therefore, is to develop a scalable technique that circumvents the bottleneck in sample preparation. We have recently devised a sequencing strategy, called DNA Sudoku, which has a tremendous potential to address this challenge. DNA Sudoku is based on pooling the specimens according to combinatorial patterns as a means of multiplexing, which substantially reduces costs of large scale experiments. We have demonstrated the feasibility of DNA Sudoku by profiling tens of thousands of bacterial colonies in a single sequencing run. We propose to leverage the success of our proof-of-concept study and to adapt DNA Sudoku for the study of rare genetic variations in large human cohorts. As a test case, we plan to use the strategy to study rare variations that are implicated in Jewish genetic diseases by sequencing risk loci of a cohort of 1000 ethnically matched individuals. This proactive approach will create, for the first time, a comprehensive catalogue of risk alleles and can have immediate clinical implications. Success of the proposed project has the potential for a far-reaching technological impact. It will introduce an ultra cost-effective sequencing strategy for a wide-variety of studies on monogenic and complex diseases, overcoming a critical barrier in genomics. Future developments in DNA sequencing will further increase the requirement for scalable multiplexing schemes, rendering our strategy a long lasting one, with rich theoretical foundations and plurality of applications.

Public Health Relevance

Finding individuals that carry disease alleles in their genome is a needle in a haystack problem. As a solution, we propose to develop an innovative and highly efficient DNA sequencing strategy that is based on recent breakthroughs in the field of signal processing and is reminiscent of solving Sudoku puzzles. The strategy will be used to address a long standing problem - creating a comprehensive catalogue of risk alleles for severe genetic diseases, which has immediate clinical implications and can prevent devastating cases.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Schloss, Jeffery
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Whitehead Institute for Biomedical Research
United States
Zip Code
Zielinski, Dina; Gordon, Assaf; Zaks, Benjamin L et al. (2014) iPipet: sample handling using a tablet. Nat Methods 11:784-5
Erlich, Yaniv; Narayanan, Arvind (2014) Routes for breaching and protecting genetic privacy. Nat Rev Genet 15:409-21
Golan, David; Erlich, Yaniv; Rosset, Saharon (2012) Weighted pooling--practical and cost-effective techniques for pooled high-throughput sequencing. Bioinformatics 28:i197-206
Gymrek, Melissa; Erlich, Yaniv (2011) Using DNA sequencers as stethoscopes. Genome Med 3:73