Innovative low-cost and low-input DNA library preparation techniques, using microfluidic molecular barcoding, have been developed. These technologies, also known as Linked-Read sequencing, offer additional long-range information over standard short-reads that are present on the same DNA molecule. The limited number of available tools do not meet the fast-growing needs of the community, because they focus on simple applications, lack the desired sensitivity, and are computationally expensive. This project will develop sensitive algorithms for mapping Linked-Reads to the reference genome. The method has also potential in solving other challenging problems such as the accurate discovery and assembly of structural variants and resolving repeat regions in the genome.

While there are an enormous amount of computational methods using standard short-read sequencing technologies, there is yet no comprehensive set of scalable algorithms that can be applied effectively to Linked-Read sequencing technologies. This project will develop novel probabilistic models and algorithms to address several computationally challenging problems, leveraging Linked-Read sequencing technologies. In particular, the project will develop a novel probabilistic framework and a sensitive algorithm for mapping Linked-Reads to the reference genome with applications to sensitive structural variation detection. Successful completion of this project will provide fast and scalable computational methods that can be applied to large-scale whole-genome data. The planned methods, tools and related test data sets will be made freely available online with extensive documentation on GitHub for the community to use and benefit.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Standard Grant (Standard)
Application #
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Joan and Sanford I. Weill Medical College of Cornell University
New York
United States
Zip Code