Next-generation sequencing refers to a collection of high throughput DNA sequencing technologies that have originated about a decade ago, and are now the de facto equipment underpinning all modern genomics studies due to their cost-effectiveness and ubiquity and versatility of use. This project is conducting comprehensive reproducibility and assessment experiments to characterize the state of the art in the field, and make the findings publicly visible and accessible. The project results are expected to become a valuable resource for practitioners, researchers, and the significantly large community of users of next generation sequencing bioinformatics. The project is involving several undergraduate students, and raising awareness of research integrity and reproducibility issues among young researchers. The project is establishing benchmark datasets to evaluate bioinformatics software for multiple next generation sequencers, multiple types of biological organisms, in multiple application contexts, and at multiple problem scales. The research spans assessment of software products for read error correction, read mapping to target genomes and reference databases, and assembly of genomes and transcriptomes. Reproducibility experiments are conducted to independently verify results of important software products based on results and datasets published in the literature. The software products are also evaluated on a range of metrics - quality of results, robustness and sensitivity to parameter values, run-time performance, memory usage, and ability to process real-world datasets. The project work will result in comprehensive recommendations available to practitioners as well as establishing state of the art to appropriately channel future research efforts.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
1718479
Program Officer
Almadena Chtchelkanova
Project Start
Project End
Budget Start
2017-07-15
Budget End
2021-06-30
Support Year
Fiscal Year
2017
Total Cost
$499,984
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332