The objective of this project is to develop algorithms for new and emerging high-throughput DNA sequencing technologies. These technologies are lowering the cost of DNA sequencing by orders of magnitude and thereby enabling a variety of new applications. These new applications, combined with the varying characteristics of the DNA sequences produced by these technologies, are increasing demand for efficient algorithms to interpret the resulting large volumes of DNA sequence data. The PI will develop a new class of robust algorithms for genome assembly and discovery of DNA sequence variants. Some of these algorithms will rely on the availability of a closely related reference genome sequence, while others will operate de novo directly from the individual DNA sequences (i.e. reads) produced by a DNA sequencing machine. In the latter case, the PI will design algorithms that exploit longer range DNA sequence information available in newer single-molecule and nanopore sequencing technologies. These algorithms will retain high sensitivity and specificity while scaling to billions-trillions of nucleotides and thousands of genomes. Finally, the PI will introduce combinatorial algorithms for the study of genome rearrangements in heterogeneous mixtures of DNA sequences. Such mixtures arise in metagenomics or cancer genomics, where the DNA that is sequenced is a mixture of genomes from different species, or from cells harboring different mutations, respectively. The PI collaborates closely with biologists and technology developers to ensure relevance and applicability of the algorithms. At the same time, some of algorithms and techniques from graph theory, combinatorial optimization, and probability that are developed in the proposal are applicable to problems outside of biology. Broader Impact The proposed research will be integrated with an educational component that includes the development of undergraduate seminar in personal genomics, a summer research experience in computational biology for high-school students, and the incorporation of a computational biology module into a summer computing camp for 9th grade girls. The PI will continue to actively mentor and recruit undergraduate and graduate students, including women and underrepresented minorities. Finally, software implementing the algorithms will be freely distributed to the scientific community through a public webserver.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1053753
Program Officer
Mitra Basu
Project Start
Project End
Budget Start
2011-01-01
Budget End
2017-06-30
Support Year
Fiscal Year
2010
Total Cost
$461,790
Indirect Cost
Name
Brown University
Department
Type
DUNS #
City
Providence
State
RI
Country
United States
Zip Code
02912