Regulatory RNA genes pervade bacteria. Our understanding of these noncoding genes has increased dramatically in recent years, thanks, in part, to advances in high-throughput sequencing technology. High-throughput sequencing technology enables, among other things, experiments that produce massive amounts of data about RNA transcripts in bacteria. However, processing the large resulting data sets from high-throughput sequencing experiments can be a bottleneck in biological and medical research studies, partly because existing methods are insufficient for analyzing these data sets from bacteria. This project aims to develop new algorithms for correcting errors in the large sets of data generated from bacterial high-throughput sequencing experiments. Further, a computational system will be designed for managing and analyzing the sequencing data, with the aim of systematically annotating evinced transcripts. Finally, since many RNA genes in bacteria act as regulators of other transcripts, novel methods will be developed to identify the interactions between these noncoding RNAs and their regulatory targets. The methods developed will be applied and evaluated in several different bacterial systems.
High-throughput sequencing experiments can provide information about gene expression in human pathogens during infection, but existing computational methods for processing the information are insufficient. In this project, computational tools and methods will be developed for analyzing high-throughput sequencing data, and these new methods will be evaluated on data collected from various bacterial organisms, including Escherichia coli, Neisseria gonorrhoeae, and Vibrio cholerae. More broadly, the computational infrastructure developed in this project will serve as a resource to biological and medical researchers studying myriad bacteria that are human pathogens.