Molecular genetics, metagenomics, and bioinformatics are central to species/strain identification, virulence determination, pathogenicity characterization, and source attribution. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats have enabled the production of massive metagenomic data that are tangible for the detection of biological threats. Distilling meaningful information from millions of new genomic sequences presents serious challenges to bioinformaticians. Even though there have been intensive studies determining the taxonomical content of the sequences, there is a dearth of methods available to study the associations and interactions among metagenomic count data, human genomic data, and clinical outcomes. This project proposes to develop novel parametric and nonparametric methods for bacterial taxa identification, clinical outcome prediction, and bacterial community structure estimation. Taxa selection will be based on changes in both abundance and correlation structures. This project will also develop statistical learning methods for evaluating bacterial community dynamics and causal inference with longitudinal metagenomic data. Efficient computational methods for detecting gene-microbe interactions with integrated metagenomic and genomic data analysis will also be developed. The proposed methodologies and algorithms will be evaluated and validated with various simulation and publicly available metagenomic and genomic data.
The threat of terrorists or criminal use of pathogenic organisms and their toxins remains a great concern in the United States. Bioterrorism utilizes viruses, bacteria, fungi and toxins to cause mass sickness or death of people, animals, or agriculture. The analytical methods and software developed in this proposal are anticipated to provide an important bioinformatics resource for researchers who have a goal of using metagenomic data sources for the prevention of bioterrorism and the conviction of bioterrorists. In addition, methods and software developed in this project would be a valuable contribution to environmental and human metagenomic research, which could potentially have a broader impact, especially in public health research, as a myriad of diseases such as obesity, inflammatory bowel diseases, bacterial vaginosis, and cancer all have been associated with shifts in microbiota. Finally, this project will also contribute to the training of graduate students and postdoctoral researchers in a cutting-edge interdisciplinary research area that fuses knowledge of biology, statistics and computer science.