Genomics has revolutionized research into infectious diseases and is poised to revolutionize the clinic. Through the activities in this technology core, we will provide high-throughput genome sequencing and analysis focused on understanding host, pathogen, and microbiome interactions as determinants of disease outcome. We will provide state-of-the-art, large-scale, high-throughput sequencing data for analysis of genomes, transcriptomes, metagenomes, metatranscriptomes, and microRNAs using the best methodologies and technologies available. We will use a portion of our combined annual sequencing capacity, which is now >38 terabases of high-quality, passed-filter data per year, to sequence infectious disease agents, their hosts, and their vectors. Continued cost savings while maintaining high quality and quantity sequencing will be obtained through: (a) the extensive use of standard operating procedures (SOPs) and (b) evaluation, development, and incorporation of enhancements, modifications and improvements as they become available. This will include the addition of new sequencing platforms, more efficient protocols, and more robust methodologies. In order to better sequence host/microbe mixed specimens, we propose to test targeted enrichment systems on PacBio genomic libraries and on lllumina transcriptome libraries to measure expression. We will continue to improve our abilities to assemble genomes/transcriptomes, identify open reading frames, and annotate gene function, particularly with respect to metagenome and metatranscriptome data. We will provide bioinformatic pipelines to analyze data types that transect multiple projects including measuring genetic variation in populations and communities. Comparative genomics will be enabled through pipelines for ortholog predictions and pan genome analyses. Transcriptome analyses will include RNAseq data alignment and visualization, differential expression analysis, heterogeneous RNAseq analysis, novel transcript identification, and miRNAseq analysis. Analysis of microbiome data will include analysis of 16S rRNA amplicons, whole metagenome shotgun classification, and metatranscriptome analysis. Stable, robust pipelines with the potential to have long lasting value will be implemented in the Cloud Virtual Resource (CloVR) to be distributed to the research community as easy-to-use virtual machines. When multiple data types are available, they will be integrated and visualized using Sybil and Circleator.

Public Health Relevance

Infectious agents are leading causes of death worldwide with the potential to be preventable through vaccination. In the US, infectious respiratory diseases are a leading cause of death. This core will provide data and analysis to support the research of some of the nation's pre-eminent genomics researchers focused on developing new diagnostics, treatment, and vaccines for the fight against infectious disease.

National Institute of Health (NIH)
Research Program--Cooperative Agreements (U19)
Project #
Application #
Study Section
Special Emphasis Panel (ZAI1)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Maryland Baltimore
United States
Zip Code
Tallon, Luke J; Liu, Xinyue; Bennuru, Sasisekhar et al. (2014) Single molecule sequencing and genome assembly of a clinical specimen of Loa loa, the causative agent of loiasis. BMC Genomics 15:788
Crabtree, Jonathan; Agrawal, Sonia; Mahurkar, Anup et al. (2014) Circleator: flexible circular visualization of genome-associated data with BioPerl and SVG. Bioinformatics 30:3125-7