Globus Genomics has been developed at the Computation Institute, University of Chicago as an advanced genomics analysis platform running as a Software-as-a-Service on Amazon Web Services, powered by Globus and Galaxy. It was developed to meet the needs of both researchers and core lab providers who require a high-quality service with state-of-the-art capabilities to help streamline data movement, simplify the creation of genomics analysis pipelines, automate the execution of those pipelines, and run analysis at very large scale on elastic compute infrastructure. Globus Genomics, under development for three years, has been used extensively by researchers at leading institutions, including University of Washington, University of Chicago, Washington University St. Louis, Georgetown University, and Johns Hopkins. There is significant potential and demand to expand use of the service to meet the rapidly growing genomics analysis needs of both existing users and large communities of new users. We now propose work that will amplify the utility and impact of Globus Genomics by providing (1) scalability to 1000s of simultaneous analyses by 1000s of users, (2) support for state-of-art high performance workflows and tools, including large-scale imputation analysis and consensus calling on structural variants, (3) automated cost and performance optimization to slash cloud computing costs and turnaround times, and (4) powerful dashboards for end-to-end and summary views of large-scale analyses. These enhancements will be enabled by development in the following key areas: enhancing and extending the Globus Genomics computational framework to enable high-performance reliable execution of standard and novel NGS analysis workflows on large and extremely large datasets; creating and maintaining state-of-the-art pipelines for variant calling, whole genome analysis, RNASeq and ChipSeq, which involves computationally profiling the latest versions of tools and understanding different computational modalities for optimal execution on Amazon Web Services; creating a profiling and optimization framework to enable automated, cost- and/or time-optimal configuration of NGS applications and workflows on large cloud systems; and creating an automatic computational provisioning framework. The grant award would allow us to address key needs of current and prospective users and thus to provide an important bioinformatics platform to researchers who otherwise could not easily access such capabilities.

Public Health Relevance

More than 300 researchers across 25 universities and research organizations, in such fields as neurodevelopmental disorders, cancer, diabetes, and cardiovascular disease, have leveraged Globus Genomics to analyze multiple terabytes of sequence data. We propose here to expand the performance and utility of this software system by creating optimzed, scalable analysis pipelines for exome, whole genome and RNASeq datasets, scale up the analysis by adopting best of the breed computational technologies, and validate our efforts by working closely with our existing users.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Di Francesco, Valentina
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Chicago
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Jagodnik, Kathleen M; Koplev, Simon; Jenkins, Sherry L et al. (2017) Developing a framework for digital objects in the Big Data to Knowledge (BD2K) commons: Report from the Commons Framework Pilots workshop. J Biomed Inform 71:49-57
Al-Khersan, Hasenin; Shah, Kaanan P; Jung, Segun C et al. (2017) A novel MERTK mutation causing retinitis pigmentosa. Graefes Arch Clin Exp Ophthalmol 255:1613-1619