The study of genomes is a critical and rapidly growing component in understanding the variability of life, biological functions, population dynamics, and how organisms respond to external influences. Genomics has qualitatively improved our ability to investigate biological dynamics and to make important discoveries that are the foundations for understanding topics such as environmental change, developing and protecting crops, and improving health outcomes. Genome analysis, however, is a significant challenge for the practicing biologist. Most biologists who need to undertake genome science are not sufficiently expert in the relevant analytical tools, or understand the complex workflows required to get from the initial data generated by sequencers to a biologically meaningful analyzed result. In addition, few have access to the supercomputing resources and large-scale storage required for processing and managing genomics data. The National Center for Genome Analysis Support (NCGAS) addresses these challenges by providing an integrated service comprised of expert consulting and educational services, hardened and optimized software available through easy to use web-based workflow management tools, large memory supercomputers, and large scale data storage and publishing facilities. These resources are particularly useful for researchers from smaller, and minority serving, institutions that typically do not have access to the required expertise and cyberinfrastructure, yet whose investigations are equally important. Since its inception in 2011, the NCGAS has supported over 80 research projects representing over $61M in funded research. It engaged in 51 training events that served 691 individuals, of which 241 were from traditionally underserved populations.
The NCGAS (http://ncgas.org) was established in 2011 through a National Science Foundation ABI development award to help the national research community complete genomics research that requires data management and computational infrastructures at scale. NCGAS is a partnership among the Indiana University Pervasive Technology Institute, the Pittsburgh Supercomputing Center, the Texas Advanced Computing Center, and the San Diego Supercomputing Center. It meets the technology challenges of modern genome science by providing excellent bioinformatics consulting services for genome analysis, particularly genome and transcriptome assembly, including research design, data analysis and visualization. It optimizes, supports, and delivers genome analysis software on national supercomputing systems such as those funded by the NSF eXtreme Digital (XD) program and coordinated by the eXtreme Science and Engineering Discovery Environment (XSEDE) and the Open Science Grid (OSG). The NCGAS maintains and supports easy-to-use gateways, including Galaxy web portals, for genome analysis workflows that lower barriers for scientist to create, execute, document, and share genomics analyses. It distributes software tools for genome analysis to research computing facilities and the general research community so that IT managers can more easily install these tools on their systems. It provides long-term archival storage services. The NCGAS provides a digital library resource for the dissemination of data sets, publications, reports, or collections of files that will allow research to be visible and data to be re-used for decades to come. It delivers education and outreach programs on genome analysis, interpretation, and data management to biology faculty and students nationally. These programs will enhance the technology literacy of practicing scientists and help grow the bioinformatics workforce. These services are particularly available to smaller institutions across the country without access to supercomputers, bioinformatics expertise, or training. The NCGAS will enable breakthroughs that would not be possible without advanced cyberinfrastructure support.