Recent declines in the cost of DNA sequencing have enabled biologists to conduct experiments that produce very large DNA and protein sequence data sets. Understanding this data requires computational analyses to recognize known sequences and group new ones by similarity. As data sets grow, these analyses become a serious bottleneck to progress. Computer scientists have therefore tried to accelerate sequence analysis using hybrid computing architectures that combine multicore CPUs with accelerators, such as field-programmable gate arrays and graphics engines, whose performance equals that of tens or hundreds of CPU cores. To more effectively accelerate biosequence analysis tasks, new infrastructure is needed to facilitate both development of accelerated analytical tools and their deployment to biologists.
This project is a planning effort to create development and deployment infrastructure for accelerated biosequence analysis applications. The PIs are developing design criteria for a preferred hardware platform and set of software tools to speed the creation, validation, and deployment of biosequence accelerators. Key activities include qualifying hardware platforms, developing prototype software and firmware, and consulting developer and user communities for accelerated sequence analysis tools to guide the planning effort. In particular, the PIs are organizing a special track at a major accelerator design conference to solicit input on proposed infrastructure.
Developing the proposed infrastructure will stimulate creation of biosequence analysis accelerators with low cost, rapid deployment, and a large supporting developer and user community. More agile development will boost adoption of accelerators by biologists, empowering labs to analyze massive biosequence data sets and speeding discovery.
This project is a planning effort for a major new infrastructure for building and deploying accelerated bioinformatics computations. The infrastructure will enable rapid development of new streaming computations in biosequence analysis and related fields for deployment on a mixture of conventional CPUs, field-programmable gate arrays (FPGAs), and graphics processors (GPUs), with user interfaces suitable for the biological end-user community. Key aspects of the planning effort include (1) recruitment of a community of bioinformatics programmers and end-users for the new infrastructure, and (2) development of prototype hardware, software, and tools to evaluate the fitness of various approaches and components for the full system. With respect to community recruitment, we identified a cohort of accelerator developers from multiple institutions, including Northeastern University, UNC Charlotte, and the University of Florida, who are interested in working with us on the development, testing, and deployment of the new infrastructure. We also recruited an initial cohort of bioinformatics and biological users for the platform with no accelerator expertise, from both Washington University and Columbia University. Subsequently, we have been invited to join CHREC, the NSF Center for High-Performance Reconfigurable Computing, and are working to develop the appropriate indusrial relationships to prepare that formal proposal to NSF. With respect to prototype development, our group explored prototype hardware and software for the planned platform by developing firmware for one of the proposed hardware substrates (the GiDEL ProcStar cards) and software extensions for the Auto-Pipe system, which is planned as a substrate for streaming computations on the platform. In addition to the above work, which was directly funded by the current award, we made additional advances in modeling and optimization of streaming applications, with the assistance of doctoral students supported on this and related awards. The common thread was the question of how to effectively model streaming applications of the type we are developing, and how to use these models to improve the applications' performance. Our work directly supported one doctoral student, who defended his PhD in September 2014, as well as two research staff, and provided research opportunities for two other doctoral students, one of whom defended her PhD in August 2013.