This Small Business Innovation Research (SBIR) Phase II project will build an FPGA based Bio-informatic appliance for processing DNA sequence data faster, at lower cost and with less power. Over the last decade the cost of sequencing a genome has dropped by six orders of magnitude and the throughput of the process has increased by five orders of magnitude. The trend shows no sign of abating and industry experts expect the $1,000 genome mark to be reached in the next year. The combination of lower prices and higher throughput has lead to what is being called "the data deluge" or the "the data tsunami". Taming this deluge has become a major issue in Bio-informatics and a principle bottleneck to further advances. The objective of this Phase II project is to contribute a solution to the processing problem based on Field Programmable Gate Arrays (FPGAs), non-conventional computing platforms that operate at significantly higher efficiency measured in cost and power per performance unit.
The mechanism of genetic coding, identified by Watson and Crick in 1953, was one of the premier scientific advances of the twentieth century. It took twenty more years to identify a feasible approach to decipher the genetic code of a particular individual and twenty more to actually implement it. The first human genome was sequenced in 2003. By 2010 less than 1,000 humans have been sequenced but rapidly decreasing costs and increasing throughput promise that the number will increase exponentially and medical researchers foresee the day in the near future when the whole population will be sequenced as part of standard medical practice. The advances that will be enabled by partial or full sequencing of the population will bring a revolution to health care ushering in an era of personal genetic based medicine. If successfully deployed, the proposed approach has the potential to address the so-called data deluge and bring about significant savings in both processing time and power consumption.
Network feeds related to social media, financial markets, intelligence and cybersecurity are all growing exponentially in message volume. On the receiving end of these information deluges are workstations and servers running high performance CPUs that are increasingly taxed and overloaded. The results are dropped information, overflowing buffers and in general an inability to do sophisticated inline processing of data. Financial markets are particularly sensitive to these trends as one of their key requirements is low-latency response. The goal of this NSF phase II project was to develop a high performance and low-latency approach to filter peak volume data feeds in hardware on the network card, offloading this work from the CPU and allowing it to proceed with other tasks. The result of our work is Avalanche, a suite of software and firmware for low-latency payload filtering on the SolarFlare ApplicationOnload Engine(AOE). The attached figure shows the capture rate of messages for both hardware filtering with Avalanche and standard software filtering. Without Avalanche, software filtering alone leads to near-zero capture rate at roughly 1.5M packets per second. Avalanche filters in hardware using a Field Programmable Gate Array(FPGA) which is part of the Solarflare AOE. Because the network feed processing is done in hardware it is deterministic, low-latency and able to handle the maximum network line rate. As the requirements of network feed handling and processing increase we believe that an increasing number of customers and applications will be drawn towards technologies such as that offered by Avalanche.