MRI/Dev.: Novel Computing Instrument for Big Data Project Proposed: This project, developing CompGen, an instrument that adopts a hardware-software co-design approach, aims to provide a - Vehicle for biologists and computer scientists to collaborate and develop new algorithms that are significantly faster and more accurate at a scale essential for handling the data deluge; - Software framework and tool set for algorithm development that support diverse data analysis and visualization; - Framework for developing accelerators and mapping to heterogeneous computational resources and hierarchical database storage. Promising technologies include emerging die-stacked and non-volatile memory technologies as well as accelerators (GPUs, FPGAs, APUs). The project brings together a multidisciplinary team of geneticists, bioinformatics specialists, computer and algorithms designers, and data mining experts. The research to be enabled includes a wide and eclectic variety of problems with direct impact on health and social issues. Some directions include understanding the impact of climate change on gene expression and ecosystems, bringing genetic analysis into medical clinics, identifying effective antibiotics, and exploring socio-genomics relations between stress, depression, and genetics among low-income African-American mothers. CompGen provides an environment that enables managing and processing genomic information and developing new algorithms. The instrument brings disruptive computing architectures and algorithmic techniques to facilitate analysis of genomic data while providing high accuracy results, resilience to errors, and scalability with growing volumes of data. It enables addressing the challenges of scale and diversity in genomic data through the development of new algorithms, models, and statistical methods. The instrument development focuses on reduction of data volume, optimization of storage hierarchy, identification and implementation of computational primitives, data visualization, mathematical toolkit optimization, and performance and reliability assessment. These developments are expected to lead to new computational structures and hardware/software architectures that can be incorporated into hierarchical databases as well as heterogeneous processors for data analysis, compression, and optimization. Broader Impacts: In addition to serving many areas, CompGen will serve as a tool for educating students and professionals in efficient ways to process and analyze genomic data and for handling big data in general. The instrument will serve multidisciplinary classes in which students gain hands-on research experience and introductory classes that expose students to applications and tools. Existing outreach and education programs will be utilized to expose the instrument. Plans include Open House events attracting thousands of visitors, Coursera courses, and minority outreach workshops. A mentoring tool, Mytri, will be used for networking among female students. Moreover, the CompGen design will be made available to others by fundamentally changing the methods by which big datasets are handled in genomics research. To this effect, an R&D consortium of hospitals, companies, and universities has been established to help identify needs, provide sources of data, act as early adopters, and ensure that new technologies are transferred smoothly into widespread use.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1337732
Program Officer
Rita Rodriguez
Project Start
Project End
Budget Start
2013-09-15
Budget End
2019-08-31
Support Year
Fiscal Year
2013
Total Cost
$1,800,000
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820