Advances in DNA sequencing technology have now made it practical and affordable to generate datasets containing millions of genetic attributes that can be tested for association with disease susceptibility. The computational complexity of searching for genetic interactions over such high- dimensional datasets imposes great challenges for genome-wide association studies (GWAS). In Phase I of this project, the Parabon research team, led by principal investigator Dr. Jason Moore of Dartmouth College Geisel School of Medicine, began addressing these bottlenecks by developing a distributed software service for analyzing gene-gene interactions over large GWAS datasets. In particular, the multifactor dimensionality reduction (MDR) algorithm was adapted for use in the Parabon(R) Crush"""""""" genome mining application. MDR was augmented to employ Crush's opportunistic evolution search algorithm to enable deep, cloud-powered search, across thousands of compute nodes, to identify complex patterns of gene-gene interaction associated with human disease endpoints or forensically relevant traits. The resultant Crush-MDR Software as a Service (SaaS) application, which is available as an online """"""""cloud"""""""" service or in-house enterprise application, was validated and shown to have excellent performance characteristics using simulated GWAS data, and then used to analyze a dataset from the Alzheimer's Disease Neuroimaging Initiative. In Phase II, the Parabon development team will extend the analytical capabilities of the Crush-MDR service and address other GWAS and next-generation sequencing (NGS) bottlenecks by enhancing its Parabon(R) Frontier(R) Compute Platform, a commercial cloud computing platform designed for high-performance computing (HPC) applications. Our overall objective (which was derived from interactions with prospective customers) is to produce a Platform as a Service (PaaS) solution that will greatly accelerate bioinformatics research by providing a comprehensive set of cloud services that collectively address many common bioinformatics bottlenecks and barriers to collaboration.

Public Health Relevance

There is tremendous opportunity to identify new genetic risk factors for common human diseases given the availability of powerful, new DNA sequencing technology. We have established a novel software platform that can offer users the ability to identify combinations of genetic risk factors using high-performance computing as an online service. We will extend this software platform in new and novel ways, and introduce it to the industrial and academic communities as a commercial product.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Small Business Technology Transfer (STTR) Grants - Phase II (R42)
Project #
2R42GM097765-03
Application #
8648112
Study Section
Special Emphasis Panel (ZRG1-IMST-K (14))
Program Officer
Lyster, Peter
Project Start
2011-09-19
Project End
2017-04-30
Budget Start
2014-06-02
Budget End
2015-04-30
Support Year
3
Fiscal Year
2014
Total Cost
$750,000
Indirect Cost
Name
Parabon Computation, Inc.
Department
Type
DUNS #
158679253
City
Reston
State
VA
Country
United States
Zip Code
20190