Advances in DNA sequencing technology have now made it practical and affordable to generate datasets containing millions of genetic attributes that can be tested for association with disease susceptibility. The computational complexity of searching for genetic interactions over such high- dimensional datasets imposes great challenges for genome-wide association studies (GWAS). In Phase I of this project, the Parabon research team, led by principal investigator Dr. Jason Moore of Dartmouth College Geisel School of Medicine, began addressing these bottlenecks by developing a distributed software service for analyzing gene-gene interactions over large GWAS datasets. In particular, the multifactor dimensionality reduction (MDR) algorithm was adapted for use in the Parabon(R) Crush genome mining application. MDR was augmented to employ Crush's opportunistic evolution search algorithm to enable deep, cloud-powered search, across thousands of compute nodes, to identify complex patterns of gene-gene interaction associated with human disease endpoints or forensically relevant traits. The resultant Crush-MDR Software as a Service (SaaS) application, which is available as an online cloud service or in-house enterprise application, was validated and shown to have excellent performance characteristics using simulated GWAS data, and then used to analyze a dataset from the Alzheimer's Disease Neuroimaging Initiative. In Phase II, the Parabon development team will extend the analytical capabilities of the Crush-MDR service and address other GWAS and next-generation sequencing (NGS) bottlenecks by enhancing its Parabon(R) Frontier(R) Compute Platform, a commercial cloud computing platform designed for high-performance computing (HPC) applications. Our overall objective (which was derived from interactions with prospective customers) is to produce a Platform as a Service (PaaS) solution that will greatly accelerate bioinformatics research by providing a comprehensive set of cloud services that collectively address many common bioinformatics bottlenecks and barriers to collaboration.
There is tremendous opportunity to identify new genetic risk factors for common human diseases given the availability of powerful, new DNA sequencing technology. We have established a novel software platform that can offer users the ability to identify combinations of genetic risk factors using high-performance computing as an online service. We will extend this software platform in new and novel ways, and introduce it to the industrial and academic communities as a commercial product.
Moore, Jason H; Andrews, Peter C; Olson, Randal S et al. (2017) Grid-based stochastic search for hierarchical gene-gene interactions in population-based genetic studies of common human diseases. BioData Min 10:19 |
Moore, Jason H (2015) The critical need for computational methods and software for simulating complex genetic and genomic data. Genet Epidemiol 39:1 |
Moore, Jason H; Amos, Ryan; Kiralis, Jeff et al. (2015) Heuristic identification of biological architectures for simulating complex hierarchical genetic interactions. Genet Epidemiol 39:25-34 |