Genomics GPUs and next generation computational statistics

Sobel, Eric

Abstract

With computational demands in genetics growing exponentially, concerns are rising whether traditional CPUs can deliver the needed computing power. Parallel computing has been touted for several years, but massively parallel CPU computers are enormously expensive and limited to a few national centers. Graphics processing units (GPUs) offer a far cheaper and more distributed solution. Hundreds of these units are fabricated on a single card, and several cards fit inside a desktop computer. Thus, cheap hardware currently exists that promises a hundred-fold speedup of many basic algorithms. Projections from the vendors of GPUs suggest that these devices will grow rapidly in computational power and versatility over the next decade. Thus, software development is the main hurdle hindering the exploitation of GPUs. This proposal targets this weak link in the chain of modern computing. Through a series of demonstration projects and the production of low-level software libraries, we hope to catalyze the spread of GPUs in genetics. The specific projects include: 1) eQTL mapping, 2) variance component models for QTL mapping, 3) genotype and haplotype construction, 4) estimation of ethnic admixture, 5) isoform discovery through RNA-Seq technology, 6) computation of genetic landscapes and clines, 7) construction of gene networks from random multigraphs, and 8) design of new parallel algorithms for data mining. High-dimensional optimization is a common thread enabling all of these applications. Our previous research on optimization has demonstrated the efficacy of four fundamental ideas, namely, penalized estimation, coordinate descent, the MM (majorization-minimization) principle, and separation of parameters. These ideas also propel parallel computing. Implementation of our demonstration projects on GPUs will require the production of subroutines of considerable general value in computational statistics. We intend to release our toolbox libraries to the open source community, including C/C++, Fortran, and R software wrappers. This may lead to a multiplier effect that will improve the computing climate in many disciplines throughout the health and physical sciences. All other application programs produced under this proposal will be freely distributed to the scientific community. Our record of producing and distributing usable software with superior documentation shows our commitment to this philosophy.

Public Health Relevance

The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, our ability to collect genetic information has currently far outstripped our ability to make use of this information in understanding the basis of disease and human diversity. Our aim is to develop, implement, and freely distribute new, more efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers'ability to map and characterize genes that lead to human diseases and to trait variation.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG006139-02
Application #: 8324508
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Bonazzi, Vivien

Project Start: 2011-08-26
Project End: 2015-06-30
Budget Start: 2012-07-01
Budget End: 2013-06-30
Support Year: 2
Fiscal Year: 2012
Total Cost: $359,174
Indirect Cost: $104,401

Institution

Name: University of California Los Angeles
Department: Genetics
Type: Schools of Medicine
DUNS #: 092530369

City: Los Angeles
State: CA
Country: United States
Zip Code: 90095

Related projects


NIH 2020 R01 HG	Genomics, EHRs, GPUs, and Next Generation Computational Statistics Sobel, Eric / University of California Los Angeles
NIH 2017 R01 HG	Genomics, GPUs, and Next Generation Computational Statistics Sobel, Eric / University of California Los Angeles
NIH 2016 R01 HG	Genomics, GPUs, and Next Generation Computational Statistics Sobel, Eric / University of California Los Angeles
NIH 2015 R01 HG	Genomics, GPUs, and Next Generation Computational Statistics Sobel, Eric / University of California Los Angeles
NIH 2014 R01 HG	Genomics GPUs and next generation computational statistics Sobel, Eric / University of California Los Angeles	$349,937
NIH 2013 R01 HG	Genomics GPUs and next generation computational statistics Sobel, Eric / University of California Los Angeles	$341,953
NIH 2012 R01 HG	Genomics GPUs and next generation computational statistics Sobel, Eric / University of California Los Angeles	$359,174
NIH 2011 R01 HG	Genomics GPUs and next generation computational statistics Sobel, Eric / University of California Los Angeles	$359,971

Publications

Ho, Lam Si Tung; Xu, Jason; Crawford, Forrest W et al. (2018) Birth/birth-death processes and their computable transition probabilities with biological applications. J Math Biol 76:911-944

Tolkoff, Max R; Alfaro, Michael E; Baele, Guy et al. (2018) Phylogenetic Factor Analysis. Syst Biol 67:384-399

Crawford, Forrest W; Ho, Lam Si Tung; Suchard, Marc A (2018) Computational methods for birth-death processes. Wiley Interdiscip Rev Comput Stat 10:

Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor et al. (2018) Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza. Stat Med 37:195-206

Suchard, Marc A; Lemey, Philippe; Baele, Guy et al. (2018) Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol 4:vey016

Dudas, Gytis; Carvalho, Luiz Max; Bedford, Trevor et al. (2017) Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature 544:309-315

Keys, Kevin L; Chen, Gary K; Lange, Kenneth (2017) Iterative hard thresholding for model selection in genome-wide association studies. Genet Epidemiol 41:756-768

Baele, Guy; Lemey, Philippe; Rambaut, Andrew et al. (2017) Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST. Bioinformatics 33:1798-1805

Zhang, Yiwen; Zhou, Hua; Zhou, Jin et al. (2017) Regression Models For Multivariate Count Data. J Comput Graph Stat 26:1-13

Baele, Guy; Suchard, Marc A; Rambaut, Andrew et al. (2017) Emerging Concepts of Data Integration in Pathogen Phylodynamics. Syst Biol 66:e47-e65

Showing the most recent 10 out of 85 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: