With computational demands in genetics growing exponentially, concerns are rising whether traditional CPUs can deliver the needed computing power. Parallel computing has been touted for several years, but massively parallel CPU computers are enormously expensive and limited to a few national centers. Graphics processing units (GPUs) offer a far cheaper and more distributed solution. Hundreds of these units are fabricated on a single card, and several cards fit inside a desktop computer. Thus, cheap hardware currently exists that promises a hundred-fold speedup of many basic algorithms. Projections from the vendors of GPUs suggest that these devices will grow rapidly in computational power and versatility over the next decade. Thus, software development is the main hurdle hindering the exploitation of GPUs. This proposal targets this weak link in the chain of modern computing. Through a series of demonstration projects and the production of low-level software libraries, we hope to catalyze the spread of GPUs in genetics. The specific projects include: 1) eQTL mapping, 2) variance component models for QTL mapping, 3) genotype and haplotype construction, 4) estimation of ethnic admixture, 5) isoform discovery through RNA-Seq technology, 6) computation of genetic landscapes and clines, 7) construction of gene networks from random multigraphs, and 8) design of new parallel algorithms for data mining. High-dimensional optimization is a common thread enabling all of these applications. Our previous research on optimization has demonstrated the efficacy of four fundamental ideas, namely, penalized estimation, coordinate descent, the MM (majorization-minimization) principle, and separation of parameters. These ideas also propel parallel computing. Implementation of our demonstration projects on GPUs will require the production of subroutines of considerable general value in computational statistics. We intend to release our toolbox libraries to the open source community, including C/C++, Fortran, and R software wrappers. This may lead to a multiplier effect that will improve the computing climate in many disciplines throughout the health and physical sciences. All other application programs produced under this proposal will be freely distributed to the scientific community. Our record of producing and distributing usable software with superior documentation shows our commitment to this philosophy.

Public Health Relevance

The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, our ability to collect genetic information has currently far outstripped our ability to make use of this information in understanding the basis of disease and human diversity. Our aim is to develop, implement, and freely distribute new, more efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers'ability to map and characterize genes that lead to human diseases and to trait variation.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG006139-02
Application #
8324508
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Bonazzi, Vivien
Project Start
2011-08-26
Project End
2015-06-30
Budget Start
2012-07-01
Budget End
2013-06-30
Support Year
2
Fiscal Year
2012
Total Cost
$359,174
Indirect Cost
$104,401
Name
University of California Los Angeles
Department
Genetics
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Suchard, Marc A; Lemey, Philippe; Baele, Guy et al. (2018) Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol 4:vey016
Ho, Lam Si Tung; Xu, Jason; Crawford, Forrest W et al. (2018) Birth/birth-death processes and their computable transition probabilities with biological applications. J Math Biol 76:911-944
Tolkoff, Max R; Alfaro, Michael E; Baele, Guy et al. (2018) Phylogenetic Factor Analysis. Syst Biol 67:384-399
Crawford, Forrest W; Ho, Lam Si Tung; Suchard, Marc A (2018) Computational methods for birth-death processes. Wiley Interdiscip Rev Comput Stat 10:
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor et al. (2018) Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza. Stat Med 37:195-206
Dudas, Gytis; Carvalho, Luiz Max; Bedford, Trevor et al. (2017) Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature 544:309-315
Keys, Kevin L; Chen, Gary K; Lange, Kenneth (2017) Iterative hard thresholding for model selection in genome-wide association studies. Genet Epidemiol 41:756-768
Baele, Guy; Lemey, Philippe; Rambaut, Andrew et al. (2017) Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST. Bioinformatics 33:1798-1805
Zhang, Yiwen; Zhou, Hua; Zhou, Jin et al. (2017) Regression Models For Multivariate Count Data. J Comput Graph Stat 26:1-13
Baele, Guy; Suchard, Marc A; Rambaut, Andrew et al. (2017) Emerging Concepts of Data Integration in Pathogen Phylodynamics. Syst Biol 66:e47-e65

Showing the most recent 10 out of 85 publications