With computational demands in genetics growing exponentially, concerns are rising whether traditional CPUs can deliver the needed computing power. Parallel computing has been touted for several years, but massively parallel CPU computers are enormously expensive and limited to a few national centers. Graphics processing units (GPUs) offer a far cheaper and more distributed solution. Hundreds of these units are fabricated on a single card, and several cards fit inside a desktop computer. Thus, cheap hardware currently exists that promises a hundred-fold speedup of many basic algorithms. Projections from the vendors of GPUs suggest that these devices will grow rapidly in computational power and versatility over the next decade. Thus, software development is the main hurdle hindering the exploitation of GPUs. This proposal targets this weak link in the chain of modern computing. Through a series of demonstration projects and the production of low-level software libraries, we hope to catalyze the spread of GPUs in genetics. The specific projects include: 1) eQTL mapping, 2) variance component models for QTL mapping, 3) genotype and haplotype construction, 4) estimation of ethnic admixture, 5) isoform discovery through RNA-Seq technology, 6) computation of genetic landscapes and clines, 7) construction of gene networks from random multigraphs, and 8) design of new parallel algorithms for data mining. High-dimensional optimization is a common thread enabling all of these applications. Our previous research on optimization has demonstrated the efficacy of four fundamental ideas, namely, penalized estimation, coordinate descent, the MM (majorization-minimization) principle, and separation of parameters. These ideas also propel parallel computing. Implementation of our demonstration projects on GPUs will require the production of subroutines of considerable general value in computational statistics. We intend to release our toolbox libraries to the open source community, including C/C++, Fortran, and R software wrappers. This may lead to a multiplier effect that will improve the computing climate in many disciplines throughout the health and physical sciences. All other application programs produced under this proposal will be freely distributed to the scientific community. Our record of producing and distributing usable software with superior documentation shows our commitment to this philosophy.

Public Health Relevance

The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, our ability to collect genetic information has currently far outstripped our ability to make use of this information in understanding the basis of disease and human diversity. Our aim is to develop, implement, and freely distribute new, more efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers'ability to map and characterize genes that lead to human diseases and to trait variation.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG006139-02
Application #
8324508
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Bonazzi, Vivien
Project Start
2011-08-26
Project End
2015-06-30
Budget Start
2012-07-01
Budget End
2013-06-30
Support Year
2
Fiscal Year
2012
Total Cost
$359,174
Indirect Cost
$104,401
Name
University of California Los Angeles
Department
Genetics
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Molak, Martyna; Suchard, Marc A; Ho, Simon Y W et al. (2015) Empirical calibrated radiocarbon sampler: a tool for incorporating radiocarbon-date and calibration error into Bayesian phylogenetic analyses of ancient DNA. Mol Ecol Resour 15:81-6
Zhou, Hua; Wu, Yichao (2014) A Generic Path Algorithm for Regularized Statistical Estimation. J Am Stat Assoc 109:686-699
Zhou, Hua; Li, Lexin (2014) Regularized matrix regression. J R Stat Soc Series B Stat Methodol 76:463-483
Lange, Kenneth; Chi, Eric C; Zhou, Hua (2014) A Brief Survey of Modern Optimization for Statisticians. Int Stat Rev 82:46-70
Bouckaert, Remco; Heled, Joseph; K├╝hnert, Denise et al. (2014) BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol 10:e1003537
Bielejec, Filip; Lemey, Philippe; Baele, Guy et al. (2014) Inferring heterogeneous evolutionary processes through time: from sequence substitution to phylogeography. Syst Biol 63:493-504
Nylinder, Stephan; Lemey, Philippe; De Bruyn, Mark et al. (2014) On the biogeography of Centipeda: a species-tree diffusion approach. Syst Biol 63:178-91
Nunes, Marcio R T; Palacios, Gustavo; Faria, Nuno Rodrigues et al. (2014) Air travel is associated with intracontinental spread of dengue virus serotypes 1-3 in Brazil. PLoS Negl Trop Dis 8:e2769
Lemey, Philippe; Rambaut, Andrew; Bedford, Trevor et al. (2014) Unifying viral genetics and human transportation data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog 10:e1003932
Lange, Kenneth; Papp, Jeanette C; Sinsheimer, Janet S et al. (2014) Next Generation Statistical Genetics: Modeling, Penalization, and Optimization in High-Dimensional Data. Annu Rev Stat Appl 1:279-300

Showing the most recent 10 out of 32 publications