With computational demands in genetics growing exponentially, concerns are rising whether traditional CPUs can deliver the needed computing power. Parallel computing has been touted for several years, but massively parallel CPU computers are enormously expensive and limited to a few national centers. Graphics processing units (GPUs) offer a far cheaper and more distributed solution. Hundreds of these units are fabricated on a single card, and several cards fit inside a desktop computer. Thus, cheap hardware currently exists that promises a hundred-fold speedup of many basic algorithms. Projections from the vendors of GPUs suggest that these devices will grow rapidly in computational power and versatility over the next decade. Thus, software development is the main hurdle hindering the exploitation of GPUs. This proposal targets this weak link in the chain of modern computing. Through a series of demonstration projects and the production of low-level software libraries, we hope to catalyze the spread of GPUs in genetics. The specific projects include: 1) eQTL mapping, 2) variance component models for QTL mapping, 3) genotype and haplotype construction, 4) estimation of ethnic admixture, 5) isoform discovery through RNA-Seq technology, 6) computation of genetic landscapes and clines, 7) construction of gene networks from random multigraphs, and 8) design of new parallel algorithms for data mining. High-dimensional optimization is a common thread enabling all of these applications. Our previous research on optimization has demonstrated the efficacy of four fundamental ideas, namely, penalized estimation, coordinate descent, the MM (majorization-minimization) principle, and separation of parameters. These ideas also propel parallel computing. Implementation of our demonstration projects on GPUs will require the production of subroutines of considerable general value in computational statistics. We intend to release our toolbox libraries to the open source community, including C/C++, Fortran, and R software wrappers. This may lead to a multiplier effect that will improve the computing climate in many disciplines throughout the health and physical sciences. All other application programs produced under this proposal will be freely distributed to the scientific community. Our record of producing and distributing usable software with superior documentation shows our commitment to this philosophy.

Public Health Relevance

The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, our ability to collect genetic information has currently far outstripped our ability to make use of this information in understanding the basis of disease and human diversity.
Our aim i s to develop, implement, and freely distribute new, more efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers'ability to map and characterize genes that lead to human diseases and to trait variation.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG006139-01
Application #
8085977
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Bonazzi, Vivien
Project Start
2011-08-26
Project End
2015-06-30
Budget Start
2011-08-26
Budget End
2012-06-30
Support Year
1
Fiscal Year
2011
Total Cost
$359,971
Indirect Cost
Name
University of California Los Angeles
Department
Genetics
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Baele, Guy; Lemey, Philippe; Rambaut, Andrew et al. (2017) Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST. Bioinformatics 33:1798-1805
Dudas, Gytis; Carvalho, Luiz Max; Bedford, Trevor et al. (2017) Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature 544:309-315
Baele, Guy; Suchard, Marc A; Rambaut, Andrew et al. (2017) Emerging Concepts of Data Integration in Pathogen Phylodynamics. Syst Biol 66:e47-e65
Zhou, Hua; Blangero, John; Dyer, Thomas D et al. (2017) Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data. Genet Epidemiol 41:174-186
Zhang, Yiwen; Zhou, Hua; Zhou, Jin et al. (2017) Regression Models For Multivariate Count Data. J Comput Graph Stat 26:1-13
Ho, Lam Si Tung; Xu, Jason; Crawford, Forrest W et al. (2017) Birth/birth-death processes and their computable transition probabilities with biological applications. J Math Biol :
Keys, Kevin L; Chen, Gary K; Lange, Kenneth (2017) Iterative hard thresholding for model selection in genome-wide association studies. Genet Epidemiol 41:756-768
Shringarpure, Suyash S; Bustamante, Carlos D; Lange, Kenneth et al. (2016) Efficient analysis of large datasets and sex bias with ADMIXTURE. BMC Bioinformatics 17:218
Schuemie, Martijn J; Hripcsak, George; Ryan, Patrick B et al. (2016) Robust empirical calibration of p-values using observational data. Stat Med 35:3883-8
Shaddox, Trevor R; Ryan, Patrick B; Schuemie, Martijn J et al. (2016) Hierarchical Models for Multiple, Rare Outcomes Using Massive Observational Healthcare Databases. Stat Anal Data Min 9:260-268

Showing the most recent 10 out of 79 publications