With computational demands in genetics growing exponentially, concerns are rising whether traditional CPUs can deliver the needed computing power. Parallel computing has been touted for several years, but massively parallel CPU computers are enormously expensive and limited to a few national centers. Graphics processing units (GPUs) offer a far cheaper and more distributed solution. Hundreds of these units are fabricated on a single card, and several cards fit inside a desktop computer. Thus, cheap hardware currently exists that promises a hundred-fold speedup of many basic algorithms. Projections from the vendors of GPUs suggest that these devices will grow rapidly in computational power and versatility over the next decade. Thus, software development is the main hurdle hindering the exploitation of GPUs. This proposal targets this weak link in the chain of modern computing. Through a series of demonstration projects and the production of low-level software libraries, we hope to catalyze the spread of GPUs in genetics. The specific projects include: 1) eQTL mapping, 2) variance component models for QTL mapping, 3) genotype and haplotype construction, 4) estimation of ethnic admixture, 5) isoform discovery through RNA-Seq technology, 6) computation of genetic landscapes and clines, 7) construction of gene networks from random multigraphs, and 8) design of new parallel algorithms for data mining. High-dimensional optimization is a common thread enabling all of these applications. Our previous research on optimization has demonstrated the efficacy of four fundamental ideas, namely, penalized estimation, coordinate descent, the MM (majorization-minimization) principle, and separation of parameters. These ideas also propel parallel computing. Implementation of our demonstration projects on GPUs will require the production of subroutines of considerable general value in computational statistics. We intend to release our toolbox libraries to the open source community, including C/C++, Fortran, and R software wrappers. This may lead to a multiplier effect that will improve the computing climate in many disciplines throughout the health and physical sciences. All other application programs produced under this proposal will be freely distributed to the scientific community. Our record of producing and distributing usable software with superior documentation shows our commitment to this philosophy.

Public Health Relevance

The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, our ability to collect genetic information has currently far outstripped our ability to make use of this information in understanding the basis of disease and human diversity. Our aim is to develop, implement, and freely distribute new, more efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers'ability to map and characterize genes that lead to human diseases and to trait variation.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG006139-02
Application #
8324508
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Bonazzi, Vivien
Project Start
2011-08-26
Project End
2015-06-30
Budget Start
2012-07-01
Budget End
2013-06-30
Support Year
2
Fiscal Year
2012
Total Cost
$359,174
Indirect Cost
$104,401
Name
University of California Los Angeles
Department
Genetics
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Shringarpure, Suyash S; Bustamante, Carlos D; Lange, Kenneth et al. (2016) Efficient analysis of large datasets and sex bias with ADMIXTURE. BMC Bioinformatics 17:218
Heintzman, Peter D; Froese, Duane; Ives, John W et al. (2016) Bison phylogeography constrains dispersal and viability of the Ice Free Corridor in western Canada. Proc Natl Acad Sci U S A 113:8057-63
Crawford, Forrest W; Stutz, Timothy C; Lange, Kenneth (2016) Coupling bounds for approximating birth-death processes by truncation. Stat Probab Lett 109:30-38
Baele, Guy; Lemey, Philippe; Suchard, Marc A (2016) Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty. Syst Biol 65:250-64
Zhou, Hua; Zhou, Jin; Hu, Tao et al. (2016) Genome-wide QTL and eQTL analyses using Mendel. BMC Proc 10:239-244
Clark, Michelle M; Blangero, John; Dyer, Thomas D et al. (2016) The Quantitative-MFG Test: A Linear Mixed Effect Model to Detect Maternal-Offspring Gene Interactions. Ann Hum Genet 80:63-80
Zhou, Hua; Blangero, John; Dyer, Thomas D et al. (2016) Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data. Genet Epidemiol :
Zhou, Jin J; Hu, Tao; Qiao, Dandi et al. (2016) Boosting Gene Mapping Power and Efficiency with Efficient Exact Variance Component Tests of SNP Sets. Genetics :
Crawford, Forrest W; Weiss, Robert E; Suchard, Marc A (2015) SEX, LIES AND SELF-REPORTED COUNTS: BAYESIAN MIXTURE MODELS FOR HEAPING IN LONGITUDINAL COUNT DATA VIA BIRTH-DEATH PROCESSES. Ann Appl Stat 9:572-596
Zhou, Hua; Lange, Kenneth (2015) Path Following in the Exact Penalty Method of Convex Programming. Comput Optim Appl 61:609-634

Showing the most recent 10 out of 68 publications