This project targets the design, development and distribution of Bayesian statistical algorithms and software for analysis of massive amounts of sequence and phenotype data to study the emergence and spread of rapidly evolving pathogens. To this end, the project will foster novel phylogeographic models to explore mechanisms of disease spread and ecological barriers, and will invent and implement data integration techniques for genotypic and phenotypic evolution to study, for example, antigenic competition between the host and pathogen. Finally, the project will develop high-performance statistical tools to learn from such Big Data in molecular epidemiology. Statistical computing techniques will include massive parallelization extensions for existing parametric models at the genome-scale and original non-parametric inference tools.

Combating pathogen spread and their associated disease burden is a tremendous challenge requiring sustained research effort and decided public health measures, and the availability of genomic data provides a major asset in characterizing these pathogens. What remains lacking is a marriage of statistical thinking and evolutionary biology to integrate these data through phylogenetic reconstructions with geographic sampling information and pathogen phenotypic and epidemiological dynamics. As one of the most pressing problems in statistical phylogenetics, this project fills this gap. Statistical advances will be widely disseminated through timely upgrades to currently popular software and new low-level libraries. The massively parallel algorithms and resulting software from this project will enable their deployment across a rapidly expanding range of large-scale problems in statistics and medicine. Results will also be provided in accessible, peer-reviewed articles that describe these advances with reference to genomics and, more generally, to other Big Data problems in medicine. Finally, the research will also provide training to graduate, undergraduate, and high school students from groups underrepresented in the sciences.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1264153
Program Officer
Junping Wang
Project Start
Project End
Budget Start
2013-09-15
Budget End
2019-08-31
Support Year
Fiscal Year
2012
Total Cost
$1,530,985
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095