This award will provide funds to the Statistics Department of the University of California, Berkeley, in order to create a computing environment that will support the research of several faculty, postdoctoral candidates, and graduate students. The equipment is critical to the rapid processing of large-scale data and simulations in support of projects in bioinformatics, environmental modeling and neuroscience. These research projects utilize algorithms and methodologies that require multiple compute processors and large memory.

Examples of the expected outcomes of the research that will be carried out with the aid of these computers are: prediction models for soil moisture fields under current and future climatic conditions; stochastic models for the dependence of fluid flow on the geometry of rock fractures; new methodologies for mapping ground-level aerosol in large metropolitan areas based on satellite data; the creation of an automated disease diagnosis system from data in public repositories; and further understanding the role different regions in the brain plays in processing visual information.

Project Report

Funding under the SCREMS program enabled the researchers to purchase computing equipment to carry out their research. The majority of the funding was used to purchase three 4-node compute servers. The methodologies, findings, and computational techniques that were developed using this computing equipment enabled advancements in theoretical and applied statistics. The theoretical research included the study of properties of hierarchical multi-label classification, lasso regression, high-dimensional modeling including sparse modeling, sparse clustering, bootstrap regression, and computer simulation design. Researchers used the compute clusters to carry out extensive simulation studies to evaluate the performance of the proposed statistical methods. The applied research carried out under this grant was in the areas of astronomy, bioinformatics, biological technology, climate science, and neuroscience. With the aid of the compute cluster, large and complex data from these various fields of application were modeled and analyzed. For example, advances in biological technology have fueled the generation of enormous amounts of complex data. The large size and novel forms of these datasets urgently call for the development of high-dimensional statistical methodologies and computational analytic tools. One field that has been rapidly transformed by the availability of massive and complex data is systems biology, where high-throughput technologies have enabled measurements of biological processes at genomic levels. Research carried out with assistance from this grant has focused on analyzing these high-dimensional, intrinsically noisy data using methods from statistics and applied probability in order to extract information on how a living system functions. Another example of research supported by this grant is in climate science, where a Bayesian hierarchical model for surface wind fields over the globe was constructed. Surface winds are intrinsically multivariate with spatially heteroscedastic behavior over the globe. This research project developed an innovative model of wind fields at the global scale over land and sea. Motivated by the geostrophic relationship, a varying coefficient model was fitted to wind fields using the pressure gradient. Additionally, a project in neuroscience fitted high-dimensional models of brain activity in the visual cortex. This work required fitting thousands of high-dimensional regression models, one for each voxel/grid-point in an fMRI experiment, which would be infeasible without the resources of a cluster computer because the matrices involved would not fit into memory on a personal computer and because the cluster allows for parallelization of the model fitting over the cells in the fMRI three-dimensional grid.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1026441
Program Officer
Jennifer Pearl
Project Start
Project End
Budget Start
2010-09-15
Budget End
2014-08-31
Support Year
Fiscal Year
2010
Total Cost
$101,213
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94710