The primary aim of this project is to develop the concepts, methods and algorithms to extract information about the nature of a joint distribution in high dimensional space from samples generated by Monte Carlo algorithms. The approach will be based on the equi-energy sampling approach recently developed by the principle investigator's group under prior NSF support. The secondary aim of this project is to implement Monte Carlo sampling methods based on radically new hardware and computation model, such as those based on field programmable gate arrays. Some of the Monte Carlo methods developed in the primary aim will be implemented using these new architectures, in order to enhance our ability to solve hard inference problem by Monte Carlo computation. This project will provide interdisciplinary training of next generation scientists working at the interface of statistics, computation and biology.

By developing methods to study the shape, topology and entropy for the posterior distribution, this research will provide fundamental tools for Bayesian inference. As such, it will have impact on numerous application areas ranging from computational biology to economic analysis. Beyond Bayesian statistics, this research will also have impact on other scientific areas that utilize Monte Carlo sampling to study a distribution, e.g. in equilibrium statistical physics where one is interested in understanding the energy landscape associated with a Boltzmann distribution. This project will provide interdisciplinary training of next generation scientists working at the interface of statistics, computation and biology.

Project Report

During this funding period (2013-2014), we have published the following papers: Arwen Meister, Ye Henry Li, Bokyung Choi, and Wing Hung Wong (2013). Learning a Nonlinear Dynamical System Model of Gene Regulation: A Perturbed Steady-State Approach. Annals of Applied Statistics, 7(3), 1311-1333. In this paper we proposed an new experimental design to collect data that are informative for mechanistic network reconstruction, and developed an associated statistical method for the estimation of the network based on such data. Junhee Seok; Lu Tian; Wing H. Wong. Density estimation on multivariate censored data with optional Pólya tree. Biostatistics. 2014;15(1):182-195 Chang-Hung Tsai, Tung-Yu Wu, Shu-Yu Hsu, Chia-Ching Chu, Fang-Ju Ku, Ying-Siao Liao, Chih-Lung Chen, Wing Hung Wong, Hsie-Chia Chang and Chen-Yi Lee, "A 7.11mJ/Gb/Query Data-Driven Machine Learning Processor (D2MLP) for Big Data Analysis and Applications," IEEE Symposium on VLSI Circuits, Jun. 2014 In this paper we implemented the Bayesian sequential partitioning method (developed in previous years under this project) in an energy-efficient application specific integrated circuit chip. We have also made progress on the revision of the following paper: Hui Jiang, Chao Du, Kun Yang, Wing Hung Wong (2013) Efficient computation of Optional Polya Tree. In this paper we developed approximate inference algorithms for multivariate density estimation based on the Optional Polya tree approach. These algorithms make it feasible to apply the method to large data sets. Finally, we developed the following software and made it freely available to the scientific community: fast-opt/smooth-opt: Computationally efficient multivariate density estimation package download site: http://web.stanford.edu/group/wonglab/software.html

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0906044
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2009-08-01
Budget End
2014-07-31
Support Year
Fiscal Year
2009
Total Cost
$1,012,007
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Palo Alto
State
CA
Country
United States
Zip Code
94304