Modern science, especially biochemistry, has become dependent on numerical analysis of large amounts of data generated in most every experiment. Scientific advancement in biology and in understanding disease pathogenesis will likely depend on the analysis of the huge corpus of biomolecular data (eg. microarray, RNA and DNA sequence data). This advancement is linked to the field's ability to continue developing statistical methodologies capable of identifying a robust ``signal'' which can be reproducibly identified in multiple experiments all of which generate noisy data. The PI has shown how the theoretical framework of spectral analysis with Markov chains unifies several statistical methods for identifying structure in data that is observed with noise: discrete Fourier analysis, correspondence analysis, principle components analysis, as well as spectral clustering. This unifying framework also provides insight into, and generalization of, the more traditional methods listed above. Therefore, the PI's proposed research has two major directions. In one direction, it will continue basic methodological development of exploratory data analysis with a focus on methods capable of identifying biological signals observed in noisy experimental conditions. In another, it will focus on rigorous statistical analysis of this methodology which is in wide use in statistics, computer science and bioinformatics.

Statistical methods developed here will be particularly aimed at the study of cellular regulation of gene and protein expression. These cellular mechanisms have wide ranging importance in understanding human disease including cancer and infectious disease. The data analytic methods developed under this grant will be implemented and made publicly available through Bioconductor, a package in R. The broad goal of this proposal is to work towards providing a methodological unification of methods in statistics, biology and computer science to biomolecular data. Thus, it falls roughly into the field of bioinformatics.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0940077
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2009-03-01
Budget End
2012-07-31
Support Year
Fiscal Year
2009
Total Cost
$113,883
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Palo Alto
State
CA
Country
United States
Zip Code
94304