This project is developing novel statistical data mining methods that will help scientists to explore and understand large observational data sets. A specific focus of this research is on developing algorithms that can extract, from large data sets, models of dynamic behavior of objects over time, with an emphasis on selected data-driven problems in biology and geoscience. Examples include automated discovery of genetic regulatory mechanisms from expression measurements over time, and clustering and prediction of cyclone behavior over time. Statistical learning principles are being used to guide algorithm development and to produce publicly-available software tools. An educational component of this project is leading to an increased awareness among students of the important role of computer science and statistics in data-driven science.
The results from this project have the potential for significant and broad impact in the primary focus areas of geoscience and biology, as well as in other scientific and engineering areas involving large observational data sets from dynamic processes. In the geosciences, the new algorithms can yield improved modeling and prediction of extra-tropical and tropical cyclones, reducing the socio-economic risks associated with cyclonic events and potentially provide valuable clues about possible climate change. In the biosciences, improved understanding of gene regulatory mechanisms (obtained via new network discovery algorithms) can provide the basis for significant advances in systems biology and medicine, such as the identification of the regulatory mechanisms for cancer-related genes and resultant development of gene-specific medical treatments.