This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
The project investigates statistical software analysis, which infers relationships among program components by using statistical properties derived from multiple program executions.
To motivate statistical techniques, it is useful to draw analogies to static analysis methods. Static analysis is about inferring dependencies between program components: If a value is changed in one component, how does that affect a value in a different component? Static analysis tends to work best for properties that are local, meaning the pieces of the program we are trying to relate are not separated by a great deal of other computation. The statistical analog of dependencies is correlation. Instead of proving definitively via static reasoning the presence or absence of dependencies, we can observe at run-time that some properties of two components have high or low correlation. Importantly, correlation is not affected by syntactic or even dynamic locality: if two components have a correlation, regardless of how much time or computation passes between the execution of one component and the execution of the other, this correlation can be detected if the appropriate statistical question is asked.
The initial focus is on using cross-correlation, which which computes the maximum correlation between two sequences of observations, to formalize statistical correlation between software components that have a direction in time. This idea gives rise to a natural graph that captures the strength and direction of statistical influence one component has upon another; these graphs are analogous to traditional dependency graphs, but have unique and useful properties.