The investigators study a new class of statistical methods for learning time series and graphical models. Their approach is based on spectral analysis and matrix decomposition methods that have enjoyed tremendous success in applications, but their use in graphical models has drawn less attention. The goal of this investigation is to extend the enormous previous successes of matrix decomposition methods to the realm of more complicated time series and certain graphical models, which will lead to new statistical machine learning algorithms with important practical applications.

In the information age, an important measure of computer intelligence is the ability to analyze huge amount of data that become available electronically, and make critical decisions under uncertain environment. Statistical machine learning is the main technique for analyzing electronic data, and graphical models are mathematical tools for understanding these complex data both by computer systems and by human operators in order to facilitate decision making. However, traditional algorithms for learning graphical models have limitations that restrict capabilities of modern computing systems. The current research attempts a new class of mathematical algorithms that can be used to design more effective graphical models, which in turn allows modern computers to analyze data more accurately and achieve higher level of intelligence.

Project Report

Fast linear methods were created for estimating hidden state time series models. These have been extended to include some forms of factorial hidden state models. For example, if a very slow Markov model and a much faster one are both used to generate an observed process, then we have developed a method that will find both hidden processes and hence be able to use them to make forecasts about the future of the process. Other extensions were made to the usual spectral estimation methods to allow them to be used in a regression like setting. These methods have been applied to a variety of data sets. The primary focus has been on linguistic data such as named entity recognition and other NLP tasks. But neural data has also been successfully analyzed and we have had some success in very high frequency financial data. In related work, we have extended these methods to work on trees. In particular, we have made them work with both traditional parse trees and dependency parse trees. In both cases, we have generated estimation methods which are competitive with existing methods. The new methods run up to 10 to 100 times faster and hence allow analyzing much more data.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1018651
Program Officer
Todd Leen
Project Start
Project End
Budget Start
2010-08-01
Budget End
2013-07-31
Support Year
Fiscal Year
2010
Total Cost
$224,998
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104