This project will develop machine learning algorithms, software tools, and supporting theory for solving structural supervised learning problems. Existing theory and algorithms have focused on learning an unknown function from training examples, where the unknown function maps from a feature vector to one of a small number of classes. Emerging applications in science and industry require learning much more complex functions that map from complex input spaces (e.g., 2-dimensional maps, time series, sequences, and graphs) to complex output spaces (e.g., other 2-dimensional maps, time series, sequences, and graphs). Examples include detecting fraudulent transactions in a transaction sequence, assigning parts of speech (noun, verb, etc.) to each word in a sentence, identifying genes in DNA sequences, and assigning a land cover class to each pixel in a remote-sensed image. However, existing statistical, machine learning, and data mining packages do not provide any support for these complex tasks, nor has machine learning theory been developed to analyze these tasks. This project will develop a general formulation of the structural supervised learning (SSL) problem, design and test a collection of algorithms for solving SSL problems, and develop a prototype system that will provide "off -the-shelf " tools for practitioners to develop SSL applications.

This project will have several broader impacts. First, many important problems confronting society involve finding patterns in sequential, spatial, or structural data. These include (a) law enforcement challenges, such as detecting theft of credit cards and health insurance fraud, (b) security challenges, such as detecting attempts by terrorists to send bombs in shipping containers, (c) health and safety applications, such as detecting outbreaks of diseases from temporal and spatial data about emergency room admissions, and (d) ecological applications, such as monitoring the health of ecosystems by analyzing sequences of remote-sensed images. The tools and techniques developed in this work can address all of these problems. Second, the project will train graduate students, including women (who are underrepresented in computer science), and provide them opportunities to attend scientific conferences and workshops.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0307592
Program Officer
Daniel F. DeMenthon
Project Start
Project End
Budget Start
2003-06-01
Budget End
2007-11-30
Support Year
Fiscal Year
2003
Total Cost
$464,176
Indirect Cost
Name
Oregon State University
Department
Type
DUNS #
City
Corvallis
State
OR
Country
United States
Zip Code
97331