A primary goal of machine learning is to have computers "learn" from data, and to make predictions based on what they have learned. Machine learning has been used in many applications, such as identification of spam emails, detection of suspicious computer network traffic, and detection of malignant tumors. The use of machine learning is based on the implicit assumption that there is a mathematical function that describes, with some accuracy, the relation between the inputs to the prediction problem and the correct prediction. The function is not arbitrary; instead, it is of a certain restricted type. However, not all types of functions are efficiently learnable. Also, learnability depends crucially on the type of data that is available.

This project focuses on the learnability of Boolean functions, a central topic in computational learning theory. The research in this project falls into three main categories: learning from random examples, learning with costs, and DNF learning and minimization. Problems in the first category address core open questions in the standard PAC learning model and explore the extent to which access to data from different probability distributions can aid in learning. Problems in the second category are motivated by concrete problems in protein engineering, databases, and cyber-security, where there are costs associated with determining the value of inputs, or in obtaining data. The third category concerns problems of properly learning DNF formulas using DNF hypotheses, related complexity theoretic problems concerning DNF minimization, and problems concerning the complexity of certificates of DNF size.

Broadly, this project seeks to expand our understanding of which types of functions are efficiently learnable by computers, and under what conditions. The research on learning with costs can yield advances in the application areas that motivate it. DNF minimization is a central problem in both complexity theory and in the design of logic circuits; the research on DNF has the potential for impact in both these areas.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0917153
Program Officer
Balasubramanian Kalyanasundaram
Project Start
Project End
Budget Start
2009-08-01
Budget End
2014-07-31
Support Year
Fiscal Year
2009
Total Cost
$338,698
Indirect Cost
Name
Polytechnic University of New York
Department
Type
DUNS #
City
Brooklyn
State
NY
Country
United States
Zip Code
11201