The goal of the proposed research is to extend the reach of machine learning technology so that it can enable computers to perform a broader span of tasks than currently. In particular the goal is to enable machines to extract knowledge from data into a form such that robust reasoning can be done on it. Currently representations on which reasoning can be done on a large scale are typically programmed and the results suffer from brittleness - they do not behave well in unforeseen situations. In the proposed approach, which is called knowledge infusion, the goal is to acquire the rules on which reasoning will be done by a combination of programming and learning, and to have a continuous process of learning and checking against an environment to ensure that the rules are reliable. The goal is to handle unaxiomatized or commonsense knowledge, which encompasses the bulk of knowledge that humans handle everyday, as embodied in speech or text, replete as these often are with inconsistencies, ambiguities and errors. This can be distinguished from knowledge that is known to be axiomatizable, such as most knowledge of a mathematical nature. Axiomatized knowledge, in general, can be easily programmed, and computers can usually fully exploit such knowledge up to any inherent computational complexity limitations of the problem at hand.

The most central aims of the research are the development of algorithms that realize knowledge infusion and are computationally efficient and effective even for very large datasets. Also central is the identification of what the fundamental limits of the phenomenon are. The techniques used will be from theoretical computer science, and experimentation on large datasets will be carried out as needed.

The goal is to be able to infuse into machines commonsense knowledge about the world on a large scale and in a way such that the machines will be able to reason with it with a controlled level of robustness. Success in this endeavor can be expected to have applications in almost all areas of computing that involve either human interaction with a computer, or computation on data that was generated by or has reference to humans. Hence there are numerous connections with the national priority areas EVS and NHS, and with the technical focus areas int and dmc.

Broader Impact:

If successful the results of the research will help enhance the effectiveness of computers to handle commonsense or unaxiomatized information about the world. This would extend the usefulness of computers to new areas and contribute to prosperity (EVS). It would also enable large datasets to be analyzed automatically with greater functionality than hitherto (NHS).

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0427129
Program Officer
Balasubramanian Kalyanasundaram
Project Start
Project End
Budget Start
2004-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2004
Total Cost
$908,049
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138