This research involves a detailed study of the connections between Cryptography and Computational Learning Theory. Cryptography is about manipulating information in order to achieve confidentiality, integrity, privacy, etc., while learning theory is about efficiently extracting information from some unknown object. Learning theory provides a rigorous basis for the practically important field of machine learning, and cryptography plays a similar role for the crucial area of computer security. In this research the investigators work to obtain new cryptographic results based on the presumed hardness of various problems in computational learning theory, and work to obtain new learning results via cryptography, thus extending and deepening the current understanding of both areas and the connections between them. The research is integrated with a plan to achieve broader impact through education by developing an advanced course on the connections between cryptography and learning theory at Columbia University and advising and guiding a diverse group of graduate students in their development as researchers and educators.

In more detail, the research involves (i) constructing and applying new cryptographic primitives, such as public-key cryptosystems and pseudorandom generators with very low circuit complexity, from learning problems that are widely believed to be hard; (ii) continuing ongoing work on exploring the average-case learnability of various well-studied concept classes such as decision trees and DNF formulas; (iii) applying computational hardness of learning to establishing computational hardness of learning for various Boolean function classes, using tools from cryptography; (v) working to obtain computational separations between pairs of well-studied learning models by showing that learning problems that have polynomial-time algorithms in one model are intractable (under a cryptographic assumption) in the other model; and (vi) exploring the foundational issue of what are the minimal assumptions required to prove computational hardness of learning.

Project Report

Our project addressed the interface between cryptography and learning theory. Roughly speaking, cryptography is about manipulating and encoding information so that it is difficult to reconstruct the original information (for instance, so that sensitive information like a credit card number can be sent securely over the Internet). Learning theory is about efficiently extracting information from an unknown object (such as learning a underlying prediction rule that explains a large data set). So learning theory and cryptography can be viewed as "two sides of the same coin" -- if a scheme for processing information is cryptographically secure then it is hard to learn the underlying information, while on the other hand any time we have an efficient learning algorithm it means that there is a provable violation of cryptographic security. One of our major research findings was establishing a new connection between cryptography and learnability for an important class of functions that are widely studied in computer science, called "monotone functions". A monotone function is one for which the output of the function always increases whenever the input to the function increases. So for example, the function f(x) = x^2 - 4x + 4 is not monotone (because increasing the input from 1 to 2 causes the output of the function to go from 1 to 0), but the function f(x) = 3x + 4 is monotone (because increasing x always causes f(x) to increase). It's clear that if a function is monotone then it is at least a little bit predictable, since we know that increasing the input to such a function can only cause the function to increase and never to decrease. But how much help does monotonicity really give us if we are trying to learn an unknown function to high accuracy? Are monotone functions always easy to learn? This is an important question because many real-world phenomena that scientists are interested in modelling correspond to monotone functions. For example, the probability of developing a disease might plausibly be a monotone function of exposure to various environmental toxins (more exposure means a higher likelihood of developing the disease), but we don't know the precise nature of this relationship. If we had efficient and general ways to learn relationships like this which correspond to monotone functions, it could be a very useful tool. Versions of this question have been studied for quite a while in learning theory, and preliminary evidence before our work suggested that monotone functions might be much easier to learn than nonmonotone functions. We used tools from cryptography to show that in fact, for many learning problems monotonicity is not very helpful if we want to learn an unknown function to high accuracy. To put it another way, while monotoncity does provide a certain "weak" form of predictability (because increasing inputs always imply increasing outputs), there are relatively simple monotone functions which are almost as hard to learn to high accuracy as nonmonotone functions. While this particular finding of ours was "bad news" in terms of coming up with efficient learning algorithms, there are some bright sides. For one thing, knowing what we *can't* do is important for guiding research efforts -- our findings may save future researchers from wasting time on to develop efficient learning algorithms for problems which we now know provably don't have such algorithms. Another positive aspect is that with our new understanding of how monotone functions can be "cryptographically hard", we or other researchers may be able to build new cryptographic schemes for securely encoding and communicating information using monotone functions. Since monotone functions are often simpler or more efficient to compute than nonmonotone functions, this could lead to simpler, faster and better schemes for cryptographically secure communication in the future.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0716245
Program Officer
Nina Amla
Project Start
Project End
Budget Start
2007-09-01
Budget End
2011-08-31
Support Year
Fiscal Year
2007
Total Cost
$396,236
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027