The goal of this project is to make possible database searching in a privacy-constrained manner: A private database provider allows only properly authorized searches by clients, in a manner that does not reveal the search criteria yet enforces the requirement that the client learns only what is authorized by the search. The initial focus will be on techniques for the case of exact matches, later extended to the much more difficult case of approximate matching. If multiple matches are found, either all of them are returned, or a subset of the "best" of them, under appropriately defined notions of quality, is returned; in approximate matching there is a natural notion of quality, namely, having smaller distance to the target of the search as specified by the query. The technical challenges include verification of the validity of a search request, and then carrying out the search, in manner that enforces the search's authorized criteria without revealing them. The project holds the promise of leading to substantial improvements in the highly unsatisfactory current "state of the practice" for searches carried out on private and sensitive databases, that unnecessarily reveal too much information and prevent useful collaborations from taking place due to concerns over the leakage of sensitive information. The minimal-disclosure feature of the protocols will also make possible a de facto "defense in depth", in that a compromised server will no longer automatically imply the compromise of all the clients' interactions with that server.
The main technical goal of this project is to provide techniques for privacy-constrained searching. At a high level, we want to enable a party to search a remote database in a private and constrained manner, i.e., so that no party carries out unauthorized queries or learns what they are not supposed to know. Challenges include how to ensure that queries to the data are properly authorized and can be performed efficiently. The framework includes situations where the remote server that stores the data is not trusted with learning its contents, and cannot be relied on to faithfully carry out the client's remote query (in which case the cheating needs to be detected). This makes the work of relevance to cloud computing, because one of the major impediments to larger-scale use of cloud services is concern for confidentiality of the data and the queries carried out on it: Whether the server is inside national borders can matter, as can the citizenship and security clearance of its system administrators, etc. To obtain more assurance about the data’s treatment, clients often use higher-grade (hence more expensive) cloud services, e.g., Google Apps provides such Premium cloud services with segregated systems for US government customers and meets their unique needs, such as for data to be stored in the US only. Our work can achieve confidentiality using the existing commercial cloud services, i.e., without the abovementioned premium cloud services. The notion of efficiency in this framework includes the usual minimization of the number of rounds, amount of computation and storage, but also requires that processing a search query should place a light burden on the client because the client is typically weaker than the remote server. The main educational goal of the proposal is to produce individuals with a unique combination of talents, who are ideally suited apply the considerable power of security and online interaction technologies to societal needs. The main results of this project include: 1. Privacy-preserving searching involves comparisons between different items, and the most important of all distance metrics between two strings is the edit distance for comparing two sequences, which is a building block to outsourcing senstive genomic computations. We made substantial progress on the problem of secure outsourcing of sequence comparisons by a client to untrusted remote servers. 2. We designed protocols to outsource comparisons in a trusted manner. This is a useful building block for verifying the results of third party storage. 3. We have designed protocols in the pre-computational model for various problems including: i) item existence queries, ii) rank queries, iii) one and two dimensionalrange queries, iv) nearest neighbor queries (in one and two dimensions),and v) word retrieval queries.