This research explores challenges in developing privacy-preserving information networks and services (PPNs). Next generation healthcare information systems and applications, such as personalized and predictive medicine, need PPNs for privacy-preserving information sharing and dissemination among independent healthcare providers, enabling information access over distributed access controlled content, while safeguarding personal health information and medical privacy of individuals from unauthorized disclosures.

The intellectual merits of this research include the development of: (1) privacy-preserving search capabilities over distributed access controlled content, a critical functionality for PPNs; (2) a suite of utility-aware data anonymization services, preserving the privacy of personal medical information against unauthorized disclosure, at the same time maximizing the data utility for medical service providers; and (3) the PPN architecture and middleware optimized for high availability, scalability and failure recovery.

The broad impact is two-fold. First, this research will create better and broader understanding of the challenges and functional requirements for building the next generation of privacy preserving networked information systems over distributed access controlled content. A domain-specific proof-of-concept prototype on top of the PPN core will be developed for discovering and analyzing risk factors for resistant bacterial infections. These real-world studies will be conducted in collaboration with Morehouse School of Medicine and Children's Healthcare of Atlanta, and be use as both a driver and a testbed for this research. Second, this research will demonstrate that the PPN is an enabling infrastructure for real-time, continuous and on demand data analysis over massively-distributed and privately-shared data repositories.

Project Report

With unrelenting advances in information and communications technology, there is an increasing concern that privacy (of many forms) is disappearing (e.g., IEEE Spectrum cover article on "The Collapse of Internet Privacy", August 2014). At the same time, some of the promising technologies (e.g., homomorphic encryption for privacy-preserving computations and queries) have been hampered by efficiency concerns. In this somewhat pessimistic atmosphere, our research results show that there is hope, maybe even bright future, for building and providing privacy-preserving information services in the Internet. Our optimism is based on concrete technical advances that provide an efficient and generic software platform for important information-intensive application areas such as health care and social services, as well as many business and economic activities ranging from banking to electronic commerce. Granted, the classic definition of privacy as the amount of information shared has been increasingly difficult to achieve due to the increasing amount of data available on citizens. The abovementioned Spectrum cover article on browser fingerprinting describes the collection of detailed browsing data at IP address granularity, which can provide sufficient information to characterize and identify users through their browsing behavior. This trend to erode classic privacy is reinforced by the increasing capability of large corporation and nation-states to gather and integrate data from many sources. In contrast to this trend, our research results show that practically useful privacy-preserving algorithms and methods can be designed and implemented that provide appropriate privacy protection while offering significant querying capabilities efficiently. These privacy-preserving indexing algorithms (PPI, including variants such as ss-PPI and e-PPI) can be used as efficient and effective software platforms on which privacy-preserving information services are built. Technically, for some important classes of mission-critical data such as electronic medical records (EMR) that will enable personalized and predictive medicine, it is feasible to design and implement efficient privacy-preserving indexing algorithms capable of providing very useful information and at the same time preserve user privacy. By privacy preserving index we mean that the index construction process is designed to prevent the content privacy breaches. More specifically, this project has made three main contributions: (1) Developing a systematic approach to design and implementing privacy preserving indexing techniques for information networks that access controlled contents. (2) Development utility aware anonymization framework and toolkit for perturbation of sensitive data prior to publishing. (3) The integration and prototype development of privacy preserving information networks. We have incorporated our own anonymization algorithms, such as geometric data rotation algorithms, Sweeney's generalization-based k-anonymity algorithm, and Wisconsin's Incognito algorithm for k-anonymity and l-diversity. The findings and techniques developed in this project show that preserving privacy is a worthwhile and feasible goal, even as more data are becoming available (often called Big Data), eroding the absolute privacy in the classic sense. From this project, we produced a prototype software implementation of the algorithms on privacy preserving information networks, which have been used as a hand-on software platform for education purpose in both undergraduate courses and graduate courses in the School of Computer Science at Georgia Tech, where students learned about the importance of privacy and the means to preserve it while building information-intensive services. At the same time, the project results also illustrate the difficulties and challenges of achieving privacy in the classic sense (i.e., in terms of bytes shared). These fundamental results have significant impact on many application areas, since the technical capabilities of the platform can shape the functionality and compliance of those information services. On the application side, the project also included faculty members at Moorehouse School of Medicine to apply our software tools in practical health care environments. As an independent and complementary research effort, there is an active research effort on investigating the opportunities and technical requirements for hosting privacy preserving information network systems in virtualized cloud environments such as Amazon EC2. This is in anticipation of future healthcare cloud environments, where healthcare providers collaborate, as in an integrated network, to offer privacy preserving information sharing and integration over access controlled content. We have started combining some of our system virtualization research efforts with this project by building and deploying such information services as a large scale network system application in the cloud. Our current research efforts include (i) studying performance implications of running network I/O intensive workloads between remote clients and virtualized cloud, where multiple virtual machines share the same hardware infrastructure, (ii) developing state monitoring techniques and optimizations for cloud services in order to scale the healthcare information networks on demand, and (iii) security and privacy issues in hosting healthcare data and applications in the third party Cloud.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0905493
Program Officer
William Bainbridge
Project Start
Project End
Budget Start
2009-09-15
Budget End
2014-08-31
Support Year
Fiscal Year
2009
Total Cost
$1,076,272
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332