Modern organizations, such as businesses, non-profits, government agencies, and universities, collect and use personal information from a range of sources, shared with specific expectations about how it will be managed and used. Accordingly, they must find ways to comply with expectations, which may be complex and varied, as well as with relevant privacy laws and regulations, while they minimize operational risk and carry out core functions of the organization efficiently and effectively. Designing organizational processes to manage personal information is one of the greatest challenges facing organizations (see, e.g. a recent survey by Deloitte and the Ponemon Institute [TI07]), with far-reaching implications for every individual whose personal information is available to modern organizations, i.e. all of us.
This project responds to these challenges by developing methods, algorithms and prototype tools for integrating privacy, compliance, and risk evaluation into complex organizational processes. It explores, articulates and characterizes formally the scope and nature of privacy-expectations of stakeholders as well as those of key regulations, such as HIPAA, GLBA, COPPA, BASEL 2, and Sarbanes-Oxley (SOX). It incorporates the diverse perspectives and areas of expertise of its multidisciplinary research team, which includes three computer scientists, one philosopher, and collaborating researchers from IBM. This industry connection facilitates interaction with product teams that have served complex organizations concerned with business process integrity, information security, privacy, and information risk management. The research builds on "contextual integrity" (a philosophical account of privacy) as well as language and risk-based methods for privacy policy specification and enforcement. Extensive training and educational opportunities are provided to undergraduate and graduate students and research results integrated into courses at CMU, NYU, Stanford, and UPenn.
Privacy has become a significant concern in modern society as personal information about individuals is increasingly collected, used, and shared, often using digital technologies, by a wide range of organizations. To mitigate privacy concerns, organizations are required to respect privacy laws in regulated sectors (e.g., HIPAA in healthcare, GLBA in financial sector) and to adhere to self-declared privacy policies in self-regulated sectors (e.g., privacy policies of companies such as Google and Facebook in Web services). Enforcing these kinds of privacy policies in organizations is difficult because privacy laws and enterprise policies typically identify a complex set of conditions governing the disclosure of personal information. For example, the HIPAA Privacy Rule includes over 80 clauses that permit, deny, and even require the disclosure of personal health information, making it difficult to manually ensure that all disclosures are compliant with the law. The research team at Carnegie Mellon University created a formal language for specifying a rich class of privacy policies. They then used this language to produce the first complete formal specification of disclosure clauses in two important US privacy laws --- the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule and the Gramm-Leach-Bliley Act (GLBA). Recognizing that certain portions of complex privacy policies such as HIPAA are subjective and might require input from human auditors for compliance determination, the specification clearly separates out the subjective and the objective portions of a given policy. The team then developed an algorithm that checks audit logs for compliance with privacy policies expressed in their language. The algorithm has two distinct characteristics. First, it automatically checks the objective portion of the privacy policy for compliance and outputs the subjective portion for inspection by human auditors. Second, recognizing that audit logs are often incomplete in practice (i.e., they may not contain sufficient information to determine whether a policy is violated or not), the algorithm proceeds iteratively: in each iteration it provably checks as much of the policy it possibly can over the current log and outputs a residual policy that can only be checked when the log is extended with additional information. Initial experiments with a prototype implementation checking compliance of simulated audit logs with the HIPAA Privacy Rule indicates that the algorithm is fast enough to be used in practice. Related collaborative efforts by research teams at Stanford University and the University of Pennsylvania resulted in other algorithms for checking compliance of actions with privacy policies with a focus on the healthcare domain. In addition, a joint team from Carnegie Mellon University and New York University conducted a multidisciplinary study of the privacy implications of moving court records online from technology and policy standpoints and presented concrete recommendations on how to mitigate privacy threats arising from this move.