(Taken from application abstract): We propose to conduct research on appropriate architectures for systems of patient identification, to develop a toolkit of techniques and implementations that will allow, us to build a handful of demonstrable and testable such systems, to develop and study methods for de-identifying patient data without destroying its utility for research, and to evaluate our results against existing and proposed desired data for the usability, and protection of health data. The results of the proposed work will be useful to policy makers and information system architects to inform them of the range of possible solutions to the tasks of patient identification and de-identification. It will also provide a reference set of tools for implementors who are developing health information systems. The proposed research and evaluation plan is summarized in the following specific aims. 1. Develop the Health Information Identification and De-Identification Toolkit (HIIDIT), a toolkit that provides a range of solutions for patient identification and de-identification to meet various national and patient objectives in healthcare access, delivery, and research. 2. Apply HIIDIT to two different tasks: retrieving individual patient data for clinical care within a multi-institutional healthcare system and retrieving aggregate data for multi-institutional clinical research trial. 3. Evaluate the application of HIIDIT in terms of protection of confidentiality within the context of national healthcare delivery and research. 4. Develop new methods to de-identify data in databases that include both coded fields and narrative text, and develop formal criteria for evaluating success of de-identification methods. The HIIDIT proposal is narrowly defined to include only issues of identification and de-identification. It does not encompass the much larger agenda of creating a Master Patient Index (MPI).

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Bean, Carol A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Children's Hospital Boston
United States
Zip Code
Butte, A J; Ye, J; Haring, H U et al. (2001) Determining significant fold differences in gene expression analysis. Pac Symp Biocomput :6-17
Mandl, K D; Szolovits, P; Kohane, I S (2001) Public standards and patients' control: how to keep electronic medical records accessible but private. BMJ 322:283-7
Butte, A J; Kohane, I S (2000) Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput :418-29
Nigrin, D J; Kohane, I S (2000) Glucoweb: a case study of secure, remote biomonitoring and communication. Proc AMIA Symp :610-4
Butte, A J; Tamayo, P; Slonim, D et al. (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci U S A 97:12182-6
Sun, Y; van Wingerde, F J; Kohane, I S et al. (1999) The challenges of automating a real-time clinical practice guideline. Clin Perform Qual Health Care 7:28-35
Kohane, I S; Dong, H; Szolovits, P (1998) Health information identification and de-identification toolkit. Proc AMIA Symp :356-60
van Wingerde, F J; Sun, Y; Harary, O et al. (1998) Linking multiple heterogeneous data sources to practice guidelines. Proc AMIA Symp :391-5