New environmental exposure technologies for chemical biomonitoring and person-specific sampling, such as in homes or breathing space, are crucial advances for finding environmental causes of disease and guiding prevention. These methods yield valuable exposure data for major NIH cohort studies and for surveillance, notably NHANES biomonitoring for more than 200 chemicals in the US population and the new EPA ExpoCast. Personal exposure measurements are expensive and sometimes non-repeatable, motivating open data- sharing, including in online repositories. At the same time, personal measurements raise new ethical concerns about the possibility that the identity of study participants might be revealed even in data considered anonymized, a process called re-identification. Proliferating public data and computing power will continue to increase privacy risks. Prompted by visible instances of re-identification, healthcare and genetics researchers, among others, are debating and investigating new practices for redacting shared data, warning participants of privacy risks, and sometimes requesting """"""""open consent"""""""" to share data without protecting privacy. However, computational privacy risks have not yet been investigated for environmental chemical exposure data. These data may pose risks through novel linkage strategies using data such as on real estate, environmental compliance, permits, weather, and consumer purchases. Re-identification could result in stigma for """"""""contaminated"""""""" individuals or communities, reveal behavior a person considers private (e.g., smoking or use of overseas products banned in the US), trigger legal obligations if a regulated chemical is measured, or affect property values, insurance, or employability. This project will empirically evaluate privacy risks and develop solutions for environmental health studies. It will engage an Advisory Council of environmental health scientists, computer scientists, policymakers, community leaders, and bioethicists to provide input and seek consensus on complex ethical and values-based considerations. Building on the investigators'established computational model for health data, this study will develop a model for predicting re-identification risk in environmental health studies. By applying the model to 10 important environmental studies, the project will quantify privacy risks in this field and identify specific data fields that contribute to risk. The model wil be validated by testing the actual number of re-identifications in a household exposure study. Based on results indicating risky data fields, the study will test and seek to optimize procedures to redact or mask data to improve privacy while retaining scientific utility for data-sharing. Because data-sharing decisions ultimately rest on participants'informed consent, the project complements computational analyses by asking participants in two large, innovative online studies about their understandings and values related to privacy and data-sharing. Results from this project will provide researchers with ethically and technically sound methods for sharing environmental data, contributing to more-rapid discovery of preventable causes of disease.

Public Health Relevance

Personal environmental exposure assessments, such as chemical measurements in blood, urine, homes, and breathing space are crucial tools for finding environmental causes of disease. These measurements are expensive and, in the case of disasters, non-repeatable, so sharing data is crucial to advance knowledge and stretch public research dollars. This project will help researchers more rapidly identify how environmental exposures affect health by developing empirically tested, ethically sound methods for sharing environmental health and exposure data in online repositories while protecting the privacy of study participants.

National Institute of Health (NIH)
National Institute of Environmental Health Sciences (NIEHS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (SEIR)
Program Officer
Finn, Symma
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Silent Spring Institute
United States
Zip Code
Zarate, Oscar A; Brody, Julia Green; Brown, Phil et al. (2016) Balancing Benefits and Risks of Immortal Data: Participants' Views of Open Consent in the Personal Genome Project. Hastings Cent Rep 46:36-45
Yasnoff, William A; Sweeney, Latanya; Shortliffe, Edward H (2013) Putting health IT on the path to success. JAMA 309:989-90