This application addresses broad Challenge Area (10) Information Technology for Processing Health Care Data and specific Challenge Topic, 10-HD-102: Data Archiving and Dissemination. The sharp increase in the sophistication of social science data systems that accompanied computer- assisted data collection methods created a concomitant increase in the risk of disclosing individual respondent's identities when the data are shared more broadly. Public use data files, which substantially reduce the risk of disclosure through statistical and technical methods often also reduce the analytic utility of these data. Data producers have increasingly chosen to retain the original analytic potential of the data by releasing the files under a modified data use agreement or legal contract with analysts. Large data collection programs, both inside and outside the Federal Statistical System, increasingly issue a substantial number of these contracts annually. The contracts often place a large burden on the end user to provision and secure computing platforms that are designed to protect the electronic security of the data files. Different data systems also will often require separate machinery for each data use contract. This ad hoc system for securing and disseminating confidential data has limited both the availability and the security of the data. In this project, the Inter University Consortium for Political and Social Research and partners at the Rand Corporation and the Survey Research Center at the University of Michigan will build and test a data storage and dissemination system for confidential data, which obviates the need for users to build and secure their own computing environments. Recent advances in public utility (or """"""""cloud"""""""") computing now makes it feasible to provision powerful, secure data analysis platforms on-demand. We will leverage these advances to build a system which collects """"""""system configuration"""""""" information from analysts using a simple web interface, and then produces a custom computing environment for each confidential data contract holder. Each custom system will secure the data storage and usage environment in accordance with the confidentiality requirements of each data file. When the analysis has been completed, this custom system will be fed into a """"""""virtual shredder"""""""" before final disposal. This prototype data dissemination system will be tested for (1) system functionality (i.e., does it remove the usual barriers to data access?);(2) storage and computing security (i.e., does it keep the data secure?);and (3) usability (i.e., is the entire system easier to use?). Contract holders of two major data systems (the Panel Study of Income Dynamics and the Los Angeles Family and Neighborhood Study) will be recruited to assess both the user interface and the analytic flexibility of the new customized computing environments. The project is designed to improve access to important public health data by lowering barriers to sophisticated research collections. Data systems that include information on morbidity, mortality, and biomarkers will be more accessible to the research community. The computing architecture described here will also improve data security and, thus, encourage more widespread sharing of research data.
The project is designed to improve access to important public health data by lowering barriers to sophisticated research collections. Data systems that include information on morbidity, mortality, and biomarkers will be more accessible to the research community. The computing architecture described here will also improve data security and, thus, encourage more widespread sharing of research data.