Genomic data are vital for advancing medical research and achieving breakthroughs. However, disclosure of genomic data has serious privacy implications that can lead to a loss of trust from data contributors and restricting researchers? access to data. To facilitate data-driven genomic research, it is crucial to address the privacy risks in data sharing and to develop privacy-preserving solutions to protect study participants. This project will study the privacy risks in realistic attack models and develop privacy methods that balance individual privacy and the utility of shared data. Overall, the proposed solutions will enable institutions to share high utility data while providing strong privacy assurance to data contributors, facilitating data collection and improving data usability. In the first aim, a privacy-preserving data publication framework will be developed to ?safely anonymize? genomic data and optimize the released data toward application needs. The framework will protect individuals from re- identification and also prevent inference attacks that may be conducted using publicly available phenotypes (e.g., eye/hair color). In the second aim, customizable privacy solutions will be developed against realistic adversarial models when data statistics are released. Building on recent privacy models, the proposed solutions will account for the adversary's external knowledge and customizable sensitive information to effectively strike a balance between privacy and utility, improving data usability compared to standard differential privacy models. This project will advance current solutions for genomic data anonymization and improve the usability of differential privacy and its variants, with the goal of facilitating highly usable and privacy-preserving data sharing. This work will widen the access to genomic data, promote transparency, and facilitate reproducibility for genomic applications. This project is in line with the mission of the National Human Genome Research Institute (NHGRI), as the proposed techniques enhance data sharing and promote collaborative genomic research. The applicant?s career goal is to become an independent investigator with a primary appointment in a biomedical informatics program, with a focus on genome privacy technologies, at a major US research university. His long- term objective is to develop new privacy-preserving technologies for data sharing and data analytics, in order to facilitate collaborative research efforts in genomics and precision medicine. The applicant proposes a carefully designed career development plan, which includes a variety of training activities to complement his computer science skills with additional biomedical knowledge and smooth his transition into an independent researcher. The UCSD Health Department of Biomedical Informatics will serve as an exceptional platform for his career development, given the experience of several faculty in privacy technologies, computational biology, genomic medicine, and close collaboration with other institutions worldwide.
While sharing genomic data holds great promise for advancing biomedical research and personalized medicine, the uniqueness of the genomic data and the rich information they carry can put participants? privacy at risk. In this project, we will develop practical solutions for genomic data sharing that provide data contributors with strong privacy assurance while enabling the release of high quality data. The outcome of the proposed research will encourage data contributors to participate in studies, widen the access to genomic data, and, in the long-term, facilitate genomic data-driven medical research to promote health and ease the burden of disease.