Free access to research data is vital to promote scientific discovery. However, privacy concerns revolve around publicly sharing biomedical data. Sharing such data puts at risk the identity and health information of individuals who have volunteered to anonymously release their information for medical research. One type of genomic sequence data that are generated rapidly by high-throughput methods is single nucleotide polymorphisms (SNPs). SNPs merit tremendous research attentions. Free exchange of these personal genotypes also poses difficult challenges for protecting privacy and information security. To deal with the challenges, I propose an investigation to acquire an accurate assessment of the privacy risk assumed by research subjects whose SNPs are disseminated in public biomedical databases. This knowledge will provide database privacy officers and policy makers the information that they need in protecting privacy of research subjects. In particular, I will develop methods to examine linkage disequilibrium (LD) patterns among SNPs throughout the genome, and I will compile a """"""""risk map"""""""" detailing the genomic locations most likely to threaten privacy. Because of LD, a small set of tag SNPs can capture the majority of SNP information content in the genome. They are thus valuable tools in genetics to reduce the effort necessary to map genes to diseases and phenotypes. Only the tag SNPs, rather than the entire gnome, needs to be examined. However, because of that very attribute, they are also the high-risk ones that would lead to individual identifications. Therefore, it is important to study the relationship between tag SNPs and privacy. I have previously developed methods to find tag SNPs with good performance. I propose improving these tagging methods as well as developing new ones to compile a comprehensive list of tag SNPs in the human genome. I will evaluate the ability of tag SNPs in disclosing individuals. I have also previously established an initial probabilistic model for the risk assessment. I propose to further develop a knowledgebase of tag SNPs with their locations and frequencies, and an automatic risk assessment tool that utilizes the probabilistic risk assessment model and the tag SNP knowledgebase. I will evaluate the usability and functionality of the risk assessment tool by applying to existing public genomic databases. I will also make the resulting methods available on the web for real-time tag SNP detection and risk assessment and will distribute the tools and software for open source development. ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Career Transition Award (K22)
Project #
1K22LM009105-01
Application #
7076320
Study Section
Special Emphasis Panel (ZLM1-AP-K (J2))
Program Officer
Ye, Jane
Project Start
2006-09-15
Project End
2009-09-14
Budget Start
2006-09-15
Budget End
2007-09-14
Support Year
1
Fiscal Year
2006
Total Cost
$151,982
Indirect Cost
Name
University of North Carolina Chapel Hill
Department
Type
Schools of Nursing
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599