The field of human genetics has undergone a revolution in the past 10 years with the advent of high-throughput genomic technologies which can measure human variation at low cost. The flagship application of these technologies has been the genome-wide association study (GWAS) where genetic variation information is collected from hundreds of thousands of individuals, a portion of which have a specific disease and a portion of which are healthy individuals. Identification of correlation between genetic variants with disease status has led to the identification of hundreds of new genes involved in dozens of human diseases. All applications of these technologies, including GWAS, require individuals to "share" their genetic data. In today's typical GWAS, thousands of individuals must consent to have their genetic information collected and incorporated into a database which also contains information on their disease status. Unfortunately, an individual's genetic data is extremely sensitive as it is considered medical information about an individual. In this proposal, the team addresses the natural tension between privacy and the application of personal genomics technologies by capitalizing on recent breakthroughs in cryptography. They present a novel technological approach to keep one's genetic data private, yet taking full advantage of genetic information - in a privacy-preserving way, by taking advantage of several techniques that have been recently developed in an area broadly referred to as secure computing, which address the problem of allowing a collection of individuals to compute some output that depends on all their inputs, without having to reveal their individual inputs to each other. The core of this proposal focuses on the application of secure computing to two specific problems in personal genomics: The first is the problem of identification of relatives from genetic variation information while preserving privacy of genetic material. The second, is the identification of disease causing variants without sacrificing individual patient's genetic privacy.
The development of the techniques presented in this proposal will have a profound impact on personal genomics and the field of genetics in general for several reasons. First, the easing of privacy fears will drop a major barrier to participation in personal genomics likely increasing the utilization of recent advances in genetic and genomic technologies for the public. This increased utilization will accelerate the medical benefits of these technologies. Second, the current thinking is that it is impossible to protect privacy in personal genomics and the results of this project will surprise many in the field, leading to a rethinking of the how to handle privacy in genetic studies. Finally, this research direction will likely lead to new problems and research directions for the cryptography research community and foster new collaborations between genetics researchers, cryptographers and mathematicians.
This project also contributes to training the next generation of interdisciplinary scientists. The investigators all teach advanced undergraduate courses in both genetics and cryptography and it is likely that the topics developed in this proposal will be included in the curriculum of the courses. In addition, the graduate students involved in this proposal will obtain interdisciplinary training in both genetics and computer science theory.