Core 1 - Research &Development Contemporary biomedical and behavioral sciences require sophisticated computation. In Core 1, a team of quantitative scientists (information and computer scientists, biostatisticians, mathematicians, and software engineers) will develop the software infrastructure (i.e. the BCl core), services, and tools for use by biomedical and behavioral researchers. An illustration of major components is shown in Figure B-1. Current state of the art research infrastructures containing biomedical data warehouses essentially have three levels of data disclosure: (1) query results counts, (2) de-identified data, and (3) identified data. Deidentification and anonymization are related, but different concepts. While de-identification consists of removal of particular identifiers, anonymization provides a means for data not be traced back to one particular individual. Simplistic measures (Murphy SN &Chueh HC 2002) are cun-ently applied to step (1) above to prevent the tracing of information to a particular individual using the results of several query counts, and previous research indicates that the de-identification of data disclosed at level (2) is not sufficient to preserve individual privacy (Sweeney 1997). Therefore, at both levels (1) and (2) robust anonymization algorithms are necessary. Formal proofs for adherence to quantitative privacy criteria are hard to produce, and consequently only available for a few methods in limited settings (Lasko 2007). As a consequence, most approaches in use today have not been rigorously validated theoretically or with real data. The three levels of disclosure outlined above are insufficient for responsible data sharing beyond the scope of an institutional IRB (in a HIPAA covered entity) such as a federated data warehouse to which multiple institutions or sources can contribute data. For this and other reasons, institutional clinical data repositories for research, some of which receive federal funding for their creation and/or maintenance, have been restricted to researchers who are formally affiliated with the institution. To address this limitation and progress towards a stage in which data can be shared across institutions, we propose research into: (a) a tool that interfaces between clinical data and a user, and that can answer limited queries while ensuring that privacy is preserved, (b) a tool that can simulate real data in a privacy preserving manner to the point that the simulated data can be used as a proxy in population based analyses, and (c) a cryptographic data submission protocol that hides the identity of the submitting entity.

National Institute of Health (NIH)
Specialized Center--Cooperative Agreements (U54)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Diego
La Jolla
United States
Zip Code
Ji, Zhanglong; Jiang, Xiaoqian; Wang, Shuang et al. (2014) Differentially private distributed logistic regression using private and public data. BMC Med Genomics 7 Suppl 1:S14
Li, Zhonghan; Chao, Ti-Chun; Chang, Kung-Yen et al. (2014) The long noncoding RNA THRIL regulates TNF? expression through its interaction with hnRNPL. Proc Natl Acad Sci U S A 111:1002-7
Hinske, Ludwig Christian; Fran├ža, Gustavo S; Torres, Hugo A M et al. (2014) miRIAD-integrating microRNA inter- and intragenic data. Database (Oxford) 2014:
Kinsella, Marcus; Patel, Anand; Bafna, Vineet (2014) The elusive evidence for chromothripsis. Nucleic Acids Res 42:8231-42
Patel, Anand; Schwab, Richard; Liu, Yu-Tsueng et al. (2014) Amplification and thrifty single-molecule sequencing of recurrent somatic structural variations. Genome Res 24:318-28
Hepler, N Lance; Scheffler, Konrad; Weaver, Steven et al. (2014) IDEPI: rapid prediction of HIV-1 antibody epitopes and other phenotypic features from sequence data using a flexible machine learning platform. PLoS Comput Biol 10:e1003842
Ronen, Roy; Zhou, Dan; Bafna, Vineet et al. (2014) The genetic basis of chronic mountain sickness. Physiology (Bethesda) 29:403-12
Gordon, C T; Jimenez-Fernandez, S; Daniels, L B et al. (2014) Pregnancy in women with a history of Kawasaki disease: management and outcomes. BJOG 121:1431-8
Gan, Zhuohui; Wang, Jianwu; Salomonis, Nathan et al. (2014) MAAMD: a workflow to standardize meta-analyses and comparison of affymetrix microarray data. BMC Bioinformatics 15:69
Doan, Son; Lin, Ko-Wei; Conway, Mike et al. (2014) PhenDisco: phenotype discovery system for the database of genotypes and phenotypes. J Am Med Inform Assoc 21:31-6

Showing the most recent 10 out of 58 publications