Phenotype Discovery in NHLBI Genomic Studies (PhD)

Ohno-Machado, Lucila

Abstract

Researchers continually upload data into public repositories at a rapid pace, yet utilize few common standards for annotation, making it close to impossible to compare or associate data across studies. To address this problem, we will develop a defined meta- data model and build an integrated system called Phenotype Discovery (PhD) that enables researchers to query and find genomic studies of interest in public repositories as well as upload new data into our database (sdGaP), in a standardized manner. A Query Interpreter (QI) will utilize text mining and natural language processing techniques to map free text into concepts in biomedical ontologies, allowing non-structured queries to be answered efficiently. In Phase I of the project, we will develop a proof-of-concept system that can retrospectively structure phenotypic descriptions in dbGaP, and will work with domain experts in pneumology to build use cases and evaluate the automated mappings. In Phase II of the project, we will extend the domain expertise to cardiology, hematology, and sleep disorders to build a more comprehensive system, expanding the phenotype annotation to transcriptome databases, and integrating a flexible automated genotype annotation tool for sdGaP. We will develop a user-friendly interface to prospectively assist researchers in uploading their data with standardized phenotypic annotations. We will provide the tool for free from our website and continuously improve its quality, based on user feedback and usage data.

Public Health Relevance

Phenotype Discovery (PhD) represents a novel, automated system to describe the characteristics of patients whose genetic information is available in public data repositories, without compromising their privacy. This initiative is greatly needed so that more researchers can make use of data collected from projects funded by public agencies. PhD uses new methodology for natural language processing and semantic integration to interpret the narrative text as well as variables and their values from studies in genomic databases. Standardized terminologies will be utilized to ensure that data can be analyzed across different studies.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Heart, Lung, and Blood Institute (NHLBI)
Type: Exploratory/Developmental Cooperative Agreement Phase I (UH2)
Project #: 5UH2HL108785-02
Application #: 8303361
Study Section: Special Emphasis Panel (ZHL1-CSR-K (M1))
Program Officer: Larkin, Jennie E

Project Start: 2011-07-19
Project End: 2013-05-31
Budget Start: 2012-06-05
Budget End: 2013-05-31
Support Year: 2
Fiscal Year: 2012
Total Cost: $516,448
Indirect Cost: $232,102

Institution

Name: University of California San Diego
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 804355790

City: La Jolla
State: CA
Country: United States
Zip Code: 92093

Related projects


NIH 2012 UH2 HL	Phenotype Discovery in NHLBI Genomic Studies (PhD) Ohno-Machado, Lucila / University of California San Diego	$516,448
NIH 2011 UH2 HL	Phenotype Discovery in NHLBI Genomic Studies (PhD) Ohno-Machado, Lucila / University of California San Diego	$540,294

Publications

Doan, Son; Lin, Ko-Wei; Conway, Mike et al. (2014) PhenDisco: phenotype discovery system for the database of genotypes and phenotypes. J Am Med Inform Assoc 21:31-6

Hinske, Ludwig Christian; França, Gustavo S; Torres, Hugo A M et al. (2014) miRIAD-integrating microRNA inter- and intragenic data. Database (Oxford) 2014:

Jiang, Xiaoqian; Ji, Zhanglong; Wang, Shuang et al. (2013) Differential-Private Data Publishing Through Component Analysis. Trans Data Priv 6:19-34

Li, Pinghao; Wang, Shuang; Kim, Jihoon et al. (2013) DNA-COMPACT: DNA COMpression based on a pattern-aware contextual modeling technique. PLoS One 8:e80377

Roozgard, Aminmohammad; Barzigar, Nafise; Wang, Shuang et al. (2013) Nucleotide sequence alignment using sparse coding and belief propagation. Conf Proc IEEE Eng Med Biol Soc 2013:588-91

Jiang, Xiaoqian; Sarwate, Anand D; Ohno-Machado, Lucila (2013) Privacy technology to support data sharing for comparative effectiveness research: a systematic review. Med Care 51:S58-65

Ross, Mindy K; Lin, Ko-Wei; Truong, Karen et al. (2013) Text Categorization of Heart, Lung, and Blood Studies in the Database of Genotypes and Phenotypes (dbGaP) Utilizing n-grams and Metadata Features. Biomed Inform Insights 6:35-45

Alipanah, Neda; Lin, Ko-Wei; Venkatesh, Vinay et al. (2013) Phenotype Information Retrieval for Existing GWAS Studies. AMIA Jt Summits Transl Sci Proc 2013:4-8

Lin, Ko-Wei; Tharp, Melissa; Conway, Mike et al. (2013) Feasibility of using Clinical Element Models (CEM) to standardize phenotype variables in the database of genotypes and phenotypes (dbGaP). PLoS One 8:e76384

Ohno-Machado, Lucila (2012) To share or not to share: that is not the question. Sci Transl Med 4:165cm15

Comments

Be the first to comment on Lucila Ohno-Machado's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: