Free Text Gene Name Recognition

Wilbur, Willy

Abstract

1) I have been a co-organizer of the BioCreative Workshops since 2005 and have taken part in BioCreative II (2007), BioCreative III (2010), BioCreative-2012 Workshop (2012), and BioCreative IV (2013) and my group is taking part in BioCreative V (2015) which has not yet taken place. The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. 2) We are currently working to develop more general methods of finding high value articles for PPI based on their abstracts. This effort involves not only more powerful ranking methods, but also ways to display evidence to the user for a users quick evaluation. 3) We are also investigating an approach to named entity recognition for a large number of biologically important entity types. We have found certain general patterns that can be used to find genes and other entity types with a higher reliability than can be done with a general CRF. This is ongoing research with a promise for more useful general patterns. 4) We have begun a project called BioC which is an effort to create a general XML format defined by a DTD and software to read and write this format. Currently this approach has been implemented in C++, Java, Python, Pearl, Ruby, and GO. The idea is to use this common currency to make software modules that are useful for natural language processing more interoperable. The project is in its early stages, but already we have software to read and write in the languages mentioned as well as significant NLP processing modules using this approach and over 25 gold standard NLP annotated data sets available in the format. The approach was featured in the BioCreative IV Workshop and the approach has formed the basis of the BioC Collaborative Track at BioCreative V which will take place in a short time. This track has received contributions from eight teams besides our own and has built a user interface which displays annotated articles to Biogrid curators to assist them in their work.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIALM000093-15
Application #: 9160916
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 15
Fiscal Year: 2015
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects


NIH 2015 ZIA LM	Free Text Gene Name Recognition Wilbur, Willy / National Library of Medicine
NIH 2014 ZIA LM	Free Text Gene Name Recognition Wilbur, Willy / National Library of Medicine
NIH 2013 ZIA LM	Free Text Gene Name Recognition Wilbur, Willy / National Library of Medicine	$369,833
NIH 2012 ZIA LM	Free Text Gene Name Recognition Wilbur, Willy / National Library of Medicine	$195,229
NIH 2011 ZIA LM	Free Text Gene Name Recognition Wilbur, Willy / National Library of Medicine	$179,884
NIH 2010 ZIA LM	Free Text Gene Name Recognition Wilbur, Willy / National Library of Medicine	$195,870
NIH 2009 ZIA LM	Free Text Gene Name Recognition Wilbur, Willy / National Library of Medicine	$221,141

Publications

Kim, Sun; Lu, Zhiyong; Wilbur, W John (2015) Identifying named entities from PubMed for enriching semantic categories. BMC Bioinformatics 16:57

Comeau, Donald C; Batista-Navarro, Riza Theresa; Dai, Hong-Jie et al. (2014) BioC interoperability track overview. Database (Oxford) 2014:

Islamaj Do?an, Rezarta; Comeau, Donald C; Yeganova, Lana et al. (2014) Finding abbreviations in biomedical literature: three BioC-compatible modules and four BioC-formatted corpora. Database (Oxford) 2014:

Kwon, Dongseop; Kim, Sun; Shin, Soo-Yong et al. (2014) Assisting manual literature curation for protein-protein interactions using BioQRator. Database (Oxford) 2014:

Arighi, Cecilia N; Carterette, Ben; Cohen, K Bretonnel et al. (2013) An overview of the BioCreative 2012 Workshop Track III: interactive text mining task. Database (Oxford) 2013:bas056

Kim, Sun; Kim, Won; Wei, Chih-Hsuan et al. (2012) Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information. Database (Oxford) 2012:bas042

Kim, Sun; Kwon, Dongseop; Shin, Soo-Yong et al. (2012) PIE the search: searching PubMed literature for protein interaction information. Bioinformatics 28:597-8

Krallinger, Martin; Vazquez, Miguel; Leitner, Florian et al. (2011) The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinformatics 12 Suppl 8:S3

Kim, Sun; Wilbur, W John (2011) Classifying protein-protein interaction articles using word and syntactic features. BMC Bioinformatics 12 Suppl 8:S9

Arighi, Cecilia N; Lu, Zhiyong; Krallinger, Martin et al. (2011) Overview of the BioCreative III Workshop. BMC Bioinformatics 12 Suppl 8:S1

Showing the most recent 10 out of 14 publications

Comments

Be the first to comment on Willy Wilbur's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: