In the broadest terms, the goal of the proposed work is to make it easier for researchers to apply robust, scalable, entity-centered, heterogeneous data access to the biomedical literature. 'Entity centered' means that information is indexed irrespective of what a surface mention looks like in any given data source. For example, there is a gene in FlyBase with synonyms in text as diverse as 'Foil"""""""" and """"""""Mel(3)10"""""""", generic norminal referring expressions like 'The gene"""""""", pronouns like """"""""it"""""""", as well as a FlyBase database id of CG5490.[Morgan et al. 2002]. The Phase I proposal breaks down into two major efforts. First, extend the existing LingPipe suite of linguistic processing tools to the challenges of bioinformatics resulting in LingPipe-Bio. This will be distributed as an open source suite of tools to the research and entrepreneurial community with dual open source/commercial licensing. Second, it is proposed to adapt a current interface for entity centered data access (ThreatTracker for intelligence analysts) to BioTracker, based on the needs of biomedical researchers.

Agency
National Institute of Health (NIH)
Institute
National Center for Research Resources (NCRR)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
1R43RR020259-01
Application #
6834548
Study Section
Special Emphasis Panel (ZRG1-BDMA (01))
Program Officer
Swain, Amy L
Project Start
2004-08-09
Project End
2006-07-31
Budget Start
2004-08-09
Budget End
2005-07-31
Support Year
1
Fiscal Year
2004
Total Cost
$199,156
Indirect Cost
Name
Alias-I
Department
Type
DUNS #
124340956
City
New York
State
NY
Country
United States
Zip Code
11211