Understanding Figures &Captions for Location Proteomics

Cohen, William

Abstract

? This proposal is for mentored training in the molecular biosciences of an established computer scientist. The training plan includes basic and advanced course work in modern biology, interactions with biological research groups, attendance at seminars and conferences, and laboratory training. Mentoring on the culture and practices of biomedical research will be provided by the sponsor. The training institution has a longstanding tradition of interdisciplinary research and specific expertise in cutting edge proteomics methods. The candidate will be fully committed to a combination of training and research. The research plan is based on the critical need to organize and summarize the knowledge in the vast biomedical literature. Curated databases are expensive to create and maintain; do not estimate confidence of assertions; and do not allow for divergence of opinions. Information extraction (IE) methods can be used to partially overcome these limitations by automatically extracting certain types of information from biomedical text. ? ? In most genres of scientific publication, the most important results in a paper are illustrated in non-textual forms, such as images and graphs. The broad thesis underlying our proposed research is that one can provide better access to the information in online scientific publications by extracting information jointly from figure images and their accompanying captions. With the exception of certain previous work by the Murphy group, previous biomedical IE systems have not attempted to extract information from image data, only text. ? ? This proposal addresses these issues in the specific context of fluorescence microscope images depicting the subcellular localization of proteins. This goal is consonant with a major focus of current biomedical research: the identification of expressed genes and the description of the proteins they encode. Motivated by recent large-scale projects which major focus of current biomedical research is the identification of expressed genes and the description (or annotation) of the proteins they encode, the Murphy group has developed automated systems for recognizing subcellular structures in 2D and 3D images. Automated image analysis techniques have also been applied to images harvested from online biomedical journal articles. This system will be extended to create a robust, comprehensive toolset for extracting, verifying and querying biologically relevant information from the text and images found in online journals. Based on this toolkit, a set of tools will be developed for aiding researchers to identify and locate information found in online journals. Upon completion of the proposed training, the candidate will be well placed to take a leadership position in machine learning applications to the range of experimental methods used in biomedical research. ? ? ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute on Drug Abuse (NIDA)
Type: Mentored Quantitative Research Career Development Award (K25)
Project #: 5K25DA017357-03
Application #: 7033080
Study Section: Human Development Research Subcommittee (NIDA)
Program Officer: Colvis, Christine

Project Start: 2004-04-01
Project End: 2007-03-31
Budget Start: 2006-04-01
Budget End: 2007-03-31
Support Year: 3
Fiscal Year: 2006
Total Cost: $133,459
Indirect Cost

Institution

Name: Carnegie-Mellon University
Department: Miscellaneous
Type: Schools of Arts and Sciences
DUNS #: 052184116

City: Pittsburgh
State: PA
Country: United States
Zip Code: 15213

Related projects


NIH 2006 K25 DA	Understanding Figures &Captions for Location Proteomics Cohen, William W. / Carnegie-Mellon University	$133,459
NIH 2005 K25 DA	Understanding Figures &Captions for Location Proteomics Cohen, William W. / Carnegie-Mellon University	$162,750
NIH 2004 K25 DA	Understanding Figures &Captions for Location Proteomics Cohen, William W. / Carnegie-Mellon University	$161,668

Publications

Kou, Zhenzhen; Cohen, William W; Murphy, Robert F (2007) A stacked graphical model for associating sub-images with sub-captions. Pac Symp Biocomput :257-68

Cohen, William W; Minkov, Einat (2006) A graph-search framework for associating gene identifiers with documents. BMC Bioinformatics 7:440

Kou, Zhenzhen; Cohen, William W; Murphy, Robert F (2005) High-recall protein entity recognition using a dictionary. Bioinformatics 21 Suppl 1:i266-73

Comments

Be the first to comment on William Cohen's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: