Recent developments in text mining research, and in scientific publication, have brought us to the moment when the long-standing potential of natural language processing technology to benefit biomedical researchers may finally be realized. Technological advances, recent results in computational linguistics, maturation of biomedical ontology, and the advent of resources such as PubMedCentral have set the stage for an attempt at an integrated computational analysis of a large proportion of the full text biomedical literature. Such an analysis has the potential to dramatically extend the way that biomedical researchers can effectively use the scientific literature, particularly in the analysis of genome-scale datasets, broadly accelerating and increasing the efficiency of scientific discovery. We hypothesize that it is now possible to extract a wide variety of ontologically-grounded entities and relationships by processing the entire PubMedCentral document collection accurately and with good coverage, to use this extracted information to produce new genres of scientifically valuable tools and analysis techniques, and to demonstrate its utility in the analysis of genome-scale data. The challenges that we plan to overcome range from fundamental linguistic issues (e.g. cross- document coreference resolution) to high-performance computing (e.g. scaling up integrated processing to include millions of complex documents), to fielding practical systems that can exploit enormous knowledge-bases to accelerate the analysis of very large molecular data sets.
Enormous amounts of biomedical information are now available in the PubMedCentral database, but computers cannot work with it because it is in the form of human-language text and humans can't read it all due to its large volume. The goal of this project is to harvest large amounts of that information automatically, making it available to humans in summarized form and to computers in computer-readable form.
|Funk, Christopher S; Hunter, Lawrence E; Cohen, K Bretonnel (2014) Combining heterogenous data for prediction of disease related and pharmacogenes. Pac Symp Biocomput :328-39|
|Mirel, Barbara; Görg, Carsten (2014) Scientists' sense making when hypothesizing about disease mechanisms from expression data and their needs for visualization support. BMC Bioinformatics 15:117|
|Cohen, K Bretonnel; Hunter, Lawrence E (2013) Chapter 16: text mining for translational bioinformatics. PLoS Comput Biol 9:e1003044|
|Comeau, Donald C; Islamaj Do?an, Rezarta; Ciccarese, Paolo et al. (2013) BioC: a minimalist approach to interoperability for biomedical text processing. Database (Oxford) 2013:bat064|
|Liu, Haibin; Hunter, Lawrence; Kešelj, Vlado et al. (2013) Approximate subgraph matching-based literature mining for biomedical events and relations. PLoS One 8:e60954|
|Hill, David P; Adams, Nico; Bada, Mike et al. (2013) Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics 14:513|
|Epperson, L Elaine; Karimpour-Fard, Anis; Hunter, Lawrence E et al. (2011) Metabolic cycles in a circannual hibernator. Physiol Genomics 43:799-807|
|Galligan, James J; Fritz, Kristofer S; Tipney, Hannah et al. (2011) Profiling impaired hepatic endoplasmic reticulum glycosylation as a consequence of ethanol ingestion. J Proteome Res 10:1837-47|
|Grabek, Katharine R; Karimpour-Fard, Anis; Epperson, L Elaine et al. (2011) Multistate proteomics analysis reveals novel strategies used by a hibernator to precondition the heart and conserve ATP for winter heterothermy. Physiol Genomics 43:1263-75|
|Hindle, Allyson G; Karimpour-Fard, Anis; Epperson, L Elaine et al. (2011) Skeletal muscle proteomics: carbohydrate metabolism oscillates with seasonal and torpor-arousal physiology of hibernation. Am J Physiol Regul Integr Comp Physiol 301:R1440-52|
Showing the most recent 10 out of 28 publications