Data Mining and Model Building in Medical Informatics

Buchanan, Bruce

Abstract

Our long-term goal is to assist biomedical scientists by extracting and codifying new knowledge from large biomedical databases routinely by computer. As large collections of data become more readily accessibly, the opportunities for discovering new information increase. We propose here to work toward this goal by extending our prior research on machine learning in two important directions: (1) codification of disparate pieces of knowledge into a coherent model (model building), and (2) discovery of new information in medical databases (data mining). Machine learning programs find classification rules (or decision trees or networks) that separate members of a target class from other individuals. They have emphasized predictive accuracy, with some attention to tradeoffs between accuracy and cost of errors or between accuracy and simplicity. We propose a framework in which these, and other, tradeoffs are explicit and the criteria by which tradeoffs are made are available for modification. We also include semantic considerations among the criteria to control the internal coherence of models. """"""""Data mining"""""""" is a recently-coined term for using computers to explore large databases, with a goal of discovering new relationships but usually with no specific target defined at the outset. In addition to accuracy, simplicity, coherence, and cost, a program that purports to discover new relationships must be able to assess novelty. We propose to measure the extent to which proposed relationships are novel by comparing them against existing knowledge in the domain of discourse, and to look for unusual rules (and other relations) that would be very interesting if true. The computer program we are primarily building on, RL, is a knowledge- based learning program that learns classification rules from a collection of data. RL has been demonstrated to be flexible enough to allow guidance from prior knowledge, and powerful enough to learn publishable information for scientists working in several different domains. Both parts of the research will requires extending the RL system in new ways detailed in the research plan, which are consistent with the overall design philosophy of the present system. We will primarily work with data already collected on pneumonia patients with with which we have considerable. We will test the generality of the criteria used to evaluate models and discoveries with a Baynesian Net learning. We will test the generality of the generality of the criteria used to evaluate models and discoveries with Bayesian Net learning system, K2.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM006759-03
Application #: 6391275
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Florance, Valerie

Project Start: 1999-05-01
Project End: 2003-04-30
Budget Start: 2001-05-01
Budget End: 2003-04-30
Support Year: 3
Fiscal Year: 2001
Total Cost: $215,487
Indirect Cost

Institution

Name: University of Pittsburgh
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 053785812

City: Pittsburgh
State: PA
Country: United States
Zip Code: 15213

Related projects


NIH 2001 R01 LM	Data Mining and Model Building in Medical Informatics Buchanan, Bruce G. / University of Pittsburgh	$215,487
NIH 2000 R01 LM	Data Mining and Model Building in Medical Informatics Buchanan, Bruce G. / University of Pittsburgh	$213,046
NIH 1999 R01 LM	Data Mining and Model Building in Medical Informatics Buchanan, Bruce G. / University of Pittsburgh

Publications

Lu, Xinghua; Zhai, Chengxiang; Gopalakrishnan, Vanathi et al. (2004) Automatic annotation of protein motif function with Gene Ontology terms. BMC Bioinformatics 5:122

Chapman, W W; Bridewell, W; Hanbury, P et al. (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34:301-10

Comments

Be the first to comment on Bruce Buchanan's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: