(Taken from application abstract): We propose to develop automated techniques to facilitate classification and pattern recognition in biomedical data sets. These techniques will involve development of novel neural network architectures, as well as formulation of principles governing their creation and explanation of results. Specifically, as a solution to the problem of recognizing infrequent categories, we will develop hierarchical and sequential systems of feedforward neural networks that make use of information such as (a) prior knowledge of the domain, and/or (b) natural clusters defined by clustering or unsupervised learning methods to develop intermediate classification goals and utilize a divide-and-conquer approach to complex classification problems. Additionally, we will develop generic tools for pre-processing input data by making transformations of original data, reducing dimensionality, and producing training and test sets suitable for cross-validation and bootstrap. We will build tools for evaluating results that measure calibration, resolution, importance of variables, and comparisons between different models. Furthermore, we will develop standardized interfaces for certain existing classification models. We will use a component-based architecture to build our neural network and write interfaces to existing classification models (e.g., regression trees, logistic regression models) so that they can be interchanged in a user-friendly manner. We will use our preprocessing modules to prepare data to be entered in a variety of classification models. The results will be evaluated in isolation, and later combined to test the hypothesis that the combined system performs better in real biomedical data sets in terms of calibration, resolution, and explanatory power. This research will (a) quantify improvement in performance when a classification problem is broken down into subproblems in a systematic way, (b) quantify the advantages of combining different types of classifiers, create a library of reusable neural network classification models, data pre-processing, and evaluation tools that use standardized interfaces, and (d) foster dissemination of classification models and the use of pre-processing and evaluation tools by making them available to other researchers through the World-Wide-Web. We will test four hypotheses: (1) Combinations of different modalities of classifiers perform significantly better than isolated models. (2) Hierarchical and sequential neural networks perform better than standard neural networks. (3) Unsupervised models can decompose a problem for hierarchical or sequential neural networks better than models that use prior knowledge. (4) It is possible to build a Classification Tool Kit composed of data pre-processing modules, classification models, and evaluation modules in which components are independent, reusable, and interchangeable.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM006538-02
Application #
2897393
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Bean, Carol A
Project Start
1998-09-30
Project End
2001-08-31
Budget Start
1999-09-01
Budget End
2001-08-31
Support Year
2
Fiscal Year
1999
Total Cost
Indirect Cost
Name
Brigham and Women's Hospital
Department
Type
DUNS #
071723621
City
Boston
State
MA
Country
United States
Zip Code
02115
Ohno-Machado, Lucila; Silveira, Paulo Sergio Panse; Vinterbo, Staal (2004) Protecting patient privacy by quantifiable control of disclosures in disseminated databases. Int J Med Inform 73:599-606
Resnic, Frederic S; Wainstein, Marco; Lee, Michael K Y et al. (2003) No-reflow is an independent predictor of death and myocardial infarction after percutaneous coronary intervention. Am Heart J 145:42-6
Resnic, F S; Ohno-Machado, L; Selwyn, A et al. (2001) Simplified risk score models accurately predict the risk of major in-hospital complications following percutaneous coronary intervention. Am J Cardiol 88:5-9
Resnic, F S; Blake, G J; Ohno-Machado, L et al. (2001) Vascular closure devices and the risk of vascular complications after percutaneous coronary intervention in patients receiving glycoprotein IIb-IIIa inhibitors. Am J Cardiol 88:493-6
Comander, J; Weber, G M; Gimbrone Jr, M A et al. (2001) Argus--a new database system for Web-based analysis of multiple microarray data sets. Genome Res 11:1603-10
Resnic, F S; Popma, J J; Ohno-Machado, L (2000) Development and evaluation of models to predict death and myocardial infarction following coronary angioplasty and stenting. Proc AMIA Symp :690-3
Lacson, R C; Ohno-Machado, L (2000) Major complications after angioplasty in patients with chronic renal failure: a comparison of predictive models. Proc AMIA Symp :457-61
Jenssen, T K; Vinterbo, S (2000) A set-covering approach to specific search for literature about human genes. Proc AMIA Symp :384-8