Hospital clinical laboratory tests are a major source of medical information used to diagnose, treat, and monitor patients. Such test errors lead to delays, additional clinical evaluation, additional expense, and sometimes to erroneous treatments that increase risk to patients. One recent study suggests that errors in measured total blood calcium concentration due to instrument mis-calibration alone cost from $60M to $199M annually in the US. However, the vast majority of clinical laboratory errors do not originate in instrument mis-calibration. Clinical laboratory errors affect about 0.5% of samples collected. Of those, approximately 75% of clinical laboratory test errors originate during sample collection, transport, and storage before samples reach the analysis instruments i.e., the pre-analytic phase. However the quality control measures standard in hospital clinical test labs only monitor instrument calibration and are therefore completely blind to sample faults introduced in the pre-analytic phase, where most errors originate. Data derived from patient samples, rather than instrumentation calibration checks, holds the key to detect faults introduced in the pre-analytic phase. Current methods are either so insensitive to errors that they do not detect sample faults reliably, or they routinely flag normal samples as being faulty.

This project brings together an interdisciplinary team of researchers from Oregon Health and Science University and Northeastern University with expertise in machine learning, signal processing, and laboratory medicine to develop and apply statistical machine learning technology to reliably detect errors in hospital clinical laboratory tests, using data derived from patient samples. The primary obstacle to developing reliable statistical detectors for lab errors is the cost of labeling samples combined with the low error rate. Developing and evaluating any automated error-detection algorithm requires a sufficient number of samples, both faulty and non-faulty. Determining which tests are faulty requires review of the tests and other patient data (e.g. charts) by a clinical lab expert - a time-consuming and economically unfeasible prospect given the low fault rate. The project addresses this challenge through active learning paradigms used to select, with emphasis on rare classes, subsets of the data for labeling by human experts. The project focuses on chronic kidney disease because of its medical importance and large data repository at Oregon Health and Science University. This research will provide algorithms for clinical lab error detection that will extend to tests used in other disease entities (for example diabetes and heart failure).

Ultimately, the error-detection algorithms developed from this research will make their way into clinical laboratory information systems and further into commercialization and thus deployment on a scale significant enough to have widespread positive impact on laboratory costs patient risk. The project provides cross-disciplinary training in statistical pattern recognition and clinical laboratory science for graduate and undergraduate students. Additional information about the project can be found at:

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Standard Grant (Standard)
Application #
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Oregon Health and Science University
United States
Zip Code