High-throughput flow cytometry is an emerging cell-analysis and screening technique employed in various fields of life-sciences, including drug discovery and clinical research. One of the major limitations of HT-FC is the lack of robust, rapid, and reproducible tools for data analysis and data mining. The current paradigm of FC analysis does not fit suit the HT format well. Traditionally, FC data are analyzed employing interactive exploratory visualization, which requires preparing a number of 2-D scatter plots that are used by an FC operator or researcher for visual evaluation of sample characteristics. Although the recent interest of computer science and bioinformatics communities in FC has spurred development of automated compensation and gating techniques, the proposed algorithms still follow the traditional analysis pathway (compensation plus gating), and typically attempt to mimic trained human operators in delineating various cell populations defined by the presence of fluorescent markers of varying intensities. Unfortunately, this model is not sustainable when hundreds or thousands of data sets must be processed in real time. This proposed research attempts to radically re-invent the FC data analysis pipeline for high-throughput FC by employing spectral classification approaches to FC data. In the proposed framework the FC data will be modeled as a mixture of signals that can be quantitatively recovered if certain physical and biological constraints describing the experimental system are rigorously followed. We propose a set of algorithms that will allow us first to define and encode the domain knowledge describing the analyzed specimens, subsequently to approximate the concentrations of labels, and from there recover information about the presence or absence of specific phenotypes of interest. The techniques employed will functionally replace two steps in FC data analysis that have traditionally been viewed as separate: compensation and gating. Instead, a new iterative spectral classification process will recover the quantitative characteristc of samples. This will allow for fast and automated extraction of sample features, as well as for mining the collected specimens for similar datasets. The proposed algorithm will be prototyped using R language for statistical computing, and relevant procedures will be made available to other researchers in the field of FC via the Bioconductor project. Upon successful testing and validation using various datasets contributed by collaborators, the classification algorithms will be implemented in PlateAnalyzer, an HT-FC data analysis package developed at Purdue University.
Flow cytometry (FC) is an important single-cell analysis tool employed in various clinical and research applications. The currently used FC data-analysis paradigm utilizes an exploratory, interactive model requiring operators to evaluate samples manually using expertise and experience. This project attempts to build an automated, robust, reproducible, and operator- independent data-analysis system that can be employed for FC data processing and data mining, limiting subjectivity and enhancing the value of FC techniques.