Recent work in automated image classification has demonstrated that in several cases, a supervised machine learning approach can equal or even surpass image classification by human experts. Although it may seem dubious that machines can surpass a humans pattern recognition skills, it is important to realize that modern imaging systems can provide a machine classifier with a far more precise set of data than the eye can provide the human brain. Modern imaging systems far exceed the human eye in spatial and spectral resolution as well as in dynamic range. So, even though the human brain is a superior classifier, it is presented with inferior data, and must therefore compensate for it and fill in the gaps. We know that this compensation occurs because it is the basis for many optical illusions. Automated image classification can be divided into two approaches: model-based and model-free. In traditional model based systems, a model of what is being imaged is manually constructed, and used as the basis for classification or for reporting quantitative information. Model-free systems make no assumptions of the underlying model and attempt to build a classifier based only on information in the images and a pre-classified training set. Model-based systems have the advantage that it is known precisely what the machine is looking at and it is possible to guide the machine to look at what is judged to be important, while ignoring artifacts and noise. The chief disadvantage is that what is judged to be important is subject to bias, and is an attempt to anthropomorphise the machine to see in much the same way that a human does--something that is not always appropriate or possible. The second disadvantage is that these machine vision approaches are highly specific to what is being imaged and can seldom be generalized to classify or measure something completely different. Their chief advantage is that model-based systems offer a direct and intuitive way to gather quantitative information on the image content. Model-free classification treats all images equivalently, and performs exactly the same process whether building a classifier for grades of melanoma, or sub-cellular organelles, pollen grains, etc. The approach we are taking to develop classifiers is based on reducing each image to a vector of 'signatures'. Each signature is a numeric value produced by an algorithm sensitive to a specific type of image content, and can be thought of as a sensor for a specific image characteristic (various textures, intensity statistics, distribution of objects, etc). A large collection of signatures (>800 in our case) ensures that there is a sufficient variety of sensors available for many kinds of images. In this way, each image is represented as a point in 800-dimensional space. This presents a problem in that this space will be very sparsely populated for any collection of images, and all images within it will be essentially equidistant, making it impossible to distinguish clusters or classes. For this reason we produce our classifiers in an iterative process that eliminates all signatures with weak classification power, thus reducing the dimensionality of this space in a systematic automated way. The algorithm usually converges to one or two dimensions more than the number of classes in the training set. The product of this training is a Naive Bayesian network capable of classifying images that were not part of the original training set. One of the strengths of using Bayesian networks is that the result of classifying an image is a probability distribution of it belonging to all of the classes in the training set. This probability distribution provides a quality of fit, which is an important metric for rejecting unclassifiable images. We have tested the generality of this classification approach on 10 different imaging problems thus far and were able to classify images reliably in all cases. With these results in hand we are ready to apply this new type of assay system to three problems in biology that are in desperate need of robust automated image analysis algorithms: Effects of aging on tissue morphology, high content screening assays, and automated diagnostics. To study the effects of aging on tissue, we are collaborating with Catherine Wolkow of the LNS (NIA-IRP) to conduct a longitudinal study of aging in the worm C. elegans. The central question of this collaboration is: In a population of genetically identical worms living in close proximity, what is it that makes the individual worms age at different rates and die at different times? We plan to use our classification tools to assess structural changes in several tissues including the pharynx, gonad, and several sections of the gut, and correlate these with functional assays such as pharynx pumping rate and motility, which are both known predictors of lifespan. We have also recently shown that pharynx pumping rate is correlated with morphology as assayed by manual scoring. The effects of aging and diet on mouse tissue morphology will be studied in collaboration with Kevin Becker using tissue arrays from calorically restricted and normally fed mice prepared for the AGEMAP project. We plan to use our classifiers to detect differences in morphology in different tissues as a consequence of aging and diet, as well as measure aging rates of different tissues and the influence of diet on these rates. With Dr. Mark Eckley, we plan to establish a high-density high-throughput RNAi screening platform using cells grown on microscope slides printed with double-stranded RNA. We have preliminary results indicating that our classifiers are able to detect cells accumulating at different stages of the cell cycle. We plan to look for genes whose inactivation leads to accumulation of cells at specific stanges of the cell cycle as well as the effect of gene knock down on the morphology of sub-cellular organelles including the nucleus, cytoskeleton and mitochondria. Dr. Nikita Orlov and Tomasz Macura will extend our classifiers to report not only a qualitative """"""""class"""""""", but also a quantitative measure of similarity. We plan on using these similarity measures to determine an absolute morphological age in our worm and tissue array experiments, as well as using this similarity measure to sub-classify morphologies within our high-throughput RNAi screens. We have a preliminary result that our classifier can report a score that can be used to intrapolate an intermediate age when the classifier is trained with images of very young and very old worms. Tomasz Macura, a student in the Cambridge-NIH Graduate Partnerships Program is in a collaboration with Nicholas Screaton (Papworth Hospital NHS Trust) on a study entitled """"""""Characterisation of interstitial lung disease and pulmonary hypertension using density and texture based analysis of computed tomography and histological data."""""""" This is a retrospective study of patients with pathologically proven interstitial lung disease that were imaged with Computed Tomography (CT). An initial learning set of 20-30 cases will be evaluated both subjectively by experienced cardiothoracic radiologists and using the classifier system developed for image-based assays. A test set of 20-30 cases will then be evaluated to assess the sensitivity, specificity and accuracy of the classifier.
Shamir, Lior; Ling, Shari; Rahimi, Salim et al. (2009) Biometric identification using knee X-rays. Int J Biom 1:365-370 |
Shamir, Lior; Ling, Shari M; Scott Jr, William W et al. (2009) Knee x-ray image analysis method for automated detection of osteoarthritis. IEEE Trans Biomed Eng 56:407-15 |
Orlov, Nikita; Shamir, Lior; Macura, Tomasz et al. (2008) WND-CHARM: Multi-purpose image classification using compound image transforms. Pattern Recognit Lett 29:1684-1693 |
Shamir, Lior; Orlov, Nikita; Eckley, D Mark et al. (2008) Wndchrm - an open source utility for biological image analysis. Source Code Biol Med 3:13 |
Shamir, Lior; Orlov, Nikita; Mark Eckley, David et al. (2008) IICBU 2008: a proposed benchmark suite for biological image analysis. Med Biol Eng Comput 46:943-7 |
Yoshikawa, Toshiyuki; Piao, Yulan; Zhong, Jinhui et al. (2006) High-throughput screen for genes predominantly expressed in the ICM of mouse blastocysts by whole mount in situ hybridization. Gene Expr Patterns 6:213-24 |
Chow, David K; Glenn, Charles F; Johnston, Josiah L et al. (2006) Sarcopenia in the Caenorhabditis elegans pharynx correlates with muscle contraction rate over lifespan. Exp Gerontol 41:252-60 |
Glenn, Charles F; Chow, David K; David, Lawrence et al. (2004) Behavioral deficits during early stages of aging in Caenorhabditis elegans result from locomotory deficits possibly linked to muscle frailty. J Gerontol A Biol Sci Med Sci 59:1251-60 |