The use of automated microscopy is increasing rapidly in academic biological research (as well as pharmaceutical and biotechnology industries). Image analysis software can extract information-rich cellular image data from microscopy images and eliminate tedious visual inspection. However, a significant challenge has been the lack of metrics and user-friendly software tools to explore the data and identify low-quality images that limit an experiment's value. This project will create tools to address these challenges and test them in the context of two biological experiments. This project will develop and characterize metrics of image quality and the final software will be distributed via the CellProfiler project as a validated, versatile, open-source toolbox of algorithms and metrics readily usable by biologists.

The research will improve data quality in a wide variety of biological experiments addressing important questions in basic science. Furthermore, the education and outreach efforts will increase the number of scientists trained in image analysis at the interface of biology and computer science, will increase the number of students interested in science, and will broaden the participation of people from under-represented minority groups in science, especially at the highest levels of achievement.

Project Report

High-content imaging (HCI), the use of automated microscopy to collect measurements from cellular images on a large scale, is a fairly recent innovation in biological research. This approach has proven to be invaluable in uncovering the processes that maintain both normal and pathological biological function. Hundreds of observable features (i.e., the cellular phenotype) can be collected quickly in an automated and robust fashion, permitting quantification of cellular changes arising from chemical or genetic treatments. However, the statistical robustness of this approach is only as good as the quality of the image data used as input. This concern affects HCI in general, but it is especially significant in two areas: (1) in signature-based experiments, where fluorescent probes are used to quantify a broad spectrum of subtle morphological responses in each sample; and (2) in time-lapse imaging, where dynamic cellular behavior is observed over time. In both instances, low-quality data can irretrievably corrupt the downstream analysis: even minor artifacts can overwhelm the fine details needed to detect differences in the former case, and degrade the ability to track a cell accurately in the latter. This Research Initiation Grant (RIG) focused on the need to develop specialized metrics to detect problematic images, and provide them to researchers in the form of readily-accessible, easy-to-use software. During the duration of the RIG, a suite of metrics were implemented and evaluated for the purpose of HCI quality assessment. The measures were tailored towards the most common image aberrations, out-of-focus images and saturated debris, and ranged from evaluating image intensity statistics, to measuring pixel-to-pixel correlations, among others. The assessment was performed on a test set of synthetic HCI images (to better control and examine the impact of cell count) and actual HCI image data. A metric summarizing the image power spectrum scored the highest for out-of-focus detection, combining robust performance across cell counts with low variance, whereas the percentage of pixels at the maximal intensity value performed well for saturation artifacts. These metrics were incorporated into a signature extraction experiment, with a resultant improvement in the downstream data analysis. These measures are now included as part of the CellProfiler biological image analysis and CellProfiler Analyst data exploration packages; both are freely-available, open-source programs widely used in the biological community, enabling broad dissemination of the findings of this RIG aim. The results were also published in the Journal of Biomolecular Screening, which is widely read by the HCI community. Moreover, these findings also constitute important sections of a chapter contributed to the Assay Guidance Manual, a public resource published by the NIH, as well as an upcoming chapter in Methods of Molecular Biology, which publishes step-by-step protocols for wider use. For the time-lapse research aim, a new set of metrics was developed for identifying problems in cell tracking. By treating the tracked cell trajectory as a network graph, heuristics from graph theory could be employed to detect tracking aberrations. An entirely new software tool, CellProfiler Tracer, was developed and implemented to provide these metrics to biologists, and is included as part of CellProfiler Analyst for broader dissemination and use. The Tracer package presents the user with a set of interactive plots familiar to those working in the time-lapse domain, and allows for selection of individual points, viewing the associated cells in their temporal context, color-coding the display based on cellular measurements, and removal of points based on quality criteria. The tool was debuted at MICCAI 2014, a prominent conference for those in the medical computing domain, and will be soon be submitted for open-access publication to the journal Bioinformatics. As part of the RIG, the PI also presented his work in order to promote teaching and learning. He has led 16 workshops in the use of image analysis software (totaling over 300 attendees), both at his home institution and abroad. In particular, a workshop at Emory University led to an invitation to present at the BioQUEST annual meeting, which is highly attended by science teachers at the high-school and undergraduate levels. His work in HCI also led to being featured in a Boston Globe article, "Why these employees love their jobs." He has also served on career panels for high-school students and undergraduate interns. In particular, he has served as judge for the New England Science Symposium (NESS) and the Annual Biomedical Research Conference for Minority Students (ABRCMS), as well as on a selection committee for the Broad’s Summer Research Program in Genomics. All of these activities aim to strengthen the pipeline of underrepresented minorities in STEM at higher levels of education.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
1119830
Program Officer
Anne Maglia
Project Start
Project End
Budget Start
2011-12-15
Budget End
2014-06-30
Support Year
Fiscal Year
2011
Total Cost
$195,574
Indirect Cost
Name
Broad Institute, Inc.
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02142