This project will develop a framework to represent, analyze and interpret shapes extracted from images, supporting a wide range of biological investigations. The primary objectives are: (1) to develop a mathematical framework and computational tools for the quantification and analysis of shapes; (2) to integrate these computational models with machine learning and statistical inference methods to enable new discoveries, transforming imaging data into biological knowledge; (3) to deliver novel quantitative methodologies for shape analysis that start from a biological premise, rather than a purely geometric one. The aim is thus not only to quantitatively describe shape, but to develop methods for linking morphological variation to its underlying biological causes. To ensure that the project focuses on methods that are most promising to biology with significant breadth of application, model and tool development will be guided and supported by a set of diverse case studies, ranging from the sub-cellular to organismal scales.
Shape represents a complex and rich source of biological information that is fundamentally linked to underlying mechanisms and function. However, shape is still often examined on a qualitative basis in many disciplines in biology, an approach that is time consuming and prone to human subjectivity. While ad hoc quantitative methods do exist, they are often inaccessible to non-experts and do not easily generalize to a wide variety of problems. The inability of biologists to systematically link shape to genetics, development, environment, function and evolution often precludes advances in biological research spanning diverse spatial and temporal scales, from the movement of molecules within a cell to adaptive changes in organismal morphology. The primary goal of this project is to develop a new suite of widely applicable quantitative methods and tools into the study of biological shape to address the significant need for consistent and repeatable analysis of shape data.
Shape is a fundamental characteristic of all biological organisms and a basis for understanding function, development, and evolution. The lack of a coherent framework for quantifying and analyzing the diversity of biological shapes, however, has prevented the objective testing of many hypotheses that rely on morphological data. Our collaborative research sought to develop methods that quantify shape and extract biologically meaningful information on morphological variation, using tools adapted from computer vision, machine learning, and applied mathematics. We sought to systematically link shape to genetics, development, function, environment, and evolution, and apply our methods to a diverse set of biological problems, from understanding how environment influences embryonic development to interpreting adaptive responses and radiations in the paleontological record. Fossil pollen was one biological case study. Pollen comes in a range of shapes and textures. This morphologic diversity relates to the taxonomic diversity and evolutionary history of the plants from which the pollen originates. This diversity allows pollen to stand in as evidence of a plant on the paleontological landscape, and to document long-term changes in plant communities. However, the visual identification of fossil pollen is a highly subjective endeavor. Classifications are restricted to the taxonomic level (order, family, genus) on which experts can agree, often resulting in the loss of whatever data that may have been preserved about fossil species – and with it, a large amount of paleoecological and paleoenvironmental information. We began with the premise that computers could do what humans could not – consistently identify small differences in pollen shape and texture. We next developed algorithms that would quantify the differences. Some of these algorithms adapted traditional machine learning/computer vision approaches – where an image was described using characteristics that would not be recognized by a human analyst, e.g. pixel intensity. Some algorithms attempted to describe shape in biologically meaningful ways – texture or other visual semantic. We applied these approaches to a diverse set of modern and fossil pollen examples: modern grasses, Ice Age spruce, and 20-million year old Venezuelan pollen. We found that our numeric and computational approaches could match or exceed the human analyst in accuracy and consistency. These results lay the groundwork for a new approach to pollen analysis – one that uses machine learning and automation for large-scale analyses of fossil records, leading to more comprehensive, more consistent, and potentially more informative censuses of past vegetation. This has ramifications for a diverse set of fields and industries that rely on pollen analysis, from the honey and petroleum industries to research on plant evolution and paleoclimate to applications in forensics. The project, initiated through the Innovations in Biological Imaging and Visualization Ideas Lab, served as a model for bring together an extremely diverse group of scientists. A large number of students and trainees were able to participate, including 18 undergraduates, 4 Masters and PhD students, and one postdoctoral associate at the University of Illinois alone. The Illinois group contributed to 25 conference presentations and 10 peer-reviewed papers. To ensure that our work would contribute to the community past the life of this grant, we collaborated in the development a prototype pollen image database that could be adopted by the community, published our images in public repositories and released our work as open source software.