Darwin's theory of evolution explained that millions of species are related, and dealt biologists and paleontologists the enormous challenge of discovering the branching pattern of the Tree of Life. Work on this great challenge is producing a map of species-relatedness through Earth history, and answering questions such as "what is the closest relative of a species?" and "what species make important products?" To do this scientists draw on all heritable features, including genotypes (e.g., DNA) and phenotypes (e.g., anatomy). Studying phenotypes, however, has remained complicated and slow, because it has not been revolutionized by computer science and engineering innovations.
A team of biologists, computer scientists, and paleontologists will extend and adapt methods from computer vision, machine learning, crowd-sourcing and natural language processing to enable rapid and automated study of phenotypes for the Tree of Life on a vast scale. The three-year goal is to release large phenomic datasets built using new methods, and to provide the public and scientific community with tools for future work. Planned is the training of teachers and students (kindergarten - postdoctoral levels) and the engagement of "citizen scientists." Enormous phenomic datasets, many with images, will fill an important public interest in biodiversity and the fossil record.