The mechanisms living systems use to establish and maintain complex 3-dimensional shapes during embryonic development are poorly understood even though molecular and cell biologists have generated mountains of data about genes and their effects on organisms. Fundamental advances in controlling biological form are stymied by the difficulty of obtaining shape information through the analysis of gene networks such that it is currently difficult or impossible for scientists to generate testable models of shape based on experimental results from current biological research. These investigators will apply state-of-the-art computational science and artificial intelligence to create a novel suite of computational tools that will fundamentally integrate numerous areas of biology and engineering to promote research into the mechanisms used by organisms for establishing and maintaining their 3-dimensional shape. This "Bioinformatics of Shape" project will integrate experimental data, a new mathematical language, a system for storing and mining data, a modeling environment within which rule sets for regulatory mechanisms can be simulated on computers, and an artificial intelligence module that will help scientists discover and test novel ideas about how shape is generated through genetics. The benefits to society of this new kind of collaboration between computer scientists and biologists include the translation of molecular and cell biological data into a new level of understanding that could have implications for regenerative medicine, adaptive and self-repairing devices for robotics and other engineering applications. The work will provide unique training opportunities for students, establish a proof-of-principle for new educational tools at the boundary between artificial intelligence and biology, and facilitate data to knowledge production in a number of fields, such as developmental biology, evolutionary biology, and the engineering of complex adaptive systems.
An extraordinary effort is currently under way to understand the regenerative abilities displayed by certain organisms and replicate them in biomedical regenerative applications. Salamanders have the capacity to regenerate a fully functional limb after its complete amputation, as well as partially regenerate their brain, heart, and tail. More remarkably, planarian worms, despite having a complex anatomy including muscles, stomach, brain, eyes, and nerve cords, can regenerate their complete bodies from almost any possible amputated body piece. Indeed, their outstanding regenerative capacity makes them to be considered practically immortal. In order to decipher the mechanisms controlling these outstanding abilities of planarian worms, thousands of molecular perturbation experiments are being conducted, including genetic knock downs, pharmacological treatments and surgical manipulations. For example, knocking down certain genes or applying drugs that block the communication between cells can induce an individual worm to regenerate two, three, and even more heads! Despite these impressive molecular techniques to perturb and analyze the regeneration process in these animals, there is no comprehensive mechanistic model available that can explain more than one or two features of this extraordinary regeneration ability. Manually inferring the mechanisms that control the spatial and temporal morphological patterns resulting from these experiments is a very challenging task due to the complex, non-linear interactions of the biological regulatory networks characteristic of development and regeneration. Furthermore, there is lack of centralized repositories to unambiguously describe and systematically analyze all this huge knowledge, as well as proper modeling, simulating, and mining tools to formulate, test, and discover mechanistic model hypothesis. In this project, we have created a new bioinformatics methodology to unambiguously describe and centralize the experimental knowledge on regeneration and automate the reverse engineering of mechanistic models that can explain the regenerative capacity of planarian worms. We have created mathematical formalisms to unambiguously describe any perturbation experiment, including genetic knock-downs, pharmacological treatments, and surgical manipulations, and their resultant morphological outcomes. Using this formalism, we have curated a freely-available database of planarian regeneration comprising more than a thousand different experiments published in the scientific literature. In addition, we have created a user-friendly mining tool to access, extend, and search for experiments on this database. We have also designed a mathematical framework for mechanistic models of regulatory networks of planarian regeneration and developed a novel high performance molecular simulator of tissue regeneration, capable of simulating the same regenerative experiments stored in the database. In this way, the simulator can automatically evaluate the predictive power of a given model, and confirm the validity of such a model with respect to the experimental results obtained with the real worm. Formulating mathematical models exhibiting the specific dynamics matching the patterning outcomes found in the experiments is a very complicated undertaking. In consequence, we have designed and implemented a novel machine-learning algorithm to automate the discovery of such models. The algorithm works according to evolutionary principles. Starting with a population of random regulatory networks, the algorithm randomly combines and changes the networks, discarding those with the worst predictive ability. This process is done cyclically and, similar to evolution in nature, better and better regulatory networks evolve in the computer. Using this method during thousands of generations, the computer can obtain a regulatory network that can correctly predict all the experiments formally described in the input dataset. We implemented this novel computational methodology in a high-performance supercomputer, and applied it to the formalized database of planarian experiments that we curated from the published literature. As a result, the computer automatically discovered the most comprehensive regulatory network model of planarian regeneration to date. This model can predict all the relevant experiments of head-versus-tail planarian regeneration, such as surgical amputations and the regeneration of incorrect morphologies after genetic knock downs and pharmacological treatments. Furthermore, the model is completely mechanistic and can explain the dynamical behavior and interactions of the specific biological products required in the regulation of regeneration in planaria. The completeness and mathematical nature of the model allows us to generate testable predictions, which we have employed to reveal previously unknown pathways and genes key for the regeneration abilities of the planarian worm. More broadly, this project has demonstrated for the first time the use of an artificial intelligence approach for the automated discovery of biological models from experimental morphological phenotypes. This serves as a crucial proof of principle of an automated approach to facilitate the overcoming of the fundamental problems holding back the fields of developmental, regenerative, and cancer biology. In summary, the novel bioinformatics system that we have developed in this project has revolutionized the way we store, share, and analyze biological experimental data and, crucially, how we create and validate new mechanistic hypothesis and biomedical applications using an automated artificial intelligence system.