This project, developing a new instrument to enable an accurate quantitative analysis of the movement of animals and vocal expressions in real world scenes, aims to facilitate innovative research in the study of animal behavior and neuroscience in complex realistic environments. While much progress has been made investigating brain mechanisms of behavior, these have been limited primarily to studying individual subjects in relatively simple settings. For many social species, including humans, understanding neurobiological processes within the confines of these more complex environments is critical because their brains have evolved to perceive and evaluate signals within a social context. Indeed, today's advances in video capture hardware and storage and in algorithms in computer vision and network science make this facilitation with animals possible. Past work has relied on subjective and time-consuming observations from video streams, which suffer from imprecision, low dimensionality, and the limitations of the expert analyst's sensory discriminability. This instrument will not only automate the process of detecting behaviors but also provide an exact numeric characterization in time and space for each individual in the social group. While not explicitly part of the instrument, the quantitative description provided by our system will allow the ability to correlate social context with neural measurements, a task that may only be accomplished when sufficient spatiotemporal precision has been achieved.
The instrument enables research in the behavioral and neural sciences and development of novel algorithms in computer vision and network theory. In the behavioral sciences, the instrumentation allows the generation of network models of social behavior in small groups of animals or humans that can be used to ask questions that can range from how the dynamics of the networks influence sexual selection, reproductive success, and even health messaging to how vocal decision making in individuals gives rise to social dominance hierarchies. In the neural sciences, the precise spatio-temporal information the system would provide can be used to evaluate the neural bases of sensory processing and behavioral decision under precisely defined social contexts. Sensory responses to a given vocal stimulus, for example, can be evaluated by the context in which the animal heard the stimulus and both his and the sender's prior behavioral history in the group. In computer vision, we propose novel approaches for the calibration of multiple cameras "in the wild", the combination of appearance and geometry for the extraction of exact 3D pose and body parts from video, the learning of attentional focus among animals in a group, and the estimation of sound source and the classification of vocalizations. New approaches will be used on hierarchical discovery of behaviors in graphs, the incorporation of interactions beyond the pairwise level with simplicial complices, and a novel theory of graph dynamics for the temporal evolution of social behavior. The instrumentation benefits behavioral and neural scientists. Therefore, the code and algorithms developed will be open-source so that the scientific community can extend them based on the application. The proposed work also impacts computer vision and network science because the fundamental algorithms designed should advance the state of the art. For performance evaluation of other computer vision algorithms, established datasets will be employed.