To apply machine learning to problems in the physical world, one needs models/algorithms that are faithful to physics. We consider understanding how the anatomical structure of the body and ears leads to the remarkable ability to localize a sound source in a complex and noisy environment that is innate in most animals and humans. The cues used in localization arise from the process of the acoustic wave scattering off the complex-shaped listener's body and ears. Numerically, these changes in the sound spectrum are characterized by the head-related transfer function (HRTF). Every person's body is unique, and the HRTF is highly individual. It is possible to measure the HRTF; however, the measurement requires specialized hardware and is tedious. There has been considerable interest in convenient methods to obtaining the HRTF. We propose to develop a framework to perform machine learning to establish a relationship between the anatomy and HRTF. An HRTF database with 100 subjects, along with their anthropometric measurements, is available. A novel LMA (Learning of Multiple Attributes) algorithm will be developed. The key properties of this algorithm are that it can incorporate physical constraints into the learning and predict complex structured outputs in continuous spaces. The algorithm will find the low-dimensional manifold in high-dimensional HRTF space and to map the manifold structure to anatomical parameters.
The research will create novel machine learning algorithms that are able to incorporate physics based constraints, and these will find application in other problems. HRTF generation from simple body measurements will allow introduction of personalized spatial audio into fields such as human-computer interaction, consumer electronics, auditory assistive devices for the vision-impaired, robotics, entertainment, education, and surveillance. Training of K-16 and graduate students in the proposed research will add to the nations talent pool.
The human (and mammalian) ability to perceive the 3D location of a sound source using just their two ears is remarkable. One of the key mechanisms that explain this ability is the able to extract cues from the scattering of sound off the person's own body and external ears. This scattering is characterized by a head-related transfer function (HRTF), and shows considerable inter-personal variability. A deep understanding of the HRTF would allow better audio displays to be created for the visually impaired, and would revolutionize how audio is presented over headphones. This project applied the techniques of statistical machine learning to analyze previously collected HRTF data. Research contributions included the development of a methodology to study HRTFs using Gaussian Process Regression. Gaussian Process Regression is a computationally intensive algorithm. Fast algorithms that exploited the "near-grid" character of HRTF data were developed. These algorithms were generalized and published as a general technique for fast GPR in a machine learning conference. A fast method for interpolating HRTF data was developed, and shown to outperform other algorithms which had been proposed for such interpolation. A method to fuse HRTF data for the same subject, but collected at different laboratories was developed. Techniques for deep learning were extended to HRTF data. Graduate student Yuancheng Luo, a U.S. citizen, was supported via this grant. Several peer reviewerd papers were published.