This proposal tackles an urgent need for sensitive clinical outcome measures of autism spectrum disorder (ASD) by developing an objective, digital, multi-modal social communication metric using computational linguistics (e.g., acoustic features, turn-taking rates, word frequency metrics). Our automatic speech recognition and natural language analytics approach is designed to fix known weaknesses in traditional measurements by providing granular information in less time, with built-in scalability for characterizing very large samples. Since ASD is defined by observables, it is ripe for an automated approach to digitizing behavior (e.g., words, sounds, facial expressions, motor behaviors). This proposal piggybacks on a recently funded R01 that uses computer vision and machine learning to characterize nonverbal motor synchrony in teens with either ASD or another disorder in a brief social conversation (MH118327, PI: Schultz). Vocal components of the conversation are not studied in MH118327; thus, the richness of the verbal domain is left untapped. We hypothesize that automatically derived spoken language markers will significantly predict group and individual differences in social communication skill, and ? when fused with nonverbal features ? will lead to better prediction than either modality alone. Together, these two projects represent a rare chance to study all observable social signals emitted during social interaction in the same diverse sample of participants. If funded, this project will be the first to use short conversations and multi-modal data fusion to predict social communication skill and diagnostic group in a large, clinically diverse sample of individuals with ASD and other disorders. Our pilot studies showed that a relatively small set of vocal features from a six-minute interaction predicts diagnosis (ASD vs. typical development [TD]) with 84% accuracy. These machine learning analyses also predicted social communication skill dimensionally, providing a granular metric of individual differences. Combining this approach with nonverbal metrics (R01MH118327) using decision level data fusion resulted in significantly better ASD vs. TD prediction ? 91% accuracy. These pilot results are promising, but several gaps remain.
In Aim 1 of this proposal, we assess the specificity of our vocal social communication approach by including a non-ASD psychiatric control group in our machine learning classification models, in addition to ASD and TD groups (N=250/group).
In Aim 2, we clinically validate our transdiagnostic dimensional metric in a large, diverse sample of participants.
In Aim 3, we test whether novel, sophisticated multi-modal fusion methods that combine vocal and nonverbal social communication features result in improved individual and group prediction. This proposal lays critical groundwork for an automated, precision medicine approach to studying, diagnosing, and caring for individuals with ASD and other mental health conditions. Suc- cessful completion of this project will transform how we quantify human behavior for a broad array of applications that demand efficient, scalable, and reliable measurement (e.g., genetic association studies, clinical trials and standard clinical care), thus meeting multiple strategic priorities set by NIMH and NIDCD.
At the most basic level, social communication boils down to what you say and what you do while interacting with other people. Problems with social communication affect individuals with ASD throughout their lives, but expensive and time-consuming measurement tools have hindered the development of effective treatments for this core challenge. In this proposal, we aim to improve clinical characterization and diagnostic decision- making in ASD and other psychiatric conditions by digitizing human vocal interaction and using advanced machine learning techniques to fuse audio-video signals into an autism-specific algorithm, thus laying the groundwork for future precision medicine efforts and advancing the state of public health.