Over 30,000 people are killed in motor vehicle crashes on US roadways every year. Driver distraction from secondary in-vehicle activities, particularly among young drivers, has emerged as a major cause of motor vehicle crashes. A substantial amount of research has been focused on analyzing small sets of naturalistic driving or simulated data to study a small number of features for detecting the driver's engagement. However, many meaningful dependencies and patterns can only be discovered by large collections of data. In this project, the goal is leveraging the two petabytes federal database of naturalistic driving data to develop predictive analytics for detecting a driver's disengagement from the driving tasks in order to provide alerts to drivers and reduce the risk of motor vehicle crashes.
In this project, data pre-processing techniques are investigated for the large volume of heterogeneous data with over 100 variables in Strategic Highway Research Program 2 (SHRP 2). Two scalable predictive analytics algorithm families based on instance-based learning and heterogeneous network mining for predictive modeling are developed. In addition, a novel distributed computing infrastructure to support the scalable predictive analytics in performing pattern mining of driving behavior analysis, modeling, and prediction are developed. The research outcomes of this project shed a significant amount insight into current work of injury prevention due to motor vehicle crashes. The project extends the capability of machine learning, sensor informatics, and driving behavior analytics. The integrated education plan includes incorporating the research findings in courses offered at the Master of Science program in Health Informatics. The outreach plan involves organizing workshops, conferences, and seminars to disseminate the research outcomes.