Data science is rapidly evolving as an essential interdisciplinary field, where advances often result from combinations of ideas from various parts of mathematics, statistics, computer science, physical sciences, and engineering, as well as other disciplines. New types of (large) data have emerged, presenting unprecedented complexities and challenges that require a new way of thinking. The goal of this Research Training Group (RTG) project is to provide inherently interdisciplinary training to undergraduate and graduate students, as well as post-doctoral fellows, developing skills that transcend individual disciplines. Training students through research involvement in areas that combine statistical and computational modeling and inference will provide the intellectual foundation for a new generation of scientists poised to make novel breakthroughs in this exciting area, and contribute to the research and design efforts in the private and public sectors.
This project will catalyze original research on modeling and inference for large datasets. The program will involve three undergraduate students, six graduate students, and two postdoctoral fellows at any given time. The undergraduate component will be an integrated research experience, consisting of a seminar course, where students will learn and present material on key topics, and active participation in research projects, where these topics are put into practice. Each trainee will work with at least two RTG faculty members to ensure a truly interdisciplinary training experience. A new course in "data science" will be developed, for education of trainees across traditional department boundaries. Throughout the calendar year, the Research Training Group will sponsor advanced courses and research lectures for the benefit of the graduate and postdoctoral participants. The program builds upon the strengths and interactions of a dynamic group of faculty, with expertise in the general area of probabilistic models and methods for computational inference. Trainees will benefit from individualized mentoring activities and participation in structured research groups. The interdisciplinary nature of these activities will lead to a newly trained generation of researchers capable of generating new approaches and ideas. Resources and tools for cross-training will be developed and disseminated to the community. Exposure to modern aspects of data science and computing will enhance the professional development of the trainees. Cutting-edge research at the interface between statistics and computer science will be enhanced.