This project seeks to develop runtime system support for improving the performance and scalability of a wide verity of data-intensive systems written in managed, object-oriented languages. It is clear that Big Data analytics has become a key component of modern computing. Popular data processing frameworks such as Hadoop, Spark, Naiad, or Hyracks are all developed in managed languages, such as Java, C#, or Scala, primarily due to the fast development cycles enabled by these languages and their abundance of library suites and community support. However, a great deal of evidence shows that memory management in Big Data systems is prohibitively expensive, severely damaging system performance. This project develops a series of runtime techniques that can automatically reduce the temporal and spatial costs of the managed runtime, allowing Big Data developers to fully enjoy the simplicity of managed languages without having to pay the performance price.
Modern life is relying increasingly on Big Data analytics designed to support many concurrent users and quickly answer their queries. Behind visible services are data-intensive computing systems that need to quickly find useful information from a sea of data records, and therefore, their performance is critically important to our daily lives. This project provides an immediate performance benefit for such data-intensive systems, leading to improved quality, usability, and user satisfaction. The educational component of this project includes creation of new course materials, recruitment of undergraduate students and students from under-represented groups, and education of local programmers on how to develop highly-efficient Big Data applications.