The rapid trend toward multi-core architectures promises faster execution of computer programs but poses significant difficulties for software development due to the lack of good programming models for exploiting the parallelism in such architectures. This situation is a significant opportunity for programming-language research to supply effective languages and tools for writing desktop applications while exploiting the performance of multi-core hardware. It is well known that functional-programming languages provide a good semantic base for concurrent and parallel programming, but for such languages to be successful, they need to provide competitive performance. The research focuses on the technical challenges in the efficient implementation of parallel functional languages. The characteristics of multi-core and many-core architectures demand that implementations preserve sequential semantics in parallel constructs, manage the granularity and scheduling of parallel threads, and be aware of the locality of data. The research explores a collection of techniques that combine static program analyses, compiler transformations, and dynamic runtime policies. Empirical analysis of both traditional parallel benchmarks and small applications is used to evaluate the effectiveness of the techniques developed by this research. By addressing performance concerns, the research will enable the practical use of parallel functional programming languages for a broad range of applications.
The rapid trend toward multicore architectures creates significant difficulties for software development, because of the lack of good programming models for exploiting the parallelism in such architectures. One of the main goals of this project is to make parallel programming accessible to a larger community of programmers. It is well known that functional programming languages provide a good semantic base for concurrent and parallel programming, but for such languages to be successful, they need to provide competitive performance. This research focuses on the technical challenges in the efficient implementation of parallel functional languages. The main vehicle for this research is the Manticore system, which consists of a parallel dialect of Standard ML, called Parallel ML (PML), and a supporting compiler and parallel runtime system. Results include the design and implementation of the PML language and a variety of compiler and runtime system techniques that improve the scalability of PML's parallel performance. The prototype implementation that has been developed as part of this project demonstrates sequential performance that is competitive with other functional language implementations and parallel performance that scales to 48 cores (the largest available test machine). The PML language design consists of a side-effect free sequential core extended with syntactically lightweight annotations to introduce implicitly-threaded fine-grain parallelism and with explicit message-passing-based concurrency for coarse-grain parallelism. This design allows applications to exploit parallelism at many different levels. The implicitly threaded mechanisms include parallel tuples, which provide for fork-join parallelism, parallel bindings, which provide a future-like mechanism, and support for nested data parallelism (NDP) over irregular nested arrays. In addition to these constructs, which have deterministic semantics, PML also provides a nondeterministic parallel case construct to support speculative parallelism. The project has developed a broad range of new implementation techniques to support the efficient and scalable execution of PML programs. A main focus of the research has been effective work-stealing techniques, which balance the workload across processors while minimizing overhead. Another result has been new optimistic-concurrency protocols for PML's message-passing primitives. The runtime system includes a NUMA-aware parallel garbage collector that provides scalable performance. The research has also explored compiler techniques for the efficient execution of NDP computations on shared-memory multiprocessors (previous research on NDP compilation targeted vector hardware) and techniques for optimizing the granularity of recursive divide-and-conquer parallel programs. Parallelism is becoming increasingly important for most applications of computation, but existing parallel programming techniques are not ready for mainstream use. This project is exploring an alternative path that has the potential to provide much higher-level and easier to use parallel languages. Such languages can make parallel hardware useful to a broad range of application areas and users. Over the course of this project, five students completed Master's degrees and three of them received Doctoral degrees. The project has also provided the opportunity for nine undergraduates to be involved in research.