The goal of this research is to make parallel computing simple and efficient on general-purpose, commodity processors and networks. Recent work has focused on algorithms for the efficient implementation of software distributed shared memory (DSM) on low-cost, high-performance workstation clusters. This work has shown that software DSM can achieve performance close to a hardware implementation of shared memory on a variety of scientific applications that synchronize with moderate frequency. The objective is to improve this work in two ways; (1) to achieve the efficient execution of applications that synchronize with greater frequency and, (2) to support languages ordinarily used to solve symbolic problems. Specifically, the following approaches are evaluated: (a) run-time support for programming high-level synchronization operations, such as task queues, that reduces the amount of communication by the DSM system, (b) compiler support to reduce the amount and/or hide the latency of communication by the DSM system, and (c) garbage collection algorithms integrated with the DSM system to enable the efficient execution of advanced programming languages.