The systolic array is a paradigm for concurrent computation that is suited to VLSI technology. Many algorithms (e.g., in digital signal processing and scientific computation) are perfect candidates for implementation on systolic arrays. We investigate fundamental systolic algorithms so that they can be realized on systolic arrays that are provably time-minimal, and processor-minimal. That is, we design arrays that operate systolically, fully exploit all the parallelism in the algorithm, and do so using as few processors as possible. The schedules for this kind of optimality often are nonlinear, and this is a significant departure from the extant research on optimal affine schedules for systolic algorithms. This research tells us what is the very best that we can accomplish with systolic realizations of fundamental algorithms. The investigation of a corresponding systolic programming system is also underway; software tools are critical to the realization of systolic arrays' potential. The systolic programming system is intended to be a natural and high-level vehicle for conducting research into systolic algorithms, and for their optimal realization on systolic arrays.