The goal of this research is to develop scalable, automated methods to design high-performance embedded systems - particularly those with real-time latency and throughput requirements. In such systems, computations flow through heterogeneous interacting components, which are managed by local resource-schedulers and drivers. Hence, a key design problem is that of balancing and tuning the local resource-managers, so that they collaboratively meet the global end-to-end performance objectives. The proposed approach involves parametrically transforming the end-to-end constraints into local (per-component) constraints, and then solving them for optimal on-line performance. Specifically, the method works by (1) extracting performance models for key stages along the all data paths; (2) hierarchically grouping the per- component models into larger system-level prototypes; and (3) using the system-level models to help calibrate the local resource management policies. Step (3) is called "design synthesis." To carry it out, the proposed method relies on off-line algorithms to synthesize optimal CPU/IO bandwidth allocations for all pipeline stages. After synthesis, the analytic estimates are checked via simulation. Then the system is integrated, and rechecked via on- line benchmarking. Two reference applications are proposed to validate this framework: (1) A software realization of a large- scale radar signal processor, to be run on several workstation- clusters; and (2) a video server system connected via 100Mbs partitioned Ethernet.