Parallel program decomposition is the process by which a single application is decomposed into multiple processes for execution on parallel hardware. For real programs, the execution speed of a good decomposition can exceed that of a poor decomposition by factors of ten or more. Unfortunately, optimal program decomposition is known to be an NP-hard problem, and efficient exact solutions are unlikely to exist. Hence, the key to finding automatic approaches to the program decomposition lies in discovering efficient algorithms that produce approximate solutions. A novel decomposition algorithm with nearly linear time complexity is explored. The algorithm initially decomposes the program into a sequence of small, parallelizable sections of code, known as partitionables. A locally optimal decomposition is selected for each partitionable based on information contained in an empirically derived database. Dynamic data redistribution is combined with a static technique called decomposition smoothing to produce a good global decomposition. The principal contributions of the research are: (1) to define new methods for program decomposition that go beyond simply minimizing communication and consider aggregate parallel execution performance, (2) to establish the extent to which fast decomposition strategies can produce acceptable solutions for practical applications, and (3) to collect and disseminate a database of performance results from kernels of scientific applications.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
9409736
Program Officer
Yechezkel Zalcstein
Project Start
Project End
Budget Start
1994-07-15
Budget End
1997-06-30
Support Year
Fiscal Year
1994
Total Cost
$98,220
Indirect Cost
Name
University of Texas Austin
Department
Type
DUNS #
City
Austin
State
TX
Country
United States
Zip Code
78712