MLulticast communication is an essential feature in parallel computers. It benefits several parallel applications and is useful for shared memory, distributed memory, and distributed-shared memory systems. Multicasting also helps in supporting a class of communication patterns known as collective operations. Multicasting can be supported by either hardware or software approaches. However, the hardware approach is significantly faster than the software approach. Most vendors of parallel computers are either unaware of it or have made unsuccessful attempts at implementing deadlock-free hardware multicasting schemes. This project is concerned with efficient hardware multicast routing techniques for meshes and switch-based parallel computers. Several implementation details, such as start-up latency, router delays, and multiple multicasts, will be addressed in this project. The primary objectives of the research will include the minimization of the total communication latency by reducing the number of start-up steps required for multicasting in meshes. For switched-based systems, this project will examine alternative switch designs to support efficient deadlock-free multicasting. Finally, an effort will be made to design low-cost routers that support multicast operation. Two primary constraints will be observed during this research. First, the cost and complexity of the algorithms will be minimized. Second, the PI will focus on techniques that not only support deadlock-free multicast operations but also enhance the performance of node-to-node unicast communications.