Nowadays, supercomputers play an essential role in high-performance computing. In general, modern supercomuputers are built as a cluster system, which is a system of multiple computers interconnected on a network. In coding a parallel program on such a cluster system, MPI (Message Passing Interface) is utilized. In this paper, we aim to reduce the execution time of MPI_Allreduce, a frequently used MPI collective communication in many simulation codes. To this end, we have integrated network programmability by Software Defined Networking into MPI_Allreduce so that it effectively uses the bandwidth of the interconnect of the cluster system. An experiment conducted on a cluster system with fat-tree interconnect indicates that our proposed MPI_Allreduce is superior to MPI_Allreduce in OpenMPI implementations.
Research papers (proceedings of international meetings)