2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) | 2019

Fuzzy Matching: Hardware Accelerated MPI Communication Middleware

 
 
 
 
 
 

Abstract


Contemporary parallel scientific codes often rely on message passing for inter-process communication. However, inefficient coding practices or multithreading (e.g., via MPI_THREAD_MULTIPLE) can severely stress the underlying message processing infrastructure, resulting in potentially un-acceptable impacts on application performance. In this article, we propose and evaluate a novel method for addressing this issue: Fuzzy Matching . This approach has two components. First, it exploits the fact most server-class CPUs include vector operations to parallelize message matching. Second, based on a survey of point-to-point communication patterns in representative scientific applications, the method further increases parallelization by allowing matches based on partial truth , i.e., by identifying probable rather than exact matches. We evaluate the impact of this approach on memory usage and performance on Knight s Landing and Skylake processors. At scale (262,144 Intel Xeon Phi cores), the method shows up to 1.13 GiB of memory savings per node in the MPI library, and improvement in matching time of 95.9%; smaller-scale runs show run-time improvements of up to 31.0% for full applications, and up to 6.1% for optimized proxy applications.

Volume None
Pages 210-220
DOI 10.1109/CCGRID.2019.00035
Language English
Journal 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Full Text