Probability density for achievable performance (GFlop/s) using 1024 samples with different tiling and problem size. With eDRAM (DRAM = dynamic random-access memory), the function curve as a whole shifts to the upper right, implying that more samples can reach near-peak (for example, 90 percent) performance. In other words, having eDRAM increases the chance for less-optimized applications to reach “vendor-claimed” performance. However, the right boundary only moves a bit, indicating that eDRAM cannot significantly improve the raw peak performance. Credit: US Department of Energy High-bandwidth memory can improve a computer's performance. On-package memory (OPM) is a popular option in many commercial systems. Before this effort, little was known about OPM's implications on speed and power use. The team experimentally characterized and analyzed modern OPM storage. They provided guidelines on tuning the memory to speed up high-performance computing (HPC) applications.
This study about OPMs is both essential and fundamental for advancing computing systems. For example, it motivates software-architecture co-design exploration. Further, it validates models and simulations. It also has resulted in general optimization guidelines. The work shows how to tune applications and architectures for the best performance on platforms with certain OPMs.
The researchers conducted a thorough experimental evaluation to discern how modern OPMs affected the performance and power efficiency of important HPC scientific kernels, which compose a computer's core operating system. They examined different tuning modes of OPM and how they influenced application tuning for the best system performance. The team from Pacific Northwest National Laboratory, University of Copenhagen, and Virginia Tech evaluated diverse HPC kernels on two Intel OPMs, eDRAM on multicore Broadwell and MCDRAM on manycore Knights Landing, with a large set of their representative input matrices (for example, 968 matrices for sparse kernels). This study allowed the team to derive an intuitive visual analytical model to better explain complex architectural scenarios, as well as provide general guidelines for future architecture optimizations and efficiency tuning.
Explore further: Roofline model boosts manycore code optimization efforts
More information: Ang Li et al. Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 (2017). DOI: 10.1145/3126908.3126931