2021 Design, Automation & Test in Europe Conference & Exhibition (DATE) | 2021

CMRC: Comprehensive Microarchitectural Register Coalescing for GPGPUs

 
 

Abstract


Graphics processing units (GPUs) deploy a large register file (RF) to achieve high compute throughput. This RF, however, consumes a large portion of the total dynamic power in the GPU. Additionally, the RF banks and operand collectors (OCs) are designed with limited number of ports causing access serialization and negatively impacting performance. In this work, we introduce CMRC, a coalescing-aware RF organization that takes advantage of frequent narrow-width data present in general purpose applications to increase performance and reduce energy for GPGPUs. CMRC is a low-cost comprehensive approach to register coalescing capable of combining narrow-width read and write accesses from same or different warp instructions into fewer accesses, reducing port contention and access pressure. On general purpose applications, CMRC reduces RF accesses by 31.8%, achieves a performance speedup of 16.5%, and reduces overall GPU energy by 32.2% on average, outperforming best of class prior work by ~1.8x without the requirement of compiler support.

Volume None
Pages 1803-1808
DOI 10.23919/DATE51398.2021.9474225
Language English
Journal 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Full Text