Irregular Accesses Reorder Unit: Improving GPGPU Memory Coalescing for Graph-Based Workloads

07/14/2020
by   Albert Segura, et al.
0

GPGPU architectures have become established as the dominant parallelization and performance platform achieving exceptional popularization and empowering domains such as regular algebra, machine learning, image detection and self-driving cars. However, irregular applications struggle to fully realize GPGPU performance as a result of control flow divergence and memory divergence due to irregular memory access patterns. To ameliorate these issues, programmers are obligated to carefully consider architecture features and devote significant efforts to modify the algorithms with complex optimization techniques, which shift programmers priorities yet struggle to quell the shortcomings. We show that in graph-based GPGPU irregular applications these inefficiencies prevail, yet we find that it is possible to relax the strict relationship between thread and data processed to empower new optimizations. Based on this key idea, we propose the Irregular accesses Reorder Unit (IRU), a novel hardware extension tightly integrated in the GPGPU pipeline. The IRU reorders data processed by the threads on irregular accesses which significantly improves memory coalescing, and allows increased performance and energy efficiency. Additionally, the IRU is capable of filtering and merging duplicated irregular access which further improves graph-based irregular applications. Programmers can easily utilize the IRU with a simple API, or compiler optimized generated code with the extended ISA instructions provided. We evaluate our proposal for state-of-the-art graph-based algorithms and a wide selection of applications. Results show that the IRU achieves a memory coalescing improvement of 1.32x and a 46 the memory hierarchy, which results in 1.33x and 13 and energy savings respectively, while incurring in a small 5.6

READ FULL TEXT

page 1

page 4

page 6

page 8

page 9

page 10

research
10/24/2019

Intelligent-Unrolling: Exploiting Regular Patterns in Irregular Applications

Modern optimizing compilers are able to exploit memory access or computa...
research
12/02/2021

Efficient Calling Conventions for Irregular Architectures

We empirically evaluated thousands of different C calling conventions fo...
research
10/18/2020

Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages

We propose Atomic Active Messages (AAM), a mechanism that accelerates ir...
research
09/01/2020

Helper Without Threads: Customized Prefetching for Delinquent Irregular Loads

The growing memory footprints of cloud and big data applications mean th...
research
03/17/2022

FUSED-PAGERANK: Loop-Fusion based Approximate PageRank

PageRank is a graph centrality metric that gives the importance of each ...
research
03/24/2023

Compiler Optimization for Irregular Memory Access Patterns in PGAS Programs

Irregular memory access patterns pose performance and user productivity ...
research
11/18/2022

AXI-Pack: Near-Memory Bus Packing for Bandwidth-Efficient Irregular Workloads

Data-intensive applications involving irregular memory streams are ineff...

Please sign up or login with your details

Forgot password? Click here to reset