A readahead prefetcher for GPU file system layer

09/11/2021
by   Vasilis Dimitsas, et al.
0

GPUs are broadly used in I/O-intensive big data applications. Prior works demonstrate the benefits of using GPU-side file system layer, GPUfs, to improve the GPU performance and programmability in such workloads. However, GPUfs fails to provide high performance for a common I/O pattern where a GPU is used to process a whole data set sequentially. In this work, we propose a number of system-level optimizations to improve the performance of GPUfs for such workloads. We perform an in-depth analysis of the interplay between the GPU I/O access pattern, CPU-GPU PCIe transfers and SSD storage, and identify the main bottlenecks. We propose a new GPU I/O readahead prefetcher and a GPU page cache replacement mechanism to resolve them. The GPU I/O readahead prefetcher achieves more than 2× (geometric mean) higher bandwidth in a series of microbenchmarks compared to the original GPUfs. Furthermore, we evaluate the system on 14 applications derived from the RODINIA, PARBOIL and POLYBENCH benchmark suites. Our prefetching mechanism improves their execution time by up to 50 transfer techniques.

READ FULL TEXT
research
06/13/2019

Thread Batching for High-performance Energy-efficient GPU Memory Design

Massive multi-threading in GPU imposes tremendous pressure on memory sub...
research
12/12/2017

Intra-node Memory Safe GPU Co-Scheduling

GPUs in High-Performance Computing systems remain under-utilised due to ...
research
04/02/2019

DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis

Training convolutional neural networks (CNNs) requires intense compute t...
research
09/30/2019

Optimizing GPU Cache Policies for MI Workloads

In recent years, machine intelligence (MI) applications have emerged as ...
research
04/25/2021

RDMAbox : Optimizing RDMA for Memory Intensive Workloads

We present RDMAbox, a set of low level RDMA opti-mizations that provide ...
research
09/11/2017

Report: Performance comparison between C2075 and P100 GPU cards using cosmological correlation functions

In this report, some cosmological correlation functions are used to eval...
research
05/28/2010

Simulation de traces réelles d'E/S disque de PC

Under Windows operating system, existing I/O benchmarking tools does not...

Please sign up or login with your details

Forgot password? Click here to reset