Characterizing Deep-Learning I/O Workloads in TensorFlow

by   Steven W. D. Chien, et al.

The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6x with respect to checkpointing directly to slower storage on our benchmark environment.


page 1

page 2

page 3

page 4


Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

Deep learning models can take weeks to train on a single GPU-equipped ma...

Designing a Micro-Benchmark Suite to Evaluate gRPC for TensorFlow: Early Experiences

Remote procedure call (RPC) is the backbone of many modern distributed s...

TensorFlow Doing HPC

TensorFlow is a popular emerging open-source programming framework suppo...

A Survey on Uncertainty Toolkits for Deep Learning

The success of deep learning (DL) fostered the creation of unifying fram...

Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms

Modern deep learning (DL) applications are built using DL libraries and ...

FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models

Training deep learning (DL) models has become a norm. With the emergence...

Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond

We propose a static loop vectorization optimization on top of high level...

Please sign up or login with your details

Forgot password? Click here to reset