ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations

10/22/2022
by   Zhiying Xu, et al.
0

Deep learning models rely on highly optimized tensor libraries for efficient inference on heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors and then optimize loops of operators. However, such unidirectional and one-off workflow strictly separates graph-level optimization and operator-level optimization into different system layers, missing opportunities for unified tuning. This paper proposes ALT, a compiler that performs joint graph- and operator-level optimizations for deep models. ALT provides a generic transformation module to manipulate layouts and loops with easy-to-use primitive functions. ALT further integrates an auto-tuning module that jointly optimizes graph-level data layouts and operator-level loops while guaranteeing efficiency. Experimental results show that ALT significantly outperforms state-of-the-art compilers (e.g., Ansor) in terms of both single operator performance (e.g., 1.5x speedup on average) and end-to-end inference performance (e.g., 1.4x speedup on average).

READ FULL TEXT

page 3

page 4

page 7

page 9

page 10

research
11/02/2020

Cortex: A Compiler for Recursive Deep Learning Models

Optimizing deep learning models is generally performed in two steps: (i)...
research
10/18/2022

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

As deep learning models nowadays are widely adopted by both cloud servic...
research
08/11/2020

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

Accelerating deep model training and inference is crucial in practice. E...
research
10/29/2022

Enabling Data Movement and Computation Pipelining in Deep Learning Compiler

Pipelining between data loading and computation is a critical tensor pro...
research
12/02/2022

AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization

Traditional deep learning compilers rely on heuristics for subgraph gene...
research
10/25/2021

Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance

Today's auto-tuners (e.g., AutoTVM, Ansor) generate efficient tensor pro...
research
05/12/2021

Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

Recent trend towards increasing large machine learning models require bo...

Please sign up or login with your details

Forgot password? Click here to reset