Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization

03/14/2023
by   Lukas Trümper, et al.
0

Performance optimization is an increasingly challenging but often repetitive task. While each platform has its quirks, the underlying code transformations rely on data movement and computational characteristics that recur across applications. This paper proposes to leverage those similarities by constructing an embedding space for subprograms. The continuous space captures both static and dynamic properties of loop nests via symbolic code analysis and performance profiling, respectively. Performance embeddings enable direct knowledge transfer of performance tuning between applications, which can result from autotuning or tailored improvements. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils. Transfer tuning reduces the search complexity by up to four orders of magnitude and outperforms the MKL library in sparse-dense matrix multiplication. The results exhibit clear correspondences between program characteristics and optimizations, outperforming prior specialized state-of-the-art approaches and generalizing beyond their capabilities.

READ FULL TEXT

page 1

page 4

research
08/22/2017

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection

The problem of cross-platform binary code similarity detection aims at d...
research
11/28/2021

Search for Optimal Systolic Arrays: A Comprehensive Automated Exploration Framework and Lessons Learned

Systolic arrays have been widely used for accelerating HPC and deep lear...
research
11/07/2020

FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks

We develop a fused matrix multiplication kernel that unifies sampled den...
research
09/23/2019

Compiler-Level Matrix Multiplication Optimization for Deep Learning

An important linear algebra routine, GEneral Matrix Multiplication (GEMM...
research
03/29/2023

PopSparse: Accelerated block sparse matrix multiplication on IPU

Reducing the computational cost of running large scale neural networks u...
research
11/01/2018

User-Directed Loop-Transformations in Clang

Directives for the compiler such as pragmas can help programmers to sepa...
research
05/18/2017

Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis

Sympiler is a domain-specific code generator that optimizes sparse matri...

Please sign up or login with your details

Forgot password? Click here to reset