Vector-Processing for Mobile Devices: Benchmark and Analysis

by   Alireza Khadem, et al.

Vector processing has become commonplace in today's CPU microarchitectures. Vector instructions improve performance and energy which is crucial for resource-constraint mobile devices. The research community currently lacks a comprehensive benchmark suite to study the benefits of vector processing for mobile devices. This paper presents Swan-an extensive vector processing benchmark suite for mobile applications. Swan consists of a diverse set of data-parallel workloads from four commonly used mobile applications: operating system, web browser, audio/video messaging application, and PDF rendering engine. Using Swan benchmark suite, we conduct a detailed analysis of the performance, power, and energy consumption of vectorized workloads, and show that: (a) Vectorized kernels increase the pressure on cache hierarchy due to the higher rate of memory requests. (b) Vector processing is more beneficial for workloads with lower precision operations and higher cache hit rates. (c) Limited Instruction-Level Parallelism and strided memory accesses to multi-dimensional data structures prevent vector processing benefits from scaling with more SIMD functional units and wider registers. (d) Despite lower computation throughput than domain-specific accelerators, such as GPU, vector processing outperforms these accelerators for kernels with lower operation counts. Finally, we show five common computation patterns in mobile data-parallel workloads that dominate the execution time.


page 5

page 7

page 10


Design Space Exploration of Algorithmic Multi-Port Memories in High-Performance Application-Specific Accelerators

Memory load/store instructions consume an important part in execution ti...

An Empirical-cum-Statistical Approach to Power-Performance Characterization of Concurrent GPU Kernels

Growing deployment of power and energy efficient throughput accelerators...

EPAM: A Predictive Energy Model for Mobile AI

Artificial intelligence (AI) has enabled a new paradigm of smart applica...

Memory Centric Characterization and Analysis of SPEC CPU2017 Suite

In this paper we provide a comprehensive, memory-centric characterizatio...

Casper: Accelerating Stencil Computation using Near-cache Processing

Stencil computation is one of the most used kernels in a wide variety of...

Short reasons for long vectors in HPC CPUs: a study based on RISC-V

For years, SIMD/vector units have enhanced the capabilities of modern CP...

Accelerating K-mer Frequency Counting with GPU and Non-Volatile Memory

The emergence of Next Generation Sequencing (NGS) platforms has increase...

Please sign up or login with your details

Forgot password? Click here to reset