A Loop-Based Methodology for Reducing Computational Redundancy in Workload Sets

by   Elie M. Shaccour, et al.

The design of general purpose processors relies heavily on a workload gathering step in which representative programs are collected from various application domains. Processor performance, when running the workload set, is profiled using simulators that model the targeted processor architecture. However, simulating the entire workload set is prohibitively time-consuming, which precludes considering a large number of programs. To reduce simulation time, several techniques in the literature have exploited the internal program repetitiveness to extract and execute only representative code segments. Existing so- lutions are based on reducing cross-program computational redundancy or on eliminating internal-program redundancy to decrease execution time. In this work, we propose an orthogonal and complementary loop- centric methodology that targets loop-dominant programs by exploiting internal-program characteristics to reduce cross-program computational redundancy. The approach employs a newly developed framework that extracts and analyzes core loops within workloads. The collected characteristics model memory behavior, computational complexity, and data structures of a program, and are used to construct a signature vector for each program. From these vectors, cross-workload similarity metrics are extracted, which are processed by a novel heuristic to exclude similar programs and reduce redundancy within the set. Finally, a reverse engineering approach that synthesizes executable micro-benchmarks having the same instruction mix as the loops in the original workload is introduced. A tool that automates the flow steps of the proposed methodology is developed. Simulation results demonstrate that applying the proposed methodology to a set of workloads reduces the set size by half, while preserving the main characterizations of the initial workloads.


Workload Similarity Analysis using Machine Learning Techniques

Finding the similarity between two workload behaviors is helpful in 1. c...

Software Pipelining for Quantum Loop Programs

We propose a method for performing software pipelining on quantum for-lo...

NPS: A Framework for Accurate Program Sampling Using Graph Neural Network

With the end of Moore's Law, there is a growing demand for rapid archite...

Accurate Open-set Recognition for Memory Workload

How can we accurately identify new memory workloads while classifying kn...

On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds

With the growing amount of data, data processing workloads and the manag...

Compiler-Guided Throughput Scheduling for Many-core Machines

Modern ARM-based servers such as ThunderX and ThunderX2 offer a tremendo...

Artist-Guided Semiautomatic Animation Colorization

There is a delicate balance between automating repetitive work in creati...

Please sign up or login with your details

Forgot password? Click here to reset