Learning to Improve Code Efficiency

08/09/2022
by   Binghong Chen, et al.
0

Improvements in the performance of computing systems, driven by Moore's Law, have transformed society. As such hardware-driven gains slow down, it becomes even more important for software developers to focus on performance and efficiency during development. While several studies have demonstrated the potential from such improved code efficiency (e.g., 2x better generational improvements compared to hardware), unlocking these gains in practice has been challenging. Reasoning about algorithmic complexity and the interaction of coding patterns on hardware can be challenging for the average programmer, especially when combined with pragmatic constraints around development velocity and multi-person development. This paper seeks to address this problem. We analyze a large competitive programming dataset from the Google Code Jam competition and find that efficient code is indeed rare, with a 2x runtime difference between the median and the 90th percentile of solutions. We propose using machine learning to automatically provide prescriptive feedback in the form of hints, to guide programmers towards writing high-performance code. To automatically learn these hints from the dataset, we propose a novel discrete variational auto-encoder, where each discrete latent variable represents a different learned category of code-edit that increases performance. We show that this method represents the multi-modal space of code efficiency edits better than a sequence-to-sequence baseline and generates a distribution of more efficient solutions.

READ FULL TEXT

page 3

page 9

page 16

research
11/04/2019

Learning based Methods for Code Runtime Complexity Prediction

Predicting the runtime complexity of a programming code is an arduous ta...
research
10/02/2021

Recommending Code Understandability Improvements based on Code Reviews

Developers spend 70 read can save time, while hard-to-read code can lead...
research
07/15/2023

Creating a Dataset for High-Performance Computing Code Translation: A Bridge Between HPC Fortran and C++

In this study, we present a novel dataset for training machine learning ...
research
06/25/2018

Prior Attention for Style-aware Sequence-to-Sequence Models

We extend sequence-to-sequence models with the possibility to control th...
research
03/25/2019

Learning a Multi-Modal Policy via Imitating Demonstrations with Mixed Behaviors

We propose a novel approach to train a multi-modal policy from mixed dem...
research
04/25/2022

Fusionize: Improving Serverless Application Performance through Feedback-Driven Function Fusion

Serverless computing increases developer productivity by removing operat...

Please sign up or login with your details

Forgot password? Click here to reset