Understanding the Effectiveness of Large Language Models in Code Translation

by   Rangeet Pan, et al.

Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are actively exploring their potential to automate code translation, i.e., generating code in target PL from its equivalent in another PL. The pre-requisite for advancing the state of LLM-based code translation is to understand their limitations. To that end, we present a large-scale empirical study to investigate the ability of LLMs, including general LLMs and code LLMs, for code translation across pairs of different languages, including C, C++, Go, Java, and Python. Our analysis involves the translation of 1,700 code samples from three distinct benchmarks and real-world projects, revealing LLMs are yet to be reliably used to automate code translation – with incorrect translations ranging from 52.7 across the studied LLMs. Further manual investigation of unsuccessful translations among all PLs identifies 14 root causes for translation bugs. Based on the insights from the empirical study, we propose a prompt-crafting approach to provide additional context for LLMs, improving the performance of LLM-based code translation by 5.5 benchmarks. Our study is the first of its kind, in terms of its scale and breadth, that provides insights into the current limitations of LLMs in code translation and opportunities for improving them. Our collected extensive dataset – consisting of 1,700 code samples written in five PLs with 10K+ tests, 43K+ translated code, 1,725 manually labeled bugs, and 1,365 bug-fix pairs generated using LLMs – can help drive research in this area.


page 1

page 4

page 5

page 6

page 7

page 8

page 10


Understanding Resolution of Multi-Language Bugs: An Empirical Study on Apache Projects

Background: In modern software systems, more and more systems are writte...

On the Evaluation of Neural Code Translation: Taxonomy and Benchmark

In recent years, neural code translation has gained increasing attention...

Bug Analysis in Jupyter Notebook Projects: An Empirical Study

Computational notebooks, such as Jupyter, have been widely adopted by da...

Leveraging Automated Unit Tests for Unsupervised Code Translation

With little to no parallel data available for programming languages, uns...

RPT: Effective and Efficient Retrieval of Program Translations from Big Code

Program translation is a growing demand in software engineering. Manual ...

Learning C to x86 Translation: An Experiment in Neural Compilation

Deep learning has had a significant impact on many fields. Recently, cod...

Better Together? An Evaluation of AI-Supported Code Translation

Generative machine learning models have recently been applied to source ...

Please sign up or login with your details

Forgot password? Click here to reset