Revisiting Lightweight Compiler Provenance Recovery on ARM Binaries

05/06/2023
by   Jason Kim, et al.
0

A binary's behavior is greatly influenced by how the compiler builds its source code. Although most compiler configuration details are abstracted away during compilation, recovering them is useful for reverse engineering and program comprehension tasks on unknown binaries, such as code similarity detection. We observe that previous work has thoroughly explored this on x86-64 binaries. However, there has been limited investigation of ARM binaries, which are increasingly prevalent. In this paper, we extend previous work with a shallow-learning model that efficiently and accurately recovers compiler configuration properties for ARM binaries. We apply opcode and register-derived features, that have previously been effective on x86-64 binaries, to ARM binaries. Furthermore, we compare this work with Pizzolotto et al., a recent architecture-agnostic model that uses deep learning, whose dataset and code are available. We observe that the lightweight features are reproducible on ARM binaries. We achieve over 99 approaches, while achieving a 583-times speedup during training and 3,826-times speedup during inference. Finally, we also discuss findings of overfitting that was previously undetected in prior work.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset