Meterstick: Benchmarking Performance Variability in Cloud and Self-hosted Minecraft-like Games Extended Technical Report
Due to increasing popularity and strict performance requirements, online games have become a topic of interest for the performance engineering community. One of the most popular types of online games is the modifiable virtual environment (MVE), in which players can terraform the environment. The most popular MVE, Minecraft, provides not only entertainment, but also educational support and social interaction, to over 130 million people world-wide. MVEs currently support their many players by replicating isolated instances that support each only up to a few hundred players under favorable conditions. In practice, as we show here, the real upper limit of supported players can be much lower. In this work, we posit that performance variability is a key cause for the lack of scalability in MVEs, investigate experimentally causes of performance variability, and derive actionable insights. We propose an operational model for MVEs, which extends the state-of-the-art with essential aspects, e.g., through the consideration of environment-based workloads, which are sizable workload components that do not depend on player input (once set in action). Starting from this model, we design the first benchmark that focuses on MVE performance variability, defining specialized workloads, metrics, and processes. We conduct real-world benchmarking of Minecraft-like MVEs, both cloud-based and self-hosted. We find environment-based workloads and cloud deployment are significant sources of performance variability: peak-latency degrades sharply to 20.7 times the arithmetic mean and exceeds by a factor of 7.4 the performance requirements. We derive actionable insights for game-developers, game-operators, and other stakeholders to tame performance variability.
READ FULL TEXT