Practical Bounds on Optimal Caching with Variable Object Sizes
Many recent caching systems aim to improve hit ratios, but there is no good sense among practitioners of how much further hit ratios can be improved. In other words, should the systems community continue working on this problem? Currently, there is no principled answer to this question. Most prior work assumes that objects have the same size, but in practice object sizes often vary by several orders of magnitude. The few known results for variable object sizes provide very weak guarantees and are impractical to compute on traces of realistic length. We propose a new method to compute the offline optimal hit ratio under variable object sizes. Our key insight is to represent caching as a min-cost flow problem, hence we call our method the flow-based offline optimal (FOO). We show that, under simple independence assumptions and Zipf popularities, FOO's bounds become tight as the number of objects goes to infinity. From FOO we develop fast, practical methods to compute nearly tight bounds for the optimal hit ratio, which we call practical flow-based offline optimal (P-FOO). P-FOO enables the first analysis of optimal caching on realistic traces with hundreds of millions of requests. We evaluate P-FOO on several production traces, where results show that recent caching systems are still far from optimal.
READ FULL TEXT