Towards Better Compressed Representations
We introduce the problem of computing a parsing where each phrase is of length at most m and which minimizes the zeroth order entropy of parsing. Based on the recent theoretical results we devise a heuristic for this problem. The solution has straightforward application in succinct text representations and gives practical improvements. Moreover the proposed heuristic yields structure which size can be bounded both by |S|H_m-1(S) and by |S|/m(H_0(S) + ... + H_m-1), where H_k(S) is the k-th order empirical entropy of S. We also consider a similar problem in which the first-order entropy is minimized.
READ FULL TEXT