Towards Better Compressed Representations

11/07/2019
by   Michał Gańczorz, et al.
0

We introduce the problem of computing a parsing where each phrase is of length at most m and which minimizes the zeroth order entropy of parsing. Based on the recent theoretical results we devise a heuristic for this problem. The solution has straightforward application in succinct text representations and gives practical improvements. Moreover the proposed heuristic yields structure which size can be bounded both by |S|H_m-1(S) and by |S|/m(H_0(S) + ... + H_m-1), where H_k(S) is the k-th order empirical entropy of S. We also consider a similar problem in which the first-order entropy is minimized.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset