Constrained Optimal Querying: Huffman Coding and Beyond

10/08/2022
by   Shuyuan Zhang, et al.
0

Huffman coding is well known to be useful in certain decision problems involving minimizing the average number of (freely chosen) queries to determine an unknown random variable. However, in problems where the queries are more constrained, the original Huffman coding no longer works. In this paper, we proposed a general model to describe such problems and two code schemes: one is Huffman-based, and the other called GBSC (Greedy Binary Separation Coding). We proved the optimality of GBSC by induction on a binary decision tree, telling us that GBSC is at least as good as Shannon coding. We then compared the two algorithms based on these two codes, by testing them with two problems: DNA detection and 1-player Battleship, and found both to be decent approximating algorithms, with Huffman-based algorithm giving an expected length 1.1 times the true optimal in DNA detection problem, and GBSC yielding an average number of queries 1.4 times the theoretical optimal in 1-player Battleship.

READ FULL TEXT
research
02/03/2021

On Coding for an Abstracted Nanopore Channel for DNA Storage

In the emerging field of DNA storage, data is encoded as DNA sequences a...
research
05/07/2022

Rate-Constrained Shaping Codes for Finite-State Channels With Cost

Shaping codes are used to generate code sequences in which the symbols o...
research
08/11/2023

Embracing Errors is More Efficient than Avoiding Them through Constrained Coding for DNA Data Storage

DNA is an attractive medium for digital data storage. When data is store...
research
06/25/2019

Coding for Crowdsourced Classification with XOR Queries

This paper models the crowdsourced labeling/classification problem as a ...
research
04/15/2022

Generalized Universal Coding of Integers

Universal coding of integers (UCI) is a class of variable-length code, s...
research
03/18/2022

A constrained Shannon-Fano entropy coder for image storage in synthetic DNA

The exponentially increasing demand for data storage has been facing mor...
research
03/31/2019

Semisupervised Clustering by Queries and Locally Encodable Source Coding

Source coding is the canonical problem of data compression in informatio...

Please sign up or login with your details

Forgot password? Click here to reset