Semisupervised Clustering by Queries and Locally Encodable Source Coding

03/31/2019
by   Arya Mazumdar, et al.
0

Source coding is the canonical problem of data compression in information theory. In a locally encodable source coding, each compressed bit depends on only few bits of the input. In this paper, we show that a recently popular model of semisupervised clustering is equivalent to locally encodable source coding. In this model, the task is to perform multiclass labeling of unlabeled elements. At the beginning, we can ask in parallel a set of simple queries to an oracle who provides (possibly erroneous) binary answers to the queries. The queries cannot involve more than two (or a fixed constant number Δ of) elements. Now the labeling of all the elements (or clustering) must be performed based on the (noisy) query answers. The goal is to recover all the correct labelings while minimizing the number of such queries. The equivalence to locally encodable source codes leads us to find lower bounds on the number of queries required in variety of scenarios. We are also able to show fundamental limitations of pairwise `same cluster' queries - and propose pairwise AND queries, that provably performs better in many situations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2019

Coding for Crowdsourced Classification with XOR Queries

This paper models the crowdsourced labeling/classification problem as a ...
research
06/22/2017

Clustering with Noisy Queries

In this paper, we initiate a rigorous theoretical study of clustering wi...
research
09/07/2023

Noisy Computing of the 𝖮𝖱 and 𝖬𝖠𝖷 Functions

We consider the problem of computing a function of n variables using noi...
research
10/27/2021

Active clustering for labeling training data

Gathering training data is a key step of any supervised learning task, a...
research
06/23/2017

Query Complexity of Clustering with Side Information

Suppose, we are given a set of n elements to be clustered into k (unknow...
research
10/28/2019

Same-Cluster Querying for Overlapping Clusters

Overlapping clusters are common in models of many practical data-segment...
research
10/08/2022

Constrained Optimal Querying: Huffman Coding and Beyond

Huffman coding is well known to be useful in certain decision problems i...

Please sign up or login with your details

Forgot password? Click here to reset