Complexity and Efficient Algorithms for Data Inconsistency Evaluating and Repairing

01/02/2020
by   Dongjing Miao, et al.
0

Data inconsistency evaluating and repairing are major concerns in data quality management. As the basic computing task, optimal subset repair is not only applied for cost estimation during the progress of database repairing, but also directly used to derive the evaluation of database inconsistency. Computing an optimal subset repair is to find a minimum tuple set from an inconsistent database whose remove results in a consistent subset left. Tight bound on the complexity and efficient algorithms are still unknown. In this paper, we improve the existing complexity and algorithmic results, together with a fast estimation on the size of optimal subset repair. We first strengthen the dichotomy for optimal subset repair computation problem, we show that it is not only APXcomplete, but also NPhard to approximate an optimal subset repair with a factor better than 17/16 for most cases. We second show a (2-0.5^σ-1)-approximation whenever given σ functional dependencies, and a (2-η_k+η_k/k)-approximation when an η_k-portion of tuples have the k-quasi-Turán property for some k>1. We finally show a sublinear estimator on the size of optimal S-repair for subset queries, it outputs an estimation of a ratio 2n+ϵ n with a high probability, thus deriving an estimation of FD-inconsistency degree of a ratio 2+ϵ. To support a variety of subset queries for FD-inconsistency evaluation, we unify them as the ⊆-oracle which can answer membership-query, and return p tuples uniformly sampled whenever given a number p. Experiments are conducted on range queries as an implementation of ⊆-oracle, and results show the efficiency of our FD-inconsistency degree estimator.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2017

Computing Optimal Repairs for Functional Dependencies

We investigate the complexity of computing an optimal repair of an incon...
research
09/29/2020

Database Repairing with Soft Functional Dependencies

A common interpretation of soft constraints penalizes the database for e...
research
08/30/2017

The Complexity of Computing a Cardinality Repair for Functional Dependencies

For a relation that violates a set of functional dependencies, we consid...
research
05/08/2018

On Secure Exact-repair Regenerating Codes with a Single Pareto Optimal Point

The problem of exact-repair regenerating codes against eavesdropping att...
research
04/22/2022

Uniform Operational Consistent Query Answering

Operational consistent query answering (CQA) is a recent framework for C...
research
07/26/2021

Approximating Sumset Size

Given a subset A of the n-dimensional Boolean hypercube 𝔽_2^n, the sumse...
research
07/31/2018

Improve3C: Data Cleaning on Consistency and Completeness with Currency

Data quality plays a key role in big data management today. With the exp...

Please sign up or login with your details

Forgot password? Click here to reset