Computing Optimal Repairs for Functional Dependencies

12/20/2017
by   Ester Livshits, et al.
0

We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair) that is obtained by a minimum number of tuple deletions, and an optimal update repair (optimal U-repair) that is obtained by a minimum number of value (cell) updates. For computing an optimal S-repair, we present a polynomial-time algorithm that succeeds on certain sets of FDs and fails on others. We prove the following about the algorithm. When it succeeds, it can also incorporate weighted tuples and duplicate tuples. When it fails, the problem is NP-hard, and in fact, APX-complete (hence, cannot be approximated better than some constant). Thus, we establish a dichotomy in the complexity of computing an optimal S-repair. We present general analysis techniques for the complexity of computing an optimal U-repair, some based on the dichotomy for S-repairs. We also draw a connection to a past dichotomy in the complexity of finding a "most probable database" that satisfies a set of FDs with a single attribute on the left hand side; the case of general FDs was left open, and we show how our dichotomy provides the missing generalization and thereby settles the open problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2017

The Complexity of Computing a Cardinality Repair for Functional Dependencies

For a relation that violates a set of functional dependencies, we consid...
research
01/02/2020

Complexity and Efficient Algorithms for Data Inconsistency Evaluating and Repairing

Data inconsistency evaluating and repairing are major concerns in data q...
research
12/26/2017

Pattern-Driven Data Cleaning

Data is inherently dirty and there has been a sustained effort to come u...
research
09/29/2020

Database Repairing with Soft Functional Dependencies

A common interpretation of soft constraints penalizes the database for e...
research
04/24/2018

Measuring and Computing Database Inconsistency via Repairs

We propose a generic numerical measure of inconsistency of a database wi...
research
01/13/2022

Certifiable Robustness for Nearest Neighbor Classifiers

ML models are typically trained using large datasets of high quality. Ho...

Please sign up or login with your details

Forgot password? Click here to reset