Independence in Infinite Probabilistic Databases

10/30/2020
by   Martin Grohe, et al.
0

Probabilistic databases (PDBs) model uncertainty in data. The current standard is to view PDBs as finite probability spaces over relational database instances. Since many attributes in typical databases have infinite domains, such as integers, strings, or real numbers, it is often more natural to view PDBs as infinite probability spaces over database instances. In this paper, we lay the mathematical foundations of infinite probabilistic databases. Our focus then is on independence assumptions. Tuple-independent PDBs play a central role in theory and practice of PDBs. Here, we study infinite tuple-independent PDBs as well as related models such as infinite block-independent disjoint PDBs. While the standard model of PDBs focuses on a set-based semantics, we also study tuple-independent PDBs with a bag semantics and propose Poisson-PDBs as a suitable model. It turns out that for uncountable PDBs, Poisson-PDBs form a natural model of tuple-independence even for a set semantics, and they nicely lock-in with the mathematical theory of Poisson processes. We also propose a new approach to PDBs with an open-world assumption, addressing issues raised by Ceylan et al. (Proc. KR 2016) and generalizing their work, which is still rooted in finite tuple-independent PDBs. Moreover, for countable PDBs we propose an approximate query answering algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2018

Probabilistic Databases with an Infinite Open-World Assumption

Probabilistic databases (PDBs) introduce uncertainty into relational dat...
research
08/21/2020

Tuple-Independent Representations of Infinite Probabilistic Databases

Probabilistic databases (PDBs) are probability spaces over database inst...
research
04/14/2019

Infinite Probabilistic Databases

Probabilistic databases (PDBs) are used to model uncertainty in data in ...
research
01/27/2022

Probabilistic Query Evaluation with Bag Semantics

We initiate the study of probabilistic query evaluation under bag semant...
research
02/27/2019

On Constrained Open-World Probabilistic Databases

Increasing amounts of available data have led to a heightened need for r...
research
04/06/2022

Computing expected multiplicities for bag-TIDBs with bounded multiplicities

In this work, we study the problem of computing a tuple's expected multi...
research
11/23/2022

Run-Based Semantics for RPQs

The formalism of RPQs (regular path queries) is an important building bl...

Please sign up or login with your details

Forgot password? Click here to reset