Probabilistic Database System

A Probabilistic Database System is a Database System that can manage a Probabilistic Data Record Set.

See: Database Management System.

References

2009

(Dalvi et al., 2009) ⇒ Nilesh Dalvi, Christopher Ré, and Dan Suciu. (2009). “Probabilistic Databases: diamonds in the dirt.” In: Communications of the ACM, 52(7). doi:10.1145/1538788.1538810
- The de facto formal semantics of a probabilistic database is the possible worlds model.^{[Dalvi & Suciu, 2007]} By contrast, there is no agreement on a representation system, instead there are several approaches covering a spectrum between expressive power and usability.^{[Benjelloun et al., 2008]} A key concept in most representation systems is that of lineage, which is derived from early work on incomplete databases by Immelinski and Lipski.^{[Imielinski & Lipski, 1984]}
- Possible Worlds Semantics. In its most general form, a probabilistic database is a probability space over the possible contents of the database. It is customary to denote a (conventional) relational database instance with the letter I. Assuming there is a single table in our database, I is simply a set of tuples (records) representing that table; this is a conventional database. A probabilistic database is a discrete probability space PDB = (W, P), where W = {I₁,I₂, ..., I_n} is a set of possible instances, called possible worlds, and P: W → [0, 1] is such that ∑_j=1,nP(I_j) = 1. In the terminology of networks of belief, there is one random variable for each possible tuple whose values are 0 (meaning that the tuple is not present) or 1 (meaning that the tuple is present), and a probabilistic database is a joint probability distribution over the values of these random variables. *...
- Consider some tuple t (we use interchangeably the terms tuple and record in this article). The probability that the tuple belongs to a randomly chosen world is P(t) = ∑_j:t∈Ij P(I_j), and is also called the marginal probability of the tuple t. Similarly, if we have two tuples t₁, t₂, we can examine the probability that both are present in a randomly chosen world, denoted P(t₁t₂). When the latter is P(t₁)P(t₂), we say that t₁, t₂ are independent tuples; if it is 0 then we say that t₁, t₂ are disjoint tuples or exclusive tuples. If none of these hold, then the tuples are correlated in a nonobvious way. Consider a query Q, expressed in some relational query language like SQL, and a possible tuple t in the query's answer. P(t ∈ Q) denotes the probability that, in a randomly chosen world, t is an answer to Q. The job of a probabilistic database system is to return all possible tuples t₁, t₂, ...together with their probabilities P(t₁ ∈ Q), P(t₂ ∈ Q),....

2007

(Dalvi & Suciu, 2007) ⇒ Nilesh Dalvi, and Dan Suciu. (2007). “Efficient query evaluation on probabilistic databases.” In: VLDB Journal, 16(4). doi:10.1007/s00778-006-0004-3

Probabilistic Database System

References

2009

2007

Navigation menu

Search