Probabilistic Data Record Set

References

(Dalvi et al., 2009) ⇒ Nilesh Dalvi, Christopher Ré, and Dan Suciu. (2009). “Probabilistic Databases: diamonds in the dirt.” In: Communications of the ACM, 52(7). doi:10.1145/1538788.1538810
- A probabilistic database is a discrete probability space PDB = (W, P), where W = {I₁,I₂, ..., I_n} is a set of possible instances, called possible worlds, and P: W → [0, 1] is such that ∑_j=1,nP(I_j) = 1. In the terminology of networks of belief, there is one random variable for each possible tuple whose values are 0 (meaning that the tuple is not present) or 1 (meaning that the tuple is present), and a probabilistic database is a joint probability distribution over the values of these random variables. *...
- Consider some tuple t (we use interchangeably the terms tuple and record in this article). The probability that the tuple belongs to a randomly chosen world is P(t) = ∑_j:t∈Ij P(I_j), and is also called the marginal probability of the tuple t. Similarly, if we have two tuples t₁, t₂, we can examine the probability that both are present in a randomly chosen world, denoted P(t₁t₂). When the latter is P(t₁)P(t₂), we say that t₁, t₂ are independent tuples; if it is 0 then we say that t₁, t₂ are disjoint tuples or exclusive tuples. If none of these hold, then the tuples are correlated in a nonobvious way.

Cavallo, R. and Pittarelli, M. The theory of probabilistic databases. In Proceedings of VLDB (1987), 7181.