Probabilistic Data Record Set

From GM-RKB
Jump to navigation Jump to search

A Probabilistic Data Record Set is a Data Record Set associated with a Probability Function on each Data Record.



References

2009

  • (Dalvi et al., 2009) ⇒ Nilesh Dalvi, Christopher Ré, and Dan Suciu. (2009). “Probabilistic Databases: diamonds in the dirt.” In: Communications of the ACM, 52(7). doi:10.1145/1538788.1538810
    • A probabilistic database is a discrete probability space PDB = (W, P), where W = {I1,I2, ..., In} is a set of possible instances, called possible worlds, and P: W → [0, 1] is such that ∑j=1,nP(Ij) = 1. In the terminology of networks of belief, there is one random variable for each possible tuple whose values are 0 (meaning that the tuple is not present) or 1 (meaning that the tuple is present), and a probabilistic database is a joint probability distribution over the values of these random variables. *...
    • Consider some tuple t (we use interchangeably the terms tuple and record in this article). The probability that the tuple belongs to a randomly chosen world is P(t) = ∑j:t∈Ij P(Ij), and is also called the marginal probability of the tuple t. Similarly, if we have two tuples t1, t2, we can examine the probability that both are present in a randomly chosen world, denoted P(t1t2). When the latter is P(t1)P(t2), we say that t1, t2 are independent tuples; if it is 0 then we say that t1, t2 are disjoint tuples or exclusive tuples. If none of these hold, then the tuples are correlated in a nonobvious way.

1987

  • Cavallo, R. and Pittarelli, M. The theory of probabilistic databases. In Proceedings of VLDB (1987), 71–81.