Confidence (Association Rule Measure)

From GM-RKB
Jump to navigation Jump to search

A Confidence is an association rule performance measure that estimates a conditional probability which indicates how often the rule has been found to be true.



References

2018a

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Association_rule_learning#Confidence Retrieved:2018-10-7.
    • Confidence is an indication of how often the rule has been found to be true.

      The confidence value of a rule, [math]\displaystyle{ X \Rightarrow Y }[/math] , with respect to a set of transactions [math]\displaystyle{ T }[/math] , is the proportion of the transactions that contains [math]\displaystyle{ X }[/math] which also contains [math]\displaystyle{ Y }[/math] .

      Confidence is defined as: [math]\displaystyle{ \mathrm{conf}(X \Rightarrow Y) = \mathrm{supp}(X \cup Y) / \mathrm{supp}(X) }[/math] For example, the rule [math]\displaystyle{ \{\mathrm{butter, bread}\} \Rightarrow \{\mathrm{milk}\} }[/math] has a confidence of [math]\displaystyle{ 0.2/0.2=1.0 }[/math] in the database, which means that for 100% of the transactions containing butter and bread the rule is correct (100% of the times a customer buys butter and bread, milk is bought as well).

      Note that [math]\displaystyle{ \mathrm{supp}(X \cup Y) }[/math] means the support of the union of the items in X and Y. This is somewhat confusing since we normally think in terms of probabilities of events and not sets of items. We can rewrite [math]\displaystyle{ \mathrm{supp}(X \cup Y) }[/math] as the probability [math]\displaystyle{ P(E_X \cap E_Y) }[/math] , where [math]\displaystyle{ E_X }[/math] and [math]\displaystyle{ E_Y }[/math] are the events that a transaction contains itemset [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] , respectively.[1]

      Thus confidence can be interpreted as an estimate of the conditional probability [math]\displaystyle{ P(E_Y | E_X) }[/math], the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS.[2] [3]

  1. Michael Hahsler (2015). A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules. http://michael.hahsler.net/research/association_rules/measures.html
  2. Hahsler, Michael (2005). "Introduction to arules – A computational environment for mining association rules and frequent item sets" (PDF). Journal of Statistical Software.
  3. Hipp, J.; Güntzer, U.; Nakhaeizadeh, G. (2000). “Algorithms for association rule mining --- a general survey and comparison". ACM SIGKDD Explorations Newsletter. 2: 58. doi:10.1145/360402.360421.

2011

  • (Han, Pei & Kamber, 2011) ⇒ Jiawei Han, Jian Pei, and Micheline Kamber (2011). "Data mining: concepts and techniques" (PDF). Elsevier. pp. 266 ISBN 978-0-12-381479-1
    • QUOTE: Let [math]\displaystyle{ I = \{I_1 , I_2 , \cdots , I_m\} }[/math] be an itemset. Let [math]\displaystyle{ D }[/math], the task-relevant data, be a set of database transactions where each transaction [math]\displaystyle{ T }[/math] is a nonempty itemset such that [math]\displaystyle{ T \subseteq I }[/math]. Each transaction is associated with an identifier, called a TID. Let [math]\displaystyle{ A }[/math] be a set of items. A transaction [math]\displaystyle{ T }[/math] is said to contain A if [math]\displaystyle{ A \subseteq T }[/math]. An association rule is an implication of the form [math]\displaystyle{ A \Rightarrow B }[/math], where [math]\displaystyle{ A \subset I,\; B \subset I,\; A = \emptyset,\; B = \emptyset }[/math], and [math]\displaystyle{ A \cap B = \emptyset }[/math]. The rule [math]\displaystyle{ A \Rightarrow B }[/math] holds in the transaction set [math]\displaystyle{ D }[/math] with support [math]\displaystyle{ s }[/math], where [math]\displaystyle{ s }[/math] is the percentage of transactions in [math]\displaystyle{ D }[/math] that contain [math]\displaystyle{ A \cup B }[/math] (i.e., the union of sets [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] say, or, both [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math]). This is taken to be the probability, [math]\displaystyle{ P(A \cup B) }[/math]. The rule [math]\displaystyle{ A \Rightarrow B }[/math] has confidence [math]\displaystyle{ c }[/math] in the transaction set [math]\displaystyle{ D }[/math], where [math]\displaystyle{ c }[/math] is the percentage of transactions in [math]\displaystyle{ D }[/math] containing [math]\displaystyle{ A }[/math] that also contain [math]\displaystyle{ B }[/math]. This is taken to be the conditional probability, [math]\displaystyle{ P(B|A) }[/math]. That is,

      [math]\displaystyle{ support (A\Rightarrow B) = P(A ∪ B) \quad\quad }[/math] (6.2)

      [math]\displaystyle{ confidence (A\Rightarrow B) =P(B|A)\quad\quad }[/math](6.3)

      (...) From Eq. (6.3), we have

      [math]\displaystyle{ confidence (A\Rightarrow B) = P(B|A) = \dfrac{support (A \cup B)} {support (A)} = .... }[/math]

2008

that [math]\displaystyle{ X (Y ) occurs }[/math] in a transaction

1993