Apriori Algorithm

From GM-RKB
Jump to navigation Jump to search

An Apriori Algorithm is an iterative association rule learning algorithm that computes the frequent itemsets where iteration [math]\displaystyle{ i }[/math] computes all frequent i-itemsets (itemsets with [math]\displaystyle{ i }[/math] elements) and each iteration has a candidate generation step, candidate counting step and candidate selection step.



References

2011

2009

  • (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Apriori_algorithm
    • In computer science and data mining, Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Other algorithms are designed for finding association rules in data having no transactions (Winepi and Minepi), or having no timestamps (DNA sequencing).
    • As is common in association rule mining, given a set of itemsets (for instance, sets of retail transactions, each listing individual items purchased), the algorithm attempts to find subsets which are common to at least a minimum number C of the itemsets. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.
    • Apriori uses breadth-first search and a tree structure to count candidate item sets efficiently. It generates candidate item sets of length [math]\displaystyle{ k }[/math] from item sets of length [math]\displaystyle{ k-1 }[/math]. Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent [math]\displaystyle{ k }[/math]-length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates.
    • Apriori, while historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms. Candidate generation generates large numbers of subsets (the algorithm attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all [math]\displaystyle{ 2^{|S|}-1 }[/math] of its proper subsets.
  • (Wu & Kumar, 2009) ⇒ Xindong Wu, and Vipin Kumar, editors. (2009). “The Top Ten Algorithms in Data Mining.” Chapman & Hall. ISBN:1420089641

1994

  • Rakesh Agrawal, and Rakesh Srikant. (1994). “Fast Algorithms for Mining Association Rules.” In: Proceedings of VLDB 1994. ISBN 1-55860-153-8.
  • Heikki Mannila, H. Toivonen, and A. I. Verkamo. (1994). “Efficient Algorithms for Discovering Association Rules.” In: AAAI Workshop on Knowledge Discovery in Databases (SIGKDD).

1993