Apriori Algorithm

An Apriori Algorithm is an iterative association rule learning algorithm that computes the frequent itemsets where iteration $i$ computes all frequent i-itemsets (itemsets with $i$ elements) and each iteration has a candidate generation step, candidate counting step and candidate selection step.

References

2009

• (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Apriori_algorithm
• In computer science and data mining, Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Other algorithms are designed for finding association rules in data having no transactions (Winepi and Minepi), or having no timestamps (DNA sequencing).
• As is common in association rule mining, given a set of itemsets (for instance, sets of retail transactions, each listing individual items purchased), the algorithm attempts to find subsets which are common to at least a minimum number C of the itemsets. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.
• Apriori uses breadth-first search and a tree structure to count candidate item sets efficiently. It generates candidate item sets of length $k$ from item sets of length $k-1$. Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent $k$-length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates.
• Apriori, while historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms. Candidate generation generates large numbers of subsets (the algorithm attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all $2^{|S|}-1$ of its proper subsets.
• (Wu & Kumar, 2009) ⇒ Xindong Wu, and Vipin Kumar, editors. (2009). “The Top Ten Algorithms in Data Mining." Chapman & Hall. ISBN:1420089641

1994

• Rakesh Agrawal, and Rakesh Srikant. (1994). "Fast Algorithms for Mining Association Rules." In: Proceedings of VLDB 1994. ISBN 1-55860-153-8.
• Heikki Mannila, H. Toivonen, and A. I. Verkamo. (1994). "Efficient Algorithms for Discovering Association Rules." In: AAAI Workshop on Knowledge Discovery in Databases (SIGKDD).