{"id":245,"date":"2012-10-02T07:33:53","date_gmt":"2012-10-02T11:33:53","guid":{"rendered":"http:\/\/learninglover.com\/blog\/?p=245"},"modified":"2013-05-08T07:40:43","modified_gmt":"2013-05-08T11:40:43","slug":"learning-the-apriori-algorithm","status":"publish","type":"post","link":"https:\/\/learninglover.com\/blog\/index.php\/2012\/10\/02\/learning-the-apriori-algorithm\/","title":{"rendered":"Learning the Apriori Algorithm"},"content":{"rendered":"<p><a href=\"http:\/\/www.learninglover.com\/examples.php?id=28\" title=\"Apriori Algorithm\"><img decoding=\"async\" src=\"http:\/\/learninglover.com\/blog\/wp-content\/uploads\/2012\/10\/apriori.jpg\"\/><\/a><\/p>\n<p>I have finished <a href=\"http:\/\/www.learninglover.com\/examples.php?id=28\" title=\"Apriori Algorithm\">a script that runs the Apriori algorithm<\/a>. <\/p>\n<p>When we are given a large set of transactions, we are often interested in discovering patterns inside these transactions. The Apriori algorithm provides a means for formulating what are known as &#8220;association rules&#8221; for the set of transactions. An association rule is an observation from the database between different items inside a transaction. For example, the statement &#8220;If a customer buys chips they are 60% more likely to also purchase dip&#8221; could be an association rule based on data from supermarket purchases. The number 60% is called the confidence we have in the rule and we are generally interested in rules with higher confidence. <\/p>\n<p>The Apriori algorithm takes as input a transaction database and a &#8220;threshold&#8221;. The initial pass through the database performs a count on each single item in the database and checks how many transactions contain each item. The algorithm proceeds by the Apriori property which states that &#8220;any subset of a frequent itemset must also be frequent&#8221;. What this means is that when checking the subsets of length 2 (and greater), we can ignore those subsets that contain an element that does not meet the minimum support, as it cannot be a part of a frequent itemset. <\/p>\n<p>So lets look at an example. Suppose our list of transactions for the items Chips, Dip, Soda, Napkins and Paper Plates are as follows: <\/p>\n<p>(Chips, Dip, Soda, Napkins, Paper Plates)<br \/>\n1 1 0 1 0<br \/>\n0 0 0 1 0<br \/>\n1 1 1 0 1<br \/>\n0 1 1 0 1<br \/>\n1 0 0 1 0<br \/>\n1 0 0 0 1<br \/>\n1 1 1 0 1<br \/>\n0 0 1 0 1<br \/>\n1 0 0 0 0<br \/>\n1 0 1 1 0<br \/>\n0 1 1 1 0<br \/>\n1 1 0 0 0<br \/>\n1 0 1 1 0<br \/>\n0 0 0 1 1<br \/>\n1 1 1 1 0<br \/>\n0 0 0 0 0<br \/>\n0 0 0 0 0<br \/>\n1 1 1 0 0<br \/>\n0 0 0 1 1<br \/>\n1 1 1 0 1 <\/p>\n<p>And lets suppose that we&#8217;re interested in finding the collections that occur more than a quarter (25%) of the time. <\/p>\n<p>What we see from simply summing the columns and dividing by the number of columns is that 60% of the people purchased chips, 45% purchased dip, 50% purchased soda, 45% purchased napkins, and 40% purchased paper plates. Since these are all above our minimum threshold, they are all possible as elements of future collections. <\/p>\n<p>When looking at larger collections, we see that:<br \/>\n &#8211; 35% of the people purchased both chips and dip<br \/>\n &#8211; 35% of the people purchased both chips and soda<br \/>\n &#8211; 25% of the people purchased both chips and napkins<br \/>\n &#8211; 35% of the people purchased both dip and soda<br \/>\n &#8211; 25% of the people purchased both soda and paper plates<br \/>\n &#8211; no other size two collections are above the minimum threshold. <\/p>\n<p>You&#8217;ll note that since only 20% of the people purchased chips and paper plates, it is not above the minimum threshold and so we can ignore it in future collections. From the set of size 2 itemsets, we can derive our list of size three itemsets. <\/p>\n<p>In particular, we see that:<br \/>\n &#8211; 25% of the people purchased all three of chips, dip, and soda.<br \/>\n &#8211; no other size three collections are above the minimum threshold. <\/p>\n<p>The hierarchy of the itemsets is displayed in the image above. <\/p>\n<p>To play around with this algorithm and understand more of its properties, visit <a href=\"http:\/\/www.learninglover.com\/examples.php?id=28\" title=\"Apriori Algorithm\">Apriori algorithm<\/a>. <\/p>\n<p>Other Blogs that have covered this topic:<br \/>\n<a href=\"http:\/\/auburnbigdata.blogspot.com\/2013\/03\/apriori-algorithm.html\" title=\"Analytics and Visualization of Big Data\">Analytics and Visualization of Big Data<\/a><br \/>\n<a href=\"http:\/\/statistical-research.com\/association-rule-learning-and-the-apriori-algorithm\/\" title=\"Statistical Research\">Statistical Research<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have finished a script that runs the Apriori algorithm. When we are given a large set of transactions, we are often interested in discovering patterns inside these transactions. The Apriori algorithm provides a means for formulating what are known as &#8220;association rules&#8221; for the set of transactions. An association rule is an observation from [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-245","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/245","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=245"}],"version-history":[{"count":0,"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/245\/revisions"}],"wp:attachment":[{"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=245"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=245"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=245"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}