05 November 2021 0 1K Report

I am interested in understanding the applications behind the Apriori Algorithm. Given a datasource with a transaction id column and a list of items. How can we use the Apriori algorithm for extracting single-dimensional, single-level, boolean association rules? The end goal of this is to understand what items were bought together.

Step 1. Build C1, list of items and then L1

  • Select a threshold, construct L1 from items for which support count>Threshold

("1" after letter C says each item set in C includes on one item)

L1 is the cleaned version of C1, that is it is the same as C1 with some itemset removed because they do not satisfy a given condition.

Step 2. Build C2, list of items in 2 by 2, from joining L1 to itself

  • Select a threshold, construct L2 from items for which support count>Threshold

Step 3. Build C3, list of items with 3 items in every itemset from joining L2 to itself

  • Select a threshold, construct L3 from items for which support count>Threshold

Step 4. Build C4, list of items with 3 items in every itemset from joining L3 to itself

  • Select a threshold, construct L3 from items for which support count>Threshold

After following this process how do we apply association rules for each item set in L3 to generate all of it's non-empty subsets?

Finally, how do we extract association rules from each subset?

Once we have determined the association rules we can calculate the confidence levels via

c = Frequency(item1 union item2)/Frequency(item1)

Finally, how can we can select association rules that satisfy the condition above?

More Evan Gertis's questions See All
Similar questions and discussions