If you have any doubts in the below, contact us by dropping a mail to the Kung Fu Panda. We will get back to you very soon.

- used for unsupervised pattern discovery in large transactional data.
- used to find out useful insights in large transaction databases of large retailers.
- result in a collection of association rules that denote the pattern found out between the data.
- also called market basket analysis because it is generally applied to transaction data in a supermarket.

- transaction data contain the items which are bought in a transaction.
- considering the no of items is much large, they are denoted as sparse matrix.
- All subsets of a frequent itemset must also be frequent. => Apriori Property.

- does not work well with small data, gives no rules.
- finally comes up with rules which are easy to understand.
- useful for data mining

- transaction data contain the items which are bought in a transaction.
- gives some trivial rules also, so need manual effort to seperate out the true knowledge from trivial rules.

Quality of rule is defined by two params: Support and Quality

- Support
- is an indication of how often does the rule or item appear in the data.
- support(X) = count(X) / N

- Confidence
- measure of the rule's predictive power and accuracy.
- confidence that X leads to Y is equal to support of itemset containing both X and Y divided by support of itemset containing X.
- confidence(X->Y) = support(X & Y)/ support(X)

- identify all the itemsets which meet a minimum support threshold.
- after picking each item, pick the combination(s) which should pass support threshold.
- lets say {A,B} and {B,C} are frequent, but {A,C} is not, then {A,B,C} will never be evaluated.
- create rules from those items using those which meet a minimum confidence threshold.

- install.packages("arules")
- library(arules)
- groceries=read.transactions("groc.csv", sep=",") // results in a sparse matrix.
- summary(groceries)
- inspect(groceries[1:5])
- itemFrequency(groceries[,1:5]) // item frequency of first 5 items.
- itemFrequencyPlot(groceries, support=0.1) // plot of all items having frequency > 0.1
- itemFrequencyPlot(groceries, topN=20)
- image(groceries[1:5]) // shows the first 5 rows in the sparse matrix.
- image(sample(groceries, 100)) // shows any 100 rows
- myrules=apriori(data=groceries, paramater=list(support=0.001, confidence=0.25, minlen=2))
- /* support is the minimum support for the item.
- confidence is the minimum confidence for a item combination to occur.
- if we set the confidence too high, we will only get obvious results, like milk n bread etc
- if we set the confience too less, we will have lots of rules which may be coincidences.
- minlen specifies the minimum number of items we should have in the rule.
- */
- summary(myrules) // contains support, confidence, and lift for each rule.
- inspect(sort(myrules, by="lift")[1:5]) // top 5 rules sorted by lift.
- myRulesWithAnItem=subset(myrules, items %in% "myItem")
- myRulesWithAnItem=subset(myrules, items %in% c("myItem1", "myItem2"))
- inspect(myRulesWithAnItem)

Rule results classfied into three types

- Actionable
- Actionable
- Inexplicable