Association Rule Learning in Python - Diego's Digital Garden

This is a complete pipeline of association rule learning when using [[Python]], delegating all the heavy-lifting to the [[mlxtend]] package. ## Input data The input format expects a set of **transactions**: sets of items that have been consumed together. For example, this list of sets is a valid starting point for this task: ```python sports_per_device = ( attendance .groupby('identifier') .agg({'sport': set}) .sport.to_list() # [ # {'Artistic Gymnastics', '3x3 Basketball'}, # {'3x3 Basketball'}, # {'Archery', '3x3 Basketball'}, # {'Athletics', 'Archery'}, # {'3x3 Basketball'}, # ... # ] ``` Before feeding it to [[mlxtend]]'s algorithms, we need to format the data using a `TransactionEncoder`, which will format the data into a sparse matrix: ```python from mlxtend.preprocessing import TransactionEncoder te = TransactionEncoder() transactions = te.fit_transform(sports_per_device) ``` ## Frequent item sets Once we have the data ready, we can compute the frequent item sets using any of the provided algorithms: [[Apriori]], [[FP-Growth]], or [[FP-Max]]: ```python from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth df = pd.DataFrame(transactions, columns=te.columns_) frequent_itemsets = apriori(df, min_support=.0001, use_colnames=True) ## Alternatively: # frequent_itemsets = fpgrowth(df, min_support=.0001, use_colnames=True) # frequent_itemsets = fpmax(df, min_support=.0001, use_colnames=True) # support itemsets # 5 0.267259 (Badminton) # 1 0.098270 (Archery) # 16 0.068881 (Fencing) # 32 0.061534 (Water Polo) # 2 0.053268 (Artistic Gymnastics) ``` ## Association rules Once these items are computed, we can call the `assocation_rules` function to perform the [[Association Rules|association rule learning]] and derive all the necessary metrics: ```python from mlxtend.frequent_patterns import association_rules association_rules( frequent_itemsets, metric="confidence", min_threshold=0.7 ) # antecedents (Breaking, Cycling BMX Freestyle) # consequents (3x3 Basketball) # antecedent support 0.000306 # consequent support 0.044237 # support 0.00023 # confidence 0.75 # lift 16.954152 # leverage 0.000216 # conviction 3.823052 # zhangs_metric 0.941306 ```