This is a complete pipeline of association rule learning when using [[Python]], delegating all the heavy-lifting to the [[mlxtend]] package.
## Input data
The input format expects a set of **transactions**: sets of items that have been consumed together. For example, this list of sets is a valid starting point for this task:
```python
sports_per_device = (
attendance
.groupby('identifier')
.agg({'sport': set})
.sport.to_list()
# [
# {'Artistic Gymnastics', '3x3 Basketball'},
# {'3x3 Basketball'},
# {'Archery', '3x3 Basketball'},
# {'Athletics', 'Archery'},
# {'3x3 Basketball'},
# ...
# ]
```
Before feeding it to [[mlxtend]]'s algorithms, we need to format the data using a `TransactionEncoder`, which will format the data into a sparse matrix:
```python
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
transactions = te.fit_transform(sports_per_device)
```
## Frequent item sets
Once we have the data ready, we can compute the frequent item sets using any of the provided algorithms: [[Apriori]], [[FP-Growth]], or [[FP-Max]]:
```python
from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth
df = pd.DataFrame(transactions, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=.0001, use_colnames=True)
## Alternatively:
# frequent_itemsets = fpgrowth(df, min_support=.0001, use_colnames=True)
# frequent_itemsets = fpmax(df, min_support=.0001, use_colnames=True)
# support itemsets
# 5 0.267259 (Badminton)
# 1 0.098270 (Archery)
# 16 0.068881 (Fencing)
# 32 0.061534 (Water Polo)
# 2 0.053268 (Artistic Gymnastics)
```
## Association rules
Once these items are computed, we can call the `assocation_rules` function to perform the [[Association Rules|association rule learning]] and derive all the necessary metrics:
```python
from mlxtend.frequent_patterns import association_rules
association_rules(
frequent_itemsets,
metric="confidence",
min_threshold=0.7
)
# antecedents (Breaking, Cycling BMX Freestyle)
# consequents (3x3 Basketball)
# antecedent support 0.000306
# consequent support 0.044237
# support 0.00023
# confidence 0.75
# lift 16.954152
# leverage 0.000216
# conviction 3.823052
# zhangs_metric 0.941306
```