数据集中的频繁项目集(关联规则挖掘)

📌 相关文章

📜 数据集中的频繁项目集(关联规则挖掘)

📅 最后修改于: 2021-04-17 02:12:59 🧑 作者: Mango

关联挖掘在数据集中搜索频繁项。在频繁的挖掘中，通常会在事务和关系数据库中找到项目集之间有趣的关联和相关性。简而言之，“频繁采矿”显示哪些项目在交易或关系中一起出现。

需要关联挖掘：
频繁挖掘是根据交易数据集生成关联规则。如果有X和Y经常购买的2件物品，那么最好将它们放在商店中，或者在购买另一件物品时提供某些物品的折扣优惠。这确实可以增加销售量。例如，很可能会发现，如果客户购买牛奶和面包，那么他/她也会购买黄油。
因此，关联规则为[‘milk] ^ [‘bread’] => [‘butter’] 。因此，卖方可以建议客户在购买牛奶和面包时购买黄油。

重要定义：

支持：这是衡量趣味性的一种方法。这说明了规则的有用性和确定性。 5％支持意味着数据库中总计5％的交易遵循该规则。
```
Support(A -> B) = Support_count(A ∪ B)
```
信心： 60％的信心意味着60％的购买牛奶和面包的顾客也购买了黄油。
```
Confidence(A -> B) = Support_count(A ∪ B) / Support_count(A)
```
如果一个规则同时满足最小支持和最小置信度，则这是一个强有力的规则。

Support_count(X) : Number of transactions in which X appears. If X is A union B then it is the number of transactions in which A and B both are present.
最大项目集：如果一个项目集的超集都不频繁，则它的频率最高。
封闭的项目集：如果某个项目集的直接超集都不具有与项目集相同的支持计数，则该项目集将被关闭。
K-项目集：包含K个项目的项目集是K个项目集。因此可以说，如果相应的支持数量大于最小支持数量，则项目集很频繁。

查找频繁项集的示例–
考虑具有给定交易的给定数据集。

可以说最低支持数是3
关系保持最大频繁=>关闭=>频繁

1-frequent:
{A} = 3; // not closed due to {A, C} and not maximal
{B} = 4; // not closed due to {B, D} and no maximal
{C} = 4; // not closed due to {C, D} not maximal
{D} = 5; // closed item-set since not immediate super-set has same count. Not maximal

2-frequent:
{A, B} = 2 // not frequent because support count < minimum support count so ignore
{A, C} = 3 // not closed due to {A, C, D}
{A, D} = 3 // not closed due to {A, C, D}
{B, C} = 3 // not closed due to {B, C, D}
{B, D} = 4 // closed but not maximal due to {B, C, D}
{C, D} = 4 // closed but not maximal due to {B, C, D}

3-frequent:
{A, B, C} = 2 // ignore not frequent because support count < minimum support count
{A, B, D} = 2 // ignore not frequent because support count < minimum support count
{A, C, D} = 3 // maximal frequent
{B, C, D} = 3 // maximal frequent

4-frequent:
{A, B, C, D} = 2 //ignore not frequent

为什么编程需要懂一点英语