Weka 简明教程

Weka - Association

观察显示,购买啤酒的人同时也会购买尿布。也就是说,在同时购买啤酒和尿布中存在关联。尽管这似乎不太令人信服,但这条关联规则是从超市的庞大数据库中挖掘出来的。类似地,可以在花生酱和面包之间找到关联。

It was observed that people who buy beer also buy diapers at the same time. That is there is an association in buying beer and diapers together. Though this seems not well convincing, this association rule was mined from huge databases of supermarkets. Similarly, an association may be found between peanut butter and bread.

找到此类关联对于超市变得至关重要,因为超市会将尿布与啤酒放在一起,以便顾客可以轻松找到这两件商品,从而增加超市的销售额。

Finding such associations becomes vital for supermarkets as they would stock diapers next to beers so that customers can locate both items easily resulting in an increased sale for the supermarket.

Apriori 算法就是机器学习中可以找出可能关联并创建关联规则的算法之一。WEKA 提供了 Apriori 算法的实现。您可以在计算这些规则时定义最低支持度和可接受的置信度。您将把 Apriori 算法应用于 WEKA 安装中提供的 supermarket 数据。

The Apriori algorithm is one such algorithm in ML that finds out the probable associations and creates association rules. WEKA provides the implementation of the Apriori algorithm. You can define the minimum support and an acceptable confidence level while computing these rules. You will apply the Apriori algorithm to the supermarket data provided in the WEKA installation.

Loading Data

在 WEKA 浏览器中,打开 Preprocess 标签,单击 Open file …​ 按钮,然后从安装文件夹中选择 supermarket.arff 数据库。加载数据后,您将看到以下屏幕 −

In the WEKA explorer, open the Preprocess tab, click on the Open file …​ button and select supermarket.arff database from the installation folder. After the data is loaded you will see the following screen −

loading data

该数据库包含 4627 个实例和 217 个属性。您可以轻松了解检测如此多的属性之间的关联有多困难。幸运的是,此任务在 Apriori 算法的帮助下已自动化。

The database contains 4627 instances and 217 attributes. You can easily understand how difficult it would be to detect the association between such a large number of attributes. Fortunately, this task is automated with the help of Apriori algorithm.

Associator

单击 Associate 标签,然后单击 Choose 按钮。选择 Apriori 关联,如屏幕快照所示 −

Click on the Associate TAB and click on the Choose button. Select the Apriori association as shown in the screenshot −

associate tab

若要为 Apriori 算法设置参数,请单击其名称,将弹出一个窗口,如下所示,允许您设置参数 −

To set the parameters for the Apriori algorithm, click on its name, a window will pop up as shown below that allows you to set the parameters −

apriori algorithm

设置参数后,单击 Start 按钮。过一会儿,您将看到屏幕快照中显示的结果 −

After you set the parameters, click the Start button. After a while you will see the results as shown in the screenshot below −

start parameters

在底部,您将找到已检测到的最佳关联规则。这将帮助超市将产品存放在合适的货架上。

At the bottom, you will find the detected best rules of associations. This will help the supermarket in stocking their products in appropriate shelves.