Weka 简明教程

Weka - Feature Selection

When a database contains a large number of attributes, there will be several attributes which do not become significant in the analysis that you are currently seeking. Thus, removing the unwanted attributes from the dataset becomes an important task in developing a good machine learning model.

You may examine the entire dataset visually and decide on the irrelevant attributes. This could be a huge task for databases containing a large number of attributes like the supermarket case that you saw in an earlier lesson. Fortunately, WEKA provides an automated tool for feature selection.

This chapter demonstrate this feature on a database containing a large number of attributes.

Loading Data

In the Preprocess tag of the WEKA explorer, select the labor.arff file for loading into the system. When you load the data, you will see the following screen −

loading data

Notice that there are 17 attributes. Our task is to create a reduced dataset by eliminating some of the attributes which are irrelevant to our analysis.

Features Extraction

Click on the *Select attributes*TAB.You will see the following screen −

select attributes

Under the Attribute Evaluator and Search Method, you will find several options. We will just use the defaults here. In the Attribute Selection Mode, use full training set option.

Click on the Start button to process the dataset. You will see the following output −

start dataset

在结果窗口的底部,您将获得 Selected 属性列表。要获取可视化表示,请右键单击 Result 列表中的结果。

Explorer 在以下屏幕截图中显示了输出 −

screenshot output

单击任何正方形都会为您提供供进一步分析的数据图。一个典型的数据图如下所示 −

data plot

这与我们在前面章节中看到的内容类似。使用可用的不同选项来分析结果。

What’s Next?

目前为止,您已经见识到了 WEKA 在快速开发机器学习模型方面的强大功能。我们使用的是一个名为 Explorer 的图形工具来开发这些模型。WEKA 还提供了一个命令行界面,该界面提供了比 explorer 中提供的更强大的功能。

单击 G*UI Chooser* 应用程序中的 Simple CLI 按钮会启动此命令行界面,如下面的屏幕截图所示 −

gui chooser

在底部的输入框中键入命令。您将能够使用该资源管理器所做的一切和其他更多内容。有关详细信息,请参阅 WEKA documentation ([role="bare"] [role="bare"]https://www.cs.waikato.ac.nz/ml/weka/documentation.html )。

最后,WEKA 是使用 Java 开发的并提供对其 API 的接口。因此,如果您是 Java 开发人员并热衷于在自己的 Java 项目中包含 WEKA ML 实施,则可以轻松做到。

Conclusion

WEKA 是开发机器学习模型的强大工具。它提供了几种最广泛使用的 ML 算法的实现。在将这些算法应用于数据集之前,它还允许您预处理数据。支持的算法类型在分类、集群、关联和选择属性下进行分类。可以通过美观且强大的可视化表示来可视化处理的各个阶段的结果。这使得数据科学家可以更轻松地快速在其数据集上应用各种机器学习技术,比较结果并为最终用途创建最佳模型。