Weka 简明教程

What is Weka?

WEKA——一款开源软件提供了数据预处理、实现多种机器学习算法和可视化工具,使您可以开发机器学习技术并将其应用到实际数据挖掘问题中。WEKA 提供的内容总结在下图中−

WEKA - an open source software provides tools for data preprocessing, implementation of several Machine Learning algorithms, and visualization tools so that you can develop machine learning techniques and apply them to real-world data mining problems. What WEKA offers is summarized in the following diagram −

weka summarized

如果您观察图像流程的开始,您会明白处理大数据以使其适合机器学习的阶段很多−

If you observe the beginning of the flow of the image, you will understand that there are many stages in dealing with Big Data to make it suitable for machine learning −

首先,您将从现场收集原始数据开始。这些数据可能包含多个空值和不相干的字段。您可以使用 WEKA 中提供的预处理数据工具来清理数据。

First, you will start with the raw data collected from the field. This data may contain several null values and irrelevant fields. You use the data preprocessing tools provided in WEKA to cleanse the data.

然后,您将预处理的数据保存在本地存储中以应用 ML 算法。

Then, you would save the preprocessed data in your local storage for applying ML algorithms.

接下来,根据您想要开发的 ML 模型的类型,您将选择 Classify, ClusterAssociate 之类的选项之一。 Attributes Selection 允许自动选择特征以创建缩减数据集。

Next, depending on the kind of ML model that you are trying to develop you would select one of the options such as Classify, Cluster, or Associate. The Attributes Selection allows the automatic selection of features to create a reduced dataset.

请注意,在每个类别下,WEKA 都提供了多种算法的实现。您将选择您选择的算法,设置所需的 parameters 并针对数据集运行它。

Note that under each category, WEKA provides the implementation of several algorithms. You would select an algorithm of your choice, set the desired parameters and run it on the dataset.

然后,WEKA 会提供模型处理的统计输出。它提供了可视化工具来检查数据。

Then, WEKA would give you the statistical output of the model processing. It provides you a visualization tool to inspect the data.

可以对相同的数据集应用各种模型。然后可以比较不同模型的输出,并选择最符合您目的的模型。

The various models can be applied on the same dataset. You can then compare the outputs of different models and select the best that meets your purpose.

因此,整体而言,使用 WEKA 可以更快地开发机器学习模型。

Thus, the use of WEKA results in a quicker development of machine learning models on the whole.

既然我们已经了解了 WEKA 是什么以及它的作用,接下来一章让我们学习如何在本地计算机上安装 WEKA。

Now that we have seen what WEKA is and what it does, in the next chapter let us learn how to install WEKA on your local computer.