Weka 简明教程
Weka - Loading Data
在本章中,我们从用于预处理数据的第一个选项卡开始。对于所有应用于数据的算法来说,这是通用的,也是 WEKA 中所有后续操作的通用步骤。
In this chapter, we start with the first tab that you use to preprocess the data. This is common to all algorithms that you would apply to your data for building the model and is a common step for all subsequent operations in WEKA.
为了让机器学习算法给出可接受的准确性,对数据进行清洗非常重要。这是因为从现场收集的原始数据可能包含空值、无关列等。
For a machine learning algorithm to give acceptable accuracy, it is important that you must cleanse your data first. This is because the raw data collected from the field may contain null values, irrelevant columns and so on.
在本章中,你将学习如何预处理原始数据,并创建干净、有意义的数据集以供进一步使用。
In this chapter, you will learn how to preprocess the raw data and create a clean, meaningful dataset for further use.
首先,你将学习如何将数据文件加载到 WEKA 资源管理器中。数据可以从以下来源加载 −
First, you will learn to load the data file into the WEKA explorer. The data can be loaded from the following sources −
-
Local file system
-
Web
-
Database
在本章中,我们将详细了解加载数据的这三个选项。
In this chapter, we will see all the three options of loading data in detail.
Loading Data from Local File System
在你学习的前一课中学到的机器学习选项卡正下方,你会找到以下三个按钮 −
Just under the Machine Learning tabs that you studied in the previous lesson, you would find the following three buttons −
-
Open file …
-
Open URL …
-
Open DB …
单击 Open file … 按钮。将打开一个目录导航器窗口,如下面的屏幕所示 −
Click on the Open file … button. A directory navigator window opens as shown in the following screen −

现在,导航到存储数据文件的文件夹。WEKA 安装附带了许多示例数据库供你进行试验。这些数据库可在 WEKA 安装的 data 文件夹中找到。
Now, navigate to the folder where your data files are stored. WEKA installation comes up with many sample databases for you to experiment. These are available in the data folder of the WEKA installation.
出于学习目的,从此文件夹中选择任何数据文件。该文件的内容将加载到 WEKA 环境中。我们将很快学习如何检查和处理这些加载的数据。在此之前,让我们看看如何从 Web 加载数据文件。
For learning purpose, select any data file from this folder. The contents of the file would be loaded in the WEKA environment. We will very soon learn how to inspect and process this loaded data. Before that, let us look at how to load the data file from the Web.
Loading Data from Web
单击 Open URL … 按钮后,将看到一个窗口,如下所示:
Once you click on the Open URL … button, you can see a window as follows −

我们将从公共 URL 打开该文件 在弹出框中输入以下 URL −
We will open the file from a public URL Type the following URL in the popup box −
链接: [https://storm.cis.fordham.edu/ gweiss/data-mining/weka-data/weather.nominal.arff[https://storm.cis.fordham.edu/ gweiss/data-mining/weka-data/weather.nominal.arff]
你还可以指定存储数据的任何其他 URL。 Explorer 会将数据从远程站点加载到其环境中。
You may specify any other URL where your data is stored. The Explorer will load the data from the remote site into its environment.
Loading Data from DB
一旦你点击了 Open DB … 按钮,你就可以看到如下窗口 −
Once you click on the Open DB … button, you can see a window as follows −

设置数据库的连接字符串,设置用于选择数据的查询,处理查询并在 WEKA 中加载已选记录。
Set the connection string to your database, set up the query for data selection, process the query and load the selected records in WEKA.