Weka 简明教程

Weka - File Formats

WEKA 支持大量的数据文件格式。以下是完整列表 −

WEKA supports a large number of file formats for the data. Here is the complete list −

arff
arff.gz
bsi
csv
dat
data
json
json.gz
libsvm
m
names
xrff
xrff.gz

它支持的文件类型列在屏幕底部的下拉列表框中。这在下面给出的屏幕截图中显示。

The types of files that it supports are listed in the drop-down list box at the bottom of the screen. This is shown in the screenshot given below.

您会注意到它支持多种格式，包括 CSV 和 JSON。默认文件类型是 Arff。

As you would notice it supports several formats including CSV and JSON. The default file type is Arff.

Arff Format

Arff 文件包含两个部分——头和数据。

An Arff file contains two sections - header and data.

The header describes the attribute types.
The data section contains a comma separated list of data.

作为 Arff 格式的示例，下面显示了从 WEKA 样本数据库加载的 Weather 数据文件 −

As an example for Arff format, the Weather data file loaded from the WEKA sample databases is shown below −

从屏幕截图中，您可以推断以下几点 −

From the screenshot, you can infer the following points −

The @relation tag defines the name of the database.
The @attribute tag defines the attributes.
The @data tag starts the list of data rows each containing the comma separated fields.
The attributes can take nominal values as in the case of outlook shown here −

@attribute outlook (sunny, overcast, rainy)

The attributes can take real values as in this case −

@attribute temperature real

You can also set a Target or a Class variable called play as shown here −

@attribute play (yes, no)

The Target assumes two nominal values yes or no.

Other Formats

Explorer 可加载任何早期提到的格式中的数据。由于 arff 是 WEKA 中的首选格式，你可以从任何格式中加载数据，并将其保存到 arff 格式中以供以后使用。在预处理数据后，只需将其保存为 arff 格式以供进一步分析。

The Explorer can load the data in any of the earlier mentioned formats. As arff is the preferred format in WEKA, you may load the data from any format and save it to arff format for later use. After preprocessing the data, just save it to arff format for further analysis.

现在你已了解如何将数据加载到 WEKA，在下个章节中，你将学习如何预处理数据。

Now that you have learned how to load data into WEKA, in the next chapter, you will learn how to preprocess the data.