Weka 简明教程

Weka - Introduction

任何机器学习应用程序的基础都是数据——不仅仅是少量数据,而是大量数据,在当前术语中称为 Big Data

The foundation of any Machine Learning application is data - not just a little data but a huge data which is termed as Big Data in the current terminology.

要训练机器分析大数据,您需要对数据进行多方面考虑 −

To train the machine to analyze big data, you need to have several considerations on the data −

  1. The data must be clean.

  2. It should not contain null values.

此外,数据表中并非所有列都对您尝试实现的分析类型有用。在将数据输入机器学习算法之前,必须删除无关数据列或机器学习术语中称之为“特征”的数据列。

Besides, not all the columns in the data table would be useful for the type of analytics that you are trying to achieve. The irrelevant data columns or ‘features’ as termed in Machine Learning terminology, must be removed before the data is fed into a machine learning algorithm.

简而言之,在可用于机器学习之前,大数据需要大量预处理。一旦数据准备就绪,您将应用各种机器学习算法,例如分类、回归、聚类等,以解决您那里的问题。

In short, your big data needs lots of preprocessing before it can be used for Machine Learning. Once the data is ready, you would apply various Machine Learning algorithms such as classification, regression, clustering and so on to solve the problem at your end.

您应用的算法类型在很大程度上取决于您的领域知识。即使在同一类型中(例如分类),也有多种算法可用。您可能希望在同一类中测试不同的算法以构建高效的机器学习模型。在执行此操作时,您更喜欢可视化处理后的数据,因此您还需要可视化工具。

The type of algorithms that you apply is based largely on your domain knowledge. Even within the same type, for example classification, there are several algorithms available. You may like to test the different algorithms under the same class to build an efficient machine learning model. While doing so, you would prefer visualization of the processed data and thus you also require visualization tools.

在即将到来的章节中,您将了解 Weka,这是一款可以轻松完成上述所有操作并让您舒适地处理大数据的软件。

In the upcoming chapters, you will learn about Weka, a software that accomplishes all the above with ease and lets you work with big data comfortably.