Machine Learning With Python 简明教程
Regression Algorithms - Overview
Introduction to Regression
回归是另一个重要且广泛使用的统计和机器学习工具。基于回归的任务的主要目标是为给定的输入数据预测输出标签或响应,这些输出标签或响应是连续数值。输出将基于模型在训练阶段中学习到的内容。基本上,回归模型使用输入数据特征(自变量)及其相应的连续数值输出值(因变量或结果变量)来学习输入和相应输出之间的特定关联。
Regression is another important and broadly used statistical and machine learning tool. The key objective of regression-based tasks is to predict output labels or responses which are continues numeric values, for the given input data. The output will be based on what the model has learned in training phase. Basically, regression models use the input data features (independent variables) and their corresponding continuous numeric output values (dependent or outcome variables) to learn specific association between inputs and corresponding outputs.
Types of Regression Models
回归模型分为以下两类:
Regression models are of following two types −
Simple regression model - 这是最基本的回归模型,其中预测是从数据的单一单变量特征形成的。
Simple regression model − This is the most basic regression model in which predictions are formed from a single, univariate feature of the data.
Multiple regression model - 正如名称所示,在此回归模型中,预测是从数据的多个特征形成的。
Multiple regression model − As name implies, in this regression model the predictions are formed from multiple features of the data.
Building a Regressor in Python
Python 中的回归模型的构造方式与分类器的构造方式相同。Scikit-learn 是一个用于机器学习的 Python 库,还可以用于在 Python 中构建回归模型。
Regressor model in Python can be constructed just like we constructed the classifier. Scikit-learn, a Python library for machine learning can also be used to build a regressor in Python.
在以下示例中,我们将构建一个基本的回归模型,它将拟合一条数据线,即线性回归模型。在 Python 中构建回归模型所需的步骤如下:
In the following example, we will be building basic regression model that will fit a line to the data i.e. linear regressor. The necessary steps for building a regressor in Python are as follows −
Step 1: Importing necessary python package
要使用 scikit-learn 构建回归模型,我们需要导入它以及其他必需的包。我们可以使用以下脚本导入:
For building a regressor using scikit-learn, we need to import it along with other necessary packages. We can import the by using following script −
import numpy as np
from sklearn import linear_model
import sklearn.metrics as sm
import matplotlib.pyplot as plt
Step 2: Importing dataset
在导入必要的包之后,我们需要一个数据集来构建回归预测模型。我们可以从 sklearn 数据集中导入它,也可以根据我们的需求使用其他数据集。我们将使用保存的输入数据。我们可以借助以下脚本导入:
After importing necessary package, we need a dataset to build regression prediction model. We can import it from sklearn dataset or can use other one as per our requirement. We are going to use our saved input data. We can import it with the help of following script −
input = r'C:\linear.txt'
接下来,我们需要加载此数据。我们使用 np.loadtxt 函数来加载它。
Next, we need to load this data. We are using np.loadtxt function to load it.
input_data = np.loadtxt(input, delimiter=',')
X, y = input_data[:, :-1], input_data[:, -1]
Step 3: Organizing data into training & testing sets
由于我们需要在未见数据上测试我们的模型,因此我们将数据集分为两部分:训练集和测试集。以下命令将执行此操作:
As we need to test our model on unseen data hence, we will divide our dataset into two parts: a training set and a test set. The following command will perform it −
training_samples = int(0.6 * len(X))
testing_samples = len(X) - num_training
X_train, y_train = X[:training_samples], y[:training_samples]
X_test, y_test = X[training_samples:], y[training_samples:]
Step 4: Model evaluation & prediction
在将数据划分为训练和测试后,我们需要构建模型。我们将为此目的使用 Scikit-learn 的 LineaRegression() 函数。以下命令将创建一个线性回归对象。
After dividing the data into training and testing we need to build the model. We will be using LineaRegression() function of Scikit-learn for this purpose. Following command will create a linear regressor object.
reg_linear= linear_model.LinearRegression()
接下来,使用训练样本训练此模型,如下所示:
Next, train this model with the training samples as follows −
reg_linear.fit(X_train, y_train)
现在,最后我们需要使用测试数据进行预测了。
Now, at last we need to do the prediction with the testing data.
y_test_pred = reg_linear.predict(X_test)
Step 5: Plot & visualization
预测之后,我们可以借助以下脚本绘制并对其进行可视化:
After prediction, we can plot and visualize it with the help of following script −
Example
plt.scatter(X_test, y_test, color='red')
plt.plot(X_test, y_test_pred, color='black', linewidth=2)
plt.xticks(())
plt.yticks(())
plt.show()
Output
在上面的输出中,我们可以在数据点之间看到回归线。
In the above output, we can see the regression line between the data points.
Step 6: Performance computation
我们还可以使用各种性能指标来计算回归模型的性能,如下所示 −
We can also compute the performance of our regression model with the help of various performance metrics as follows −
Example
print("Regressor model performance:")
print("Mean absolute error(MAE) =", round(sm.mean_absolute_error(y_test, y_test_pred), 2))
print("Mean squared error(MSE) =", round(sm.mean_squared_error(y_test, y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2))
print("Explain variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))
Output
Regressor model performance:
Mean absolute error(MAE) = 1.78
Mean squared error(MSE) = 3.89
Median absolute error = 2.01
Explain variance score = -0.09
R2 score = -0.09
Types of ML Regression Algorithms
最有用的流行 ML 回归算法是线性回归算法,它进一步分为两类:
The most useful and popular ML regression algorithm is Linear regression algorithm which further divided into two types namely −
-
Simple Linear Regression algorithm
-
Multiple Linear Regression algorithm.
我们将在下一章中对其进行讨论并在 Python 中实现它。
We will discuss about it and implement it in Python in the next chapter.
Applications
ML 回归算法的应用如下:
The applications of ML regression algorithms are as follows −
Forecasting or Predictive analysis - 回归的一个重要用途是预测或预测分析。例如,我们可以预测 GDP、石油价格或简单来说随着时间的推移而变化的数量化数据。
Forecasting or Predictive analysis − One of the important uses of regression is forecasting or predictive analysis. For example, we can forecast GDP, oil prices or in simple words the quantitative data that changes with the passage of time.
Optimization − 我们可以在回归的帮助下优化业务流程。例如,商店经理可以创建统计模型以了解顾客高峰期。
Optimization − We can optimize business processes with the help of regression. For example, a store manager can create a statistical model to understand the peek time of coming of customers.
Error correction − 在业务中,做出正确的决定与优化业务流程一样重要。回归可以帮助我们做出正确的决定,并帮助纠正在已执行的决策。
Error correction − In business, taking correct decision is equally important as optimizing the business process. Regression can help us to take correct decision as well in correcting the already implemented decision.
Economics − 这是经济学中最常用的工具。我们可以使用回归来预测供给、需求、消费、库存投资等。
Economics − It is the most used tool in economics. We can use regression to predict supply, demand, consumption, inventory investment etc.
Finance − 金融公司始终对最大程度降低风险组合感兴趣,并且想知道影响客户的因素。所有这些都可以使用回归模型进行预测。
Finance − A financial company is always interested in minimizing the risk portfolio and want to know the factors that affects the customers. All these can be predicted with the help of regression model.