Artificial Intelligence With Python 简明教程

AI with Python – Supervised Learning: Regression

回归是最重要的统计工具和机器学习工具之一。如果我们说机器学习之旅是从回归开始的，那么我们不会讲错。可以将回归定义为参数化技术，它使我们能够根据数据做出决策，或者换句话说，它使我们能够通过学习输入和输出变量之间的关系，根据数据进行预测。在此，输出变量取决于输入变量，是连续取值实数。在回归中，输入和输出变量之间的关系很重要，它帮助我们了解输出变量的值如何随输入变量的变化而变化。回归经常用于预测价格、经济、方差等。

Regression is one of the most important statistical and machine learning tools. We would not be wrong to say that the journey of machine learning starts from regression. It may be defined as the parametric technique that allows us to make decisions based upon data or in other words allows us to make predictions based upon data by learning the relationship between input and output variables. Here, the output variables dependent on the input variables, are continuous-valued real numbers. In regression, the relationship between input and output variables matters and it helps us in understanding how the value of the output variable changes with the change of input variable. Regression is frequently used for prediction of prices, economics, variations, and so on.

Building Regressors in Python

在本部分中，我们将学习如何构建单变量和多变量回归器。

In this section, we will learn how to build single as well as multivariable regressor.

Linear Regressor/Single Variable Regressor

让我们导入几个必需的软件包 -

Let us important a few required packages −

import numpy as np
from sklearn import linear_model
import sklearn.metrics as sm
import matplotlib.pyplot as plt

现在，我们需要提供输入数据，我们已将数据保存在名为 linear.txt 的文件中。

Now, we need to provide the input data and we have saved our data in the file named linear.txt.

input = 'D:/ProgramData/linear.txt'

我们需要使用 np.loadtxt 函数加载此数据。

We need to load this data by using the np.loadtxt function.

input_data = np.loadtxt(input, delimiter=',')
X, y = input_data[:, :-1], input_data[:, -1]

下一步是训练模型。让我们提供训练样本和测试样本。

The next step would be to train the model. Let us give training and testing samples.

training_samples = int(0.6 * len(X))
testing_samples = len(X) - num_training

X_train, y_train = X[:training_samples], y[:training_samples]

X_test, y_test = X[training_samples:], y[training_samples:]

现在，我们需要创建一个线性回归器对象。

Now, we need to create a linear regressor object.

reg_linear = linear_model.LinearRegression()

使用训练样本对对象进行训练。

Train the object with the training samples.

reg_linear.fit(X_train, y_train)

我们需要使用测试数据进行预测。

We need to do the prediction with the testing data.

y_test_pred = reg_linear.predict(X_test)

现在绘制并可视化数据。

Now plot and visualize the data.

plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_test, y_test_pred, color = 'black', linewidth = 2)
plt.xticks(())
plt.yticks(())
plt.show()

Output

现在，我们可以按如下所示计算我们的线性回归的性能 -

Now, we can compute the performance of our linear regression as follows −

print("Performance of Linear regressor:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2))
print("Explain variance score =", round(sm.explained_variance_score(y_test, y_test_pred),
2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))

Output

线性回归器的性能 -

Performance of Linear Regressor −

Mean absolute error = 1.78
Mean squared error = 3.89
Median absolute error = 2.01
Explain variance score = -0.09
R2 score = -0.09

在上面的代码中，我们使用了这个小数据。如果你想要一些大的数据集，那么你可以使用 sklearn.dataset 导入更大的数据集。

In the above code, we have used this small data. If you want some big dataset then you can use sklearn.dataset to import bigger dataset.

2,4.82.9,4.72.5,53.2,5.56,57.6,43.2,0.92.9,1.92.4,
3.50.5,3.41,40.9,5.91.2,2.583.2,5.65.1,1.54.5,
1.22.3,6.32.1,2.8

Multivariable Regressor

首先，让我们导入一些必需的软件包 -

First, let us import a few required packages −

import numpy as np
from sklearn import linear_model
import sklearn.metrics as sm
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures

现在，我们需要提供输入数据，我们已将数据保存在名为 linear.txt 的文件中。

Now, we need to provide the input data and we have saved our data in the file named linear.txt.

input = 'D:/ProgramData/Mul_linear.txt'

我们将使用 np.loadtxt 函数加载此数据。

We will load this data by using the np.loadtxt function.

input_data = np.loadtxt(input, delimiter=',')
X, y = input_data[:, :-1], input_data[:, -1]

下一步是训练模型，我们将提供训练样本和测试样本。

The next step would be to train the model; we will give training and testing samples.

training_samples = int(0.6 * len(X))
testing_samples = len(X) - num_training

X_train, y_train = X[:training_samples], y[:training_samples]

X_test, y_test = X[training_samples:], y[training_samples:]

现在，我们需要创建一个线性回归器对象。

Now, we need to create a linear regressor object.

reg_linear_mul = linear_model.LinearRegression()

使用训练样本对对象进行训练。

Train the object with the training samples.

reg_linear_mul.fit(X_train, y_train)

现在，最后我们需要使用测试数据进行预测了。

Now, at last we need to do the prediction with the testing data.

y_test_pred = reg_linear_mul.predict(X_test)

print("Performance of Linear regressor:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2))
print("Explain variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))

Output

线性回归器的性能 -

Performance of Linear Regressor −

Mean absolute error = 0.6
Mean squared error = 0.65
Median absolute error = 0.41
Explain variance score = 0.34
R2 score = 0.33

现在，我们将创建一个 10 阶多项式，并训练回归器。我们将提供样本数据点。

Now, we will create a polynomial of degree 10 and train the regressor. We will provide the sample data point.

polynomial = PolynomialFeatures(degree = 10)
X_train_transformed = polynomial.fit_transform(X_train)
datapoint = [[2.23, 1.35, 1.12]]
poly_datapoint = polynomial.fit_transform(datapoint)

poly_linear_model = linear_model.LinearRegression()
poly_linear_model.fit(X_train_transformed, y_train)
print("\nLinear regression:\n", reg_linear_mul.predict(datapoint))
print("\nPolynomial regression:\n", poly_linear_model.predict(poly_datapoint))

Output

线性回归 -

Linear regression −

[2.40170462]

多项回归 −

Polynomial regression −

[1.8697225]

在上面的代码中，我们使用了小数据。如果你想要一个大的数据集，你可以使用sklearn.dataset导入较大的数据集。

In the above code, we have used this small data. If you want a big dataset then, you can use sklearn.dataset to import a bigger dataset.

2,4.8,1.2,3.22.9,4.7,1.5,3.62.5,5,2.8,23.2,5.5,3.5,2.16,5,
2,3.27.6,4,1.2,3.23.2,0.9,2.3,1.42.9,1.9,2.3,1.22.4,3.5,
2.8,3.60.5,3.4,1.8,2.91,4,3,2.50.9,5.9,5.6,0.81.2,2.58,
3.45,1.233.2,5.6,2,3.25.1,1.5,1.2,1.34.5,1.2,4.1,2.32.3,
6.3,2.5,3.22.1,2.8,1.2,3.6