Scikit Learn 简明教程
Scikit Learn - Support Vector Machines
本章讨论了一种称为支持向量机 (SVM) 的机器学习方法。
This chapter deals with a machine learning method termed as Support Vector Machines (SVMs).
Introduction
支持向量机 (SVM) 是强大而灵活的监督式机器学习方法,用于分类、回归和异常值检测。SVM 在高维空间中非常高效,通常用于分类问题。SVM 很流行且内存高效,因为它们在决策函数中使用训练点子集。
Support vector machines (SVMs) are powerful yet flexible supervised machine learning methods used for classification, regression, and, outliers’ detection. SVMs are very efficient in high dimensional spaces and generally are used in classification problems. SVMs are popular and memory efficient because they use a subset of training points in the decision function.
SVM 的主要目标是将数据集划分为多个类别,以便找到 maximum marginal hyperplane (MMH) ,这可以通过以下两步完成 −
The main goal of SVMs is to divide the datasets into number of classes in order to find a maximum marginal hyperplane (MMH) which can be done in the following two steps −
-
Support Vector Machines will first generate hyperplanes iteratively that separates the classes in the best way.
-
After that it will choose the hyperplane that segregate the classes correctly.
SVM 中的一些重要概念如下 −
Some important concepts in SVM are as follows −
-
Support Vectors − They may be defined as the datapoints which are closest to the hyperplane. Support vectors help in deciding the separating line.
-
Hyperplane − The decision plane or space that divides set of objects having different classes.
-
Margin − The gap between two lines on the closet data points of different classes is called margin.
以下图表将让您深入了解这些 SVM 概念−
Following diagrams will give you an insight about these SVM concepts −
Scikit-learn 中的 SVM 同时支持稀疏和密集样本向量作为输入。
SVM in Scikit-learn supports both sparse and dense sample vectors as input.
Classification of SVM
Scikit-learn 提供了三个类,即 SVC, NuSVC 和 LinearSVC ,它们可以执行多类分类。
Scikit-learn provides three classes namely SVC, NuSVC and LinearSVC which can perform multiclass-class classification.
SVC
C 支持向量分类,其实现基于 libsvm 。scikit-learn 使用的模块是 sklearn.svm.SVC 。此类根据一对一方案处理多类支持。
It is C-support vector classification whose implementation is based on libsvm. The module used by scikit-learn is sklearn.svm.SVC. This class handles the multiclass support according to one-vs-one scheme.
Parameters
下表包含 sklearn.svm.SVC 类使用的参数:
Followings table consist the parameters used by sklearn.svm.SVC class −
Sr.No |
Parameter & Description |
1 |
C − float, optional, default = 1.0 It is the penalty parameter of the error term. |
2 |
kernel − string, optional, default = ‘rbf’ This parameter specifies the type of kernel to be used in the algorithm. we can choose any one among, ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. The default value of kernel would be ‘rbf’. |
3 |
degree − int, optional, default = 3 It represents the degree of the ‘poly’ kernel function and will be ignored by all other kernels. |
4 |
gamma − {‘scale’, ‘auto’} or float, It is the kernel coefficient for kernels ‘rbf’, ‘poly’ and ‘sigmoid’. |
5 |
optinal default − = ‘scale’ If you choose default i.e. gamma = ‘scale’ then the value of gamma to be used by SVC is 1/(𝑛𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠∗𝑋.𝑣𝑎𝑟()). On the other hand, if gamma= ‘auto’, it uses 1/𝑛𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠. |
6 |
coef0 − float, optional, Default=0.0 An independent term in kernel function which is only significant in ‘poly’ and ‘sigmoid’. |
7 |
tol − float, optional, default = 1.e-3 This parameter represents the stopping criterion for iterations. |
8 |
shrinking − Boolean, optional, default = True This parameter represents that whether we want to use shrinking heuristic or not. |
9 |
verbose − Boolean, default: false It enables or disable verbose output. Its default value is false. |
10 |
probability − boolean, optional, default = true This parameter enables or disables probability estimates. The default value is false, but it must be enabled before we call fit. |
11 |
max_iter − int, optional, default = -1 As name suggest, it represents the maximum number of iterations within the solver. Value -1 means there is no limit on the number of iterations. |
12 |
cache_size − float, optional This parameter will specify the size of the kernel cache. The value will be in MB(MegaBytes). |
13 |
random_state − int, RandomState instance or None, optional, default = none This parameter represents the seed of the pseudo random number generated which is used while shuffling the data. Followings are the options − int − In this case, random_state is the seed used by random number generator. RandomState instance − In this case, random_state is the random number generator. None − In this case, the random number generator is the RandonState instance used by np.random. |
14 |
class_weight − {dict, ‘balanced’}, optional This parameter will set the parameter C of class j to 𝑐𝑙𝑎𝑠𝑠_𝑤𝑒𝑖𝑔ℎ𝑡[𝑗]∗𝐶 for SVC. If we use the default option, it means all the classes are supposed to have weight one. On the other hand, if you choose class_weight:balanced, it will use the values of y to automatically adjust weights. |
15 |
decision_function_shape − ovo’, ‘ovr’, default = ‘ovr’ This parameter will decide whether the algorithm will return ‘ovr’ (one-vs-rest) decision function of shape as all other classifiers, or the original ovo(one-vs-one) decision function of libsvm. |
16 |
break_ties − boolean, optional, default = false True − The predict will break ties according to the confidence values of decision_function False − The predict will return the first class among the tied classes. |
Attributes
下表包含 sklearn.svm.SVC 类使用的属性:
Followings table consist the attributes used by sklearn.svm.SVC class −
Sr.No |
Attributes & Description |
1 |
support_ − array-like, shape = [n_SV] It returns the indices of support vectors. |
2 |
support_vectors_ − array-like, shape = [n_SV, n_features] It returns the support vectors. |
3 |
n_support_ − array-like, dtype=int32, shape = [n_class] It represents the number of support vectors for each class. |
4 |
dual_coef_ − array, shape = [n_class-1,n_SV] These are the coefficient of the support vectors in the decision function. |
5 |
coef_ − array, shape = [n_class * (n_class-1)/2, n_features] This attribute, only available in case of linear kernel, provides the weight assigned to the features. |
6 |
intercept_ − array, shape = [n_class * (n_class-1)/2] It represents the independent term (constant) in decision function. |
7 |
fit_status_ − int The output would be 0 if it is correctly fitted. The output would be 1 if it is incorrectly fitted. |
8 |
classes_ − array of shape = [n_classes] It gives the labels of the classes. |
Implementation Example
Implementation Example
与其他分类器类似,SVC 还必须使用以下两个数组进行拟合 -
Like other classifiers, SVC also has to be fitted with following two arrays −
-
An array X holding the training samples. It is of size [n_samples, n_features].
-
An array Y holding the target values i.e. class labels for the training samples. It is of size [n_samples].
以下 Python 脚本使用 sklearn.svm.SVC 类 -
Following Python script uses sklearn.svm.SVC class −
import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import SVC
SVCClf = SVC(kernel = 'linear',gamma = 'scale', shrinking = False,)
SVCClf.fit(X, y)
Output
SVC(C = 1.0, cache_size = 200, class_weight = None, coef0 = 0.0,
decision_function_shape = 'ovr', degree = 3, gamma = 'scale', kernel = 'linear',
max_iter = -1, probability = False, random_state = None, shrinking = False,
tol = 0.001, verbose = False)
Example
现在,一旦拟合,我们可以借助以下 Python 脚本获取权重向量:
Now, once fitted, we can get the weight vector with the help of following python script −
SVCClf.coef_
Output
array([[0.5, 0.5]])
Example
同样,我们可以获得其他属性的值,如下所示 -
Similarly, we can get the value of other attributes as follows −
SVCClf.predict([[-0.5,-0.8]])
Output
array([1])
Example
SVCClf.n_support_
Output
array([1, 1])
Example
SVCClf.support_vectors_
Output
array(
[
[-1., -1.],
[ 1., 1.]
]
)
Example
SVCClf.support_
Output
array([0, 2])
Example
SVCClf.intercept_
Output
array([-0.])
Example
SVCClf.fit_status_
Output
0
NuSVC
NuSVC 是核支持向量分类。它是 scikit-learn 提供的用于执行多类分类的另一个类。它类似于 SVC,但 NuSVC 接受略有不同的参数集。与 SVC 不同的参数如下 -
NuSVC is Nu Support Vector Classification. It is another class provided by scikit-learn which can perform multi-class classification. It is like SVC but NuSVC accepts slightly different sets of parameters. The parameter which is different from SVC is as follows −
-
nu − float, optional, default = 0.5
它表示训练误差的上限和支持向量数量的下限。其值应在 (o,1] 区间内。
It represents an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Its value should be in the interval of (o,1].
其他参数和属性与 SVC 相同。
Rest of the parameters and attributes are same as of SVC.
Implementation Example
我们还可以使用 sklearn.svm.NuSVC 类实现相同的示例。
We can implement the same example using sklearn.svm.NuSVC class also.
import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
from sklearn.svm import NuSVC
NuSVCClf = NuSVC(kernel = 'linear',gamma = 'scale', shrinking = False,)
NuSVCClf.fit(X, y)
Output
NuSVC(cache_size = 200, class_weight = None, coef0 = 0.0,
decision_function_shape = 'ovr', degree = 3, gamma = 'scale', kernel = 'linear',
max_iter = -1, nu = 0.5, probability = False, random_state = None,
shrinking = False, tol = 0.001, verbose = False)
我们可以像使用 SVC 一样获得其余属性的输出。
We can get the outputs of rest of the attributes as did in the case of SVC.
LinearSVC
它就是线性支持向量分类。它类似于 SVC,其 kernel = ‘linear’。它们之间的区别在于 LinearSVC 根据 liblinear 实现,而 SVC 根据 libsvm 实现。这就是 LinearSVC 在惩罚和损失函数的选择方面具有更大灵活性的原因。它的可扩展性也更高,适用于大量样本。
It is Linear Support Vector Classification. It is similar to SVC having kernel = ‘linear’. The difference between them is that LinearSVC implemented in terms of liblinear while SVC is implemented in libsvm. That’s the reason LinearSVC has more flexibility in the choice of penalties and loss functions. It also scales better to large number of samples.
如果我们讨论其参数和属性,那么它不支持 ‘kernel’ ,因为它假定为线性,并且还缺少 support_, support_vectors_, n_support_, fit_status_ 和 dual_coef_ 等属性。
If we talk about its parameters and attributes then it does not support ‘kernel’ because it is assumed to be linear and it also lacks some of the attributes like support_, support_vectors_, n_support_, fit_status_ and, dual_coef_.
不过,它支持 penalty 和 loss 参数,如下所示:
However, it supports penalty and loss parameters as follows −
-
penalty − string, L1 or L2(default = ‘L2’) This parameter is used to specify the norm (L1 or L2) used in penalization (regularization).
-
loss − string, hinge, squared_hinge (default = squared_hinge) It represents the loss function where ‘hinge’ is the standard SVM loss and ‘squared_hinge’ is the square of hinge loss.
Implementation Example
以下 Python 脚本使用 sklearn.svm.LinearSVC 类:
Following Python script uses sklearn.svm.LinearSVC class −
from sklearn.svm import LinearSVC
from sklearn.datasets import make_classification
X, y = make_classification(n_features = 4, random_state = 0)
LSVCClf = LinearSVC(dual = False, random_state = 0, penalty = 'l1',tol = 1e-5)
LSVCClf.fit(X, y)
Output
LinearSVC(C = 1.0, class_weight = None, dual = False, fit_intercept = True,
intercept_scaling = 1, loss = 'squared_hinge', max_iter = 1000,
multi_class = 'ovr', penalty = 'l1', random_state = 0, tol = 1e-05, verbose = 0)
Example
现在,在拟合模型后,可以预测新值如下所示:
Now, once fitted, the model can predict new values as follows −
LSVCClf.predict([[0,0,0,0]])
Example
对于上述示例,我们可以借助以下 Python 脚本获取权重向量:
For the above example, we can get the weight vector with the help of following python script −
LSVCClf.coef_
Regression with SVM
如前所述,SVM 用于分类和回归问题。Scikit-learn 的支持向量分类 (SVC) 方法还可以扩展以解决回归问题。该扩展方法称为支持向量回归 (SVR)。
As discussed earlier, SVM is used for both classification and regression problems. Scikit-learn’s method of Support Vector Classification (SVC) can be extended to solve regression problems as well. That extended method is called Support Vector Regression (SVR).
Basic similarity between SVM and SVR
由 SVC 创建的模型仅取决于训练数据的一个子集。为什么?因为用于构建模型的成本函数不关心位于裕度之外的训练数据点。
The model created by SVC depends only on a subset of training data. Why? Because the cost function for building the model doesn’t care about training data points that lie outside the margin.
然而,由 SVR(支持向量回归)生成的模型也仅取决于训练数据的一个子集。为什么?因为用于构建模型的成本函数会忽略任何接近模型预测的训练数据点。
Whereas, the model produced by SVR (Support Vector Regression) also only depends on a subset of the training data. Why? Because the cost function for building the model ignores any training data points close to the model prediction.
Scikit-learn 提供了三个类,即 SVR, NuSVR and LinearSVR ,作为 SVR 的三个不同实现。
Scikit-learn provides three classes namely SVR, NuSVR and LinearSVR as three different implementations of SVR.
SVR
它是基于 libsvm 实现的 Epsilon 支持向量回归。与 SVC 相反,该模型中有两个自由参数,即 ‘C’ 和 ‘epsilon’ 。
It is Epsilon-support vector regression whose implementation is based on libsvm. As opposite to SVC There are two free parameters in the model namely ‘C’ and ‘epsilon’.
-
epsilon − float, optional, default = 0.1
它表示 epsilon-SVR 模型中的 epsilon,并指定 epsilon 管,在该管中,损失函数中与实际值距离 epsilon 之内的预测点不会产生罚则。
It represents the epsilon in the epsilon-SVR model, and specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.
其他参数和属性与我们在 SVC 中使用的类似。
Rest of the parameters and attributes are similar as we used in SVC.
Implementation Example
以下 Python 脚本使用 sklearn.svm.SVR 类:
Following Python script uses sklearn.svm.SVR class −
from sklearn import svm
X = [[1, 1], [2, 2]]
y = [1, 2]
SVRReg = svm.SVR(kernel = ’linear’, gamma = ’auto’)
SVRReg.fit(X, y)
Output
SVR(C = 1.0, cache_size = 200, coef0 = 0.0, degree = 3, epsilon = 0.1, gamma = 'auto',
kernel = 'linear', max_iter = -1, shrinking = True, tol = 0.001, verbose = False)
Example
现在,一旦拟合,我们可以借助以下 Python 脚本获取权重向量:
Now, once fitted, we can get the weight vector with the help of following python script −
SVRReg.coef_
NuSVR
NuSVR 是 Nu 支持向量回归。它类似于 NuSVC,但 NuSVR 使用参数 nu 来控制支持向量的数量。此外,与 NuSVC 不同(其中 nu 替换了 C 参数),这里它替换了 epsilon 。
NuSVR is Nu Support Vector Regression. It is like NuSVC, but NuSVR uses a parameter nu to control the number of support vectors. And moreover, unlike NuSVC where nu replaced C parameter, here it replaces epsilon.
Implementation Example
以下 Python 脚本使用 sklearn.svm.SVR 类:
Following Python script uses sklearn.svm.SVR class −
from sklearn.svm import NuSVR
import numpy as np
n_samples, n_features = 20, 15
np.random.seed(0)
y = np.random.randn(n_samples)
X = np.random.randn(n_samples, n_features)
NuSVRReg = NuSVR(kernel = 'linear', gamma = 'auto',C = 1.0, nu = 0.1)^M
NuSVRReg.fit(X, y)
Output
NuSVR(C = 1.0, cache_size = 200, coef0 = 0.0, degree = 3, gamma = 'auto',
kernel = 'linear', max_iter = -1, nu = 0.1, shrinking = True, tol = 0.001,
verbose = False)
LinearSVR
它是线性支持向量回归。它类似于 SVR(kernel =“linear”)。它们之间的区别在于, LinearSVR 是根据 liblinear 实现的,而 SVC 是根据 libsvm 实现的。这就是为什么 LinearSVR 在罚则和损失函数的选择上更灵活的原因。它也能更好地扩展到大量样本。
It is Linear Support Vector Regression. It is similar to SVR having kernel = ‘linear’. The difference between them is that LinearSVR implemented in terms of liblinear, while SVC implemented in libsvm. That’s the reason LinearSVR has more flexibility in the choice of penalties and loss functions. It also scales better to large number of samples.
如果我们讨论其参数和属性,那么它不支持 ‘kernel’ ,因为它假定为线性,并且还缺少 support_, support_vectors_, n_support_, fit_status_ 和 dual_coef_ 等属性。
If we talk about its parameters and attributes then it does not support ‘kernel’ because it is assumed to be linear and it also lacks some of the attributes like support_, support_vectors_, n_support_, fit_status_ and, dual_coef_.
不过,它支持以下“损失”参数:
However, it supports ‘loss’ parameters as follows −
-
loss − string, optional, default = ‘epsilon_insensitive’
它表示 epsilon_insensitive 损失为 L1 损失且平方 epsilon_insensitive 损失为 L2 损失的损失函数。
It represents the loss function where epsilon_insensitive loss is the L1 loss and the squared epsilon-insensitive loss is the L2 loss.
Implementation Example
以下 Python 脚本使用 sklearn.svm.LinearSVR 类 -
Following Python script uses sklearn.svm.LinearSVR class −
from sklearn.svm import LinearSVR
from sklearn.datasets import make_regression
X, y = make_regression(n_features = 4, random_state = 0)
LSVRReg = LinearSVR(dual = False, random_state = 0,
loss = 'squared_epsilon_insensitive',tol = 1e-5)
LSVRReg.fit(X, y)
Output
LinearSVR(
C=1.0, dual=False, epsilon=0.0, fit_intercept=True,
intercept_scaling=1.0, loss='squared_epsilon_insensitive',
max_iter=1000, random_state=0, tol=1e-05, verbose=0
)
Example
现在,在拟合模型后,可以预测新值如下所示:
Now, once fitted, the model can predict new values as follows −
LSRReg.predict([[0,0,0,0]])
Example
对于上述示例,我们可以借助以下 Python 脚本获取权重向量:
For the above example, we can get the weight vector with the help of following python script −
LSRReg.coef_