Opencv Python 简明教程
OpenCV Python - Digit Recognition with KNN
KNN 代表 K-Nearest Neighbour ,是一种基于监督学习的机器学习算法。它试图将新数据点放入与可用类别最相似的类别中。所有可用数据都分类为不同的类别,并根据相似性将新数据点放入其中一个类别。
KNN which stands for K-Nearest Neighbour is a Machine Learning algorithm based on Supervised Learning. It tries to put a new data point into the category that is most similar to the available categories. All the available data is classified into distinct categories and a new data point is put in one of them based on the similarity.
KNN 算法遵循以下原理:
The KNN algorithm works on following principle −
-
Choose preferably an odd number as K for the number of neighbours to be checked.
-
Calculate their Euclidean distance.
-
Take the K nearest neighbors as per the calculated Euclidean distance.
-
count the number of the data points in each category.
-
Category with maximum data points is the category in which the new data point is classified.
作为使用 OpenCV 实现 KNN 算法的示例,我们将使用以下包含 5000 幅手写数字图像(每幅图像像素为 20X20)的 digits.png。
As an example of implementation of KNN algorithm using OpenCV, we shall use the following image digits.png consisting of 5000 images of handwritten digits, each of 20X20 pixels.
第一个任务是将该图像分成 5000 个数字。这是我们的特征集。将其转换为 NumPy 数组。该程序如下:
First task is to split this image into 5000 digits. This is our feature set. Convert it to a NumPy array. The program is given below −
import numpy as np
import cv2
image = cv2.imread('digits.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
fset=[]
for i in np.vsplit(gray,50):
x=np.hsplit(i,100)
fset.append(x)
NP_array = np.array(fset)
现在,我们将这些数据分成训练集和测试集,每个的大小为 (2500,20x20),如下所示:
Now we divide this data in training set and testing set, each of size (2500,20x20) as follows −
trainset = NP_array[:,:50].reshape(-1,400).astype(np.float32)
testset = NP_array[:,50:100].reshape(-1,400).astype(np.float32)
接下来,我们必须为每个数字创建 10 个不同的标签,如下所示:
Next, we have to create 10 different labels for each digit, as shown below −
k = np.arange(10)
train_labels = np.repeat(k,250)[:,np.newaxis]
test_labels = np.repeat(k,250)[:,np.newaxis]
我们现在可以开始 KNN 分类。创建分类器对象并训练数据。
We are now in a position to start the KNN classification. Create the classifier object and train the data.
knn = cv2.ml.KNearest_create()
knn.train(trainset, cv2.ml.ROW_SAMPLE, train_labels)
将 k 值选择为 3,获取分类器的输出。
Choosing the value of k as 3, obtain the output of the classifier.
ret, output, neighbours, distance = knn.findNearest(testset, k = 3)
比较输出与测试标签以检查分类器的性能和准确性。
Compare the output with test labels to check the performance and accuracy of the classifier.
程序在准确检测手写数字方面显示了 91.64% 的准确性。
The program shows an accuracy of 91.64% in detecting the handwritten digit accurately.
result = output==test_labels
correct = np.count_nonzero(result)
accuracy = (correct*100.0)/(output.size)
print(accuracy)