Caffe2 简明教程

Image Classification Using Pre-Trained Model

在本课程中,您将学习如何使用预训练模型检测给定图像中的对象。您将使用 squeezenet 预训练模块,该模块可以非常准确地检测和分类给定图像中的对象。

In this lesson, you will learn to use a pre-trained model to detect objects in a given image. You will use squeezenet pre-trained module that detects and classifies the objects in a given image with a great accuracy.

打开一个新的 Juypter notebook 以按照步骤来开发此图像分类应用程序。

Open a new Juypter notebook and follow the steps to develop this image classification application.

Importing Libraries

首先,我们使用以下代码来导入必需的包 −

First, we import the required packages using the below code −

from caffe2.proto import caffe2_pb2
from caffe2.python import core, workspace, models
import numpy as np
import skimage.io
import skimage.transform
from matplotlib import pyplot
import os
import urllib.request as urllib2
import operator

接下来,我们设置一些 variables

Next, we set up a few variables

INPUT_IMAGE_SIZE = 227
mean = 128

用于训练的图象显然会有不同的尺寸。所有这些图象必须转换成一个固定的大小进行准确训练。同样,测试图象和在生产环境中预测的图象也必须转换成与训练中所用的相同尺寸。因此,我们创建了名为 INPUT_IMAGE_SIZE 的上述变量,其值为 227 。因此,在我们将其用于我们的分类器中之前,我们将把所有图象转换成尺寸 227x227

The images used for training will obviously be of varied sizes. All these images must be converted into a fixed size for accurate training. Likewise, the test images and the image which you want to predict in the production environment must also be converted to the size, the same as the one used during training. Thus, we create a variable above called INPUT_IMAGE_SIZE having value 227. Hence, we will convert all our images to the size 227x227 before using it in our classifier.

我们还声明了名为 mean 的变量,其值为 128 ,这稍后用于改进分类结果。

We also declare a variable called mean having value 128, which is used later for improving the classification results.

接下来,我们将开发两个用于处理图像的函数。

Next, we will develop two functions for processing the image.

Image Processing

图像处理包括两个步骤。第一步是调整图像大小,第二步是居中裁剪图像。对于这两个步骤,我们将编写两个函数,用于调整大小和裁剪。

The image processing consists of two steps. First one is to resize the image, and the second one is to centrally crop the image. For these two steps, we will write two functions for resizing and cropping.

Image Resizing

首先,我们将编写一个用于调整图像大小的函数。如前所述,我们将图像调整为 227x227 。因此,让我们将函数 resize 定义为以下内容:

First, we will write a function for resizing the image. As said earlier, we will resize the image to 227x227. So let us define the function resize as follows −

def resize(img, input_height, input_width):

我们通过将宽度除以高度来获得图像纵横比。

We obtain the aspect ratio of the image by dividing the width by the height.

original_aspect = img.shape[1]/float(img.shape[0])

如果纵横比大于 1,则表示图像很宽,即为横向模式。我们现在调整图像高度,并使用以下代码返回调整大小后的图像:

If the aspect ratio is greater than 1, it indicates that the image is wide, that to say it is in the landscape mode. We now adjust the image height and return the resized image using the following code −

if(original_aspect>1):
   new_height = int(original_aspect * input_height)
   return skimage.transform.resize(img, (input_width,
   new_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)

如果纵横比为 less than 1 ,则表示 portrait mode 。我们现在使用以下代码调整宽度:

If the aspect ratio is less than 1, it indicates the portrait mode. We now adjust the width using the following code −

if(original_aspect<1):
   new_width = int(input_width/original_aspect)
   return skimage.transform.resize(img, (new_width,
   input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)

如果纵横比等于 1 ,则我们将不进行任何高度/宽度调整。

If the aspect ratio equals 1, we do not make any height/width adjustments.

if(original_aspect == 1):
   return skimage.transform.resize(img, (input_width,
   input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)

完整的函数代码如下所示,供您快速参考:

The full function code is given below for your quick reference −

def resize(img, input_height, input_width):
   original_aspect = img.shape[1]/float(img.shape[0])
   if(original_aspect>1):
      new_height = int(original_aspect * input_height)
      return skimage.transform.resize(img, (input_width,
	   new_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)
   if(original_aspect<1):
         new_width = int(input_width/original_aspect)
         return skimage.transform.resize(img, (new_width,
         input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)
   if(original_aspect == 1):
         return skimage.transform.resize(img, (input_width,
         input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)

我们现在将编写一个函数,用于在图像周围裁剪图像中心。

We will now write a function for cropping the image around its center.

Image Cropping

我们声明 crop_image 函数如下:

We declare the crop_image function as follows −

def crop_image(img,cropx,cropy):

我们使用以下语句提取图像尺寸:

We extract the dimensions of the image using the following statement −

y,x,c = img.shape

我们使用以下两行代码为图像创建新的起点:

We create a new starting point for the image using the following two lines of code −

startx = x//2-(cropx//2)
starty = y//2-(cropy//2)

最后,我们通过创建具有新尺寸的图像对象来返回裁剪的图像:

Finally, we return the cropped image by creating an image object with the new dimensions −

return img[starty:starty+cropy,startx:startx+cropx]

完整的函数代码如下所示,供您快速参考:

The entire function code is given below for your quick reference −

def crop_image(img,cropx,cropy):
   y,x,c = img.shape
   startx = x//2-(cropx//2)
   starty = y//2-(cropy//2)
   return img[starty:starty+cropy,startx:startx+cropx]

现在,我们将编写代码来测试这些函数。

Now, we will write code to test these functions.

Processing Image

首先,将图像文件复制到项目目录中的 images 子文件夹中。 tree.jpg 文件将复制到项目中。以下 Python 代码将加载图像并在控制台上显示:

Firstly, copy an image file into images subfolder within your project directory. tree.jpg file is copied in the project. The following Python code loads the image and displays it on the console −

img = skimage.img_as_float(skimage.io.imread("images/tree.jpg")).astype(np.float32)
print("Original Image Shape: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Original image')

输出如下 −

The output is as follows −

processing image

请注意,原始图像的大小为 600 x 960 。我们需要将其调整为我们指定的 227 x 227 。调用我们之前定义的 resize 函数即可完成此任务。

Note that size of the original image is 600 x 960. We need to resize this to our specification of 227 x 227. Calling our earlier-defined *resize*function does this job.

img = resize(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
print("Image Shape after resizing: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Resized image')

输出如下所示:

The output is as given below −

original image

请注意,现在图像大小为 227 x 363 。我们需要将其裁剪为 227 x 227 ,以供最终馈送算法。为此,我们调用之前定义的裁剪函数。

Note that now the image size is 227 x 363. We need to crop this to 227 x 227 for the final feed to our algorithm. We call the previously-defined crop function for this purpose.

img = crop_image(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
print("Image Shape after cropping: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Center Cropped')

下面提到的是代码的输出 −

Below mentioned is the output of the code −

cropping image

在这一刻,图像的大小为 227 x 227 并已准备好进一步处理。现在我们交换图像轴并将这三种颜色提取成三个不同的区域。

At this point, the image is of size 227 x 227 and is ready for further processing. We now swap the image axes to extract the three colours into three different zones.

img = img.swapaxes(1, 2).swapaxes(0, 1)
print("CHW Image Shape: " , img.shape)

给出以下输出 −

Given below is the output −

CHW Image Shape: (3, 227, 227)

请注意,最后一个轴现已变成了数组中的第一个维度。现在我们将使用以下代码绘制三个通道 −

Note that the last axis has now become the first dimension in the array. We will now plot the three channels using the following code −

pyplot.figure()
for i in range(3):
   pyplot.subplot(1, 3, i+1)
   pyplot.imshow(img[i])
   pyplot.axis('off')
   pyplot.title('RGB channel %d' % (i+1))

输出如下 −

The output is stated below −

dimension

最后,我们对图像执行一些其他处理,例如将 Red Green Blue 转换为 Blue Green Red (RGB to BGR) ,去除均值以获得更好的结果并使用以下三行代码添加批大小轴 −

Finally, we do some additional processing on the image such as converting Red Green Blue to Blue Green Red (RGB to BGR), removing mean for better results and adding batch size axis using the following three lines of code −

# convert RGB --> BGR
img = img[(2, 1, 0), :, :]
# remove mean
img = img * 255 - mean
# add batch size axis
img = img[np.newaxis, :, :, :].astype(np.float32)

在这一刻,你的图像在 NCHW format 中并已准备好馈送进入我们的网络。接下来,我们将加载我们预训练的模型文件并将上述图像馈送进入其中以进行预测。

At this point, your image is in NCHW format and is ready for feeding into our network. Next, we will load our pre-trained model files and feed the above image into it for prediction.

Predicting Objects in Processed Image

我们首先设置在 Caffe 的预训练模型中定义的 initpredict 网络的路径。

We first setup the paths for the init and predict networks defined in the pre-trained models of Caffe.

Setting Model File Paths

从我们早先的讨论中记住,所有预训练模型都安装在 models 文件夹中。我们按照如下方式设置此文件夹的路径 −

Remember from our earlier discussion, all the pre-trained models are installed in the models folder. We set up the path to this folder as follows −

CAFFE_MODELS = os.path.expanduser("/anaconda3/lib/python3.7/site-packages/caffe2/python/models")

我们按照如下方式设置 init_net 模型的 squeezenet protobuf 文件的路径 −

We set up the path to the init_net protobuf file of the squeezenet model as follows −

INIT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'init_net.pb')

同样,我们按照如下方式设置 predict_net protobuf 的路径 −

Likewise, we set up the path to the predict_net protobuf as follows −

PREDICT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'predict_net.pb')

我们出于诊断目的打印两条路径 −

We print the two paths for diagnosis purpose −

print(INIT_NET)
print(PREDICT_NET)

上面的代码和输出在此处给出以供你快速参考 −

The above code along with the output is given here for your quick reference −

CAFFE_MODELS = os.path.expanduser("/anaconda3/lib/python3.7/site-packages/caffe2/python/models")
INIT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'init_net.pb')
PREDICT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'predict_net.pb')
print(INIT_NET)
print(PREDICT_NET)

输出如下:

The output is mentioned below −

/anaconda3/lib/python3.7/site-packages/caffe2/python/models/squeezenet/init_net.pb
/anaconda3/lib/python3.7/site-packages/caffe2/python/models/squeezenet/predict_net.pb

接下来,我们将创建一个预测器。

Next, we will create a predictor.

Creating Predictor

我们使用以下两个语句读取模型文件 −

We read the model files using the following two statements −

with open(INIT_NET, "rb") as f:
   init_net = f.read()
with open(PREDICT_NET, "rb") as f:
   predict_net = f.read()

预测器是通过将指向两个文件的指针作为 Predictor 函数的参数来传递而创建的。

The predictor is created by passing pointers to the two files as parameters to the Predictor function.

p = workspace.Predictor(init_net, predict_net)

p 对象是预测器,用于预测图像中任何给定的对象。请注意,每个输入图像必须采用 NCHW 格式,就像我们先前对 tree.jpg 文件所做的那样。

The p object is the predictor, which is used for predicting the objects in any given image. Note that each input image must be in NCHW format as what we have done earlier to our tree.jpg file.

Predicting Objects

要预测给定图像中的对象很简单 —— 只需执行一行命令。我们对 predictor 对象调用 run 方法以在给定图像中进行对象检测。

To predict the objects in a given image is trivial - just executing a single line of command. We call run method on the predictor object for an object detection in a given image.

results = p.run({'data': img})

预测结果现在在 results 对象中,我们将该对象转换为数组以供我们阅读。

The prediction results are now available in the results object, which we convert to an array for our readability.

results = np.asarray(results)

使用以下语句打印数组的维度以加深你的理解 −

Print the dimensions of the array for your understanding using the following statement −

print("results shape: ", results.shape)

输出如下所示:

The output is as shown below −

results shape: (1, 1, 1000, 1, 1)

现在我们从中删除不必要的轴:

We will now remove the unnecessary axis −

preds = np.squeeze(results)

现在,可以通过获取 preds 阵列中的 max 值来检索最顶端预测。

The topmost predication can now be retrieved by taking the max value in the preds array.

curr_pred, curr_conf = max(enumerate(preds), key=operator.itemgetter(1))
print("Prediction: ", curr_pred)
print("Confidence: ", curr_conf)

输出如下 −

The output is as follows −

Prediction: 984
Confidence: 0.89235985

如您所见,模型预测了一个具有 984 索引值和 89% 置信度的对象。索引号 984 对于我们理解检测到的是哪种对象并无太多意义。我们需要使用其索引值来获取对象的字符串化名称。模型识别的对象及其相应的索引值可在 GitHub 存储库上找到。

As you see the model has predicted an object with an index value 984 with 89% confidence. The index of 984 does not make much sense to us in understanding what kind of object is detected. We need to get the stringified name for the object using its index value. The kind of objects that the model recognizes along with their corresponding index values are available on a github repository.

现在,我们将了解如何检索索引值为 984 的对象的名称。

Now, we will see how to retrieve the name for our object having index value of 984.

Stringifying Result

我们创建一个指向 GitHub 存储库的 URL 对象,如下所示:

We create a URL object to the github repository as follows −

codes = "https://gist.githubusercontent.com/aaronmarkham/cd3a6b6ac0
71eca6f7b4a6e40e6038aa/raw/9edb4038a37da6b5a44c3b5bc52e448ff09bfe5b/alexnet_codes"

读取 URL 的内容:

We read the contents of the URL −

response = urllib2.urlopen(codes)

响应将包含所有代码及其描述的列表。响应中显示几行内容以让您了解其中包含的内容:

The response will contain a list of all codes and its descriptions. Few lines of the response are shown below for your understanding of what it contains −

5: 'electric ray, crampfish, numbfish, torpedo',
6: 'stingray',
7: 'cock',
8: 'hen',
9: 'ostrich, Struthio camelus',
10: 'brambling, Fringilla montifringilla',

现在,我们迭代整个阵列以使用 for 循环找到所需的代码 984,如下所示:

We now iterate the entire array to locate our desired code of 984 using a for loop as follows −

for line in response:
   mystring = line.decode('ascii')
   code, result = mystring.partition(":")[::2]
   code = code.strip()
   result = result.replace("'", "")
   if (code == str(curr_pred)):
      name = result.split(",")[0][1:]
      print("Model predicts", name, "with", curr_conf, "confidence")

运行代码后,你将看到以下输出 −

When you run the code, you will see the following output −

Model predicts rapeseed with 0.89235985 confidence

您现在可以对另一张图片进行模型尝试。

You may now try the model on another image.

Predicting a Different Image

要预测另一张图片,只需将图像文件复制到项目目录的 images 文件夹。这是我们早先的 tree.jpg 文件存储的目录。请在代码中更改图像文件名。只需进行一项更改,如下所示:

To predict another image, simply copy the image file into the images folder of your project directory. This is the directory in which our earlier tree.jpg file is stored. Change the name of the image file in the code. Only one change is required as shown below

img = skimage.img_as_float(skimage.io.imread("images/pretzel.jpg")).astype(np.float32)

原始图片和预测结果如下所示:

The original picture and the prediction result are shown below −

predicting image

输出如下:

The output is mentioned below −

Model predicts pretzel with 0.99999976 confidence

如您所见,预训练模型可以极高准确度检测给定图像中的对象。

As you see the pre-trained model is able to detect objects in a given image with a great amount of accuracy.

Full Source

上面代码执行完后,使用预训练模型检测给定图像中的对象,其全部源代码如下,供您快速参考:

The full source for the above code that uses a pre-trained model for object detection in a given image is mentioned here for your quick reference −

def crop_image(img,cropx,cropy):
   y,x,c = img.shape
   startx = x//2-(cropx//2)
   starty = y//2-(cropy//2)
   return img[starty:starty+cropy,startx:startx+cropx]
img = skimage.img_as_float(skimage.io.imread("images/pretzel.jpg")).astype(np.float32)
print("Original Image Shape: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Original image')
img = resize(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
print("Image Shape after resizing: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Resized image')
img = crop_image(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
print("Image Shape after cropping: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Center Cropped')
img = img.swapaxes(1, 2).swapaxes(0, 1)
print("CHW Image Shape: " , img.shape)
pyplot.figure()
for i in range(3):
pyplot.subplot(1, 3, i+1)
pyplot.imshow(img[i])
pyplot.axis('off')
pyplot.title('RGB channel %d' % (i+1))
# convert RGB --> BGR
img = img[(2, 1, 0), :, :]
# remove mean
img = img * 255 - mean
# add batch size axis
img = img[np.newaxis, :, :, :].astype(np.float32)
CAFFE_MODELS = os.path.expanduser("/anaconda3/lib/python3.7/site-packages/caffe2/python/models")
INIT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'init_net.pb')
PREDICT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'predict_net.pb')
print(INIT_NET)
print(PREDICT_NET)
with open(INIT_NET, "rb") as f:
   init_net = f.read()
with open(PREDICT_NET, "rb") as f:
   predict_net = f.read()
p = workspace.Predictor(init_net, predict_net)
results = p.run({'data': img})
results = np.asarray(results)
print("results shape: ", results.shape)
preds = np.squeeze(results)
curr_pred, curr_conf = max(enumerate(preds), key=operator.itemgetter(1))
print("Prediction: ", curr_pred)
print("Confidence: ", curr_conf)
codes = "https://gist.githubusercontent.com/aaronmarkham/cd3a6b6ac071eca6f7b4a6e40e6038aa/raw/9edb4038a37da6b5a44c3b5bc52e448ff09bfe5b/alexnet_codes"
response = urllib2.urlopen(codes)
for line in response:
   mystring = line.decode('ascii')
   code, result = mystring.partition(":")[::2]
   code = code.strip()
   result = result.replace("'", "")
   if (code == str(curr_pred)):
      name = result.split(",")[0][1:]
      print("Model predicts", name, "with", curr_conf, "confidence")

到目前为止,您已了解如何使用预训练模型对数据执行预测。

By this time, you know how to use a pre-trained model for doing the predictions on your dataset.

下一步是学习如何在 Caffe2 中定义 neural network (NN) 架构并对您的数据进行训练。现在,我们将学习如何创建一个微不足道的单层 NN。

What’s next is to learn how to define your neural network (NN) architectures in Caffe2 and train them on your dataset. We will now learn how to create a trivial single layer NN.