Caffe2 简明教程
Caffe2 - Introduction
过去几年,深度学习已成为机器学习中的大趋势。它已被成功用于解决以前无法解决的问题,例如*视觉、语音识别和自然语言处理* (NLP)。深度学习已被应用于更多领域并展示了它的有用性。
Last couple of years, Deep Learning has become a big trend in Machine Learning. It has been successfully applied to solve previously unsolvable problems in * Vision, Speech Recognition and Natural Language Processing* (NLP). There are many more domains in which Deep Learning is being applied and has shown its usefulness.
Caffe (Convolutional Architecture for Fast Feature Embedding) 是在 Berkeley Vision and Learning Center (BVLC) 开发的深度学习框架。在杨清吉加州大学伯克利分校攻读博士期间创建了 Caffe 项目。Caffe 提供了一种轻松尝试深度学习的方法。它用 C++ 编写并为 Python 和 Matlab 提供绑定。
Caffe (Convolutional Architecture for Fast Feature Embedding) is a deep learning framework developed at Berkeley Vision and Learning Center (BVLC). The Caffe project was created by Yangqing Jia during his Ph.D. at University of California - Berkeley. Caffe provides an easy way to experiment with deep learning. It is written in C++ and provides bindings for Python and Matlab.
它支持多种不同类型的深度学习架构,例如 CNN (卷积神经网络)、 LSTM (长短期记忆)和 FC(全连接)。它支持 GPU,因此非常适用于涉及深度神经网络的生产环境。它还支持基于 CPU 的核库,例如 NVIDIA 、CUDA 深度神经网络库 (cuDNN) 和英特尔数学核库 (Intel MKL) 。
It supports many different types of deep learning architectures such as CNN (Convolutional Neural Network), LSTM (Long Short Term Memory) and FC (Fully Connected). It supports GPU and is thus, ideally suited for production environments involving deep neural networks. It also supports CPU-based kernel libraries such as NVIDIA, CUDA Deep Neural Network library (cuDNN) and Intel Math Kernel Library (Intel MKL).
2017 年 4 月,位于美国的社交网站服务公司 Facebook 宣布推出 Caffe2,它现包含 RNN(循环神经网络),2018 年 3 月,Caffe2 合并到了 PyTorch 中。Caffe2 创建者和社区成员创建了用于解决各种问题的模型。这些模型作为预训练模型提供给公众。Caffe2 可以帮助创建者使用这些模型和创建自己的网络来对数据集进行预测。
In April 2017, U.S. based social networking service company Facebook announced Caffe2, which now includes RNN (Recurrent Neural Networks) and in March 2018, Caffe2 was merged into PyTorch. Caffe2 creators and community members have created models for solving various problems. These models are available to the public as pre-trained models. Caffe2 helps the creators in using these models and creating one’s own network for making predictions on the dataset.
在深入了解 Caffe2 之前,我们先了解一下 machine learning 和 deep learning 之间的差异。这对于理解在 Caffe2 中如何创建和使用模型是必要的。
Before we go into the details of Caffe2, let us understand the difference between machine learning and deep learning. This is necessary to understand how models are created and used in Caffe2.
Machine Learning v/s Deep Learning
在任何机器学习算法中,无论是传统的算法还是深度学习算法,数据集中的特征选择在获得所需的预测准确性方面都起着至关重要的作用。在传统的机器学习技术中, feature selection 主要由人为观察、判断和深入的领域知识完成。有时,你可以寻求一些经过测试的特征选择算法的帮助。
In any machine learning algorithm, be it a traditional one or a deep learning one, the selection of features in the dataset plays an extremely important role in getting the desired prediction accuracy. In traditional machine learning techniques, the feature selection is done mostly by human inspection, judgement and deep domain knowledge. Sometimes, you may seek help from a few tested algorithms for feature selection.
传统的机器学习流程如下图所示 −
The traditional machine learning flow is depicted in the figure below −

在深度学习中,特征选择是自动的,并且是深度学习算法自身的一部分。这如下图所示 −
In deep learning, the feature selection is automatic and is a part of deep learning algorithm itself. This is shown in the figure below −

在深度学习算法中, feature engineering 自动完成。通常,特征工程非常耗时,并且需要良好的领域专业知识。为了实现自动特征提取,深度学习算法通常需要大量数据,因此,如果你只有数千到数万个数据点,那么深度学习技术可能无法为你提供满意的结果。
In deep learning algorithms, feature engineering is done automatically. Generally, feature engineering is time-consuming and requires a good expertise in domain. To implement the automatic feature extraction, the deep learning algorithms typically ask for huge amount of data, so if you have only thousands and tens of thousands of data points, the deep learning technique may fail to give you satisfactory results.
与传统机器学习算法相比,深度学习算法在大数据上会产生更好的结果,而且不需要进行特征工程或减少特征工程。
With larger data, the deep learning algorithms produce better results compared to traditional ML algorithms with an added advantage of less or no feature engineering.
Caffe2 - Overview
现在,当对深度学习有所了解,我们来了解一下 Caffe 是什么。
Now, as you have got some insights into deep learning, let us get an overview of what is Caffe.
Training a CNN
让我们学习训练 CNN 以对图像进行分类的过程。此过程包括以下步骤:
Let us learn the process for training a CNN for classifying images. The process consists of the following steps −
-
Data Preparation − In this step, we center-crop the images and resize them so that all images for training and testing would be of the same size. This is usually done by running a small Python script on the image data.
-
Model Definition − In this step, we define a CNN architecture. The configuration is stored in .pb (protobuf) file. A typical CNN architecture is shown in figure below.
-
Solver Definition − We define the solver configuration file. Solver does the model optimization.
-
Model Training − We use the built-in Caffe utility to train the model. The training may take a considerable amount of time and CPU usage. After the training is completed, Caffe stores the model in a file, which can later on be used on test data and final deployment for predictions.

What’s New in Caffe2
在 Caffe2 中,你会发现许多即用型预训练模型,还可以经常利用社区对新模型和算法的贡献。你创建的模型可以使用云中的 GPU 功能轻松扩展,还可以通过其跨平台库应用到大规模移动设备上。
In Caffe2, you would find many ready-to-use pre-trained models and also leverage the community contributions of new models and algorithms quite frequently. The models that you create can scale up easily using the GPU power in the cloud and also can be brought down to the use of masses on mobile with its cross-platform libraries.
Caffe2 相对于 Caffe 所做的改进可以总结如下:
The improvements made in Caffe2 over Caffe may be summarized as follows −
-
Mobile deployment
-
New hardware support
-
Support for large-scale distributed training
-
Quantized computation
-
Stress tested on Facebook
Pretrained Model Demo
Berkeley Vision and Learning Center(BVLC)网站提供了预训练网络的演示。可以通过此链接在图片分类中找到一个这样的网络 https://caffe2.ai/docs/learn-more#null_caffe-neural-network-for-image-classification ,并且如下图所示。
The Berkeley Vision and Learning Center (BVLC) site provides demos of their pre- trained networks. One such network for image classification is available on the link stated herewith https://caffe2.ai/docs/learn-more#null_caffe-neural-network-for-image-classification and is depicted in the screenshot below.

在截图中,一张狗的图像被分类并标上了其预测准确度。它还表示对图像分类仅用时 0.068 seconds 。你可以通过指定图像 URL 或在屏幕底部的选项中上传图像本身来尝试一张你自己的选择图像。
In the screenshot, the image of a dog is classified and labelled with its prediction accuracy. It also says that it took just 0.068 seconds to classify the image. You may try an image of your own choice by specifying the image URL or uploading the image itself in the options given at the bottom of the screen.
Caffe2 - Installation
现在,你已经充分了解了 Caffe2 的能力,是时候亲自尝试 Caffe2 了。要使用预训练模型或在自己的 Python 代码中开发模型,你必须先在计算机上安装 Caffe2。
Now, that you have got enough insights on the capabilities of Caffe2, it is time to experiment Caffe2 on your own. To use the pre-trained models or to develop your models in your own Python code, you must first install Caffe2 on your machine.
在 Caffe2 网站的安装页面上,可通过链接 https://caffe2.ai/docs/getting-started.html 访问,你将看到以下内容,用于选择你的平台并安装类型。
On the installation page of Caffe2 site which is available at the link https://caffe2.ai/docs/getting-started.html you would see the following to select your platform and install type.

如你在上面的屏幕截图中看到的, Caffe2 支持多个流行的平台,包括移动平台。
As you can see in the above screenshot, Caffe2 supports several popular platforms including the mobile ones.
现在,我们将了解 MacOS installation 的步骤,本教程中的所有项目都在此平台上进行测试。
Now, we shall understand the steps for MacOS installation on which all the projects in this tutorial are tested.
MacOS Installation
安装可以分为以下四种类型:
The installation can be of four types as given below −
-
Pre-Built Binaries
-
Build From Source
-
Docker Images
-
Cloud
根据你的喜好,选择上述任何一种作为你的安装类型。此处给出的说明是根据 pre-built binaries 的 Caffe2 安装网站进行的。它为 Jupyter environment 使用 Anaconda。在控制台提示符处执行以下命令:
Depending upon your preference, select any of the above as your installation type. The instructions given here are as per the Caffe2 installation site for pre-built binaries. It uses Anaconda for Jupyter environment. Execute the following command on your console prompt
pip install torch_nightly -f
https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
除了上述内容之外,你将需要一些第三方库,这些库使用以下命令安装:
In addition to the above, you will need a few third-party libraries, which are installed using the following commands −
conda install -c anaconda setuptools
conda install -c conda-forge graphviz
conda install -c conda-forge hypothesis
conda install -c conda-forge ipython
conda install -c conda-forge jupyter
conda install -c conda-forge matplotlib
conda install -c anaconda notebook
conda install -c anaconda pydot
conda install -c conda-forge python-nvd3
conda install -c anaconda pyyaml
conda install -c anaconda requests
conda install -c anaconda scikit-image
conda install -c anaconda scipy
Caffe2 网站上的一些教程还需要安装 zeromq ,可使用以下命令进行安装:
Some of the tutorials in the Caffe2 website also require the installation of zeromq, which is installed using the following command −
conda install -c anaconda zeromq
Windows/Linux Installation
在你的控制台提示符处执行以下命令:
Execute the following command on your console prompt −
conda install -c pytorch pytorch-nightly-cpu
如您已注意到的,您可能需要 Anaconda 来使用上述安装。您将需要按照 MacOS installation 中指定的安装其他包。
As you must have noticed, you would need Anaconda to use the above installation. You will need to install the additional packages as specified in the MacOS installation.
Testing Installation
为了测试您的安装,下面给出了一个小型的 Python 脚本,您可以将其剪切并粘贴到 Juypter 项目中并执行。
To test your installation, a small Python script is given below, which you can cut and paste in your Juypter project and execute.
from caffe2.python import workspace
import numpy as np
print ("Creating random data")
data = np.random.rand(3, 2)
print(data)
print ("Adding data to workspace ...")
workspace.FeedBlob("mydata", data)
print ("Retrieving data from workspace")
mydata = workspace.FetchBlob("mydata")
print(mydata)
执行以上代码时,应该看到以下输出 −
When you execute the above code, you should see the following output −
Creating random data
[[0.06152718 0.86448082]
[0.36409966 0.52786113]
[0.65780886 0.67101053]]
Adding data to workspace ...
Retrieving data from workspace
[[0.06152718 0.86448082]
[0.36409966 0.52786113]
[0.65780886 0.67101053]]
安装测试页的屏幕截图此处显示供您快速参考 −
The screenshot of the installation test page is shown here for your quick reference −

现在,您已在计算机上安装了 Caffe2,请继续安装教程应用程序。
Now, that you have installed Caffe2 on your machine, proceed to install the tutorial applications.
Tutorial Installation
在控制台上使用以下命令下载教程源 −
Download the tutorials source using the following command on your console −
git clone --recursive https://github.com/caffe2/tutorials caffe2_tutorials
下载完成后,您将在安装目录中的 caffe2_tutorials 文件夹中找到几个 Python 项目。此文件夹的屏幕截图供您快速浏览。
After the download is completed, you will find several Python projects in the caffe2_tutorials folder in your installation directory. The screenshot of this folder is given for your quick perusal.
/Users/yourusername/caffe2_tutorials

您可以打开其中一些教程以了解 Caffe2 code 的外观。本教程中描述的接下来的两个项目在很大程度上基于上面显示的示例。
You can open some of these tutorials to see what the Caffe2 code looks like. The next two projects described in this tutorial are largely based on the samples shown above.
现在是时候自己进行一些 Python 编码了。让我们了解如何使用 Caffe2 的预训练模型。稍后,您将学习创建您自己的微不足道的对您自己的数据集进行训练的神经网络。
It is now time to do some Python coding of our own. Let us understand, how to use a pre-trained model from Caffe2. Later, you will learn to create your own trivial neural network for training on your own dataset.
Caffe2 - Verifying Access to Pre-Trained Models
在您学习在 Python 应用程序中使用预训练模型之前,让我们首先验证模型是否已安装到您的计算机上,并且可以通过 Python 代码访问。
Before you learn to use a pre-trained model in your Python application, let us first verify that the models are installed on your machine and are accessible through the Python code.
安装 Caffe2 时,预训练的模型将被复制到安装文件夹中。在拥有 Anaconda 安装的计算机上,这些模型位于以下文件夹中。
When you install Caffe2, the pre-trained models are copied in the installation folder. On the machine with Anaconda installation, these models are available in the following folder.
anaconda3/lib/python3.7/site-packages/caffe2/python/models
查看计算机上的安装文件夹中是否存在这些模型。您可以使用以下简短的 Python 脚本尝试从安装文件夹加载这些模型 −
Check out the installation folder on your machine for the presence of these models. You can try loading these models from the installation folder with the following short Python script −
CAFFE_MODELS = os.path.expanduser("/anaconda3/lib/python3.7/site-packages/caffe2/python/models")
INIT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'init_net.pb')
PREDICT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'predict_net.pb')
print(INIT_NET)
print(PREDICT_NET)
当脚本成功运行时,您将看到以下输出 −
When the script runs successfully, you will see the following output −
/anaconda3/lib/python3.7/site-packages/caffe2/python/models/squeezenet/init_net.pb
/anaconda3/lib/python3.7/site-packages/caffe2/python/models/squeezenet/predict_net.pb
这确认 squeezenet 模块已安装到您的计算机上,并且您的代码可以访问该模块。
This confirms that the squeezenet module is installed on your machine and is accessible to your code.
现在,您可以使用 Caffe2 squeezenet 预训练模块编写用于图像分类的 Python 代码了。
Now, you are ready to write your own Python code for image classification using Caffe2 squeezenet pre-trained module.
Image Classification Using Pre-Trained Model
在本课程中,您将学习如何使用预训练模型检测给定图像中的对象。您将使用 squeezenet 预训练模块,该模块可以非常准确地检测和分类给定图像中的对象。
In this lesson, you will learn to use a pre-trained model to detect objects in a given image. You will use squeezenet pre-trained module that detects and classifies the objects in a given image with a great accuracy.
打开一个新的 Juypter notebook 以按照步骤来开发此图像分类应用程序。
Open a new Juypter notebook and follow the steps to develop this image classification application.
Importing Libraries
首先,我们使用以下代码来导入必需的包 −
First, we import the required packages using the below code −
from caffe2.proto import caffe2_pb2
from caffe2.python import core, workspace, models
import numpy as np
import skimage.io
import skimage.transform
from matplotlib import pyplot
import os
import urllib.request as urllib2
import operator
接下来,我们设置一些 variables −
Next, we set up a few variables −
INPUT_IMAGE_SIZE = 227
mean = 128
用于训练的图象显然会有不同的尺寸。所有这些图象必须转换成一个固定的大小进行准确训练。同样,测试图象和在生产环境中预测的图象也必须转换成与训练中所用的相同尺寸。因此,我们创建了名为 INPUT_IMAGE_SIZE 的上述变量,其值为 227 。因此,在我们将其用于我们的分类器中之前,我们将把所有图象转换成尺寸 227x227 。
The images used for training will obviously be of varied sizes. All these images must be converted into a fixed size for accurate training. Likewise, the test images and the image which you want to predict in the production environment must also be converted to the size, the same as the one used during training. Thus, we create a variable above called INPUT_IMAGE_SIZE having value 227. Hence, we will convert all our images to the size 227x227 before using it in our classifier.
我们还声明了名为 mean 的变量,其值为 128 ,这稍后用于改进分类结果。
We also declare a variable called mean having value 128, which is used later for improving the classification results.
接下来,我们将开发两个用于处理图像的函数。
Next, we will develop two functions for processing the image.
Image Processing
图像处理包括两个步骤。第一步是调整图像大小,第二步是居中裁剪图像。对于这两个步骤,我们将编写两个函数,用于调整大小和裁剪。
The image processing consists of two steps. First one is to resize the image, and the second one is to centrally crop the image. For these two steps, we will write two functions for resizing and cropping.
Image Resizing
首先,我们将编写一个用于调整图像大小的函数。如前所述,我们将图像调整为 227x227 。因此,让我们将函数 resize 定义为以下内容:
First, we will write a function for resizing the image. As said earlier, we will resize the image to 227x227. So let us define the function resize as follows −
def resize(img, input_height, input_width):
我们通过将宽度除以高度来获得图像纵横比。
We obtain the aspect ratio of the image by dividing the width by the height.
original_aspect = img.shape[1]/float(img.shape[0])
如果纵横比大于 1,则表示图像很宽,即为横向模式。我们现在调整图像高度,并使用以下代码返回调整大小后的图像:
If the aspect ratio is greater than 1, it indicates that the image is wide, that to say it is in the landscape mode. We now adjust the image height and return the resized image using the following code −
if(original_aspect>1):
new_height = int(original_aspect * input_height)
return skimage.transform.resize(img, (input_width,
new_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)
如果纵横比为 less than 1 ,则表示 portrait mode 。我们现在使用以下代码调整宽度:
If the aspect ratio is less than 1, it indicates the portrait mode. We now adjust the width using the following code −
if(original_aspect<1):
new_width = int(input_width/original_aspect)
return skimage.transform.resize(img, (new_width,
input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)
如果纵横比等于 1 ,则我们将不进行任何高度/宽度调整。
If the aspect ratio equals 1, we do not make any height/width adjustments.
if(original_aspect == 1):
return skimage.transform.resize(img, (input_width,
input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)
完整的函数代码如下所示,供您快速参考:
The full function code is given below for your quick reference −
def resize(img, input_height, input_width):
original_aspect = img.shape[1]/float(img.shape[0])
if(original_aspect>1):
new_height = int(original_aspect * input_height)
return skimage.transform.resize(img, (input_width,
new_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)
if(original_aspect<1):
new_width = int(input_width/original_aspect)
return skimage.transform.resize(img, (new_width,
input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)
if(original_aspect == 1):
return skimage.transform.resize(img, (input_width,
input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)
我们现在将编写一个函数,用于在图像周围裁剪图像中心。
We will now write a function for cropping the image around its center.
Image Cropping
我们声明 crop_image 函数如下:
We declare the crop_image function as follows −
def crop_image(img,cropx,cropy):
我们使用以下语句提取图像尺寸:
We extract the dimensions of the image using the following statement −
y,x,c = img.shape
我们使用以下两行代码为图像创建新的起点:
We create a new starting point for the image using the following two lines of code −
startx = x//2-(cropx//2)
starty = y//2-(cropy//2)
最后,我们通过创建具有新尺寸的图像对象来返回裁剪的图像:
Finally, we return the cropped image by creating an image object with the new dimensions −
return img[starty:starty+cropy,startx:startx+cropx]
完整的函数代码如下所示,供您快速参考:
The entire function code is given below for your quick reference −
def crop_image(img,cropx,cropy):
y,x,c = img.shape
startx = x//2-(cropx//2)
starty = y//2-(cropy//2)
return img[starty:starty+cropy,startx:startx+cropx]
现在,我们将编写代码来测试这些函数。
Now, we will write code to test these functions.
Processing Image
首先,将图像文件复制到项目目录中的 images 子文件夹中。 tree.jpg 文件将复制到项目中。以下 Python 代码将加载图像并在控制台上显示:
Firstly, copy an image file into images subfolder within your project directory. tree.jpg file is copied in the project. The following Python code loads the image and displays it on the console −
img = skimage.img_as_float(skimage.io.imread("images/tree.jpg")).astype(np.float32)
print("Original Image Shape: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Original image')
输出如下 −
The output is as follows −

请注意,原始图像的大小为 600 x 960 。我们需要将其调整为我们指定的 227 x 227 。调用我们之前定义的 resize 函数即可完成此任务。
Note that size of the original image is 600 x 960. We need to resize this to our specification of 227 x 227. Calling our earlier-defined *resize*function does this job.
img = resize(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
print("Image Shape after resizing: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Resized image')
输出如下所示:
The output is as given below −

请注意,现在图像大小为 227 x 363 。我们需要将其裁剪为 227 x 227 ,以供最终馈送算法。为此,我们调用之前定义的裁剪函数。
Note that now the image size is 227 x 363. We need to crop this to 227 x 227 for the final feed to our algorithm. We call the previously-defined crop function for this purpose.
img = crop_image(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
print("Image Shape after cropping: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Center Cropped')
下面提到的是代码的输出 −
Below mentioned is the output of the code −

在这一刻,图像的大小为 227 x 227 并已准备好进一步处理。现在我们交换图像轴并将这三种颜色提取成三个不同的区域。
At this point, the image is of size 227 x 227 and is ready for further processing. We now swap the image axes to extract the three colours into three different zones.
img = img.swapaxes(1, 2).swapaxes(0, 1)
print("CHW Image Shape: " , img.shape)
给出以下输出 −
Given below is the output −
CHW Image Shape: (3, 227, 227)
请注意,最后一个轴现已变成了数组中的第一个维度。现在我们将使用以下代码绘制三个通道 −
Note that the last axis has now become the first dimension in the array. We will now plot the three channels using the following code −
pyplot.figure()
for i in range(3):
pyplot.subplot(1, 3, i+1)
pyplot.imshow(img[i])
pyplot.axis('off')
pyplot.title('RGB channel %d' % (i+1))
输出如下 −
The output is stated below −

最后,我们对图像执行一些其他处理,例如将 Red Green Blue 转换为 Blue Green Red (RGB to BGR) ,去除均值以获得更好的结果并使用以下三行代码添加批大小轴 −
Finally, we do some additional processing on the image such as converting Red Green Blue to Blue Green Red (RGB to BGR), removing mean for better results and adding batch size axis using the following three lines of code −
# convert RGB --> BGR
img = img[(2, 1, 0), :, :]
# remove mean
img = img * 255 - mean
# add batch size axis
img = img[np.newaxis, :, :, :].astype(np.float32)
在这一刻,你的图像在 NCHW format 中并已准备好馈送进入我们的网络。接下来,我们将加载我们预训练的模型文件并将上述图像馈送进入其中以进行预测。
At this point, your image is in NCHW format and is ready for feeding into our network. Next, we will load our pre-trained model files and feed the above image into it for prediction.
Predicting Objects in Processed Image
我们首先设置在 Caffe 的预训练模型中定义的 init 和 predict 网络的路径。
We first setup the paths for the init and predict networks defined in the pre-trained models of Caffe.
Setting Model File Paths
从我们早先的讨论中记住,所有预训练模型都安装在 models 文件夹中。我们按照如下方式设置此文件夹的路径 −
Remember from our earlier discussion, all the pre-trained models are installed in the models folder. We set up the path to this folder as follows −
CAFFE_MODELS = os.path.expanduser("/anaconda3/lib/python3.7/site-packages/caffe2/python/models")
我们按照如下方式设置 init_net 模型的 squeezenet protobuf 文件的路径 −
We set up the path to the init_net protobuf file of the squeezenet model as follows −
INIT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'init_net.pb')
同样,我们按照如下方式设置 predict_net protobuf 的路径 −
Likewise, we set up the path to the predict_net protobuf as follows −
PREDICT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'predict_net.pb')
我们出于诊断目的打印两条路径 −
We print the two paths for diagnosis purpose −
print(INIT_NET)
print(PREDICT_NET)
上面的代码和输出在此处给出以供你快速参考 −
The above code along with the output is given here for your quick reference −
CAFFE_MODELS = os.path.expanduser("/anaconda3/lib/python3.7/site-packages/caffe2/python/models")
INIT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'init_net.pb')
PREDICT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'predict_net.pb')
print(INIT_NET)
print(PREDICT_NET)
输出如下:
The output is mentioned below −
/anaconda3/lib/python3.7/site-packages/caffe2/python/models/squeezenet/init_net.pb
/anaconda3/lib/python3.7/site-packages/caffe2/python/models/squeezenet/predict_net.pb
接下来,我们将创建一个预测器。
Next, we will create a predictor.
Creating Predictor
我们使用以下两个语句读取模型文件 −
We read the model files using the following two statements −
with open(INIT_NET, "rb") as f:
init_net = f.read()
with open(PREDICT_NET, "rb") as f:
predict_net = f.read()
预测器是通过将指向两个文件的指针作为 Predictor 函数的参数来传递而创建的。
The predictor is created by passing pointers to the two files as parameters to the Predictor function.
p = workspace.Predictor(init_net, predict_net)
p 对象是预测器,用于预测图像中任何给定的对象。请注意,每个输入图像必须采用 NCHW 格式,就像我们先前对 tree.jpg 文件所做的那样。
The p object is the predictor, which is used for predicting the objects in any given image. Note that each input image must be in NCHW format as what we have done earlier to our tree.jpg file.
Predicting Objects
要预测给定图像中的对象很简单 —— 只需执行一行命令。我们对 predictor 对象调用 run 方法以在给定图像中进行对象检测。
To predict the objects in a given image is trivial - just executing a single line of command. We call run method on the predictor object for an object detection in a given image.
results = p.run({'data': img})
预测结果现在在 results 对象中,我们将该对象转换为数组以供我们阅读。
The prediction results are now available in the results object, which we convert to an array for our readability.
results = np.asarray(results)
使用以下语句打印数组的维度以加深你的理解 −
Print the dimensions of the array for your understanding using the following statement −
print("results shape: ", results.shape)
输出如下所示:
The output is as shown below −
results shape: (1, 1, 1000, 1, 1)
现在我们从中删除不必要的轴:
We will now remove the unnecessary axis −
preds = np.squeeze(results)
现在,可以通过获取 preds 阵列中的 max 值来检索最顶端预测。
The topmost predication can now be retrieved by taking the max value in the preds array.
curr_pred, curr_conf = max(enumerate(preds), key=operator.itemgetter(1))
print("Prediction: ", curr_pred)
print("Confidence: ", curr_conf)
输出如下 −
The output is as follows −
Prediction: 984
Confidence: 0.89235985
如您所见,模型预测了一个具有 984 索引值和 89% 置信度的对象。索引号 984 对于我们理解检测到的是哪种对象并无太多意义。我们需要使用其索引值来获取对象的字符串化名称。模型识别的对象及其相应的索引值可在 GitHub 存储库上找到。
As you see the model has predicted an object with an index value 984 with 89% confidence. The index of 984 does not make much sense to us in understanding what kind of object is detected. We need to get the stringified name for the object using its index value. The kind of objects that the model recognizes along with their corresponding index values are available on a github repository.
现在,我们将了解如何检索索引值为 984 的对象的名称。
Now, we will see how to retrieve the name for our object having index value of 984.
Stringifying Result
我们创建一个指向 GitHub 存储库的 URL 对象,如下所示:
We create a URL object to the github repository as follows −
codes = "https://gist.githubusercontent.com/aaronmarkham/cd3a6b6ac0
71eca6f7b4a6e40e6038aa/raw/9edb4038a37da6b5a44c3b5bc52e448ff09bfe5b/alexnet_codes"
读取 URL 的内容:
We read the contents of the URL −
response = urllib2.urlopen(codes)
响应将包含所有代码及其描述的列表。响应中显示几行内容以让您了解其中包含的内容:
The response will contain a list of all codes and its descriptions. Few lines of the response are shown below for your understanding of what it contains −
5: 'electric ray, crampfish, numbfish, torpedo',
6: 'stingray',
7: 'cock',
8: 'hen',
9: 'ostrich, Struthio camelus',
10: 'brambling, Fringilla montifringilla',
现在,我们迭代整个阵列以使用 for 循环找到所需的代码 984,如下所示:
We now iterate the entire array to locate our desired code of 984 using a for loop as follows −
for line in response:
mystring = line.decode('ascii')
code, result = mystring.partition(":")[::2]
code = code.strip()
result = result.replace("'", "")
if (code == str(curr_pred)):
name = result.split(",")[0][1:]
print("Model predicts", name, "with", curr_conf, "confidence")
运行代码后,你将看到以下输出 −
When you run the code, you will see the following output −
Model predicts rapeseed with 0.89235985 confidence
您现在可以对另一张图片进行模型尝试。
You may now try the model on another image.
Predicting a Different Image
要预测另一张图片,只需将图像文件复制到项目目录的 images 文件夹。这是我们早先的 tree.jpg 文件存储的目录。请在代码中更改图像文件名。只需进行一项更改,如下所示:
To predict another image, simply copy the image file into the images folder of your project directory. This is the directory in which our earlier tree.jpg file is stored. Change the name of the image file in the code. Only one change is required as shown below
img = skimage.img_as_float(skimage.io.imread("images/pretzel.jpg")).astype(np.float32)
原始图片和预测结果如下所示:
The original picture and the prediction result are shown below −

输出如下:
The output is mentioned below −
Model predicts pretzel with 0.99999976 confidence
如您所见,预训练模型可以极高准确度检测给定图像中的对象。
As you see the pre-trained model is able to detect objects in a given image with a great amount of accuracy.
Full Source
上面代码执行完后,使用预训练模型检测给定图像中的对象,其全部源代码如下,供您快速参考:
The full source for the above code that uses a pre-trained model for object detection in a given image is mentioned here for your quick reference −
def crop_image(img,cropx,cropy):
y,x,c = img.shape
startx = x//2-(cropx//2)
starty = y//2-(cropy//2)
return img[starty:starty+cropy,startx:startx+cropx]
img = skimage.img_as_float(skimage.io.imread("images/pretzel.jpg")).astype(np.float32)
print("Original Image Shape: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Original image')
img = resize(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
print("Image Shape after resizing: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Resized image')
img = crop_image(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
print("Image Shape after cropping: " , img.shape)
pyplot.figure()
pyplot.imshow(img)
pyplot.title('Center Cropped')
img = img.swapaxes(1, 2).swapaxes(0, 1)
print("CHW Image Shape: " , img.shape)
pyplot.figure()
for i in range(3):
pyplot.subplot(1, 3, i+1)
pyplot.imshow(img[i])
pyplot.axis('off')
pyplot.title('RGB channel %d' % (i+1))
# convert RGB --> BGR
img = img[(2, 1, 0), :, :]
# remove mean
img = img * 255 - mean
# add batch size axis
img = img[np.newaxis, :, :, :].astype(np.float32)
CAFFE_MODELS = os.path.expanduser("/anaconda3/lib/python3.7/site-packages/caffe2/python/models")
INIT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'init_net.pb')
PREDICT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'predict_net.pb')
print(INIT_NET)
print(PREDICT_NET)
with open(INIT_NET, "rb") as f:
init_net = f.read()
with open(PREDICT_NET, "rb") as f:
predict_net = f.read()
p = workspace.Predictor(init_net, predict_net)
results = p.run({'data': img})
results = np.asarray(results)
print("results shape: ", results.shape)
preds = np.squeeze(results)
curr_pred, curr_conf = max(enumerate(preds), key=operator.itemgetter(1))
print("Prediction: ", curr_pred)
print("Confidence: ", curr_conf)
codes = "https://gist.githubusercontent.com/aaronmarkham/cd3a6b6ac071eca6f7b4a6e40e6038aa/raw/9edb4038a37da6b5a44c3b5bc52e448ff09bfe5b/alexnet_codes"
response = urllib2.urlopen(codes)
for line in response:
mystring = line.decode('ascii')
code, result = mystring.partition(":")[::2]
code = code.strip()
result = result.replace("'", "")
if (code == str(curr_pred)):
name = result.split(",")[0][1:]
print("Model predicts", name, "with", curr_conf, "confidence")
到目前为止,您已了解如何使用预训练模型对数据执行预测。
By this time, you know how to use a pre-trained model for doing the predictions on your dataset.
下一步是学习如何在 Caffe2 中定义 neural network (NN) 架构并对您的数据进行训练。现在,我们将学习如何创建一个微不足道的单层 NN。
What’s next is to learn how to define your neural network (NN) architectures in Caffe2 and train them on your dataset. We will now learn how to create a trivial single layer NN.
Caffe2 - Creating Your Own Network
在本教程中,你将学习在 Caffe2 中定义一个 single layer neural network (NN) 并针对随机生成的数据集运行它。我们将编写代码来以图形方式描述网络架构,打印输入、输出、权重和偏差值。要理解本教程,你必须熟悉 neural network architectures 及其 terms 和 mathematics 。
In this lesson, you will learn to define a single layer neural network (NN) in Caffe2 and run it on a randomly generated dataset. We will write code to graphically depict the network architecture, print input, output, weights, and bias values. To understand this lesson, you must be familiar with neural network architectures, its terms and mathematics used in them.
Network Architecture
让我们考虑想要构建如下所示的单层神经网络——
Let us consider that we want to build a single layer NN as shown in the figure below −

从数学上说,此网络由以下 Python 代码表示——
Mathematically, this network is represented by the following Python code −
Y = X * W^T + b
其中 X, W, b 为张量, Y 为输出。我们将使用一些随机数据填充所有三个张量,运行网络并检查 Y 输出。为了定义网络和张量,Caffe2 提供了多个 Operator 函数。
Where X, W, b are tensors and Y is the output. We will fill all three tensors with some random data, run the network and examine the Y output. To define the network and tensors, Caffe2 provides several Operator functions.
Caffe2 Operators
在 Caffe2 中, Operator 是计算的基本单元。Caffe2 Operator 表示如下。
In Caffe2, Operator is the basic unit of computation. The Caffe2 Operator is represented as follows.

Caffe2 提供了一个详尽的操作符列表。对于我们当前正在设计的网络,我们将使用名为 FC 的运算符,它计算将输入向量 X 传递到具有二维权重矩阵 W 和单维偏差向量的全连接网络中的结果。换句话说,它计算以下数学方程
Caffe2 provides an exhaustive list of operators. For the network that we are designing currently, we will use the operator called FC, which computes the result of passing an input vector X into a fully connected network with a two-dimensional weight matrix W and a single-dimensional bias vector b. In other words, it computes the following mathematical equation
Y = X * W^T + b
其中 X 的维度为 (M x k), W ,维度为 (n x k) , b 为 (1 x n) 。输出 Y 的维度将为 (M x n) ,其中 M 为批处理大小。
Where X has dimensions (M x k), W has dimensions (n x k) and b is (1 x n). The output Y will be of dimension (M x n), where M is the batch size.
对于向量 X 和 W ,我们将使用 GaussianFill 运算符来创建一些随机数据。为了生成偏差值 b ,我们将使用 ConstantFill 运算符。
For the vectors X and W, we will use the GaussianFill operator to create some random data. For generating bias values b, we will use ConstantFill operator.
我们现在将继续定义我们的网络。
We will now proceed to define our network.
Creating Network
首先,导入所需的包——
First of all, import the required packages −
from caffe2.python import core, workspace
接下来,通过如下调用 core.Net 来定义网络——
Next, define the network by calling core.Net as follows −
net = core.Net("SingleLayerFC")
网络名称指定为 SingleLayerFC 。在这一步,创建名为 net 的网络对象。到目前为止,它不包含任何层。
The name of the network is specified as SingleLayerFC. At this point, the network object called net is created. It does not contain any layers so far.
Creating Tensors
我们现在将创建我们的网络所需的三个向量。首先,我们将通过调用 GaussianFill 运算符来创建 X 张量,如下所示——
We will now create the three vectors required by our network. First, we will create X tensor by calling GaussianFill operator as follows −
X = net.GaussianFill([], ["X"], mean=0.0, std=1.0, shape=[2, 3], run_once=0)
X 矢量的维度 2 x 3 ,平均数据值为 0,0,标准差为 1.0 。
The X vector has dimensions 2 x 3 with the mean data value of 0,0 and standard deviation of 1.0.
同样,我们如下创建 W 张量——
Likewise, we create W tensor as follows −
W = net.GaussianFill([], ["W"], mean=0.0, std=1.0, shape=[5, 3], run_once=0)
W 矢量的大小为 5 x 3 。
The W vector is of size 5 x 3.
最后,我们创建大小为 5 的偏差 b 矩阵。
Finally, we create bias b matrix of size 5.
b = net.ConstantFill([], ["b"], shape=[5,], value=1.0, run_once=0)
现在,来到代码中最重要的一部分,即对网络本身进行定义。
Now, comes the most important part of the code and that is defining the network itself.
Defining Network
我们在以下 Python 语句中对网络进行定义 −
We define the network in the following Python statement −
Y = X.FC([W, b], ["Y"])
我们对输入数据 X 调用 FC 运算符。权重在 W 中指定,偏差在 b 中指定。输出是 Y 。或者,你可以使用以下 Python 语句创建网络,这样更详细。
We call FC operator on the input data X. The weights are specified in W and bias in b. The output is Y. Alternatively, you may create the network using the following Python statement, which is more verbose.
Y = net.FC([X, W, b], ["Y"])
此时,网络刚刚创建。在至少运行一次网络之前,它不会包含任何数据。在运行网络之前,我们将检查其架构。
At this point, the network is simply created. Until we run the network at least once, it will not contain any data. Before running the network, we will examine its architecture.
Printing Network Architecture
Caffe2 在 JSON 文件中定义网络架构,可以通过在创建的 net 对象上调用 Proto 方法来对其进行检查。
Caffe2 defines the network architecture in a JSON file, which can be examined by calling the Proto method on the created net object.
print (net.Proto())
生成以下输出:
This produces the following output −
name: "SingleLayerFC"
op {
output: "X"
name: ""
type: "GaussianFill"
arg {
name: "mean"
f: 0.0
}
arg {
name: "std"
f: 1.0
}
arg {
name: "shape"
ints: 2
ints: 3
}
arg {
name: "run_once"
i: 0
}
}
op {
output: "W"
name: ""
type: "GaussianFill"
arg {
name: "mean"
f: 0.0
}
arg {
name: "std"
f: 1.0
}
arg {
name: "shape"
ints: 5
ints: 3
}
arg {
name: "run_once"
i: 0
}
}
op {
output: "b"
name: ""
type: "ConstantFill"
arg {
name: "shape"
ints: 5
}
arg {
name: "value"
f: 1.0
}
arg {
name: "run_once"
i: 0
}
}
op {
input: "X"
input: "W"
input: "b"
output: "Y"
name: ""
type: "FC"
}
如你在上述列表中所见,它首先定义运算符 X, W 和 b 。让我们举 W 的定义为例。 W 的类型指定为 GausianFill 。 mean 定义为浮点 0.0 ,标准偏差定义为浮点 1.0 ,而 shape 为 5 x 3 。
As you can see in the above listing, it first defines the operators X, W and b. Let us examine the definition of W as an example. The type of W is specified as GausianFill. The mean is defined as float 0.0, the standard deviation is defined as float 1.0, and the shape is 5 x 3.
op {
output: "W"
name: "" type: "GaussianFill"
arg {
name: "mean"
f: 0.0
}
arg {
name: "std"
f: 1.0
}
arg {
name: "shape"
ints: 5
ints: 3
}
...
}
检查 X 和 b 的定义,以便你了解。最后,让我们看看我们的单层网络定义,此处对它进行了复制
Examine the definitions of X and b for your own understanding. Finally, let us look at the definition of our single layer network, which is reproduced here
op {
input: "X"
input: "W"
input: "b"
output: "Y"
name: ""
type: "FC"
}
在此,网络类型为 FC (全连接), X, W, b 为输入, Y 为输出。此网络定义过于详细,对于大型网络来说,检查其内容会变得很乏味。幸运的是,Caffe2 为已创建的网络提供了图形化表示。
Here, the network type is FC (Fully Connected) with X, W, b as inputs and Y is the output. This network definition is too verbose and for large networks, it will become tedious to examine its contents. Fortunately, Caffe2 provides a graphical representation for the created networks.
Network Graphical Representation
要获取网络的图形化表示,请运行以下代码片段,它本质上只有两行 Python 代码。
To get the graphical representation of the network, run the following code snippet, which is essentially only two lines of Python code.
from caffe2.python import net_drawer
from IPython import display
graph = net_drawer.GetPydotGraph(net, rankdir="LR")
display.Image(graph.create_png(), width=800)
运行代码后,你将看到以下输出 −
When you run the code, you will see the following output −

对于大型网络,图形化表示在可视化和调试网络定义错误方面非常有用。
For large networks, the graphical representation becomes extremely useful in visualizing and debugging network definition errors.
最后,现在是运行网络的时候了。
Finally, it is now time to run the network.
Running Network
你可以通过对 workspace 对象调用 RunNetOnce 方法来运行网络 −
You run the network by calling the RunNetOnce method on the workspace object −
workspace.RunNetOnce(net)
在运行网络一次后,所有随机生成的我们数据都会被创建,并馈送到网络中,并且将创建输出。在运行网络后创建的张量在 Caffe2 中被称为 blobs 。工作区包含你创建并存储在内存中的 blobs 。这与 Matlab 非常相似。
After the network is run once, all our data that is generated at random would be created, fed into the network and the output will be created. The tensors which are created after running the network are called blobs in Caffe2. The workspace consists of the blobs you create and store in memory. This is quite similar to Matlab.
在运行网络后,你可以使用以下 print 命令检查工作区包含的 blobs
After running the network, you can examine the blobs that the workspace contains using the following print command
print("Blobs in the workspace: {}".format(workspace.Blobs()))
您将看到以下输出 −
You will see the following output −
Blobs in the workspace: ['W', 'X', 'Y', 'b']
请注意,工作区包含三个输入 blob − X, W 和 b 。它还包含名为 Y 的输出 blob。现在让我们检查一下这些 blob 的内容。
Note that the workspace consists of three input blobs − X, W and b. It also contains the output blob called Y. Let us now examine the contents of these blobs.
for name in workspace.Blobs():
print("{}:\n{}".format(name, workspace.FetchBlob(name)))
您将看到以下输出 −
You will see the following output −
W:
[[ 1.0426593 0.15479846 0.25635982]
[-2.2461145 1.4581774 0.16827184]
[-0.12009818 0.30771437 0.00791338]
[ 1.2274994 -0.903331 -0.68799865]
[ 0.30834186 -0.53060573 0.88776857]]
X:
[[ 1.6588869e+00 1.5279824e+00 1.1889904e+00]
[ 6.7048723e-01 -9.7490678e-04 2.5114202e-01]]
Y:
[[ 3.2709925 -0.297907 1.2803618 0.837985 1.7562964]
[ 1.7633215 -0.4651525 0.9211631 1.6511179 1.4302125]]
b:
[1. 1. 1. 1. 1.]
请注意,由于所有输入都是随机创建的,因此机器上的数据,或者事实上每次运行网络时的生成数据都是不同的。你现在已经成功定义了一个网络,并在计算机上运行了它。
Note that the data on your machine or as a matter of fact on every run of the network would be different as all inputs are created at random. You have now successfully defined a network and run it on your computer.
Caffe2 - Defining Complex Networks
在前一课中,你学习了如何创建一个平凡的网络,并学习了如何执行它和检查其输出。创建复杂网络的过程与上面描述的过程类似。Caffe2 提供了一大组运算符,用于创建复杂架构。建议你查看 Caffe2 文档,了解运算符列表。在研究了各种运算符的用途后,你将能够创建复杂网络并对其进行训练。对于训练网络,Caffe2 提供了多个 predefined computation units - 即运算符。你将需要为训练网络选择合适的运算符,以解决你尝试解决的问题类型。
In the previous lesson, you learned to create a trivial network and learned how to execute it and examine its output. The process for creating complex networks is similar to the process described above. Caffe2 provides a huge set of operators for creating complex architectures. You are encouraged to examine the Caffe2 documentation for a list of operators. After studying the purpose of various operators, you would be in a position to create complex networks and train them. For training the network, Caffe2 provides several predefined computation units - that is the operators. You will need to select the appropriate operators for training your network for the kind of problem that you are trying to solve.
一旦网络经过你的满意训练,你就可以将其存储在模型文件中,类似于你之前使用的预训练模型文件。这些经过训练的模型可能会贡献给 Caffe2 存储库,以造福其他用户。或者,你也可以简单地将经过训练的模型用于自己的私有生产。
Once a network is trained to your satisfaction, you can store it in a model file similar to the pre-trained model files you used earlier. These trained models may be contributed to Caffe2 repository for the benefits of other users. Or you may simply put the trained model for your own private production use.
Summary
Caffe2 是一个深度学习框架,它允许你尝试多种神经网络来预测数据。Caffe2 网站提供了许多预训练模型。你学会了使用其中一个预训练模型对给定图像中的对象进行分类。你还学会了定义你选择的任意神经网络架构。这种自定义网络可以使用 Caffe 中的许多预定义运算符进行训练。经过训练的模型会存储在一个文件中,该文件可以带到生产环境中。
Caffe2, which is a deep learning framework allows you to experiment with several kinds of neural networks for predicting your data. Caffe2 site provides many pre-trained models. You learned to use one of the pre-trained models for classifying objects in a given image. You also learned to define a neural network architecture of your choice. Such custom networks can be trained using many predefined operators in Caffe. A trained model is stored in a file which can be taken into a production environment.