Opencv Python 简明教程

OpenCV-Python - Quick Guide

OpenCV Python - Overview

OpenCV 代表 Open Source Computer Vision ,是一个在实际计算机视觉应用程序编程中非常有用的函数库。计算机视觉一词用于描述使用计算机程序执行数字图像和视频分析的主题。计算机视觉是现代学科(如人工智能和机器学习)的一个重要组成部分。

OpenCV stands for Open Source Computer Vision and is a library of functions which is useful in real time computer vision application programming. The term Computer vision is used for a subject of performing the analysis of digital images and videos using a computer program. Computer vision is an important constituent of modern disciplines such as artificial intelligence and machine learning.

OpenCV 最初由英特尔开发,是一个用 C++ 编写、跨平台的库,但也具有为许多其他编程语言(例如 Java 和 Python)而开发的 OpenCV C 接口包装器。在该教程中,我们将描述 OpenCV 的 Python 库的功能。

Originally developed by Intel, OpenCV is a cross platform library written in C++ but also has a C Interface Wrappers for OpenCV which have been developed for many other programming languages such as Java and Python. In this tutorial, functionality of OpenCV’s Python library will be described.

OpenCV-Python

OpenCV-Python 是 OpenCV 库的 C++ 实现的 Python 包装器。它利用 NumPy 库进行数值运算,并且是为计算机视觉问题提供的快速原型制作工具。

OpenCV-Python is a Python wrapper around C++ implementation of OpenCV library. It makes use of NumPy library for numerical operations and is a rapid prototyping tool for computer vision problems.

OpenCV-Python 是一个跨平台的库,适用于所有操作系统 (OS) 平台,包括 Windows、Linux、macOS 和 Android。OpenCV 还支持图形处理单元 (GPU) 加速。

OpenCV-Python is a cross-platform library, available for use on all Operating System (OS) platforms including, Windows, Linux, MacOS and Android. OpenCV also supports the Graphics Processing Unit (GPU) acceleration.

本教程专为计算机科学学生和专业人员设计,他们希望在计算机视觉应用领域获得专业知识。掌握 OpenCV-Python 的功能必须具备 Python 和 NumPy 库的先验知识。

This tutorial is designed for the computer science students and professionals who wish to gain expertise in the field of computer vision applications. Prior knowledge of Python and NumPy library is essential to understand the functionality of OpenCV-Python.

OpenCV Python - Environment Setup

在大多数情况下,使用 pip 就足以在电脑上安装 OpenCV-Python。

In most of the cases, using pip should be sufficient to install OpenCV-Python on your computer.

用于安装 pip 的命令如下 −

The command which is used to install pip is as follows −

pip install opencv-python

建议在新虚拟环境中执行此安装。当前版本的 OpenCV-Python 是 4.5.1.48,可以通过以下命令进行验证 −

Performing this installation in a new virtual environment is recommended. The current version of OpenCV-Python is 4.5.1.48 and it can be verified by following command −

>>> import cv2
>>> cv2.__version__
'4.5.1'

由于 OpenCV-Python 依赖于 NumPy,后者也将自动安装。根据选择,你可以安装 Matplotlib 来渲染某些图形化输出。

Since OpenCV-Python relies on NumPy, it is also installed automatically. Optionally, you may install Matplotlib for rendering certain graphical output.

在 Fedora 上,你可以通过下面提到的命令安装 OpenCV-Python −

On Fedora, you may install OpenCV-Python by the below mentioned command −

$ yum install numpy opencv*

OpenCV-Python 也可以通过从 http://sourceforge.net 获得的源构建来安装。请按照为其提供的安装说明进行操作。

OpenCV-Python can also be installed by building from its source available at http://sourceforge.net Follow the installation instructions given for the same.

OpenCV Python - Reading an image

CV2 程序包(OpenCV-Python 库的名称)提供了 imread() 函数来读取图像。

The CV2 package (name of OpenCV-Python library) provides the imread() function to read an image.

读取图像的命令如下 −

The command to read an image is as follows −

img=cv2.imread(filename, flags)

flags 参数是对以下常量的枚举 −

The flags parameters are the enumeration of following constants −

  1. cv2.IMREAD_COLOR (1) − Loads a color image.

  2. cv2.IMREAD_GRAYSCALE (0) − Loads image in grayscale mode

  3. cv2.IMREAD_UNCHANGED (-1) − Loads image as such including alpha channel

该函数将返回一个图像对象,可以使用 imshow() 函数来渲染该对象。imshow() 函数的命令如下 −

The function will return an image object, which can be rendered using imshow() function. The command for using imshow() function is given below −

cv2.imshow(window-name, image)

图像将在一个已命名窗口中显示。使用 AUTOSIZE 标志集创建一个新的窗口。 WaitKey() 是一种键盘绑定函数。它的参数以毫秒为单位的时间。

The image is displayed in a named window. A new window is created with the AUTOSIZE flag set. The WaitKey() is a keyboard binding function. Its argument is the time in milliseconds.

该函数将等待指定的时间并保持窗口显示,直到按下某个键。最后,我们可以销毁所有已创建的窗口。

The function waits for specified milliseconds and keeps the window on display till a key is pressed. Finally, we can destroy all the windows thus created.

该函数将等待指定的时间并保持窗口显示,直到按下某个键。最后,我们可以销毁所有已创建的窗口。

The function waits for specified milliseconds and keeps the window on display till a key is pressed. Finally, we can destroy all the windows thus created.

显示 OpenCV 徽标的程序如下 −

The program to display the OpenCV logo is as follows −

import numpy as np
import cv2
# Load a color image in grayscale
img = cv2.imread('OpenCV_Logo.png',1)
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()

上述程序将按如下方式显示 OpenCV 徽标 −

The above program displays the OpenCV logo as follows −

opencv logo

OpenCV Python - Write an image

CV2 程序包有 imwrite() 函数,可将图像对象保存到指定的文件中。

CV2 package has imwrite() function that saves an image object to a specified file.

使用 imwrite() 函数保存图像的命令如下 −

The command to save an image with the help of imwrite() function is as follows −

cv2.imwrite(filename, img)

图像格式由 OpenCV 从文件扩展名自动确定。OpenCV 支持 .bmp, *.dib , *.jpeg, *.jpg, *.png, .webp、 .sr, .tiff、*.tif 等图像文件类型。

The image format is automatically decided by OpenCV from the file extension. OpenCV supports .bmp, *.dib , *.jpeg, *.jpg, *.png,.webp, .sr,.tiff, \*.tif etc. image file types.

Example

以下程序加载 OpenCV 徽标图像并在按下“s”键时保存其灰度版本 −

Following program loads OpenCV logo image and saves its greyscale version when ‘s’ key is pressed −

import numpy as np
import cv2
# Load an color image in grayscale
img = cv2.imread('OpenCV_Logo.png',0)
cv2.imshow('image',img)
key=cv2.waitKey(0)
if key==ord('s'):
   cv2.imwrite("opencv_logo_GS.png", img)
cv2.destroyAllWindows()

Output

opencv logo greyscale

OpenCV Python - Using Matplotlib

Python 的 Matplotlib 是一个功能强大的绘图库,它为各种绘图类型提供了大量的绘图函数。它还具有用于呈现图片的 imshow() 函数。它提供了诸如缩放、保存等附加功能。

Python’s Matplotlib is a powerful plotting library with a huge collection of plotting functions for the variety of plot types. It also has imshow() function to render an image. It gives additional facilities such as zooming, saving etc.

Example

在运行以下程序之前,确保在当前工作环境中已安装 Matplotlib。

Ensure that Matplotlib is installed in the current working environment before running the following program.

import numpy as np
import cv2
import matplotlib.pyplot as plt
# Load an color image in grayscale
img = cv2.imread('OpenCV_Logo.png',0)
plt.imshow(img)
plt.show()

Output

opencv logo matplotlib

OpenCV Python - Image Properties

OpenCV 会在 NumPy 数组中读取图像数据。此 ndarray 对象的 shape() 方法揭示了图像属性,例如维度和通道。

OpenCV reads the image data in a NumPy array. The shape() method of this ndarray object reveals image properties such as dimensions and channels.

使用 shape() 方法的命令如下:

The command to use the shape() method is as follows −

>>> img = cv.imread("OpenCV_Logo.png", 1)
>>> img.shape
(222, 180, 3)

在上述命令中:

In the above command −

  1. The first two items shape[0] and shape[1] represent width and height of the image.

  2. Shape[2] stands for a number of channels.

  3. 3 indicates that the image has Red Green Blue (RGB) channels.

类似地,size 属性返回图像大小。图像大小的命令如下:

Similarly, the size property returns the size of the image. The command for the size of an image is as follows −

>>> img.size
119880

ndarray 中的每个元素代表一个图像像素。

Each element in the ndarray represents one image pixel.

我们可以借助下面提到的命令访问和操作任何像素的值。

We can access and manipulate any pixel’s value, with the help of the command mentioned below.

>>> p=img[50,50]
>>> p
array([ 1, 1, 255], dtype=uint8)

Example

以下代码将前 100X100 个像素的颜色值更改为黑色。 imshow() 函数可以验证结果。

Following code changes the color value of the first 100X100 pixels to black. The imshow() function can verify the result.

>>> for i in range(100):
   for j in range(100):
      img[i,j]=[0,0,0]

Output

imshow

图像通道可以通过使用 split() 函数拆分为单个平面。可以通过使用 merge() 函数合并通道。

The image channels can be split in individual planes by using the split() function. The channels can be merged by using merge() function.

split() 函数返回一个多通道数组。

The split() function returns a multi-channel array.

我们可以使用以下命令拆分图像通道:

We can use the following command to split the image channels −

>>> img = cv.imread("OpenCV_Logo.png", 1)
>>> b,g,r = cv.split(img)

您现在可以对每个平面进行操作。

You can now perform manipulation on each plane.

假设我们把蓝色通道里的所有像素设为 0,代码如下 −

Suppose we set all pixels in blue channel to 0, the code will be as follows −

>>> img[:, :, 0]=0
>>> cv.imshow("image", img)

结果的图像将如下所示 −

The resultant image will be shown as below −

individual planes

OpenCV Python - Bitwise Operations

位操作用于图像处理和提取图像中的基本部分。

Bitwise operations are used in image manipulation and for extracting the essential parts in the image.

OpenCV 中实现了以下运算符:

Following operators are implemented in OpenCV −

  1. bitwise_and

  2. bitwise_or

  3. bitwise_xor

  4. bitwise_not

Example 1

为了演示如何使用这些运算符,获取了两张带有已填充和未填充圆圈的图像。

To demonstrate the use of these operators, two images with filled and empty circles are taken.

以下程序演示了如何在 OpenCV-Python 中使用位运算符:

Following program demonstrates the use of bitwise operators in OpenCV-Python −

import cv2
import numpy as np

img1 = cv2.imread('a.png')
img2 = cv2.imread('b.png')

dest1 = cv2.bitwise_and(img2, img1, mask = None)
dest2 = cv2.bitwise_or(img2, img1, mask = None)
dest3 = cv2.bitwise_xor(img1, img2, mask = None)

cv2.imshow('A', img1)
cv2.imshow('B', img2)
cv2.imshow('AND', dest1)
cv2.imshow('OR', dest2)
cv2.imshow('XOR', dest3)
cv2.imshow('NOT A', cv2.bitwise_not(img1))
cv2.imshow('NOT B', cv2.bitwise_not(img2))

if cv2.waitKey(0) & 0xff == 27:
   cv2.destroyAllWindows()

Output

bitwise operators
bitwise operator
bitwises operator

Example 2

在另一个涉及位运算的示例中,将 opencv 标志叠加到另一张图像上。在这里,我们通过 threshold() 函数调用从标志中获得一个掩码数组,并在它们之间执行 AND 操作。

In another example involving bitwise operations, the opencv logo is superimposed on another image. Here, we obtain a mask array calling threshold() function on the logo and perform AND operation between them.

类似地,通过 NOT 操作,我们得到一个反向掩码。此外,我们还可以用背景图像进行 AND 操作。

Similarly, by NOT operation, we get an inverse mask. Also, we get AND with the background image.

以下程序确定了位操作的使用:

Following is the program which determines the use of bitwise operations −

import cv2 as cv
import numpy as np

img1 = cv.imread('lena.jpg')
img2 = cv.imread('whitelogo.png')
rows,cols,channels = img2.shape
roi = img1[0:rows, 0:cols]
img2gray = cv.cvtColor(img2,cv.COLOR_BGR2GRAY)
ret, mask = cv.threshold(img2gray, 10, 255, cv.THRESH_BINARY)
mask_inv = cv.bitwise_not(mask)
# Now black-out the area of logo
img1_bg = cv.bitwise_and(roi,roi,mask = mask_inv)

# Take only region of logo from logo image.
img2_fg = cv.bitwise_and(img2,img2,mask = mask)
# Put logo in ROI
dst = cv.add(img2_fg, img1_bg)
img1[0:rows, 0:cols ] = dst
cv.imshow(Result,img1)
cv.waitKey(0)
cv.destroyAllWindows()

Output

掩码图像提供了以下结果:

The masked images give following result −

bitwise operators mask

OpenCV Python - Draw Shapes and Text

在本章中,我们将学习如何在 OpenCV-Python 的帮助下在图像中绘制形状和文本。让我们从了解在图像上绘制形状开始。

In this chapter, we will learn how to draw shapes and text on images with the help of OpenCV-Python. Let us begin by understanding about drawing shapes on images.

Draw Shapes on Images

我们需要了解 OpenCV-Python 中所需的函数,这有助于我们绘制图像上的形状。

We need to understand the required functions in OpenCV-Python, which help us to draw the shapes on images.

Functions

OpenCV-Python 程序包(称为 cv2)包含以下用于绘制相应形状的函数。

The OpenCV-Python package (referred as cv2) contains the following functions to draw the respective shapes.

Function

Description

Command

cv2.line()

Draws a line segment connecting two points.

cv2.line(img, pt1, pt2, color, thickness)

cv2.circle()

Draws a circle of given radius at given point as center

cv2.circle(img, center, radius, color, thickness)

cv2.rectangle

Draws a rectangle with given points as top-left and bottom-right.

cv2.rectangle(img, pt1, pt2, color, thickness)

cv2.ellipse()

Draws a simple or thick elliptic arc or fills an ellipse sector.

cv2.ellipse(img, center, axes, angle, startAngle, endAngle, color, thickness)

Parameters

以上函数的通用参数如下 −

The common parameters to the above functions are as follows −

Sr.No.

Function & Description

1

img The image where you want to draw the shapes

2

color Color of the shape. for BGR, pass it as a tuple. For grayscale, just pass the scalar value.

3

thickness Thickness of the line or circle etc. If -1 is passed for closed figures like circles, it will fill the shape.

4

lineType Type of line, whether 8-connected, anti-aliased line etc.

Example

以下示例显示如何在图像上绘制形状。程序如下 −

Following example shows how the shapes are drawn on top of an image. The program for the same is given below −

import numpy as np
import cv2
img = cv2.imread('LENA.JPG',1)
cv2.line(img,(20,400),(400,20),(255,255,255),3)
cv2.rectangle(img,(200,100),(400,400),(0,255,0),5)
cv2.circle(img,(80,80), 55, (255,255,0), -1)
cv2.ellipse(img, (300,425), (80, 20), 5, 0, 360, (0,0,255), -1)
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Output

draw shapes

Draw Text

提供 cv2.putText() 函数在图像上写文本。命令如下 −

The cv2.putText() function is provided to write a text on the image. The command for the same is as follows −

img, text, org, fontFace, fontScale, color, thickness)

Fonts

OpenCV 支持以下字体 −

OpenCV supports the following fonts −

Font Name

Font Size

FONT_HERSHEY_SIMPLEX

0

FONT_HERSHEY_PLAIN

1

FONT_HERSHEY_DUPLEX

2

FONT_HERSHEY_COMPLEX

3

FONT_HERSHEY_TRIPLEX

4

FONT_HERSHEY_COMPLEX_SMALL

5

FONT_HERSHEY_SCRIPT_SIMPLEX

6

FONT_HERSHEY_SCRIPT_COMPLEX

7

FONT_ITALIC

16

Example

以下程序为照片添加文本标题,照片中是著名的足球运动员莱昂内尔·梅西。

Following program adds a text caption to a photograph showing Lionel Messi, the famous footballer.

import numpy as np
import cv2
img = cv2.imread('messi.JPG',1)
txt="Lionel Messi"
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img,txt,(10,100), font, 2,(255,255,255),2,cv2.LINE_AA)

cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Output

draw text

OpenCV Python - Handling Mouse Events

OpenCV 能够用一个回调函数来注册各种鼠标相关事件。这用于根据鼠标事件的类型来启动一个特定用户定义的动作。

OpenCV is capable of registering various mouse related events with a callback function. This is done to initiate a certain user defined action depending on the type of mouse event.

Sr.No

Mouse event & Description

1

cv.EVENT_MOUSEMOVE When the mouse pointer has moved over the window.

2

cv.EVENT_LBUTTONDOWN Indicates that the left mouse button is pressed.

3

cv.EVENT_RBUTTONDOWN Event of that the right mouse button is pressed.

4

cv.EVENT_MBUTTONDOWN Indicates that the middle mouse button is pressed.

5

cv.EVENT_LBUTTONUP When the left mouse button is released.

6

cv.EVENT_RBUTTONUP When the right mouse button is released.

7

cv.EVENT_MBUTTONUP Indicates that the middle mouse button is released.

8

cv.EVENT_LBUTTONDBLCLK This event occurs when the left mouse button is double clicked.

9

cv.EVENT_RBUTTONDBLCLK Indicates that the right mouse button is double clicked.

10

cv.EVENT_MBUTTONDBLCLK Indicates that the middle mouse button is double clicked.

11

cv.EVENT_MOUSEWHEEL Positive for forward and negative for backward scrolling.

要根据鼠标事件来调用函数,就必须使用 setMouseCallback() 函数来注册。对应的命令如下所示:

To fire a function on a mouse event, it has to be registered with the help of setMouseCallback() function. The command for the same is as follows −

cv2.setMouseCallback(window, callbak_function)

此函数将事件的类型和位置传递给回调函数以进行进一步处理。

This function passes the type and location of the event to the callback function for further processing.

Example 1

以下代码会在窗口图像背景上发生左键双击事件时绘制一个圆:

Following code draws a circle whenever left button double click event occurs on the window showing an image as background −

import numpy as np
import cv2 as cv
# mouse callback function
def drawfunction(event,x,y,flags,param):
   if event == cv.EVENT_LBUTTONDBLCLK:
      cv.circle(img,(x,y),20,(255,255,255),-1)
img = cv.imread('lena.jpg')
cv.namedWindow('image')
cv.setMouseCallback('image',drawfunction)
while(1):
   cv.imshow('image',img)
   key=cv.waitKey(1)
   if key == 27:
      break
cv.destroyAllWindows()

Output

运行以上程序然后在随机位置双击。将会看到类似的输出:

Run the above program and double click at random locations. The similar output will appear −

mouse events

Example 2

根据用户输入(1、2 或 3)交互式绘制矩形、线或圆:

Following program interactively draws either rectangle, line or circle depending on user input (1,2 or 3) −

import numpy as np
import cv2 as cv
# mouse callback function

drawing=True
shape='r'

def draw_circle(event,x,y,flags,param):
   global x1,x2
   if event == cv.EVENT_LBUTTONDOWN:
      drawing = True
      x1,x2 = x,y
   elif event == cv.EVENT_LBUTTONUP:
      drawing = False
      if shape == 'r':
         cv.rectangle(img,(x1,x2),(x,y),(0,255,0),-1)
      if shape == 'l':
         cv.line(img,(x1,x2),(x,y),(255,255,255),3)
      if shape=='c':
         cv.circle(img,(x,y), 10, (255,255,0), -1)
img = cv.imread('lena.jpg')
cv.namedWindow('image')
cv.setMouseCallback('image',draw_circle)
while(1):
   cv.imshow('image',img)
   key=cv.waitKey(1)
   if key==ord('1'):
      shape='r'
   if key==ord('2'):
      shape='l'
   if key==ord('3'):
      shape='c'

   #print (shape)
   if key == 27:
      break
   cv.destroyAllWindows()

如果按下“1”,将在鼠标左键按下和抬起的坐标之间绘制一个矩形。

On the window surface, a rectangle is drawn between the coordinates of the mouse left button down and up if ‘1’ is pressed.

如果用户选择 2,将使用坐标作为端点绘制一条线。

If user choice is 2, a line is drawn using coordinates as endpoints.

如果选择 3 绘制圆,则会在鼠标抬起事件的坐标处绘制一个圆。

On choosing 3 for the circle, it is drawn at the coordinates of the mouse up event.

在成功执行以上程序后,输出将显示如下图像:

Following image will be the output after the successful execution of the above mentioned program −

mouse event

OpenCV Python - Add Trackbar

OpenCV 中的轨迹条是滑块控件,通过手动在条上滑动标签,帮助从连续范围内选取一个变量值。标签位置与一个值同步。

Trackbar in OpenCV is a slider control which helps in picking a value for the variable from a continuous range by manually sliding the tab over the bar. Position of the tab is synchronised with a value.

createTrackbar() 函数使用以下命令创建一个 Trackbar 对象 −

The createTrackbar() function creates a Trackbar object with the following command −

cv2.createTrackbar(trackbarname, winname, value, count, TrackbarCallback)

在以下示例中,为用户提供了三个轨迹条,以便从 0 到 255 的灰度范围内设置 R、G 和 B 的值。

In the following example, three trackbars are provided for the user to set values of R, G and B from the grayscale range 0 to 255.

使用轨迹条位置值,绘制一个矩形,其填充颜色对应于 RGB 颜色值。

Using the track bar position values, a rectangle is drawn with the fill colour corresponding to RGB colour value.

Example

以下程序用于添加轨迹条 −

Following program is for adding a trackbar −

import numpy as np
import cv2 as cv
img = np.zeros((300,400,3), np.uint8)
cv.namedWindow('image')
def nothing(x):
   pass

# create trackbars for color change
cv.createTrackbar('R','image',0,255,nothing)
cv.createTrackbar('G','image',0,255,nothing)
cv.createTrackbar('B','image',0,255,nothing)

while(1):
   cv.imshow('image',img)
   k = cv.waitKey(1) & 0xFF
   if k == 27:
      break
   # get current positions of four trackbars
   r = cv.getTrackbarPos('R','image')
   g = cv.getTrackbarPos('G','image')
   b = cv.getTrackbarPos('B','image')

   #s = cv.getTrackbarPos(switch,'image')
   #img[:] = [b,g,r]
   cv.rectangle(img, (100,100),(200,200), (b,g,r),-1)
   cv.destroyAllWindows()

Output

trackbar

OpenCV Python - Resize and Rotate an Image

在本章中,我们将了解如何使用 OpenCVPython 调整图像大小和旋转图像。

In this chapter, we will learn how to resize and rotate an image with the help of OpenCVPython.

Resize an Image

可以使用 cv2.resize() 函数放大或缩小图像。

It is possible to scale up or down an image with the use of cv2.resize() function.

resize() 函数的使用方式如下:

The resize() function is used as follows −

resize(src, dsize, dst, fx, fy, interpolation)

一般来说,插值是在已知数据点之间估计算值的处理过程。

In general, interpolation is a process of estimating values between known data points.

当图形数据包含一个间隙,但在间隙的两侧或间隙内的几个特定点内有数据可用时,插值允许我们估算间隙内的值。

When graphical data contains a gap, but data is available on either side of the gap or at a few specific points within the gap. Interpolation allows us to estimate the values within the gap.

在上 resize() 函数中,插值标记确定用于计算目标图像大小的插值类型。

In the above resize() function, interpolation flags determine the type of interpolation used for calculating size of destination image.

Types of Interpolation

插值类型如下:

The types of interpolation are as follows −

  1. INTER_NEAREST − A nearest-neighbor interpolation.

  2. INTER_LINEAR − A bilinear interpolation (used by default)

  3. INTER_AREA − Resampling using pixel area relation. It is a preferred method for image decimation but when the image is zoomed, it is similar to the INTER_NEAREST method.

  4. INTER_CUBIC − A bicubic interpolation over 4x4 pixel neighborhood

  5. INTER_LANCZOS4 − A Lanczos interpolation over 8x8 pixel neighborhood

首选插值方法是 cv2.INTER_AREA 用于缩小,cv2.INTER_CUBIC(慢)和 cv2.INTER_LINEAR 用于缩放。

Preferable interpolation methods are cv2.INTER_AREA for shrinking and cv2.INTER_CUBIC (slow) & cv2.INTER_LINEAR for zooming.

Example

以下代码将“messi.jpg”图像缩小到其原始高度和宽度的二分之一。

Following code resizes the ‘messi.jpg’ image to half its original height and width.

import numpy as np
import cv2
img = cv2.imread('messi.JPG',1)
height, width = img.shape[:2]
res = cv2.resize(img,(int(width/2), int(height/2)), interpolation =
cv2.INTER_AREA)

cv2.imshow('image',res)
cv2.waitKey(0)
cv2.destroyAllWindows()

Output

resize image

Rotate an image

OpenCV 使用仿射变换函数对图像进行诸如平移和旋转之类的操作。仿射变换是一种变换,可以表示为矩阵乘法(线性变换)后跟向量加法(平移)。

OpenCV uses affine transformation functions for operations on images such as translation and rotation. The affine transformation is a transformation that can be expressed in the form of a matrix multiplication (linear transformation) followed by a vector addition (translation).

cv2 模块提供两个函数 cv2.warpAffinecv2.warpPerspective ,您可以使用它们执行各种变换。cv2.warpAffine 采用 2x3 变换矩阵,而 cv2.warpPerspective 采用 3x3 变换矩阵作为输入。

The cv2 module provides two functions cv2.warpAffine and cv2.warpPerspective, with which you can have all kinds of transformations. cv2.warpAffine takes a 2x3 transformation matrix while cv2.warpPerspective takes a 3x3 transformation matrix as input.

为了找到用于旋转的变换矩阵,OpenCV 提供了一个函数 cv2.getRotationMatrix2D ,如下所示:

To find this transformation matrix for rotation, OpenCV provides a function, cv2.getRotationMatrix2D, which is as follows −

getRotationMatrix2D(center, angle, scale)

然后我们将 warpAffine 函数应用于 getRotationMatrix2D()函数返回的矩阵,以获得旋转后的图像。

We then apply the warpAffine function to the matrix returned by getRotationMatrix2D() function to obtain rotated image.

以下程序将原始图像旋转 90 度,而不改变其尺寸:

Following program rotates the original image by 90 degrees without changing the dimensions −

Example

import numpy as np
import cv2
img = cv2.imread('OpenCV_Logo.png',1)
h, w = img.shape[:2]

center = (w / 2, h / 2)
mat = cv2.getRotationMatrix2D(center, 90, 1)
rotimg = cv2.warpAffine(img, mat, (h, w))
cv2.imshow('original',img)
cv2.imshow('rotated', rotimg)
cv2.waitKey(0)
cv2.destroyAllWindows()

Output

Original Image

Original Image

original image

Rotated Image

Rotated Image

rotated image

OpenCV Python - Image Threshold

在数字图像处理中,阈值处理是一个基于像素强度阈值创建二值图像的过程。阈值处理过程将前景像素与背景像素分离开来。

In digital image processing, the thresholding is a process of creating a binary image based on a threshold value of pixel intensity. Thresholding process separates the foreground pixels from background pixels.

OpenCV 提供了执行 simple, adaptiveOtsu’s 阈值处理的函数。

OpenCV provides functions to perform simple, adaptive and Otsu’s thresholding.

在简单阈值化中,所有值小于阈值的像素都被设置为零,其余的则被设置为最大像素值。这是最简单的阈值化形式。

In simple thresholding, all pixels with value less than threshold are set to zero, rest to the maximum pixel value. This is the simplest form of thresholding.

cv2.threshold() 函数具有以下定义。

The cv2.threshold() function has the following definition.

cv2.threshold((src, thresh, maxval, type, dst)

Parameters

图像阈值化的参数如下:

The parameters for the image thresholding are as follows −

  1. Src: Input array.

  2. Dst: Output array of same size.

  3. Thresh: Threshold value.

  4. Maxval: Maximum value.

  5. Type: Thresholding type.

Types of Thresholding

其他类型的阈值化如下所示:

Other types of thresholding are enumerated as below −

Sr.No

Type & Function

1

cv.THRESH_BINARY dst(x,y) = maxval if src(x,y)>thresh 0 otherwise

2

cv.THRESH_BINARY_INV dst(x,y)=0 if src(x,y)>thresh maxval otherwise

3

cv.THRESH_TRUNC dst(x,y)=threshold if src(x,y)>thresh src(x,y) otherwise

4

cv.THRESH_TOZERO dst(x,y)=src(x,y) if src(x,y)>thresh 0 otherwise

5

cv.THRESH_TOZERO_INV dst(x,y)=0 if src(x,y)>thresh src(x,y)otherwise

这些阈值类型根据以下图表对输入图像执行操作:

These threshold types result in operation on input image according to following diagram −

threshold

threshold() 函数返回所使用的阈值和阈值图像。

The threshold() function returns threshold used and threshold image.

以下程序通过将阈值设为 127,从原始图像生成一个从 255 到 0 具有灰色值渐变的二进制图像。

Following program produces a binary image from the original with a gradient of grey values from 255 to 0 by setting a threshold to 127.

Example

最初的和产生的阈值二进制图像使用 Matplotlib 库并排绘制。

Original and resultant threshold binary images are plotted side by side using Matplotlib library.

import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('gradient.png',0)
ret,img1 = cv.threshold(img,127,255,cv.THRESH_BINARY)

plt.subplot(2,3,1),plt.imshow(img,'gray',vmin=0,vmax=255)
plt.title('Original')
plt.subplot(2,3,2),plt.imshow(img1,'gray',vmin=0,vmax=255)
plt.title('Binary')
plt.show()

Output

threshold binary

自适应阈值化根据其周围的一个小区域确定像素的阈值。因此,得到了同一图像中不同区域的不同阈值。这为照明不同的图像提供了更好的结果。

The adaptive thresholding determines the threshold for a pixel based on a small region around it. So, different thresholds for different regions of the same image are obtained. This gives better results for images with varying illumination.

cv2.adaptiveThreshold() 方法采用以下输入参数:

The cv2.adaptiveThreshold() method takes following input arguments −

cv.adaptiveThreshold( src, maxValue, adaptiveMethod, thresholdType, blockSize, C[, dst] )

adaptiveMethod 具有以下枚举值:

The adaptiveMethod has following enumerated values −

  1. cv.ADAPTIVE_THRESH_MEAN_C − The threshold value is the mean of the neighbourhood area minus the constant C.

  2. cv.ADAPTIVE_THRESH_GAUSSIAN_C − The threshold value is a Gaussianweighted sum of the neighbourhood values minus the constant C.

Example

在下面的示例中,原始图像 (messi.jpg) 应用平均值和高斯自适应阈值化。

In the example below, the original image (messi.jpg is applied with mean and Gaussian adaptive thresholding.

import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('messi.jpg',0)
img = cv.medianBlur(img,5)
th1 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
   cv.THRESH_BINARY,11,2)
th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
   cv.THRESH_BINARY,11,2)
titles = ['Original', 'Mean Thresholding', 'Gaussian Thresholding']
images = [img, th1, th2]
for i in range(3):
   plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
   plt.title(titles[i])
   plt.xticks([]),plt.yticks([])
plt.show()

Output

原始图像和自适应阈值二值图像使用 matplotlib 绘制,如下所示 −

Original as well as adaptive threshold binary images are plotted by using matplotlib as shown below −

adaptive threshold binary

Example

OTSU 算法从图像直方图中自动确定阈值。我们需要在 THRESH-BINARY 标志中传递 cv.THRES_OTSU 标志。

OTSU algorithm determines the threshold value automatically from the image histogram. We need to pass the cv.THRES_OTSU flag in addition to the THRESH-BINARY flag.

import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('messi.jpg',0)
# global thresholding
ret1,img1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
# Otsu's thresholding
ret2,img2 = cv.threshold(img,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
plt.subplot(2,2,1),plt.imshow(img,'gray',vmin=0,vmax=255)
plt.title('Original')
plt.subplot(2,2,2),plt.imshow(img1,'gray')

plt.title('Binary')
plt.subplot(2,2,3),plt.imshow(img2,'gray')
plt.title('OTSU')
plt.show()

Output

matplotlib 的绘图结果如下所示 −

The matplotlib’s plot result is as follows −

image histogram

OpenCV Python - Image Filtering

图像基本上是像素矩阵,表示为 0 到 255 之间的二进制值,对应于灰度值。彩色图像将是一个三维矩阵,具有对应于 RGB 的多个通道。

An image is basically a matrix of pixels represented by binary values between 0 to 255 corresponding to gray values. A color image will be a three dimensional matrix with a number of channels corresponding to RGB.

图像滤波是对像素值进行平均的过程,目的是改变原始图像的色调、亮度、对比度等。

Image filtering is a process of averaging the pixel values so as to alter the shade, brightness, contrast etc. of the original image.

通过应用低通滤波器,我们可以去除图像中的任何噪声。高通滤波器有助于检测边缘。

By applying a low pass filter, we can remove any noise in the image. High pass filters help in detecting the edges.

OpenCV 库提供 cv2.filter2D() 函数。它通过一个大小为 3X3 或 5X5 等的正方形矩阵内核对原始图像进行卷积。

The OpenCV library provides cv2.filter2D() function. It performs convolution of the original image by a kernel of a square matrix of size 3X3 or 5X5 etc.

卷积将一个内核矩阵横向和纵向滑动跨越图像矩阵。对于每个位置,将内核下方的所有像素相加,取内核下方的像素的平均值,并将中心像素替换为平均值。

Convolution slides a kernel matrix across the image matrix horizontally and vertically. For each placement, add all pixels under the kernel, take the average of pixels under the kernel and replace the central pixel with the average value.

对所有像素执行此操作以获取输出图像像素矩阵。参考如下给出的图表 −

Perform this operation for all pixels to obtain the output image pixel matrix. Refer the diagram given below −

pixel matrix

cv2.filter2D() 函数需要输入数组、内核矩阵和输出数组参数。

The cv2.filter2D() function requires input array, kernel matrix and output array parameters.

Example

下图使用此函数获得二维卷积的平均图像结果。相关的程序如下 −

Following figure uses this function to obtain an averaged image as a result of 2D convolution. The program for the same is as follows −

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('opencv_logo_gs.png')
kernel = np.ones((3,3),np.float32)/9
dst = cv.filter2D(img,-1,kernel)
plt.subplot(121),plt.imshow(img),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(dst),plt.title('Convolved')
plt.xticks([]), plt.yticks([])
plt.show()

Output

pixel matrixs

Types of Filtering Function

OpenCV 中的其他类型滤波功能包括 −

Other types of filtering function in OpenCV includes −

  1. BilateralFilter − Reduces unwanted noise keeping edges intact.

  2. BoxFilter − This is an average blurring operation.

  3. GaussianBlur − Eliminates high frequency content such as noise and edges.

  4. MedianBlur − Instead of average, it takes the median of all pixels under the kernel and replaces the central value.

OpenCV Python - Edge Detection

此处的边缘表示图像中对象的边界。OpenCV 有一个 cv2.Canny() 函数,通过实现 Canny 算法识别图像中各种对象的边缘。

An edge here means the boundary of an object in the image. OpenCV has a cv2.Canny() function that identifies the edges of various objects in an image by implementing Canny’s algorithm.

Canny 边缘检测算法由 John Canny 发明。据此,对象的边缘通过执行以下步骤确定 −

Canny edge detection algorithm was developed by John Canny. According to it, object’s edges are determined by performing following steps −

第一步是减少图像中的噪声像素。这通过应用 5X5 高斯滤波器来完成。

First step is to reduce the noisy pixels in the image. This is done by applying 5X5 Gaussian Filter.

第二步涉及查找图像的强度梯度。通过应用 Sobel 算子来滤波第一阶段的平滑图像,以获得水平和垂直方向上的 一 阶导数 (Gx 和 Gy)。

Second step involves finding the intensity gradient of the image. The smooth image of the first stage is filtered by applying the Sobel operator to obtain first order derivatives in horizontal and vertical directions (Gx and Gy).

均方根值给出边缘梯度,导数的反正切比率给出边缘的方向。

The root mean square value gives edge gradient and tan inverse ratio of derivatives gives the direction of edge.

\mathrm{边缘\:梯度\:G\:=\:\sqrt{G_x 2+G_y 2}}

\mathrm{Edge \:gradient\:G\:=\:\sqrt{G_x2+G_y2}}

\mathrm{角度\:\theta\:=\:\tan^{-1}(\frac{G_{y}}{G_{x}})}

\mathrm{Angle\:\theta\:=\:\tan^{-1}(\frac{G_{y}}{G_{x}})}

在获得梯度大小和方向之后,对图像进行全面扫描以移除任何可能不构成边缘的意外像素。

After getting gradient magnitude and direction, a full scan of the image is done to remove any unwanted pixels which may not constitute the edge.

下一步是根据最小值和最大值阈值执行迟滞阈值处理。小于最小值和最大值的强度梯度是非边缘,因此需要丢弃。两者之间的基于其连通性被视为边缘点或非边缘。

Next step is to perform hysteresis thresholding by using minval and maxval thresholds. Intensity gradients less than minval and maxval are non-edges so discarded. Those in between are treated as edge points or non-edges based on their connectivity.

所有这些步骤都通过 OpenCV 的 cv2.Canny() 函数执行,该函数需要输入图像数组和最小值和最大值参数。

All these steps are performed by OpenCV’s cv2.Canny() function which needs the input image array and minval and maxval parameters.

Example

以下是 Canny 边缘检测的示例。程序如下所示:

Here’s the example of canny edge detection. The program for the same is as follows −

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('lena.jpg', 0)
edges = cv.Canny(img,100,200)
plt.subplot(121),plt.imshow(img,cmap = 'gray')
plt.title('Original Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(edges,cmap = 'gray')
plt.title('Edges of original Image'), plt.xticks([]), plt.yticks([])
plt.show()

Output

canny edge

OpenCV Python - Histogram

直方图显示了图像中的强度分布。它在 X 轴上绘制像素值(0 至 255),在 Y 轴上绘制像素数。

Histogram shows the intensity distribution in an image. It plots the pixel values (0 to 255) on X axis and number of pixels on Y axis.

通过使用直方图,可以理解指定图像的对比度、亮度和强度分布。直方图中的柱状条表示 X 轴上值的增量部分。

By using histogram, one can understand the contrast, brightness and intensity distribution of the specified image. The bins in a histogram represent incremental parts of the values on X axis.

在我们的案例中,它是像素值,默认柱状条大小为 1。

In our case, it is the pixel value and the default bin size is one.

在 OpenCV 库中,函数 cv2.calcHist() 根据输入图像计算直方图。函数的命令如下 −

In OpenCV library, the function cv2.calcHist() function which computes the histogram from the input image. The command for the function is as follows −

cv.calcHist(images, channels, mask, histSize, ranges)

Parameters

函数 cv2.calcHist() 的参数如下 −

The cv2.calcHist() function’s parameters are as follows −

  1. images − It is the source image of type uint8 or float32, in square brackets, i.e., "[img]".

  2. channels − It is the index of the channel for which we calculate histogram. For a grayscale image, its value is [0]. For BGR images, you can pass [0], [1] or [2] to calculate the histogram of each channel.

  3. mask − Mask image is given as "None" for full image. For a particular region of image, you have to create a mask image for that and give it as a mask.

  4. histSize − This represents our BIN count.

  5. ranges − Normally, it is [0,256].

Histogram using Matplotlib

直方图图可以通过 Matplotlib 的 pyplot.plot() 函数或通过从 OpenCV 库调用 Polylines() 函数获得。

A histogram plot can be obtained either with the help of Matplotlib’s pyplot.plot() function or by calling Polylines() function from OpenCV library.

Example

通过以下程序计算图像的每个通道的直方图(lena.jpg),并绘制每个通道的强度分布 −

Following program computes histogram for each channel in the image (lena.jpg) and plots the intensity distribution for each channel −

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('lena.jpg')
color = ('b','g','r')
for i,col in enumerate(color):
   hist = cv.calcHist([img],[i],None,[256],[0,256])
   plt.plot(hist, color = col)
   plt.xlim([0,256])
plt.show()

Output

histogram

OpenCV Python - Color Spaces

色彩空间是一个数学模型,描述了如何表示颜色。它在特定的、可测量的、固定的可能颜色和亮度值范围内描述了颜色。

A color space is a mathematical model describing how colours can be represented. It is described in a specific, measurable, and fixed range of possible colors and luminance values.

OpenCV 支持以下知名的色彩空间 −

OpenCV supports following well known color spaces −

  1. RGB Color space − It is an additive color space. A color value is obtained by combination of red, green and blue colour values. Each is represented by a number ranging between 0 to 255.

  2. HSV color space − H, S and V stand for Hue, Saturation and Value. This is an alternative color model to RGB. This model is supposed to be closer to the way a human eye perceives any colour. Hue value is between 0 to 179, whereas S and V numbers are between 0 to 255.

  3. CMYK color space − In contrast to RGB, CMYK is a subtractive color model. The alphabets stand for Cyan, Magenta, Yellow and Black. White light minus red leaves cyan, green subtracted from white leaves magenta, and white minus blue returns yellow. All the values are represented on the scale of 0 to 100 %.

  4. CIELAB color space − The LAB color space has three components which are L for lightness, A color components ranging from Green to Magenta and B for components from Blue to Yellow.

  5. YCrCb color space − Here, Cr stands for R-Y and Cb stands for B-Y. This helps in separation of luminance from chrominance into different channels.

OpenCV 支持使用 cv2.cvtColor() 函数在色域之间转换图像。

OpenCV supports conversion of image between color spaces with the help of cv2.cvtColor() function.

cv2.cvtColor() 函数的命令如下 −

The command for the cv2.cvtColor() function is as follows −

cv.cvtColor(src, code, dst)

Conversion Codes

转换受以下预定义的转换代码控制。

The conversion is governed by following predefined conversion codes.

Sr.No.

Conversion Code & Function

1

cv.COLOR_BGR2BGRA Add alpha channel to RGB or BGR image.

2

cv.COLOR_BGRA2BGR Remove alpha channel from RGB or BGR image.

3

cv.COLOR_BGR2GRAY Convert between RGB/BGR and grayscale.

4

cv.COLOR_BGR2YCrCb Convert RGB/BGR to luma-chroma

5

cv.COLOR_BGR2HSV Convert RGB/BGR to HSV

6

cv.COLOR_BGR2Lab Convert RGB/BGR to CIE Lab

7

cv.COLOR_HSV2BGR Backward conversions HSV to RGB/BGR

Example

以下程序显示了将原始图像从 RGB 色彩空间转换为 HSV 和 Gray 方案的过程 −

Following program shows the conversion of original image with RGB color space to HSV and Gray schemes −

import cv2
img = cv2.imread('messi.jpg')
img1 = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY )
img2 = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Displaying the image
cv2.imshow('original', img)
cv2.imshow('Gray', img1)
cv2.imshow('HSV', img2)

Output

rgb color space
gray color space
hsv color space

OpenCV Python - Morphological Transformations

基于形状图像上的简单操作称为形态变换。最常见的两种转换是 erosion and dilation

Simple operations on an image based on its shape are termed as morphological transformations. The two most common transformations are erosion and dilation.

Erosion

腐蚀会消除前景对象的边界。类似于 2D 卷积,内核滑过图像 A。如果内核下的所有像素均为 1,则保留原始图像中的像素。

Erosion gets rid of the boundaries of the foreground object. Similar to 2D convolution, a kernel is slide across the image A. The pixel in the original image is retained, if all the pixels under the kernel are 1.

否则将其变为 0,从而导致腐蚀。丢弃所有边界附近的像素。此过程对去除白噪声很有用。

Otherwise it is made 0 and thus, it causes erosion. All the pixels near the boundary are discarded. This process is useful for removing white noises.

OpenCV 中 erode() 函数的命令如下 −

The command for the erode() function in OpenCV is as follows −

cv.erode(src, kernel, dst, anchor, iterations)

Parameters

OpenCV 中的 erode() 函数使用以下参数 −

The erode() function in OpenCV uses following parameters −

src 和 dst 参数是大小相同的输入和输出图像数组。Kernel 是用于腐蚀的结构化元素矩阵。例如,3X3 或 5X5。

The src and dst parameters are input and output image arrays of the same size. Kernel is a matrix of structuring elements used for erosion. For example, 3X3 or 5X5.

anchor 参数默认为 -1,这意味着锚元素位于中心。Iterations 指腐蚀应用的次数。

The anchor parameter is -1 by default which means the anchor element is at center. Iterations refers to the number of times erosion is applied.

Dilation

它与腐蚀正好相反。此处,如果内核下的至少一个像素为 1,则像素元素为 1。因此,它增加了图像中的白色区域。

It is just the opposite of erosion. Here, a pixel element is 1, if at least one pixel under the kernel is 1. As a result, it increases the white region in the image.

dilate() 函数的命令如下 −

The command for the dilate() function is as follows −

cv.dilate(src, kernel, dst, anchor, iterations)

Parameters

dilate() 函数具有与 erode() 函数相同参数。这两个函数可以有 BorderType 和 borderValue 这两个其他可选参数。

The dilate() function has the same parameters such as that of erode() function. Both functions can have additional optional parameters as BorderType and borderValue.

BorderType 是图像边界的一种枚举类型(CONSTANT、REPLICATE、TRANSPERANT 等)

BorderType is an enumerated type of image boundaries (CONSTANT, REPLICATE, TRANSPERANT etc.)

borderValue 用于恒定边界的情况。默认情况下,它为 0。

borderValue is used in case of a constant border. By default, it is 0.

Example

下面给出了一个示例程序显示 erode() 和 dilate() 函数的使用 −

Given below is an example program showing erode() and dilate() functions in use −

import cv2 as cv
import numpy as np
img = cv.imread('LinuxLogo.jpg',0)
kernel = np.ones((5,5),np.uint8)
erosion = cv.erode(img,kernel,iterations = 1)
dilation = cv.dilate(img,kernel,iterations = 1)
cv.imshow('Original', img)
cv.imshow('Erosion', erosion)
cv.imshow('Dialation', dilation)

Output

Original Image

Original Image

morphological

Erosion

Erosion

erosion

Dilation

Dilation

dilation

OpenCV Python - Image Contours

轮廓是一条连接所有沿边界具有相同颜色或强度的连续点的曲线。轮廓非常适用于形状分析和对象检测。

Contour is a curve joining all the continuous points along the boundary having the same color or intensity. The contours are very useful for shape analysis and object detection.

Find Contour

在找到轮廓之前,我们应该应用阈值或 Canny 边缘检测。然后,通过使用 findContours() 方法,我们可以在二进制图像中找到轮廓。

Before finding contours, we should apply threshold or canny edge detection. Then, by using findContours() method, we can find the contours in the binary image.

使用 findContours() 函数的命令如下 −

The command for the usage of *findContours()*function is as follows −

cv.findContours(image, mode, method, contours)

Parameters

findContours() 函数的参数如下 −

The parameters of the findContours() function are as follows −

  1. image − Source, an 8-bit single-channel image.

  2. mode − Contour retrieval mode.

  3. method − Contour approximation method.

mode 参数的值枚举如下 −

The mode parameter’s values are enumerated as follows −

  1. cv.RETR_EXTERNAL − Retrieves only the extreme outer contours.

  2. cv.RETR_LIST − Retrieves all of the contours without establishing any hierarchical relationships.

  3. cv.RETR_CCOMP − Retrieves all of the contours and organizes them into a twolevel hierarchy.

  4. cv.RETR_TREE − Retrieves all of the contours and reconstructs a full hierarchy of nested contours.

另一方面,近似方法可以是从以下方法中选取一个 −

On the other hand, approximation method can be one from the following −

  1. cv.CHAIN_APPROX_NONE − Stores absolutely all the contour points.

  2. cv.CHAIN_APPROX_SIMPLE − Compresses horizontal, vertical, and diagonal segments and leaves only their end points.

Draw Contour

在检测到轮廓矢量后,使用 cv.drawContours() 函数在原始图像上绘制轮廓。

After detecting the contour vectors, contours are drawn over the original image by using the cv.drawContours() function.

cv.drawContours()函数的命令如下 −

The command for the cv.drawContours() function is as follows −

cv.drawContours(image, contours, contourIdx, color)

Parameters

drawContours() 函数的参数如下 −

The parameters of the drawContours() function are as follows −

  1. image − Destination image.

  2. contours − All the input contours. Each contour is stored as a point vector.

  3. contourIdx − Parameter indicating a contour to draw. If it is negative, all the contours are drawn.

  4. color − Color of the contours.

Example

以下代码示例中绘制轮廓的输入图像有三个填充有黑色颜色的图形。

Following code is an example of drawing contours on an input image having three shapes filled with black colours.

第一步,我们获取一个灰度图像,然后执行 Canny 边缘检测。

In the first step, we obtain a gray image and then perform the canny edge detection.

然后,我们在生成图像上调用 findContours() 函数。其结果是点矢量。然后,我们调用 drawContours() 函数。

On the resultant image, we then call findContours() function. Its result is a point vector. We then call the drawContours() function.

完整代码如下 −

The complete code is as below −

import cv2
import numpy as np

img = cv2.imread('shapes.png')
cv2.imshow('Original', img)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

canny = cv2.Canny(gray, 30, 200)

contours, hierarchy = cv2.findContours(canny,
   cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
print("Number of Contours = " ,len(contours))
cv2.imshow('Canny Edges', canny)

cv2.drawContours(img, contours, -1, (0, 255, 0), 3)

cv2.imshow('Contours', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Output

原始图像在 Canny 边缘检测之后,以及绘制轮廓图像将显示在单独的窗口中,如下所示 −

The original image, after canny edge detection and one with contours drawn will be displayed in separate windows as shown here −

separate windows

执行 canny edge detection 操作后,图像如下 −

After the canny edge detection, the image will be as follows −

canny edge detection

contours are drawn 后,图像如下所示 −

After the contours are drawn, the image will be as follows −

contours drawn

OpenCV Python - Template Matching

模板匹配技术用于检测图像中与样本或模板图像匹配的一个或多个区域。

The technique of template matching is used to detect one or more areas in an image that matches with a sample or template image.

Cv.matchTemplate() 函数在 OpenCV 中定义用作此目的,对该函数的命令如下所示:

Cv.matchTemplate() function in OpenCV is defined for the purpose and the command for the same is as follows:

cv.matchTemplate(image, templ, method)

其中 image 为输入图像,其中要查找 templ(模板)模式。method 参数采用以下值之一 −

Where image is the input image in which the templ (template) pattern is to be located. The method parameter takes one of the following values −

  1. cv.TM_CCOEFF,

  2. cv.TM_CCOEFF_NORMED, cv.TM_CCORR,

  3. cv.TM_CCORR_NORMED,

  4. cv.TM_SQDIFF,

  5. cv.TM_SQDIFF_NORMED

此方法使模板图像在输入图像上滑动。这与卷积的过程相似,并将输入图像下的模板和补丁与模板图像进行比较。

This method slides the template image over the input image. This is a similar process to convolution and compares the template and patch of input image under the template image.

它返回一个灰度图像,其中每个像素都表示它与模板的匹配程度。如果输入图像大小为 (WxH),并且模板图像大小为 (wxh),则输出图像的大小将为 (W-w+1, H-h+1)。因此,该矩形是模板的区域。

It returns a grayscale image, whose each pixel denotes how much it matches with the template. If the input image is of size (WxH) and template image is of size (wxh), the output image will have a size of (W-w+1, H-h+1). Hence, that rectangle is your region of template.

Example

在下面的示例中,将印度板球队队员 Virat Kohli 的脸部图像用作模板,与另一张描绘他和另一位印度板球队队员 M.S.Dhoni 合影的图像进行匹配。

In an example below, an image having Indian cricketer Virat Kohli’s face is used as a template to be matched with another image which depicts his photograph with another Indian cricketer M.S.Dhoni.

以下程序使用 80% 的阈值并围绕匹配的脸部绘制一个矩形 −

Following program uses a threshold value of 80% and draws a rectangle around the matching face −

import cv2
import numpy as np

img = cv2.imread('Dhoni-and-Virat.jpg',1)
cv2.imshow('Original',img)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

template = cv2.imread('virat.jpg',0)
cv2.imshow('Template',template)
w,h = template.shape[0], template.shape[1]

matched = cv2.matchTemplate(gray,template,cv2.TM_CCOEFF_NORMED)
threshold = 0.8

loc = np.where( matched >= threshold)

for pt in zip(*loc[::-1]):
   cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0,255,255), 2)

cv2.imshow('Matched with Template',img)

Output

结果的原图、模板和匹配图像如下 −

The original image, the template and matched image of the result as follows −

Original image

template matching

template 如下所示 −

The template is as follows −

templates

matched with template 时,图像如下 −

The image when matched with template is as follows −

matched templates

OpenCV Python - Image Pyramids

有时,我们可能需要将图像转换为不同于其原始大小的大小。为此,您可以将图像放大(放大)或缩小(缩小)。

Occasionally, we may need to convert an image to a size different than its original. For this, you either Upsize the image (zoom in) or Downsize it (zoom out).

图像金字塔是一组图像(由单一原始图像构建),连续向下采样指定次数。

An image pyramid is a collection of images (constructed from a single original image) successively down sampled a specified number of times.

高斯金字塔用于对图像进行下采样,而拉普拉斯金字塔则使用分辨率较低的图像金字塔中的一张图像重建一张上采样的图像。

The Gaussian pyramid is used to down sample images while the Laplacian pyramid reconstructs an up sampled image from an image lower in the pyramid with less resolution.

将金字塔视为一组图层。图像如下所示 −

Consider the pyramid as a set of layers. The image is shown below −

pyramid

金字塔较高层中的图像尺寸较小。要制作高斯金字塔中下一层的图像,我们将较低水平的图像与高斯核进行卷积。

Image at the higher layer of the pyramid is smaller in size. To produce an image at the next layer in the Gaussian pyramid, we convolve a lower level image with a Gaussian kernel.

\frac{1}{16}\begin{bmatrix}1 & 4 & 6 & 4 & 1 \\4 & 16 & 24 & 16 & 4 \\6 & 24 & 36 & 24 & 6 \\4 & 16 & 24 & 16 & 4 \\1 & 4 & 6 & 4 & 1\end{bmatrix}

现在移除所有偶数行的行和列。得到的图像将是其前身面积的 1/4。对原始图像进行此过程的迭代就会产生整个金字塔。

Now remove every even-numbered row and column. Resulting image will be 1/4th the area of its predecessor. Iterating this process on the original image produces the entire pyramid.

为了使图像变大,需要使用零填充列。首先,将图像放大至每个维度都为原来的两倍,获得新的偶数行,然后使用核进行卷积以近似缺失像素的值。

To make the images bigger, the columns filled with zeros. First, upsize the image to double the original in each dimension, with the new even rows and then perform a convolution with the kernel to approximate the values of the missing pixels.

cv.pyrUp() 函数会使原始尺寸加倍; cv.pyrDown() 函数会使原始尺寸减半。

The cv.pyrUp() function doubles the original size and cv.pyrDown() function decreases it to half.

Example

以下程序根据用户输入的“I”或“o”分别调用 pyrUp() 和 pyrDown() 函数。

Following program calls pyrUp() and pyrDown() functions depending on user input “I” or “o” respectively.

请注意,当我们减小图像尺寸时,图像的信息就会丢失。一旦缩小尺寸,再重新缩放回原始尺寸时,我们将丢失部分信息,且新图像的分辨率远低于原始图像。

Note that when we reduce the size of an image, information of the image is lost. Once, we scale down and if we rescale it to the original size, we lose some information and the resolution of the new image is much lower than the original one.

import sys
import cv2 as cv

filename = 'chicky_512.png'

src = cv.imread(filename)

while 1:
   print ("press 'i' for zoom in 'o' for zoom out esc to stop")
   rows, cols, _channels = map(int, src.shape)
   cv.imshow('Pyramids', src)
   k = cv.waitKey(0)

   if k == 27:
      break

   elif chr(k) == 'i':
      src = cv.pyrUp(src, dstsize=(2 * cols, 2 * rows))

   elif chr(k) == 'o':
      src = cv.pyrDown(src, dstsize=(cols // 2, rows // 2))

cv.destroyAllWindows()

Output

image pyramids
gaussian pyramids
laplacian pyramids

OpenCV Python - Image Addition

读取图像的图像对象本质上是一个二维或三维矩阵,具体取决于图像是否为灰度图像或 RGB 图像。

When an image is read by imread() function, the resultant image object is really a two or three dimensional matrix depending upon if the image is grayscale or RGB image.

因此, cv2.add() 函数将两个图像矩阵相加,并返回另一个图像矩阵。

Hence, cv2.add() functions add two image matrices and returns another image matrix.

Example

以下代码读取两张图像并执行其二进制加法:

Following code reads two images and performs their binary addition −

kalam = cv2.imread('kalam.jpg')
einst = cv2.imread('einstein.jpg')
img = cv2.add(kalam, einst)
cv2.imshow('addition', img)

Result

image addition

OpenCV 有一个 addWeighted() 函数来执行两个数组的加权和,而不是线性二进制加法。对应的命令如下:

Instead of a linear binary addition, OpenCV has a addWeighted() function that performs weighted sum of two arrays. The command for the same is as follows

Cv2.addWeighted(src1, alpha, src2, beta, gamma)

Parameters

addWeighted() 函数的参数如下:

The parameters of the addWeighted() function are as follows −

  1. src1 − First input array.

  2. alpha − Weight of the first array elements.

  3. src2 − Second input array of the same size and channel number as first

  4. beta − Weight of the second array elements.

  5. gamma − Scalar added to each sum.

此函数根据以下方程式将图像相加:

This function adds the images as per following equation −

\mathrm{g(x)=(1-\alpha)f_{0}(x)+\alpha f_{1}(x)}

在上述示例中获得的图像矩阵用于执行加权和。

The image matrices obtained in the above example are used to perform weighted sum.

通过将 a 从 0 更改到 1,可以平滑地从一张图像过渡到另一张图像,以便它们融合在一起。

By varying a from 0 → 1, a smooth transition takes place from one image to another, so that they blend together.

第一张图像的权重为 0.3,第二张图像的权重为 0.7。将余弦因子设置为 0.

First image is given a weight of 0.3 and the second image is given 0.7. The gamma factor is taken as 0.

addWeighted() 函数的命令如下:

The command for addWeighted() function is as follows −

img = cv2.addWeighted(kalam, 0.3, einst, 0.7, 0)

可以看出,与二进制加法相比,图像加法更加平滑。

It can be seen that the image addition is smoother compared to binary addition.

gamma factor

OpenCV Python - Image Blending with Pyramids

通过使用图像金字塔可以最大程度减少图像的不连续性。这将产生无缝融合的图像。

The discontinuity of images can be minimised by the use of image pyramids. This results in a seamless blended image.

采取以下步骤来实现最终结果 −

Following steps are taken to achieve the final result −

首先加载图像并为两幅图像寻找高斯金字塔。以下是执行此操作的程序 -

First load the images and find Gaussian pyramids for both. The program for the same is as follows −

import cv2
import numpy as np,sys

kalam = cv2.imread('kalam.jpg')
einst = cv2.imread('einstein.jpg')
### generate Gaussian pyramid for first
G = kalam.copy()
gpk = [G]
for i in range(6):
   G = cv2.pyrDown(G)
   gpk.append(G)
# generate Gaussian pyramid for second
G = einst.copy()
gpe = [G]
for i in range(6):
   G = cv2.pyrDown(G)
   gpe.append(G)

从高斯金字塔获取相应的拉普拉斯金字塔。以下是执行此操作的程序 -

From the Gaussian pyramids, obtain the respective Laplacian Pyramids. The program for the same is as follows −

# generate Laplacian Pyramid for first
lpk = [gpk[5]]
for i in range(5,0,-1):
   GE = cv2.pyrUp(gpk[i])
   L = cv2.subtract(gpk[i-1],GE)
   lpk.append(L)

# generate Laplacian Pyramid for second
lpe = [gpe[5]]
for i in range(5,0,-1):
   GE = cv2.pyrUp(gpe[i])
   L = cv2.subtract(gpe[i-1],GE)
   lpe.append(L)

然后,在金字塔中的每个层中将第一张图像的左半部分与第二张图像的右半部分结合在一起。因此,该程序如下所示 −

Then, join the left half of the first image with the right half of second in each level of pyramids. The program for the same is as follows −

# Now add left and right halves of images in each level
LS = []
for la,lb in zip(lpk,lpe):
   rows,cols,dpt = la.shape
   ls = np.hstack((la[:,0:int(cols/2)], lb[:,int(cols/2):]))
   LS.append(ls)

最后,从这个联合金字塔中重建图像。因此,该程序如下所示 −

Finally, reconstruct the image from this joint pyramid. The program for the same is given below −

ls_ = LS[0]
for i in range(1,6):
   ls_ = cv2.pyrUp(ls_)
   ls_ = cv2.add(ls_, LS[i])
   cv2.imshow('RESULT',ls_)

Output

混合后的结果应如下所示 −

The blended result should be as follows −

blending pyramids

OpenCV Python - Fourier Transform

傅里叶变换通过将其分解为正弦分量和余弦分量来将图像从其空间域转换到其频域。

The Fourier Transform is used to transform an image from its spatial domain to its frequency domain by decomposing it into its sinus and cosines components.

在数字图像的情况下,基本的灰度图像值通常在 0 到 255 之间。因此,傅里叶变换也需要是 Discrete Fourier Transform (DFT) 。它用于找到频域。

In case of digital images, a basic gray scale image values usually are between zero and 255. Therefore, the Fourier Transform too needs to be a Discrete Fourier Transform (DFT). It is used to find the frequency domain.

从数学上讲,二维图像的傅里叶变换表示如下 −

Mathematically, Fourier Transform of a two dimensional image is represented as follows −

\mathrm{F(k,l)=\displaystyle\sum\limits_{i=0}^{N-1}\: \displaystyle\sum\limits_{j=0}^{N-1} f(i,j)\:e^{-i2\pi (\frac{ki}{N},\frac{lj}{N})}}

如果幅值在短时间内变化很快,你可以说它是一个高频信号。如果变化缓慢,则它是一个低频信号。

If the amplitude varies so fast in a short time, you can say it is a high frequency signal. If it varies slowly, it is a low frequency signal.

在图像的情况下,幅值在边缘点或噪声处发生急剧变化。因此,边缘和噪声是图像中的高频内容。如果幅值没有太大变化,则它是一个低频分量。

In case of images, the amplitude varies drastically at the edge points, or noises. So edges and noises are high frequency contents in an image. If there are no much changes in amplitude, it is a low frequency component.

OpenCV 为此目的提供了 cv.dft()cv.idft() 函数。

OpenCV provides the functions cv.dft() and cv.idft() for this purpose.

cv.dft() 执行一维或二维浮点数组的离散傅里叶变换。命令如下所示 −

cv.dft() performs a Discrete Fourier transform of a 1D or 2D floating-point array. The command for the same is as follows −

cv.dft(src, dst, flags)

在此,

Here,

  1. src − Input array that could be real or complex.

  2. dst − Output array whose size and type depends on the flags.

  3. flags − Transformation flags, representing a combination of the DftFlags.

cv.idft() 计算一维或二维数组的离散傅里叶逆变换。命令如下所示 −

cv.idft() calculates the inverse Discrete Fourier Transform of a 1D or 2D array. The command for the same is as follows −

cv.idft(src, dst, flags)

为了获得离散傅里叶变换,将输入图像转换为 np.float32 数据类型。然后使用获得的变换将零频率分量移到频谱的中心,从中计算幅度谱。

In order to obtain a discrete fourier transform, the input image is converted to np.float32 datatype. The transform obtained is then used to Shift the zero-frequency component to the center of the spectrum, from which magnitude spectrum is calculated.

Example

下面是使用 Matplotlib 的程序,我们绘制原始图像和幅度谱 −

Given below is the program using Matplotlib, we plot the original image and magnitude spectrum −

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('lena.jpg',0)
dft = cv.dft(np.float32(img),flags = cv.DFT_COMPLEX_OUTPUT)
dft_shift = np.fft.fftshift(dft)
magnitude_spectrum = 20*np.log(cv.magnitude(dft_shift[:,:,0],dft_shift[:,:,1]))
plt.subplot(121),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(magnitude_spectrum, cmap = 'gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])
plt.show()

Output

fourier transform

OpenCV Python - Capture Video from Camera

通过在 OpenCV 库中使用 VideoCapture() 函数,可以非常轻松地从相机在 OpenCV 窗口上实时捕捉视频流。

By using the VideoCapture() function in OpenCV library, it is very easy to capture a live stream from a camera on the OpenCV window.

此函数需要设备索引作为参数。你的电脑可能连接了多个相机。它们从内置网络摄像头开始按索引编号排列。此函数返回一个 VideoCapture 对象。

This function needs a device index as the parameter. Your computer may have multiple cameras attached. They are enumerated by an index starting from 0 for built-in webcam. The function returns a VideoCapture object

cam = cv.VideoCapture(0)

打开摄像头后,我们可以借助 read() 函数从摄像头连续读取帧。

After the camera is opened, we can read successive frames from it with the help of read() function

ret,frame = cam.read()

read() 函数读取下一个可用的帧,并返回一个值(True/False)。此帧现在呈现为 cvtColor() 函数所需的色彩空间,并在 OpenCV 窗口中显示。

The read() function reads the next available frame and a return value (True/False). This frame is now rendered in desired color space with the cvtColor() function and displayed on the OpenCV window.

img = cv.cvtColor(frame, cv.COLOR_BGR2RGB)
# Display the resulting frame
cv.imshow('frame', img)

你可以使用 imwrite() 函数将当前帧捕捉到图像文件中。

To capture the current frame to an image file, you can use imwrite() function.

cv2.imwrite(“capture.png”, img)

OpenCV 提供了 VideoWriter() 函数来将摄像头实时流保存到视频文件中。

To save the live stream from camera to a video file, OpenCV provides a VideoWriter() function.

cv.VideoWriter( filename, fourcc, fps, frameSize)

fourcc 参数是视频编解码器的标准化代码。OpenCV 支持各种编解码器,如 DIVX、XVID、MJPG、X264 等。fps 和 framesize 参数取决于视频采集设备。

The fourcc parameter is a standardized code for video codecs. OpenCV supports various codecs such as DIVX, XVID, MJPG, X264 etc. The fps anf framesize parameters depend on the video capture device.

VideoWriter() 函数返回一个 VideoWrite 流对象,捕获的帧会连续写入其中,形成一个循环。最后,释放帧和 VideoWriter 对象,以便最终完成视频的创建。

The VideoWriter() function returns a VideoWrite stream object, to which the grabbed frames are successively written in a loop. Finally, release the frame and VideoWriter objects to finalize the creation of video.

Example

以下示例会从内置网络摄像头读取实时视频,并将其保存到 ouput.avi 文件中。

Following example reads live feed from built-in webcam and saves it to ouput.avi file.

import cv2 as cv
cam = cv.VideoCapture(0)
cc = cv.VideoWriter_fourcc(*'XVID')
file = cv.VideoWriter('output.avi', cc, 15.0, (640, 480))
if not cam.isOpened():
   print("error opening camera")
   exit()
while True:
   # Capture frame-by-frame
   ret, frame = cam.read()
   # if frame is read correctly ret is True
   if not ret:
      print("error in retrieving frame")
      break
   img = cv.cvtColor(frame, cv.COLOR_BGR2RGB)
   cv.imshow('frame', img)
   file.write(img)


   if cv.waitKey(1) == ord('q'):
      break

cam.release()
file.release()
cv.destroyAllWindows()

OpenCV Python - Play Video from File

VideoCapture() 函数也可以从视频文件中而不是摄像机中提取帧。因此,我们只需要把摄像机索引替换为在 OpenCV 窗口中播放的视频文件的名称。

The VideoCapture() function can also retrieve frames from a video file instead of a camera. Hence, we have only replaced the camera index with the video file’s name to be played on the OpenCV window.

video=cv2.VideoCapture(file)

虽然这对于渲染视频文件来说已足够,但如果它带有音频,音频不会一同播放。为此,你需要安装 ffpyplayer 模块。

While this should be enough to start rendering a video file, if it is accompanied by sound. The sound will not play along. For this purpose, you will need to install the ffpyplayer module.

FFPyPlayer

FFPyPlayer 是 FFmpeg 库的 Python 绑定,可用于播放和编写媒体文件。要安装,请使用 pip 安装器实用程序,并使用以下命令。

FFPyPlayer is a python binding for the FFmpeg library for playing and writing media files. To install, use pip installer utility by using the following command.

pip3 install ffpyplayer

该模块中 MediaPlayer 对象的 get_frame() 方法返回一个音频帧,它将与从视频文件中读取的每个帧一起播放。

The get_frame() method of the MediaPlayer object in this module returns the audio frame which will play along with each frame read from the video file.

以下是播放视频文件及其音频的完整代码 −

Following is the complete code for playing a video file along with its audio −

import cv2

from ffpyplayer.player import MediaPlayer
file="video.mp4"

video=cv2.VideoCapture(file)
player = MediaPlayer(file)
while True:
   ret, frame=video.read()
   audio_frame, val = player.get_frame()
   if not ret:
      print("End of video")
      break
   if cv2.waitKey(1) == ord("q"):
      break
   cv2.imshow("Video", frame)
   if val != 'eof' and audio_frame is not None:
      #audio
      img, t = audio_frame
video.release()
cv2.destroyAllWindows()

OpenCV Python - Extract Images from Video

视频只不过是由一系列帧组成的,且每个帧都是一幅图像。使用 OpenCV 时,可以通过执行 imwrite() 函数直至视频结束来提取组成视频文件的所有帧。

A video is nothing but a sequence of frames and each frame is an image. By using OpenCV, all the frames that compose a video file can be extracted by executing imwrite() function till the end of video.

cv2.read() 函数将返回下一个可用的帧。该函数还会提供一个返回值,该值在流结束前一直保持为 true。在此处,循环内部将对计数器递增,并将其用作文件名。

The cv2.read() function returns the next available frame. The function also gives a return value which continues to be true till the end of stream. Here, a counter is incremented inside the loop and used as a file name.

以下程序演示如何从视频中提取图像:

Following program demonstrates how to extract images from the video −

import cv2
import os

cam = cv2.VideoCapture("video.avi")

frameno = 0
while(True):
   ret,frame = cam.read()
   if ret:
      # if video is still left continue creating images
      name = str(frameno) + '.jpg'
      print ('new frame captured...' + name)

      cv2.imwrite(name, frame)
      frameno += 1
   else:
      break

cam.release()
cv2.destroyAllWindows()

OpenCV Python - Video from Images

在上一章中,我们使用 VideoWriter() 函数将相机的实时流保存为视频文件。为了将多张图像拼接成一个视频,我们应使用该函数。

In the previous chapter, we have used the VideoWriter() function to save the live stream from a camera as a video file. To stitch multiple images into a video, we shall use the same function.

首先,确保所有必需的图像都放在一个文件夹中。内置的 glob 模块中的 Python 的 glob() 函数构建一个图像数组,以便我们可以对其进行迭代。

First, ensure that all the required images are in a folder. Python’s glob() function in the built-in glob module builds an array of images so that we can iterate through it.

从文件夹中的图像中读取图像对象,并添加到图像数组。

Read the image object from the images in the folder and append to an image array.

以下程序说明如何将多个图像拼接成一个视频。

Following program explains how to stitch multiple images in a video −

import cv2
import numpy as np
import glob

img_array = []
for filename in glob.glob('*.png'):
   img = cv2.imread(filename)
   height, width, layers = img.shape
   size = (width,height)
   img_array.append(img)

通过使用 VideoWriter() 函数创建一个视频流,将图像数组的内容写入到其中。下面给出了该程序。

The create a video stream by using VideoWriter() function to write the contents of the image array to it. Given below is the program for the same.

out = cv2.VideoWriter('video.avi',cv2.VideoWriter_fourcc(*'DIVX'), 15, size)

for i in range(len(img_array)):
   out.write(img_array[i])
out.release()

您应该在当前文件夹中找到名为 ‘video.avi’ 的文件。

You should find the file named ‘video.avi’ in the current folder.

OpenCV Python - Face Detection

OpenCV 使用 Haar 基于特征的级联分类器进行对象检测。它是一种基于机器学习的算法,级联函数由大量正负图像训练。然后,它用于检测其他图像中的对象。该算法使用了级联分类器的概念。

OpenCV uses Haar feature-based cascade classifiers for the object detection. It is a machine learning based algorithm, where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images. The algorithm uses the concept of Cascade of Classifiers.

人脸、眼睛等预训练分类器可从 https://github.com 下载

Pretrained classifiers for face, eye etc. can be downloaded from https://github.com

对于以下示例,请从该 URL 下载并 copy haarcascade_frontalface_default.xmlhaarcascade_eye.xml 。然后,加载将用于灰度模式人脸检测的输入图像。

For the following example, download and copy haarcascade_frontalface_default.xml and haarcascade_eye.xml from this URL. Then, load our input image to be used for face detection in grayscale mode.

CascadeClassifier 类的 DetectMultiScale() 方法检测输入图像中的对象。它以矩形形式返回检测到的人脸的位置及其尺寸 (x,y,w,h)。一旦获得这些位置,我们就可以将其用于眼睛检测,因为眼睛始终在人脸上!

The DetectMultiScale() method of CascadeClassifier class detects objects in the input image. It returns the positions of detected faces as in the form of Rectangle and its dimensions (x,y,w,h). Once we get these locations, we can use it for eye detection since eyes are always on the face!

Example

人脸检测的完整代码如下 −

The complete code for face detection is as follows −

import numpy as np
import cv2

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')

img = cv2.imread('Dhoni-and-virat.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in faces:
   img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
   roi_gray = gray[y:y+h, x:x+w]
   roi_color = img[y:y+h, x:x+w]
   eyes = eye_cascade.detectMultiScale(roi_gray)
   for (ex,ey,ew,eh) in eyes:
      cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)

cv2.imshow('img',img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Output

您将在输入图像中看到围绕人脸绘制的矩形,如下所示 −

You will get rectangles drawn around faces in the input image as shown below −

face detection

OpenCV Python - Meanshift and Camshift

在本章中,我们一起来了解一下 OpenCV-Python 中的均值漂移和 CAMShift。首先,我们来了解一下什么是均值漂移。

In this chapter, let us learn about the meanshift and the camshift in the OpenCV-Python. First, let us understand what is meanshift.

Meanshift

均值漂移算法可以识别数据集中数据点高度集中或集群的位置。该算法会在每个数据点放置一个内核并对它们求和,以做出一个 Kernel Density Estimation (KDE)。

The mean shift algorithm identifies places in the data set with a high concentration of data points, or clusters. The algorithm places a kernel at each data point and sums them together to make a Kernel Density Estimation (KDE).

KDE 会有高和低数据点密度的区域,具有相应的值。均值漂移是一种很有用的方法,可以用来跟踪视频中某个指定对象。

The KDE will have places with a high and low data point density, respectfully. Meanshift is a very useful method to keep the track of a particular object inside a video.

视频中的每个实例都会以该帧的像素分布形式被检查。初始窗口(即目标区域 (ROI) )通常是正方形或圆形。为此,会通过硬编码指定位置,并标识最高像素分布区域。

Every instance of the video is checked in the form of pixel distribution in that frame. An initial window as region of interest (ROI) is generally a square or a circle. For this, the positions are specified by hardcoding and the area of maximum pixel distribution is identified.

随着视频播放,ROI 窗口会向着最高像素分布区域移动。移动方向取决于我们的跟踪窗口中心与该窗口内所有 k 像素的质心之间的差值。

The ROI window moves towards the region of maximum pixel distribution as the video runs. The direction of movement depends upon the difference between the center of our tracking window and the centroid of all the k-pixels inside that window.

要在 OpenCV 中使用均值漂移,首先要查找我们目标的直方图(其中仅考虑色调),然后可以针对每个帧 反投影其目标,以计算均值漂移。我们还需要提供 ROI 窗口的初始位置。

In order to use Meanshift in OpenCV, first, find the histogram (of which, only Hue is considered) of our target and can back project its target on each frame for calculation of Meanshift. We also need to provide an initial location of the ROI window.

我们反复计算直方图的反投影,并计算均值漂移以获得跟踪窗口的新位置。随后,我们会使用它的尺寸在帧上绘制一个矩形。

We repeatedly calculate the back projection of the histogram and calculate the Meanshift to get the new position of track window. Later on, we draw a rectangle using its dimensions on the frame.

Functions

程序中使用的 OpenCV 函数包括:

The openCV functions used in the program are −

  1. cv.calcBackProject() − Calculates the back projection of a histogram.

  2. cv.meanShift() − Back projection of the object histogram using initial search window and Stop criteria for the iterative search algorithm.

Example

以下是均值漂移的示例程序:

Here is the example program of Meanshift −

import numpy as np
import cv2 as cv

cap = cv.VideoCapture('traffic.mp4')

ret,frame = cap.read()

# dimensions of initial location of window
x, y, w, h = 300, 200, 100, 50
tracker = (x, y, w, h)

region = frame[y:y+h, x:x+w]
hsv_reg = cv.cvtColor(region, cv.COLOR_BGR2HSV)
mask = cv.inRange(hsv_reg, np.array((0., 60.,32.)), np.array((180.,255.,255.)))
reg_hist = cv.calcHist([hsv_reg],[0],mask,[180],[0,180])
cv.normalize(reg_hist,reg_hist,0,255,cv.NORM_MINMAX)

# Setup the termination criteria
criteria = ( cv.TERM_CRITERIA_EPS | cv.TERM_CRITERIA_COUNT, 10, 1 )

while(1):
   ret, frame = cap.read()

   if ret == True:
      hsv = cv.cvtColor(frame, cv.COLOR_BGR2HSV)
      dst = cv.calcBackProject([hsv],[0],reg_hist,[0,180],1)

      # apply meanshift
      ret, tracker = cv.meanShift(dst, tracker, criteria)

      # Draw it on image
      x,y,w,h = tracker
      img = cv.rectangle(frame, (x,y), (x+w,y+h), 255,2)
      cv.imshow('img',img)

      k = cv.waitKey(30) & 0xff
      if k==115:
         cv.imwrite('capture.png', img)
      if k == 27:
         break

在运行程序时,均值漂移算法会将我们的窗口移动到密度最大的新位置。

As the program is run, the Meanshift algorithm moves our window to the new location with maximum density.

Output

以下为移动窗口的快照:

Here’s a snapshot of moving window −

meanshift

Camshift

Meanshift 算法的一个缺点是,跟踪窗口的大小保持不变,与对象到摄像机的距离无关。此外,只有当窗口处于该对象区域内时,窗口才会跟踪该对象。因此,我们必须手动对窗口进行硬编码,并且应该小心进行。

One of the disadvantages of Meanshift algorithm is that the size of the tracking window remains the same irrespective of the object’s distance from the camera. Also, the window will track the object only if it is in the region of that object. So, we must do manual hardcoding of the window and it should be done carefully.

CAMshift(代表 Continuously Adaptive Meanshift )给出了这些问题的解决方案。一旦 meanshift 收敛,Camshift 算法就会更新窗口的大小,使得跟踪窗口的大小可能会改变,甚至旋转以更好地与跟踪对象的动作相关联。

The solution to these problems is given by CAMshift (stands for Continuously Adaptive Meanshift). Once meanshift converges, the Camshift algorithm updates the size of the window such that the tracking window may change in size or even rotate to better correlate to the movements of the tracked object.

在以下代码中,使用了 camshift() 函数,而不是 meanshift() 函数。

In the following code, instead of meanshift() function, the camshift() function is used.

首先,它使用 meanShift 查找对象中心,然后调整窗口大小并找到最佳旋转。该函数返回对象的位置、大小和方向。使用 polylines() 绘制函数在帧上绘制位置。

First, it finds an object center using meanShift and then adjusts the window size and finds the optimal rotation. The function returns the object position, size, and orientation. The position is drawn on the frame by using polylines() draw function.

Example

在较早程序中,使用 CamShift() 函数,而不是 Meanshift() 函数,如下所示:

Instead of Meanshift() function in earlier program, use CamShift() function as below −

# apply camshift
ret, tracker = cv.CamShift(dst, tracker, criteria)
pts = cv.boxPoints(ret)
pts = np.int0(pts)
img = cv.polylines(frame,[pts],True, 255,2)
cv.imshow('img',img)

Output

以下显示跟踪窗口旋转矩形的修改后的程序的结果快照:

One snapshot of the result of modified program showing rotated rectangle of the tracking window is as follows −

camshift

OpenCV Python - Feature Detection

在图像处理中,特征是图像关键区域的数学描述。它们是图像视觉内容的矢量表示。

In the context of image processing, features are mathematical representations of key areas in an image. They are the vector representations of the visual content from an image.

特征使得能够对它们进行数学运算。各种计算机视觉应用包括目标检测、运动估计、分割、图像对齐等。

Features make it possible to perform mathematical operations on them. Various computer vision applications include object detection, motion estimation, segmentation, image alignment etc.

任何图像中的突出特征包括边缘、角或图像的部分。OpenCV 支持 Haris corner detectionShi-Tomasi corner detection 算法。OpenCV 库还提供用于实现 SIFT (尺度不变特征变换)、 SURF (加速稳健特征)和角检测快速算法的功能。

Prominent features in any image include edges, corners or parts of an image. OpenCV supports Haris corner detection and Shi-Tomasi corner detection algorithms. OpenCV library also provides functionality to implement SIFT (Scale-Invariant Feature Transform), SURF(Speeded-Up Robust Features) and FAST algorithm for corner detection.

Harris 和 Shi-Tomasi 算法是旋转不变的。即使图像旋转,我们也可以找到相同的角。但是当图像被放大时,图像中的某个角可能不再是角。下图描述了这一点。

Harris and Shi-Tomasi algorithms are rotation-invariant. Even if the image is rotated, we can find the same corners. But when an image is scaled up, a corner may not be a corner if the image. The figure given below depicts the same.

shi tomasi

D.Lowe 的新算法 Scale Invariant Feature Transform (SIFT)提取关键点并计算其描述符。

D.Lowe’s new algorithm, Scale Invariant Feature Transform (SIFT) extracts the key points and computes its descriptors.

这是通过以下步骤实现的 −

This is achieved by following steps −

  1. Scale-space Extrema Detection.

  2. Keypoint Localization.

  3. Orientation Assignment.

  4. Keypoint Descriptor.

  5. Keypoint Matching.

至于 OpenCV 中 SIFT 的实现,它从加载图像并将其转换为灰度开始。 cv.SHIFT_create() 函数创建 SIFT 对象。

As far as implementation of SIFT in OpenCV is concerned, it starts from loading an image and converting it into grayscale. The cv.SHIFT_create() function creates a SIFT object.

Example

调用其 detect() 方法可获得绘制在原始图像顶部的关键点。以下代码实现了此过程

Calling its detect() method obtains key points which are drawn on top of the original image. Following code implements this procedure

import numpy as np
import cv2 as cv
img = cv.imread('home.jpg')
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)
sift = cv.SIFT_create()
kp = sift.detect(gray,None)
img=cv.drawKeypoints(gray,kp,img)
cv.imwrite('keypoints.jpg',img)

Output

原始图像及其绘制有关键点的图像如下所示 −

The original image and the one with keypoints drawn are shown below −

这是一个 original image

This is an original image.

scale space

给出的图像如下所示: one with keypoints

An image given below is the one with keypoints

sift

OpenCV Python - Feature Matching

OpenCV 为特征匹配提供了两种技术。蛮力匹配和 FLANN 匹配器技术。

OpenCV provides two techniques for feature matching. Brute force matching and FLANN matcher technique.

Example

以下示例使用蛮力方法

Following example uses brute-force method

import numpy as np
import cv2

img1 = cv2.imread('lena.jpg')
img2 = cv2.imread('lena-test.jpg')

# Convert it to grayscale
img1_bw = cv2.cvtColor(img1,cv2.COLOR_BGR2GRAY)
img2_bw = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)

orb = cv2.ORB_create()

queryKeypoints, queryDescriptors = orb.detectAndCompute(img1_bw,None)
trainKeypoints, trainDescriptors = orb.detectAndCompute(img2_bw,None)

matcher = cv2.BFMatcher()
matches = matcher.match(queryDescriptors,trainDescriptors)

img = cv2.drawMatches(img1, queryKeypoints,
img2, trainKeypoints, matches[:20],None)

img = cv2.resize(img, (1000,650))

cv2.imshow("Feature Match", img)

Output

feature matching

OpenCV Python - Digit Recognition with KNN

KNN 代表 K-Nearest Neighbour ,是一种基于监督学习的机器学习算法。它试图将新数据点放入与可用类别最相似的类别中。所有可用数据都分类为不同的类别,并根据相似性将新数据点放入其中一个类别。

KNN which stands for K-Nearest Neighbour is a Machine Learning algorithm based on Supervised Learning. It tries to put a new data point into the category that is most similar to the available categories. All the available data is classified into distinct categories and a new data point is put in one of them based on the similarity.

KNN 算法遵循以下原理:

The KNN algorithm works on following principle −

  1. Choose preferably an odd number as K for the number of neighbours to be checked.

  2. Calculate their Euclidean distance.

  3. Take the K nearest neighbors as per the calculated Euclidean distance.

  4. count the number of the data points in each category.

  5. Category with maximum data points is the category in which the new data point is classified.

作为使用 OpenCV 实现 KNN 算法的示例,我们将使用以下包含 5000 幅手写数字图像(每幅图像像素为 20X20)的 digits.png。

As an example of implementation of KNN algorithm using OpenCV, we shall use the following image digits.png consisting of 5000 images of handwritten digits, each of 20X20 pixels.

knn

第一个任务是将该图像分成 5000 个数字。这是我们的特征集。将其转换为 NumPy 数组。该程序如下:

First task is to split this image into 5000 digits. This is our feature set. Convert it to a NumPy array. The program is given below −

import numpy as np
import cv2

image = cv2.imread('digits.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

fset=[]
for i in np.vsplit(gray,50):
   x=np.hsplit(i,100)
   fset.append(x)

NP_array = np.array(fset)

现在,我们将这些数据分成训练集和测试集,每个的大小为 (2500,20x20),如下所示:

Now we divide this data in training set and testing set, each of size (2500,20x20) as follows −

trainset = NP_array[:,:50].reshape(-1,400).astype(np.float32)
testset = NP_array[:,50:100].reshape(-1,400).astype(np.float32)

接下来,我们必须为每个数字创建 10 个不同的标签,如下所示:

Next, we have to create 10 different labels for each digit, as shown below −

k = np.arange(10)
train_labels = np.repeat(k,250)[:,np.newaxis]
test_labels = np.repeat(k,250)[:,np.newaxis]

我们现在可以开始 KNN 分类。创建分类器对象并训练数据。

We are now in a position to start the KNN classification. Create the classifier object and train the data.

knn = cv2.ml.KNearest_create()
knn.train(trainset, cv2.ml.ROW_SAMPLE, train_labels)

将 k 值选择为 3,获取分类器的输出。

Choosing the value of k as 3, obtain the output of the classifier.

ret, output, neighbours, distance = knn.findNearest(testset, k = 3)

比较输出与测试标签以检查分类器的性能和准确性。

Compare the output with test labels to check the performance and accuracy of the classifier.

程序在准确检测手写数字方面显示了 91.64% 的准确性。

The program shows an accuracy of 91.64% in detecting the handwritten digit accurately.

result = output==test_labels
correct = np.count_nonzero(result)
accuracy = (correct*100.0)/(output.size)
print(accuracy)