Scipy 简明教程

SciPy - Quick Guide

SciPy - Introduction

SciPy,读作 Sigh Pi,是一个科学的 Python 开源库,由 BSD 授权,用于执行数学、科学和工程计算。

SciPy, pronounced as Sigh Pi, is a scientific python open source, distributed under the BSD licensed library to perform Mathematical, Scientific and Engineering Computations.

SciPy 库依赖于 NumPy,后者提供便捷且快速的 N 维数组操作。SciPy 库可与 NumPy 数组一起使用,并提供许多用户友好的高效数值实践,例如数值积分和优化的例程。它们一起在所有流行的操作系统上运行,快速安装,并且免费。NumPy 和 SciPy 易于使用,但功能强大,足以被一些世界领先的科学家和工程师依赖。

The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays and provides many user-friendly and efficient numerical practices such as routines for numerical integration and optimization. Together, they run on all popular operating systems, are quick to install and are free of charge. NumPy and SciPy are easy to use, but powerful enough to depend on by some of the world’s leading scientists and engineers.

SciPy Sub-packages

SciPy 被组织成涵盖不同科学计算领域的子包。它们总结在以下表中 −

SciPy is organized into sub-packages covering different scientific computing domains. These are summarized in the following table −

scipy.cluster

Vector quantization / Kmeans

scipy.constants

Physical and mathematical constants

scipy.fftpack

Fourier transform

scipy.integrate

Integration routines

scipy.interpolate

Interpolation

scipy.io

Data input and output

scipy.linalg

Linear algebra routines

scipy.ndimage

n-dimensional image package

scipy.odr

Orthogonal distance regression

scipy.optimize

Optimization

scipy.signal

Signal processing

scipy.sparse

Sparse matrices

scipy.spatial

Spatial data structures and algorithms

scipy.special

Any special mathematical functions

scipy.stats

Statistics

Data Structure

SciPy 使用的多维数组是 NumPy 模块提供的一个基本数据结构。NumPy 为线性代数、傅里叶变换和随机数字生成提供了一些功能,但不如 SciPy 中同等功能那样通用。

The basic data structure used by SciPy is a multidimensional array provided by the NumPy module. NumPy provides some functions for Linear Algebra, Fourier Transforms and Random Number Generation, but not with the generality of the equivalent functions in SciPy.

SciPy - Environment Setup

标准 Python 发行版不附带任何 SciPy 模块。一种轻量级的替代方案是使用流行的 Python 包安装程序安装 SciPy,

Standard Python distribution does not come bundled with any SciPy module. A lightweight alternative is to install SciPy using the popular Python package installer,

pip install pandas

如果我们安装 Anaconda Python package ,Pandas 将被默认安装。以下是安装它们的不同操作系统的包和链接。

If we install the Anaconda Python package, Pandas will be installed by default. Following are the packages and links to install them in different operating systems.

Windows

Anaconda (链接: https://www.continuum.io )是 SciPy 堆栈的免费 Python 分发。它也可以在 Linux 和 Mac 上使用。

Anaconda (from link: https://www.continuum.io) is a free Python distribution for the SciPy stack. It is also available for Linux and Mac.

Canopy ( https://www.enthought.com/products/canopy/ )提供免费版,以及商业版,其中包含适用于 Windows、Linux 和 Mac 的完整 SciPy 堆栈。

Canopy (https://www.enthought.com/products/canopy/) is available free, as well as for commercial distribution with a full SciPy stack for Windows, Linux and Mac.

Python (x,y) − 它是 Windows 操作系统的 SciPy 堆栈和 Spyder IDE 的一个免费 Python 分发。(可从 https://python-xy.github.io/ 下载)

Python (x,y) − It is a free Python distribution with SciPy stack and Spyder IDE for Windows OS. (Downloadable from https://python-xy.github.io/)

Linux

使用各个 Linux 发行版的包管理器来安装 SciPy 堆栈中的一个或多个包。

Package managers of respective Linux distributions are used to install one or more packages in the SciPy stack.

Ubuntu

我们可以使用以下路径在 Ubuntu 中安装 Python。

We can use the following path to install Python in Ubuntu.

sudo apt-get install python-numpy python-scipy
python-matplotlibipythonipython-notebook python-pandas python-sympy python-nose

Fedora

我们可以使用以下路径在 Fedora 中安装 Python。

We can use the following path to install Python in Fedora.

sudo yum install numpyscipy python-matplotlibipython python-pandas
sympy python-nose atlas-devel

SciPy - Basic Functionality

默认情况下,所有 NumPy 函数都可通过 SciPy 名称空间获得。当导入 SciPy 时,无需显式导入 NumPy 函数。NumPy 的主要对象是同构多维数组。它是元素(通常是数字)的表格,所有元素都具有相同的类型,由正整数元组编制索引。在 NumPy 中,维度称为轴。 axes 的数量称为 rank

By default, all the NumPy functions have been available through the SciPy namespace. There is no need to import the NumPy functions explicitly, when SciPy is imported. The main object of NumPy is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy, dimensions are called as axes. The number of axes is called as rank.

现在,让我们回顾一下 NumPy 中向量和矩阵的基本功能。由于 SciPy 构建在 NumPy 数组之上,因此需要了解 NumPy 基础知识。由于线性代数的大部分内容只处理矩阵。

Now, let us revise the basic functionality of Vectors and Matrices in NumPy. As SciPy is built on top of NumPy arrays, understanding of NumPy basics is necessary. As most parts of linear algebra deals with matrices only.

NumPy Vector

可以通过多种方式创建向量。下面描述了其中一些。

A Vector can be created in multiple ways. Some of them are described below.

Converting Python array-like objects to NumPy

让我们考虑以下示例。

Let us consider the following example.

import numpy as np
list = [1,2,3,4]
arr = np.array(list)
print arr

以上程序的输出如下。

The output of the above program will be as follows.

[1 2 3 4]

Intrinsic NumPy Array Creation

NumPy 具有用于从头开始创建数组的内置函数。下面说明了其中一些函数。

NumPy has built-in functions for creating arrays from scratch. Some of these functions are explained below.

Using zeros()

zeros(shape) 函数将创建一个填充有 0 值且形状为指定形状的数组。默认数据类型是 float64。让我们考虑以下示例。

The zeros(shape) function will create an array filled with 0 values with the specified shape. The default dtype is float64. Let us consider the following example.

import numpy as np
print np.zeros((2, 3))

以上程序的输出如下。

The output of the above program will be as follows.

array([[ 0., 0., 0.],
[ 0., 0., 0.]])

Using ones()

ones(shape) 函数将创建一个填充有 1 值的数组。它在其他所有方面都与 zeros 相同。让我们考虑以下示例。

The ones(shape) function will create an array filled with 1 values. It is identical to zeros in all the other respects. Let us consider the following example.

import numpy as np
print np.ones((2, 3))

以上程序的输出如下。

The output of the above program will be as follows.

array([[ 1., 1., 1.],
[ 1., 1., 1.]])

Using arange()

arange() 函数将创建一个以规则递增值递增的数组。让我们考虑以下示例。

The arange() function will create arrays with regularly incrementing values. Let us consider the following example.

import numpy as np
print np.arange(7)

上述程序将生成以下输出。

The above program will generate the following output.

array([0, 1, 2, 3, 4, 5, 6])

Defining the data type of the values

让我们考虑以下示例。

Let us consider the following example.

import numpy as np
arr = np.arange(2, 10, dtype = np.float)
print arr
print "Array Data Type :",arr.dtype

上述程序将生成以下输出。

The above program will generate the following output.

[ 2. 3. 4. 5. 6. 7. 8. 9.]
Array Data Type : float64

Using linspace()

linspace() 函数将创建具有指定数量元素的数组,这些元素将在指定的开始值和结束值之间均等地分布。让我们考虑以下示例。

The linspace() function will create arrays with a specified number of elements, which will be spaced equally between the specified beginning and end values. Let us consider the following example.

import numpy as np
print np.linspace(1., 4., 6)

上述程序将生成以下输出。

The above program will generate the following output.

array([ 1. , 1.6, 2.2, 2.8, 3.4, 4. ])

Matrix

矩阵是一个专门的二维数组,它通过操作保持其二维特性。它具有一些特殊运算符,例如 (矩阵乘法)和 *(矩阵幂)。让我们考虑以下示例。

A matrix is a specialized 2-D array that retains its 2-D nature through operations. It has certain special operators, such as * (matrix multiplication) and ** (matrix power). Let us consider the following example.

import numpy as np
print np.matrix('1 2; 3 4')

上述程序将生成以下输出。

The above program will generate the following output.

matrix([[1, 2],
[3, 4]])

Conjugate Transpose of Matrix

此特性返回 self 的(复数)共轭转置。让我们考虑以下示例。

This feature returns the (complex) conjugate transpose of self. Let us consider the following example.

import numpy as np
mat = np.matrix('1 2; 3 4')
print mat.H

上述程序将生成以下输出。

The above program will generate the following output.

matrix([[1, 3],
        [2, 4]])

Transpose of Matrix

此特性返回自身的转置。让我们考虑以下示例。

This feature returns the transpose of self. Let us consider the following example.

import numpy as np
mat = np.matrix('1 2; 3 4')
mat.T

上述程序将生成以下输出。

The above program will generate the following output.

matrix([[1, 3],
        [2, 4]])

当我们转置一个矩阵时,我们会创建一个新矩阵,其行是原始矩阵的列。另一方面,共轭转置将每个矩阵元素的行和列索引互换。矩阵的逆是一个矩阵,如果与原始矩阵相乘,将得到单位矩阵。

When we transpose a matrix, we make a new matrix whose rows are the columns of the original. A conjugate transposition, on the other hand, interchanges the row and the column index for each matrix element. The inverse of a matrix is a matrix that, if multiplied with the original matrix, results in an identity matrix.

SciPy - Cluster

K-means clustering 是一种在没有标签的数据集中查找聚类和聚类中心的方法。凭直觉,我们可以将聚类视作由一组数据点组成,这些点之间的距离与聚类外部点之间的距离相比很小。给定初始 K 中心集合,K 均值算法迭代执行以下两个步骤 −

K-means clustering is a method for finding clusters and cluster centers in a set of unlabelled data. Intuitively, we might think of a cluster as – comprising of a group of data points, whose inter-point distances are small compared with the distances to points outside of the cluster. Given an initial set of K centers, the K-means algorithm iterates the following two steps −

  1. For each center, the subset of training points (its cluster) that is closer to it is identified than any other center.

  2. The mean of each feature for the data points in each cluster are computed, and this mean vector becomes the new center for that cluster.

迭代这两个步骤,直到中心不再移动或分配不再改变。然后,可以将一个新点 x 分配到最接近的原型的聚类中。SciPy 库通过 cluster 包提供了 K 均值算法的良好实现。让我们了解如何使用它。

These two steps are iterated until the centers no longer move or the assignments no longer change. Then, a new point x can be assigned to the cluster of the closest prototype. The SciPy library provides a good implementation of the K-Means algorithm through the cluster package. Let us understand how to use it.

K-Means Implementation in SciPy

在这个部分中,我们将理解如何利用 SciPy 来实现 K-Means 算法。

We will understand how to implement K-Means in SciPy.

Import K-Means

我们将了解每个导入函数的实现和用途。

We will see the implementation and usage of each imported function.

from SciPy.cluster.vq import kmeans,vq,whiten

Data generation

我们必须模拟一些数据来探索聚类。

We have to simulate some data to explore the clustering.

from numpy import vstack,array
from numpy.random import rand

# data generation with three features
data = vstack((rand(100,3) + array([.5,.5,.5]),rand(100,3)))

现在,我们必须检查数据。上述程序将生成以下输出。

Now, we have to check for data. The above program will generate the following output.

array([[ 1.48598868e+00, 8.17445796e-01, 1.00834051e+00],
       [ 8.45299768e-01, 1.35450732e+00, 8.66323621e-01],
       [ 1.27725864e+00, 1.00622682e+00, 8.43735610e-01],
       …………….

在每个特征基础上标准化一组观测值。在运行 K-Means 之前,最好使用白化来重新调整观测集合中每一个特征的维度。每个特征除以横跨所有观测值的标准差,得到单位方差。

Normalize a group of observations on a per feature basis. Before running K-Means, it is beneficial to rescale each feature dimension of the observation set with whitening. Each feature is divided by its standard deviation across all observations to give it unit variance.

Whiten the data

我们必须使用以下代码对数据进行白化。

We have to use the following code to whiten the data.

# whitening of data
data = whiten(data)

Compute K-Means with Three Clusters

现在,我们使用以下代码,使用三个聚类来计算 K-Means。

Let us now compute K-Means with three clusters using the following code.

# computing K-Means with K = 3 (2 clusters)
centroids,_ = kmeans(data,3)

上述代码对一组观测向量执行 K-Means 并形成 K 个聚类。K-Means 算法不断调整质心,直到进度无法取得进展,也就是说,自上次迭代以来,扭曲形变的变化低于某些阈值。这里,我们能够通过使用以下代码打印质心变量来观测聚类的质心。

The above code performs K-Means on a set of observation vectors forming K clusters. The K-Means algorithm adjusts the centroids until sufficient progress cannot be made, i.e. the change in distortion, since the last iteration is less than some threshold. Here, we can observe the centroid of the cluster by printing the centroids variable using the code given below.

print(centroids)

以上代码将生成以下输出。

The above code will generate the following output.

print(centroids)[ [ 2.26034702  1.43924335  1.3697022 ]
                  [ 2.63788572  2.81446462  2.85163854]
                  [ 0.73507256  1.30801855  1.44477558] ]

使用以下代码,将每一个值分配到一个聚类。

Assign each value to a cluster by using the code given below.

# assign each sample to a cluster
clx,_ = vq(data,centroids)

vq 函数将“M”乘以“N” obs 数组中的每一个观测向量与质心进行比较,并将观测值分配给最近的聚类。它返回每个观测值的聚类和扭曲形变。我们也能检查扭曲形变。让我们使用以下代码检查每个观测值的聚类。

The vq function compares each observation vector in the ‘M’ by ‘N’ obs array with the centroids and assigns the observation to the closest cluster. It returns the cluster of each observation and the distortion. We can check the distortion as well. Let us check the cluster of each observation using the following code.

# check clusters of observation
print clx

以上代码将生成以下输出。

The above code will generate the following output.

array([1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 2, 0, 2, 0, 1, 1, 1,
0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0,
0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 1,  0, 0, 0, 0, 1, 0, 0, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 0, 2, 2, 2, 2, 2, 0, 0,
2, 2, 2, 1, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

上述数组的不同值 0、1、2 表明了聚类。

The distinct values 0, 1, 2 of the above array indicate the clusters.

SciPy - Constants

SciPy 常量包提供了广泛的常量,可用于一般科学领域。

SciPy constants package provides a wide range of constants, which are used in the general scientific area.

SciPy Constants Package

scipy.constants package 提供各种常量。我们必须导入所需的常量,并根据要求使用。让我们看看如何导入和使用这些常量变量。

The scipy.constants package provides various constants. We have to import the required constant and use them as per the requirement. Let us see how these constant variables are imported and used.

首先,让我们考虑以下示例,比较“pi”值。

To start with, let us compare the ‘pi’ value by considering the following example.

#Import pi constant from both the packages
from scipy.constants import pi
from math import pi

print("sciPy - pi = %.16f"%scipy.constants.pi)
print("math - pi = %.16f"%math.pi)

上述程序将生成以下输出。

The above program will generate the following output.

sciPy - pi = 3.1415926535897931
math - pi = 3.1415926535897931

List of Constants Available

下表简要介绍了各种常量。

The following tables describe in brief the various constants.

Mathematical Constants

Sr. No.

Constant

Description

1

pi

pi

2

golden

Golden Ratio

Physical Constants

下表列出了最常用的物理常量。

The following table lists the most commonly used physical constants.

Sr. No.

Constant & Description

1

c Speed of light in vacuum

2

speed_of_light Speed of light in vacuum

3

h Planck constant

4

Planck Planck constant h

5

G Newton’s gravitational constant

6

e Elementary charge

7

R Molar gas constant

8

Avogadro Avogadro constant

9

k Boltzmann constant

10

electron_mass(OR) m_e Electronic mass

11

proton_mass (OR) m_p Proton mass

12

neutron_mass(OR)m_n Neutron mass

Units

下表列出了国际单位制。

The following table has the list of SI units.

Sr. No.

Unit

Value

1

milli

0.001

2

micro

1e-06

3

kilo

1000

这些单位的范围从尧、泽、艾、拍、太……千、百……纳、皮……到泽普托。

These units range from yotta, zetta, exa, peta, tera ……kilo, hector, …nano, pico, … to zepto.

Other Important Constants

下表列出了 SciPy 中使用的其他重要常量。

The following table lists other important constants used in SciPy.

Sr. No.

Unit

Value

1

gram

0.001 kg

2

atomic mass

Atomic mass constant

3

degree

Degree in radians

4

minute

One minute in seconds

5

day

One day in seconds

6

inch

One inch in meters

7

micron

One micron in meters

8

light_year

One light-year in meters

9

atm

Standard atmosphere in pascals

10

acre

One acre in square meters

11

liter

One liter in cubic meters

12

gallon

One gallon in cubic meters

13

kmh

Kilometers per hour in meters per seconds

14

degree_Fahrenheit

One Fahrenheit in kelvins

15

eV

One electron volt in joules

16

hp

One horsepower in watts

17

dyn

One dyne in newtons

18

lambda2nu

Convert wavelength to optical frequency

记住所有这些有点困难。获取哪个键适用于哪个函数的简单方法是使用 scipy.constants.find() 方法。让我们考虑以下示例。

Remembering all of these are a bit tough. The easy way to get which key is for which function is with the scipy.constants.find() method. Let us consider the following example.

import scipy.constants
res = scipy.constants.physical_constants["alpha particle mass"]
print res

上述程序将生成以下输出。

The above program will generate the following output.

[
   'alpha particle mass',
   'alpha particle mass energy equivalent',
   'alpha particle mass energy equivalent in MeV',
   'alpha particle mass in u',
   'electron to alpha particle mass ratio'
]

此方法返回键的列表,否则如果关键字不匹配,则返回空。

This method returns the list of keys, else nothing if the keyword does not match.

SciPy - FFTpack

对时域信号计算 Fourier Transformation ,以检查其在频域中的行为。傅立叶变换在信号和噪声处理、图像处理、音频信号处理等学科中都有应用。SciPy 提供 fftpack 模块,使用户可以计算快速傅立叶变换。

Fourier Transformation is computed on a time domain signal to check its behavior in the frequency domain. Fourier transformation finds its application in disciplines such as signal and noise processing, image processing, audio signal processing, etc. SciPy offers the fftpack module, which lets the user compute fast Fourier transforms.

以下是正弦函数的一个示例,将使用 fftpack 模块计算傅立叶变换。

Following is an example of a sine function, which will be used to calculate Fourier transform using the fftpack module.

Fast Fourier Transform

让我们详细了解一下什么是快速傅立叶变换。

Let us understand what fast Fourier transform is in detail.

One Dimensional Discrete Fourier Transform

长度为 N 的序列 x[n] 的长度为 N 的 FFT y[k] 由 fft() 计算,逆变换由 ifft() 计算。让我们考虑以下示例

The FFT y[k] of length N of the length-N sequence x[n] is calculated by fft() and the inverse transform is calculated using ifft(). Let us consider the following example

#Importing the fft and inverse fft functions from fftpackage
from scipy.fftpack import fft

#create an array with random n numbers
x = np.array([1.0, 2.0, 1.0, -1.0, 1.5])

#Applying the fft function
y = fft(x)
print y

上述程序将生成以下输出。

The above program will generate the following output.

[ 4.50000000+0.j           2.08155948-1.65109876j   -1.83155948+1.60822041j
 -1.83155948-1.60822041j   2.08155948+1.65109876j ]

我们来看另一个示例

Let us look at another example

#FFT is already in the workspace, using the same workspace to for inverse transform

yinv = ifft(y)

print yinv

上述程序将生成以下输出。

The above program will generate the following output.

[ 1.0+0.j   2.0+0.j   1.0+0.j   -1.0+0.j   1.5+0.j ]

scipy.fftpack 模块允许计算快速傅立叶变换。例如,(有噪声的)输入信号可能如下所示 −

The scipy.fftpack module allows computing fast Fourier transforms. As an illustration, a (noisy) input signal may look as follows −

import numpy as np
time_step = 0.02
period = 5.
time_vec = np.arange(0, 20, time_step)
sig = np.sin(2 * np.pi / period * time_vec) + 0.5 *np.random.randn(time_vec.size)
print sig.size

我们正在创建一个时间步长为 0.02 秒的信号。最后一条语句打印信号 sig 的大小。输出如下所示 −

We are creating a signal with a time step of 0.02 seconds. The last statement prints the size of the signal sig. The output would be as follows −

1000

我们不知道信号频率;我们只知道信号 sig 的采样时间步长。该信号应该来自一个真实函数,因此傅立叶变换是对称的。 scipy.fftpack.fftfreq() 函数将生成采样频率, scipy.fftpack.fft() 将计算快速傅立叶变换。

We do not know the signal frequency; we only know the sampling time step of the signal sig. The signal is supposed to come from a real function, so the Fourier transform will be symmetric. The scipy.fftpack.fftfreq() function will generate the sampling frequencies and scipy.fftpack.fft() will compute the fast Fourier transform.

让我们通过一个示例来理解这一点。

Let us understand this with the help of an example.

from scipy import fftpack
sample_freq = fftpack.fftfreq(sig.size, d = time_step)
sig_fft = fftpack.fft(sig)
print sig_fft

上述程序将生成以下输出。

The above program will generate the following output.

array([
   25.45122234 +0.00000000e+00j,   6.29800973 +2.20269471e+00j,
   11.52137858 -2.00515732e+01j,   1.08111300 +1.35488579e+01j,
   …….])

Discrete Cosine Transform

Discrete Cosine Transform (DCT) 用在不同频率下振荡的余弦函数之和表示有限数据点的序列。SciPy 提供了带有函数 dct 的 DCT 和带有函数 idct 的相应的 IDCT。让我们考虑以下示例。

A Discrete Cosine Transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. SciPy provides a DCT with the function dct and a corresponding IDCT with the function idct. Let us consider the following example.

from scipy.fftpack import dct
print dct(np.array([4., 3., 5., 10., 5., 3.]))

上述程序将生成以下输出。

The above program will generate the following output.

array([ 60.,  -3.48476592,  -13.85640646,  11.3137085,  6.,  -6.31319305])

离散余弦逆变换从其离散余弦变换 (DCT) 系数重建一个序列。idct 函数是 dct 函数的逆函数。让我们通过以下示例来理解这一点。

The inverse discrete cosine transform reconstructs a sequence from its discrete cosine transform (DCT) coefficients. The idct function is the inverse of the dct function. Let us understand this with the following example.

from scipy.fftpack import dct
print idct(np.array([4., 3., 5., 10., 5., 3.]))

上述程序将生成以下输出。

The above program will generate the following output.

array([ 39.15085889, -20.14213562, -6.45392043, 7.13341236,
8.14213562, -3.83035081])

SciPy - Integrate

当无法分析积分一个函数或者很难分析积分一个函数时,通常会使用数值积分法。SciPy 有很多例程可执行数值积分。其中大多数都可以在相同的 scipy.integrate 库中找到。下表列出了一些常用的函数。

When a function cannot be integrated analytically, or is very difficult to integrate analytically, one generally turns to numerical integration methods. SciPy has a number of routines for performing numerical integration. Most of them are found in the same scipy.integrate library. The following table lists some commonly used functions.

Sr No.

Function & Description

1

quad Single integration

2

dblquad Double integration

3

tplquad Triple integration

4

nquad n-fold multiple integration

5

fixed_quad Gaussian quadrature, order n

6

quadrature Gaussian quadrature to tolerance

7

romberg Romberg integration

8

trapz Trapezoidal rule

9

cumtrapz Trapezoidal rule to cumulatively compute integral

10

simps Simpson’s rule

11

romb Romberg integration

12

polyint Analytical polynomial integration (NumPy)

13

poly1d Helper function for polyint (NumPy)

Single Integrals

Quad 函数是 SciPy 集成函数的核心。数值积分有时被称为 quadrature ,因此得名。它通常是针对给定范围 a 到 b 上函数 f(x) 的单积分执行的默认选择。

The Quad function is the workhorse of SciPy’s integration functions. Numerical integration is sometimes called quadrature, hence the name. It is normally the default choice for performing single integrals of a function f(x) over a given fixed range from a to b.

\int_{a}^{b} f(x)dx

quad 的通用形式为,其中,‘f’是被积函数的名称。而 ‘a’ 和 ‘b’ 分别是下限和上限。让我们看一下一个范围在 0 和 1 之间的 Gaussian 函数的示例。

The general form of quad is scipy.integrate.quad(f, a, b), Where ‘f’ is the name of the function to be integrated. Whereas, ‘a’ and ‘b’ are the lower and upper limits, respectively. Let us see an example of the Gaussian function, integrated over a range of 0 and 1.

我们首先需要定义函数 → $f(x) = e {-x 2}$ ,这可以使用 lambda 表达式完成,然后在该函数上调用 quad 方法。

We first need to define the function → $f(x) = e{-x2}$ , this can be done using a lambda expression and then call the quad method on that function.

import scipy.integrate
from numpy import exp
f= lambda x:exp(-x**2)
i = scipy.integrate.quad(f, 0, 1)
print i

上述程序将生成以下输出。

The above program will generate the following output.

(0.7468241328124271, 8.291413475940725e-15)

quad 函数返回两个值,其中第一个数字是积分值,第二个数字是积分值绝对误差的估计。

The quad function returns the two values, in which the first number is the value of integral and the second value is the estimate of the absolute error in the value of integral.

Note − 由于 quad 需要函数作为第一个参数,因此我们不能直接传递 exp 作为参数。Quad 函数接受正无穷大和负无穷大作为极限。Quad 函数可以对单变量标准预定义 NumPy 函数(如 exp、sin 和 cos)进行积分。

Note − Since quad requires the function as the first argument, we cannot directly pass exp as the argument. The Quad function accepts positive and negative infinity as limits. The Quad function can integrate standard predefined NumPy functions of a single variable, such as exp, sin and cos.

Multiple Integrals

双重和三重集成的机制已封装到函数 dblquad, tplquadnquad 中。这些函数分别集成四个或六个参数。所有内部积分的极限需要定义为函数。

The mechanics for double and triple integration have been wrapped up into the functions dblquad, tplquad and nquad. These functions integrate four or six arguments, respectively. The limits of all inner integrals need to be defined as functions.

Double Integrals

dblquad 的通用形式为 scipy.integrate.dblquad(func, a, b, gfun, hfun)。其中,func 是被积函数的名称,‘a’ 和 ‘b’ 分别是 x 变量的下限和上限,而 gfun 和 hfun 是定义 y 变量的下限和上限的函数的名称。

The general form of dblquad is scipy.integrate.dblquad(func, a, b, gfun, hfun). Where, func is the name of the function to be integrated, ‘a’ and ‘b’ are the lower and upper limits of the x variable, respectively, while gfun and hfun are the names of the functions that define the lower and upper limits of the y variable.

举个例子,让我们执行二重积分法。

As an example, let us perform the double integral method.

\int_{0}^{1/2} dy \int_{0} {\sqrt{1-4y 2}} 16xy \:dx

\int_{0}^{1/2} dy \int_{0}{\sqrt{1-4y2}} 16xy \:dx

我们使用 lambda 表达式定义函数 f、g 和 h。请注意,即使 g 和 h 是常数(在很多情况下它们可能都是),也必须将它们定义为函数,就像我们在此处为下限所做的那样。

We define the functions f, g, and h, using the lambda expressions. Note that even if g and h are constants, as they may be in many cases, they must be defined as functions, as we have done here for the lower limit.

import scipy.integrate
from numpy import exp
from math import sqrt
f = lambda x, y : 16*x*y
g = lambda x : 0
h = lambda y : sqrt(1-4*y**2)
i = scipy.integrate.dblquad(f, 0, 0.5, g, h)
print i

上述程序将生成以下输出。

The above program will generate the following output.

(0.5, 1.7092350012594845e-14)

除了上面描述的例程之外,scipy.integrate 还有许多其他积分例程,包括执行 n 倍多重积分的 nquad,以及实现各种积分算法的其他例程。但是,对于我们的数值积分需求,quad 和 dblquad 将满足其中大部分需求。

In addition to the routines described above, scipy.integrate has a number of other integration routines, including nquad, which performs n-fold multiple integration, as well as other routines that implement various integration algorithms. However, quad and dblquad will meet most of our needs for numerical integration.

SciPy - Interpolate

在本章中,我们将讨论插值如何帮助 SciPy。

In this chapter, we will discuss how interpolation helps in SciPy.

What is Interpolation?

插值是在线或曲线上两个点之间找到一个值的过程。为了帮助我们记住它的含义,我们应该把这个词的第一部分“inter”理解为“enter”(进入),这提醒我们查看我们最初拥有数据的“内部”。这个工具 - 插值,不仅在统计学中有用,在科学、商业或需要预测位于两个现有数据点之内的值的场合也有用。

Interpolation is the process of finding a value between two points on a line or a curve. To help us remember what it means, we should think of the first part of the word, 'inter,' as meaning 'enter,' which reminds us to look 'inside' the data we originally had. This tool, interpolation, is not only useful in statistics, but is also useful in science, business, or when there is a need to predict values that fall within two existing data points.

让我们创建一些数据,看看如何使用 scipy.interpolate 包来完成这种插值。

Let us create some data and see how this interpolation can be done using the scipy.interpolate package.

import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
x = np.linspace(0, 4, 12)
y = np.cos(x**2/3+4)
print x,y

上述程序将生成以下输出。

The above program will generate the following output.

(
   array([0.,  0.36363636,  0.72727273,  1.09090909,  1.45454545, 1.81818182,
          2.18181818,  2.54545455,  2.90909091,  3.27272727,  3.63636364,  4.]),

   array([-0.65364362,  -0.61966189,  -0.51077021,  -0.31047698,  -0.00715476,
           0.37976236,   0.76715099,   0.99239518,   0.85886263,   0.27994201,
          -0.52586509,  -0.99582185])
)

现在,我们有两个数组。假设这两个数组是空间中点的两个维度,让我们使用以下程序绘制它们并看看它们是什么样子的。

Now, we have two arrays. Assuming those two arrays as the two dimensions of the points in space, let us plot using the following program and see how they look like.

plt.plot(x, y,’o’)
plt.show()

上述程序将生成以下输出。

The above program will generate the following output.

interpolation

1-D Interpolation

scipy.interpolate 中的 interp1d 类是一个便捷的方法,用于基于固定数据点创建函数,该函数可以使用线性插值在由给定数据定义的范围内任何位置进行求值。

The interp1d class in the scipy.interpolate is a convenient method to create a function based on fixed data points, which can be evaluated anywhere within the domain defined by the given data using linear interpolation.

让我们使用上述数据创建一个插值函数并绘制一个新的插值图。

By using the above data, let us create a interpolate function and draw a new interpolated graph.

f1 = interp1d(x, y,kind = 'linear')

f2 = interp1d(x, y, kind = 'cubic')

使用 interp1d 函数,我们创建了两个函数 f1 和 f2。这些函数为给定的输入 x 返回 y。第三个变量 kind 表示插值技术的类型。“Linear”(线性)、“Nearest”(最近)、“Zero”(零)、“Slinear”(线性分段)、“Quadratic”(二次)、“Cubic”(三次)是一些插值技术。

Using the interp1d function, we created two functions f1 and f2. These functions, for a given input x returns y. The third variable kind represents the type of the interpolation technique. 'Linear', 'Nearest', 'Zero', 'Slinear', 'Quadratic', 'Cubic' are a few techniques of interpolation.

现在,让我们创建一个更长的新的输入来清楚地了解插值的区别。我们将对新数据使用旧数据的相同函数。

Now, let us create a new input of more length to see the clear difference of interpolation. We will use the same function of the old data on the new data.

xnew = np.linspace(0, 4,30)

plt.plot(x, y, 'o', xnew, f(xnew), '-', xnew, f2(xnew), '--')

plt.legend(['data', 'linear', 'cubic','nearest'], loc = 'best')

plt.show()

上述程序将生成以下输出。

The above program will generate the following output.

1d interpolation

Splines

为了绘制通过数据点的平滑曲线,绘图员曾经使用称为机械样条的薄柔性木、硬橡胶、金属或塑料条。为了使用机械样条,在沿着设计中曲线上精心挑选的点上放置销钉,然后弯曲样条,以便它触碰到这些销钉中的每一个。

To draw smooth curves through data points, drafters once used thin flexible strips of wood, hard rubber, metal or plastic called mechanical splines. To use a mechanical spline, pins were placed at a judicious selection of points along a curve in a design, and then the spline was bent, so that it touched each of these pins.

显然,通过这种构造,样条在这些销钉处对曲线进行插值。它可用于在其他图纸中再现曲线。放置销钉的点称为结。我们可以通过调整结的位置来改变样条定义曲线的形状。

Clearly, with this construction, the spline interpolates the curve at these pins. It can be used to reproduce the curve in other drawings. The points where the pins are located is called knots. We can change the shape of the curve defined by the spline by adjusting the location of the knots.

Univariate Spline

一维平滑样条拟合给定的一组数据点。scipy.interpolate 中的 UnivariateSpline 类是一个便捷的方法,用于基于固定数据点创建函数,类 - scipy.interpolate.UnivariateSpline(x, y, w = None, bbox = [None, None], k = 3, s = None, ext = 0, check_finite = False)。

One-dimensional smoothing spline fits a given set of data points. The UnivariateSpline class in scipy.interpolate is a convenient method to create a function, based on fixed data points class – scipy.interpolate.UnivariateSpline(x, y, w = None, bbox = [None, None], k = 3, s = None, ext = 0, check_finite = False).

Parameters − 以下是 Univariate 样条的参数。

Parameters − Following are the parameters of a Univariate Spline.

  1. This fits a spline y = spl(x) of degree k to the provided x, y data.

  2. ‘w’ − Specifies the weights for spline fitting. Must be positive. If none (default), weights are all equal.

  3. ‘s’ − Specifies the number of knots by specifying a smoothing condition.

  4. ‘k’ − Degree of the smoothing spline. Must be ⇐ 5. Default is k = 3, a cubic spline.

  5. Ext − Controls the extrapolation mode for elements not in the interval defined by the knot sequence.

  6. check_finite – Whether to check that the input arrays contain only finite numbers.

让我们考虑以下示例。

Let us consider the following example.

import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline
x = np.linspace(-3, 3, 50)
y = np.exp(-x**2) + 0.1 * np.random.randn(50)
plt.plot(x, y, 'ro', ms = 5)
plt.show()

对平滑参数使用默认值。

Use the default value for the smoothing parameter.

splines
spl = UnivariateSpline(x, y)
xs = np.linspace(-3, 3, 1000)
plt.plot(xs, spl(xs), 'g', lw = 3)
plt.show()

手动更改平滑量。

Manually change the amount of smoothing.

splines smoothing
spl.set_smoothing_factor(0.5)
plt.plot(xs, spl(xs), 'b', lw = 3)
plt.show()
splines smoothing2

SciPy - Input & Output

Scipy.io(输入输出)包提供了一系列函数,用于处理不同格式的文件。某些格式包括:

The Scipy.io (Input and Output) package provides a wide range of functions to work around with different format of files. Some of these formats are −

  1. Matlab

  2. IDL

  3. Matrix Market

  4. Wave

  5. Arff

  6. Netcdf, etc.

下面详细介绍一下最常用的文件格式:

Let us discuss in detail about the most commonly used file formats −

MATLAB

以下是用于加载和保存 .mat 文件的函数。

Following are the functions used to load and save a .mat file.

Sr. No.

Function & Description

1

loadmat Loads a MATLAB file

2

savemat Saves a MATLAB file

3

whosmat Lists variables inside a MATLAB file

让我们考虑以下示例。

Let us consider the following example.

import scipy.io as sio
import numpy as np

#Save a mat file
vect = np.arange(10)
sio.savemat('array.mat', {'vect':vect})

#Now Load the File
mat_file_content = sio.loadmat(‘array.mat’)
Print mat_file_content

上述程序将生成以下输出。

The above program will generate the following output.

{
   'vect': array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]), '__version__': '1.0',
   '__header__': 'MATLAB 5.0 MAT-file Platform: posix, Created on: Sat Sep 30
   09:49:32 2017', '__globals__': []
}

我们可以看到阵列及元信息。如果我们希望检查一个 MATLAB 文件的内容而不将数据读入到内存中,则使用 whosmat command ,如下所示。

We can see the array along with the Meta information. If we want to inspect the contents of a MATLAB file without reading the data into memory, use the whosmat command as shown below.

import scipy.io as sio
mat_file_content = sio.whosmat(‘array.mat’)
print mat_file_content

上述程序将生成以下输出。

The above program will generate the following output.

[('vect', (1, 10), 'int64')]

SciPy - Linalg

SciPy 是基于经过优化的 ATLAS LAPACKBLAS 库构建的。它具有非常快的线性代数功能。所有这些线性代数例程都预期使用可转换为二维阵列的对象。这些例程的输出也是一个二维阵列。

SciPy is built using the optimized ATLAS LAPACK and BLAS libraries. It has very fast linear algebra capabilities. All of these linear algebra routines expect an object that can be converted into a two-dimensional array. The output of these routines is also a two-dimensional array.

SciPy.linalg vs NumPy.linalg

scipy.linalg 包含 numpy.linalg 中的所有函数。另外,scipy.linalg 还有 numpy.linalg 中没有的某些高级函数。与 numpy.linalg 相比,使用 scipy.linalg 的另一个优势在于它始终与 BLAS/LAPACK 支持一起编译,而对于 NumPy 则是可选的。因此,根据 NumPy 的安装方式,使用 SciPy 的版本可能会更快。

A scipy.linalg contains all the functions that are in numpy.linalg. Additionally, scipy.linalg also has some other advanced functions that are not in numpy.linalg. Another advantage of using scipy.linalg over numpy.linalg is that it is always compiled with BLAS/LAPACK support, while for NumPy this is optional. Therefore, the SciPy version might be faster depending on how NumPy was installed.

Linear Equations

scipy.linalg.solve 特性用于求解线性方程 a * x + b * y = Z,未知数为 x、y。

The scipy.linalg.solve feature solves the linear equation a * x + b * y = Z, for the unknown x, y values.

举例来说,假设希望求解下列联立方程组:

As an example, assume that it is desired to solve the following simultaneous equations.

x + 3y + 5z = 10

x + 3y + 5z = 10

2x + 5y + z = 8

2x + 5y + z = 8

2x + 3y + 8z = 3

2x + 3y + 8z = 3

为了对 x、y、z 求解上述方程,我们可以使用矩阵逆来找到解向量,如下所示:

To solve the above equation for the x, y, z values, we can find the solution vector using a matrix inverse as shown below.

\begin{bmatrix} x\\ y\\ z \end{bmatrix} = \begin{bmatrix} 1 & 3 & 5\\ 2 & 5 & 1\\ 2 & 3 & 8 \end{bmatrix}^{-1} \begin{bmatrix} 10\\ 8\\ 3 \end{bmatrix} = \frac{1}{25} \begin{bmatrix} -232\\ 129\\ 19 \end{bmatrix} = \begin{bmatrix} -9.28\\ 5.16\\ 0.76 \end{bmatrix}.

但是,最好使用可以更快并且数值稳定的 linalg.solve 命令。

However, it is better to use the linalg.solve command, which can be faster and more numerically stable.

solve 函数接受两个输入“a”和“b”,其中“a”表示系数,“b”表示右侧相应的值,并返回解数组。

The solve function takes two inputs ‘a’ and ‘b’ in which ‘a’ represents the coefficients and ‘b’ represents the respective right hand side value and returns the solution array.

让我们考虑以下示例。

Let us consider the following example.

#importing the scipy and numpy packages
from scipy import linalg
import numpy as np

#Declaring the numpy arrays
a = np.array([[3, 2, 0], [1, -1, 0], [0, 5, 1]])
b = np.array([2, 4, -1])

#Passing the values to the solve function
x = linalg.solve(a, b)

#printing the result array
print x

上述程序将生成以下输出。

The above program will generate the following output.

array([ 2., -2., 9.])

Finding a Determinant

方阵 A 的行列式通常表示为 |A|,是线性代数中经常使用的量。在 SciPy 中,使用 det() 函数计算该行列式。它将矩阵作为输入,并返回标量值。

The determinant of a square matrix A is often denoted as |A| and is a quantity often used in linear algebra. In SciPy, this is computed using the det() function. It takes a matrix as input and returns a scalar value.

让我们考虑以下示例。

Let us consider the following example.

#importing the scipy and numpy packages
from scipy import linalg
import numpy as np

#Declaring the numpy array
A = np.array([[1,2],[3,4]])

#Passing the values to the det function
x = linalg.det(A)

#printing the result
print x

上述程序将生成以下输出。

The above program will generate the following output.

-2.0

Eigenvalues and Eigenvectors

特征值-特征向量问题是使用最普遍的线性代数运算之一。我们可以通过考虑以下关系来找到方阵(A)的特征值(λ)和相应的特征向量(v)−

The eigenvalue-eigenvector problem is one of the most commonly employed linear algebra operations. We can find the Eigen values (λ) and the corresponding Eigen vectors (v) of a square matrix (A) by considering the following relation −

Av = λv

Av = λv

scipy.linalg.eig 计算常规特征值问题或广义特征值问题的特征值。该函数返回特征值和特征向量。

scipy.linalg.eig computes the eigenvalues from an ordinary or generalized eigenvalue problem. This function returns the Eigen values and the Eigen vectors.

让我们考虑以下示例。

Let us consider the following example.

#importing the scipy and numpy packages
from scipy import linalg
import numpy as np

#Declaring the numpy array
A = np.array([[1,2],[3,4]])

#Passing the values to the eig function
l, v = linalg.eig(A)

#printing the result for eigen values
print l

#printing the result for eigen vectors
print v

上述程序将生成以下输出。

The above program will generate the following output.

array([-0.37228132+0.j, 5.37228132+0.j]) #--Eigen Values
array([[-0.82456484, -0.41597356], #--Eigen Vectors
       [ 0.56576746, -0.90937671]])

Singular Value Decomposition

奇异值分解 (SVD) 可被视为将特征值问题扩展到非方阵的矩阵。

A Singular Value Decomposition (SVD) can be thought of as an extension of the eigenvalue problem to matrices that are not square.

scipy.linalg.svd 将矩阵“a”分解为两个酉矩阵“U”和“Vh”以及奇异值(实数、非负数)的一维数组“s”,使得 a == U*S*Vh,其中“S”是以主对角线“s”为适形矩阵的零矩阵。

The scipy.linalg.svd factorizes the matrix ‘a’ into two unitary matrices ‘U’ and ‘Vh’ and a 1-D array ‘s’ of singular values (real, non-negative) such that a == U*S*Vh, where ‘S’ is a suitably shaped matrix of zeros with the main diagonal ‘s’.

让我们考虑以下示例。

Let us consider the following example.

#importing the scipy and numpy packages
from scipy import linalg
import numpy as np

#Declaring the numpy array
a = np.random.randn(3, 2) + 1.j*np.random.randn(3, 2)

#Passing the values to the eig function
U, s, Vh = linalg.svd(a)

# printing the result
print U, Vh, s

上述程序将生成以下输出。

The above program will generate the following output.

(
   array([
      [ 0.54828424-0.23329795j, -0.38465728+0.01566714j,
      -0.18764355+0.67936712j],
      [-0.27123194-0.5327436j , -0.57080163-0.00266155j,
      -0.39868941-0.39729416j],
      [ 0.34443818+0.4110186j , -0.47972716+0.54390586j,
      0.25028608-0.35186815j]
   ]),

   array([ 3.25745379, 1.16150607]),

   array([
      [-0.35312444+0.j , 0.32400401+0.87768134j],
      [-0.93557636+0.j , -0.12229224-0.33127251j]
   ])
)

SciPy - Ndimage

SciPy ndimage 子模块专用于图像处理。此处,ndimage 表示 n 维图像。

The SciPy ndimage submodule is dedicated to image processing. Here, ndimage means an n-dimensional image.

图像处理中最常见的一些任务如下:

Some of the most common tasks in image processing are as follows &miuns;

  1. Input/Output, displaying images

  2. Basic manipulations − Cropping, flipping, rotating, etc.

  3. Image filtering − De-noising, sharpening, etc.

  4. Image segmentation − Labeling pixels corresponding to different objects

  5. Classification

  6. Feature extraction

  7. Registration

我们来讨论如何使用 SciPy 完成其中一些操作。

Let us discuss how some of these can be achieved using SciPy.

Opening and Writing to Image Files

SciPy 中的 misc package 带有一些图像。我们使用那些图像来学习图像处理。我们不妨考虑以下示例。

The misc package in SciPy comes with some images. We use those images to learn the image manipulations. Let us consider the following example.

from scipy import misc
f = misc.face()
misc.imsave('face.png', f) # uses the Image module (PIL)

import matplotlib.pyplot as plt
plt.imshow(f)
plt.show()

上述程序将生成以下输出。

The above program will generate the following output.

opening and writing to image files

任何原始格式的图像都是由以矩阵格式表示的数字表示的颜色组合而成的。机器仅基于这些数字来理解和处理图像。RGB 是流行的表示方法。

Any images in its raw format is the combination of colors represented by the numbers in the matrix format. A machine understands and manipulates the images based on those numbers only. RGB is a popular way of representation.

让我们看看上图的统计信息。

Let us see the statistical information of the above image.

from scipy import misc
face = misc.face(gray = False)
print face.mean(), face.max(), face.min()

上述程序将生成以下输出。

The above program will generate the following output.

110.16274388631184, 255, 0

现在,我们知道该图像由数字构成,因此数字值中的任何变化都会改变原始图像。让我们对该图像执行一些几何变换。基本几何操作为裁剪。

Now, we know that the image is made out of numbers, so any change in the value of the number alters the original image. Let us perform some geometric transformations on the image. The basic geometric operation is cropping

from scipy import misc
face = misc.face(gray = True)
lx, ly = face.shape
# Cropping
crop_face = face[lx / 4: - lx / 4, ly / 4: - ly / 4]
import matplotlib.pyplot as plt
plt.imshow(crop_face)
plt.show()

上述程序将生成以下输出。

The above program will generate the following output.

cropping operation image files

我们还可以执行一些基本操作,例如按照以下说明将图像倒置。

We can also perform some basic operations such as turning the image upside down as described below.

# up <-> down flip
from scipy import misc
face = misc.face()
flip_ud_face = np.flipud(face)

import matplotlib.pyplot as plt
plt.imshow(flip_ud_face)
plt.show()

上述程序将生成以下输出。

The above program will generate the following output.

image turning operation

除此之外,我们还有 rotate() function ,它会根据指定的角度旋转图像。

Besides this, we have the rotate() function, which rotates the image with a specified angle.

# rotation
from scipy import misc,ndimage
face = misc.face()
rotate_face = ndimage.rotate(face, 45)

import matplotlib.pyplot as plt
plt.imshow(rotate_face)
plt.show()

上述程序将生成以下输出。

The above program will generate the following output.

image rotation operation

Filters

让我们讨论过滤器如何帮助进行图像处理。

Let us discuss how filters help in image processing.

What is filtering in image processing?

过滤是一种修改或增强图像的技术。例如,你可以过滤图像以强调某些功能或删除其他功能。使用过滤实施的图像处理操作包括平滑、锐化和边缘增强。

Filtering is a technique for modifying or enhancing an image. For example, you can filter an image to emphasize certain features or remove other features. Image processing operations implemented with filtering include Smoothing, Sharpening, and Edge Enhancement.

过滤是一种邻域操作,其中输出图像中任何给定像素的值由应用于对应输入像素邻域中像素值的某种算法来确定。我们现在使用 SciPy ndimage 执行一些操作。

Filtering is a neighborhood operation, in which the value of any given pixel in the output image is determined by applying some algorithm to the values of the pixels in the neighborhood of the corresponding input pixel. Let us now perform a few operations using SciPy ndimage.

Blurring

模糊被广泛用于降低图像中的噪声。我们可以执行滤波操作并查看图像中的变化。我们不妨考虑以下示例。

Blurring is widely used to reduce the noise in the image. We can perform a filter operation and see the change in the image. Let us consider the following example.

from scipy import misc
face = misc.face()
blurred_face = ndimage.gaussian_filter(face, sigma=3)
import matplotlib.pyplot as plt
plt.imshow(blurred_face)
plt.show()

上述程序将生成以下输出。

The above program will generate the following output.

image blurring operation

西格玛值表示在五点量表上的模糊级别。我们可以通过调整西格玛值来查看图像质量的变化。有关模糊的更多详细信息,请点击 → DIP(数字图像处理)教程。

The sigma value indicates the level of blur on a scale of five. We can see the change on the image quality by tuning the sigma value. For more details of blurring, click on → DIP (Digital Image Processing) Tutorial.

Edge Detection

让我们讨论边缘检测如何帮助进行图像处理。

Let us discuss how edge detection helps in image processing.

What is Edge Detection?

边缘检测是一种图像处理技术,用于找到图像内对象的边界。它的工作原理是检测亮度的曲折。边缘检测用于图像处理、计算机视觉和机器视觉等领域中的图像分割和数据提取。

Edge detection is an image processing technique for finding the boundaries of objects within images. It works by detecting discontinuities in brightness. Edge detection is used for image segmentation and data extraction in areas such as Image Processing, Computer Vision and Machine Vision.

最常用的边缘检测算法包括

The most commonly used edge detection algorithms include

  1. Sobel

  2. Canny

  3. Prewitt

  4. Roberts

  5. Fuzzy Logic methods

让我们考虑以下示例。

Let us consider the following example.

import scipy.ndimage as nd
import numpy as np

im = np.zeros((256, 256))
im[64:-64, 64:-64] = 1
im[90:-90,90:-90] = 2
im = ndimage.gaussian_filter(im, 8)

import matplotlib.pyplot as plt
plt.imshow(im)
plt.show()

上述程序将生成以下输出。

The above program will generate the following output.

edge detection

图像看起来像一个方块的颜色块。现在,我们将检测这些彩色块的边缘。此处,ndimage 提供一个名为 Sobel 的函数来执行此操作。而 NumPy 提供 Hypot 函数将得到的两个矩阵组合成一个。

The image looks like a square block of colors. Now, we will detect the edges of those colored blocks. Here, ndimage provides a function called Sobel to carry out this operation. Whereas, NumPy provides the Hypot function to combine the two resultant matrices to one.

让我们考虑以下示例。

Let us consider the following example.

import scipy.ndimage as nd
import matplotlib.pyplot as plt

im = np.zeros((256, 256))
im[64:-64, 64:-64] = 1
im[90:-90,90:-90] = 2
im = ndimage.gaussian_filter(im, 8)

sx = ndimage.sobel(im, axis = 0, mode = 'constant')
sy = ndimage.sobel(im, axis = 1, mode = 'constant')
sob = np.hypot(sx, sy)

plt.imshow(sob)
plt.show()

上述程序将生成以下输出。

The above program will generate the following output.

edge detection 2

SciPy - Optimize

scipy.optimize package 提供了多种常用优化算法。此模块包含以下方面:

The scipy.optimize package provides several commonly used optimization algorithms. This module contains the following aspects −

  1. Unconstrained and constrained minimization of multivariate scalar functions (minimize()) using a variety of algorithms (e.g. BFGS, Nelder-Mead simplex, Newton Conjugate Gradient, COBYLA or SLSQP)

  2. Global (brute-force) optimization routines (e.g., anneal(), basinhopping())

  3. Least-squares minimization (leastsq()) and curve fitting (curve_fit()) algorithms

  4. Scalar univariate functions minimizers (minimize_scalar()) and root finders (newton())

  5. Multivariate equation system solvers (root()) using a variety of algorithms (e.g. hybrid Powell, Levenberg-Marquardt or large-scale methods such as Newton-Krylov)

Unconstrained & Constrained minimization of multivariate scalar functions

minimize() functionscipy.optimize 中多元标量函数的无约束和约束最小化算法提供了一个公共接口。为了演示最小化函数,考虑最小化 NN 变量的 Rosenbrock 函数的问题:

The minimize() function provides a common interface to unconstrained and constrained minimization algorithms for multivariate scalar functions in scipy.optimize. To demonstrate the minimization function, consider the problem of minimizing the Rosenbrock function of the NN variables −

f(x) = \sum_{i = 1}^{N-1} \:100(x_i - x_{i-1}^{2})

此函数的最小值为 0,当 xi = 1 时实现。

The minimum value of this function is 0, which is achieved when xi = 1.

Nelder–Mead Simplex Algorithm

在以下示例中,minimize() 例程与 Nelder-Mead simplex algorithm (method = 'Nelder-Mead') 一起使用(通过 method 参数选择)。我们来考虑以下示例。

In the following example, the minimize() routine is used with the Nelder-Mead simplex algorithm (method = 'Nelder-Mead') (selected through the method parameter). Let us consider the following example.

import numpy as np
from scipy.optimize import minimize

def rosen(x):

x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])
res = minimize(rosen, x0, method='nelder-mead')

print(res.x)

上述程序将生成以下输出。

The above program will generate the following output.

[7.93700741e+54  -5.41692163e+53  6.28769150e+53  1.38050484e+55  -4.14751333e+54]

单纯形算法可能是最小化行为良好的函数的最简单方法。它只需要函数评估,并且对于简单的最小化问题来说是一个不错的选择。但是,由于它不使用任何梯度评估,因此可能需要更长的时间才能找到最小值。

The simplex algorithm is probably the simplest way to minimize a fairly well-behaved function. It requires only function evaluations and is a good choice for simple minimization problems. However, because it does not use any gradient evaluations, it may take longer to find the minimum.

另一种只需要函数调用就可以找到最小值的优化算法是 Powell‘s method ,可以通过在 minimize() 函数中设置 method = 'powell' 来获得。

Another optimization algorithm that needs only function calls to find the minimum is the Powell‘s method, which is available by setting method = 'powell' in the minimize() function.

Least Squares

求解具有变量边界的非线性最小二乘问题。给定残差 f(x)(n 个实变量的 m 维实函数)和损失函数 rho(s)(标量函数),least_squares 找到代价函数 F(x) 的局部极小值。我们来考虑以下示例。

Solve a nonlinear least-squares problem with bounds on the variables. Given the residuals f(x) (an m-dimensional real function of n real variables) and the loss function rho(s) (a scalar function), least_squares find a local minimum of the cost function F(x). Let us consider the following example.

在此示例中,我们在没有独立变量边界的情况下找到 Rosenbrock 函数的最小值。

In this example, we find a minimum of the Rosenbrock function without bounds on the independent variables.

#Rosenbrock Function
def fun_rosenbrock(x):
   return np.array([10 * (x[1] - x[0]**2), (1 - x[0])])

from scipy.optimize import least_squares
input = np.array([2, 2])
res = least_squares(fun_rosenbrock, input)

print res

请注意,我们只提供残差向量。该算法将代价函数构建为残差的平方和,这给出了 Rosenbrock 函数。精确的最小值在 x = [1.0,1.0] 处。

Notice that, we only provide the vector of the residuals. The algorithm constructs the cost function as a sum of squares of the residuals, which gives the Rosenbrock function. The exact minimum is at x = [1.0,1.0].

上述程序将生成以下输出。

The above program will generate the following output.

active_mask: array([ 0., 0.])
      cost: 9.8669242910846867e-30
      fun: array([ 4.44089210e-15, 1.11022302e-16])
      grad: array([ -8.89288649e-14, 4.44089210e-14])
      jac: array([[-20.00000015,10.],[ -1.,0.]])
   message: '`gtol` termination condition is satisfied.'
      nfev: 3
      njev: 3
   optimality: 8.8928864934219529e-14
      status: 1
      success: True
         x: array([ 1., 1.])

Root finding

让我们了解根查找如何在 SciPy 中提供帮助。

Let us understand how root finding helps in SciPy.

Scalar functions

如果有一个单变量方程,有四种不同的根查找算法可以尝试。其中每种算法都需要预期中有根的间隔端点(因为函数改变符号)。一般而言, brentq 是最佳选择,但在某些情况下或出于学术目的,其他方法可能有用。

If one has a single-variable equation, there are four different root-finding algorithms, which can be tried. Each of these algorithms require the endpoints of an interval in which a root is expected (because the function changes signs). In general, brentq is the best choice, but the other methods may be useful in certain circumstances or for academic purposes.

Fixed-point solving

一个与找到函数的零点密切相关的问题是找到函数的一个不动点。函数的不动点是评估函数返回该点的点:g(x) = x。显然, gg 的不动点是 f(x) = g(x)−x 的根。等效, ff 的根是 g(x) = f(x)+x 的不动点。如果给出了一个起点,则 fixed_point 例程提供了一种简单迭代方法,使用 Aitkens sequence acceleration 来估计 gg 的不动点。

A problem closely related to finding the zeros of a function is the problem of finding a fixed point of a function. A fixed point of a function is the point at which evaluation of the function returns the point: g(x) = x. Clearly the fixed point of gg is the root of f(x) = g(x)−x. Equivalently, the root of ff is the fixed_point of g(x) = f(x)+x. The routine fixed_point provides a simple iterative method using the Aitkens sequence acceleration to estimate the fixed point of gg, if a starting point is given.

Sets of equations

可以使用 root() function 找出非线性方程组的根。有几种方法可用,其中 hybr (默认)和 lm 分别使用 MINPACK 中的 hybrid method of PowellLevenberg-Marquardt method

Finding a root of a set of non-linear equations can be achieved using the root() function. Several methods are available, amongst which hybr (the default) and lm, respectively use the hybrid method of Powell and the Levenberg-Marquardt method from the MINPACK.

以下示例考虑了单变量超越方程。

The following example considers the single-variable transcendental equation.

x2 + 2cos(x) = 0

x2 + 2cos(x) = 0

其一个根可如下找到——

A root of which can be found as follows −

import numpy as np
from scipy.optimize import root
def func(x):
   return x*2 + 2 * np.cos(x)
sol = root(func, 0.3)
print sol

上述程序将生成以下输出。

The above program will generate the following output.

fjac: array([[-1.]])
fun: array([ 2.22044605e-16])
message: 'The solution converged.'
   nfev: 10
   qtf: array([ -2.77644574e-12])
      r: array([-3.34722409])
   status: 1
   success: True
      x: array([-0.73908513])

SciPy - Stats

所有统计函数都位于子程序包 scipy.stats 中,使用 info(stats) 函数可以获得这些函数的相当完整的列表。统计子函数包中可用的随机变量的列表也可以从 docstring 得到。该模块包含大量概率分布以及越来越丰富的统计函数库。

All of the statistics functions are located in the sub-package scipy.stats and a fairly complete listing of these functions can be obtained using info(stats) function. A list of random variables available can also be obtained from the docstring for the stats sub-package. This module contains a large number of probability distributions as well as a growing library of statistical functions.

如以下表格所述,每个单变量分布都有自己从属的子类——

Each univariate distribution has its own subclass as described in the following table −

Sr. No.

Class & Description

1

rv_continuous A generic continuous random variable class meant for subclassing

2

rv_discrete A generic discrete random variable class meant for subclassing

3

rv_histogram Generates a distribution given by a histogram

Normal Continuous Random Variable

随机变量 X 可以获得任何值的概率分布是连续随机变量。位置(loc)关键字指定均值。 scale(scale)关键字指定标准差。

A probability distribution in which the random variable X can take any value is continuous random variable. The location (loc) keyword specifies the mean. The scale (scale) keyword specifies the standard deviation.

作为 rv_continuous 类的实例, norm 对象继承了它的通用方法集合,并用针对这种特殊分布的详细信息对此类方法进行了补充。

As an instance of the rv_continuous class, norm object inherits from it a collection of generic methods and completes them with details specific for this particular distribution.

若要计算一系列点的 CDF,我们可以传递一个列表或一个 NumPy 数组。让我们考虑一下以下示例。

To compute the CDF at a number of points, we can pass a list or a NumPy array. Let us consider the following example.

from scipy.stats import norm
import numpy as np
print norm.cdf(np.array([1,-1., 0, 1, 3, 4, -2, 6]))

上述程序将生成以下输出。

The above program will generate the following output.

array([ 0.84134475, 0.15865525, 0.5 , 0.84134475, 0.9986501 ,
0.99996833, 0.02275013, 1. ])

要找出分布的中位数,我们可以使用百分点函数 (PPF),它是 CDF 的逆函数。让我们通过以下示例来理解这一点。

To find the median of a distribution, we can use the Percent Point Function (PPF), which is the inverse of the CDF. Let us understand by using the following example.

from scipy.stats import norm
print norm.ppf(0.5)

上述程序将生成以下输出。

The above program will generate the following output.

0.0

要生成随机变量序列,我们应该使用 size 关键字参数,该参数在以下示例中所示。

To generate a sequence of random variates, we should use the size keyword argument, which is shown in the following example.

from scipy.stats import norm
print norm.rvs(size = 5)

上述程序将生成以下输出。

The above program will generate the following output.

array([ 0.20929928, -1.91049255, 0.41264672, -0.7135557 , -0.03833048])

以上输出不可复现。要生成相同的随机数,请使用 seed 函数。

The above output is not reproducible. To generate the same random numbers, use the seed function.

Uniform Distribution

可以使用 uniform 函数生成均匀分布。让我们考虑以下示例。

A uniform distribution can be generated using the uniform function. Let us consider the following example.

from scipy.stats import uniform
print uniform.cdf([0, 1, 2, 3, 4, 5], loc = 1, scale = 4)

上述程序将生成以下输出。

The above program will generate the following output.

array([ 0. , 0. , 0.25, 0.5 , 0.75, 1. ])

Build Discrete Distribution

让我们生成一个随机样本,并将观察到的频率与概率进行比较。

Let us generate a random sample and compare the observed frequencies with the probabilities.

Binomial Distribution

作为 rv_discrete class 的实例, binom object 从它那里继承了一系列通用方法,并用针对此特定分布的详细信息对它们进行了补充。让我们考虑以下示例。

As an instance of the rv_discrete class, the binom object inherits from it a collection of generic methods and completes them with details specific for this particular distribution. Let us consider the following example.

from scipy.stats import uniform
print uniform.cdf([0, 1, 2, 3, 4, 5], loc = 1, scale = 4)

上述程序将生成以下输出。

The above program will generate the following output.

array([ 0. , 0. , 0.25, 0.5 , 0.75, 1. ])

Descriptive Statistics

最小值、最大值、均值和方差等基本统计信息以 NumPy 数组作为输入,并返回相应的结果。 scipy.stats package 中可用的几个基本统计函数在以下表格中进行了解释。

The basic stats such as Min, Max, Mean and Variance takes the NumPy array as input and returns the respective results. A few basic statistical functions available in the scipy.stats package are described in the following table.

Sr. No.

Function & Description

1

describe() Computes several descriptive statistics of the passed array

2

gmean() Computes geometric mean along the specified axis

3

hmean() Calculates the harmonic mean along the specified axis

4

kurtosis() Computes the kurtosis

5

mode() Returns the modal value

6

skew() Tests the skewness of the data

7

f_oneway() Performs a 1-way ANOVA

8

iqr() Computes the interquartile range of the data along the specified axis

9

zscore() Calculates the z score of each value in the sample, relative to the sample mean and standard deviation

10

sem() Calculates the standard error of the mean (or standard error of measurement) of the values in the input array

其中几个函数在 scipy.stats.mstats 中有类似的版本,适用于屏蔽数组。让我们通过以下示例来理解这一点。

Several of these functions have a similar version in the scipy.stats.mstats, which work for masked arrays. Let us understand this with the example given below.

from scipy import stats
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9])
print x.max(),x.min(),x.mean(),x.var()

上述程序将生成以下输出。

The above program will generate the following output.

(9, 1, 5.0, 6.666666666666667)

T-test

让我们了解 T 检验如何在 SciPy 中有用。

Let us understand how T-test is useful in SciPy.

ttest_1samp

计算一组分数的均值的 T 检验。这是一个双边检验,用于检验一个独立的观察样本‘a’的预期值(均值)等于给定的总体均值 popmean 的原假设。让我们考虑以下示例。

Calculates the T-test for the mean of ONE group of scores. This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations ‘a’ is equal to the given population mean, popmean. Let us consider the following example.

from scipy import stats
rvs = stats.norm.rvs(loc = 5, scale = 10, size = (50,2))
print stats.ttest_1samp(rvs,5.0)

上述程序将生成以下输出。

The above program will generate the following output.

Ttest_1sampResult(statistic = array([-1.40184894, 2.70158009]),
pvalue = array([ 0.16726344, 0.00945234]))

Comparing two samples

在以下示例中,有两个样本,它们可以来自相同或不同分布,我们希望测试这些样本是否具有相同的统计特性。

In the following examples, there are two samples, which can come either from the same or from different distribution, and we want to test whether these samples have the same statistical properties.

ttest_ind − 计算两个独立分数样本均值的 T 检验。这是一个双侧检验,用于检验两个独立样本具有相同的平均(预期)值的原假设。此检验默认情况下假设总体具有相同的方差。

ttest_ind − Calculates the T-test for the means of two independent samples of scores. This is a two-sided test for the null hypothesis that two independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.

如果我们观察到来自相同或不同总体中的两个独立样本,我们可以使用此检验。让我们考虑以下示例。

We can use this test, if we observe two independent samples from the same or different population. Let us consider the following example.

from scipy import stats
rvs1 = stats.norm.rvs(loc = 5,scale = 10,size = 500)
rvs2 = stats.norm.rvs(loc = 5,scale = 10,size = 500)
print stats.ttest_ind(rvs1,rvs2)

上述程序将生成以下输出。

The above program will generate the following output.

Ttest_indResult(statistic = -0.67406312233650278, pvalue = 0.50042727502272966)

您可以使用长度相同但具有不同均值的新数组来测试相同的内容。在 loc 中使用不同的值并测试相同的内容。

You can test the same with a new array of the same length, but with a varied mean. Use a different value in loc and test the same.

SciPy - CSGraph

CSGraph 表示 Compressed Sparse Graph ,专注于基于稀疏矩阵表示的快速图算法。

CSGraph stands for Compressed Sparse Graph, which focuses on Fast graph algorithms based on sparse matrix representations.

Graph Representations

首先,我们了解一下稀疏图是什么,以及它在图表示中如何提供帮助。

To begin with, let us understand what a sparse graph is and how it helps in graph representations.

What exactly is a sparse graph?

图就是节点的集合,在节点之间有链接。图可以表示几乎所有内容,例如社交网络连接,其中每个节点都是一个人,并与熟人相连;图像,其中每个节点都是一个像素并与相邻像素相连;高维分布中的点,其中每个节点都与它最近的邻居相连;以及你能想象到的实际上任何其他东西。

A graph is just a collection of nodes, which have links between them. Graphs can represent nearly anything − social network connections, where each node is a person and is connected to acquaintances; images, where each node is a pixel and is connected to neighboring pixels; points in a high-dimensional distribution, where each node is connected to its nearest neighbors; and practically anything else you can imagine.

用稀疏矩阵表示图数据是一种非常有效的方法:让我们称之为 G。矩阵 G 的大小为 N x N,G[i, j] 给出了节点“i”和节点“j”之间连接的值。稀疏图主要包含零 − 也就是说,大多数节点只有少数连接。此特性在大多数情况下都是成立的。

One very efficient way to represent graph data is in a sparse matrix: let us call it G. The matrix G is of size N x N, and G[i, j] gives the value of the connection between node ‘i' and node ‘j’. A sparse graph contains mostly zeros − that is, most nodes have only a few connections. This property turns out to be true in most cases of interest.

创建稀疏图子模块的动机来自 scikit-learn 中使用的几种算法,包括以下算法 −

The creation of the sparse graph submodule was motivated by several algorithms used in scikit-learn that included the following −

  1. Isomap − A manifold learning algorithm, which requires finding the shortest paths in a graph.

  2. Hierarchical clustering − A clustering algorithm based on a minimum spanning tree.

  3. Spectral Decomposition − A projection algorithm based on sparse graph laplacians.

举个具体的例子,假设我们要表示以下无向图 −

As a concrete example, imagine that we would like to represent the following undirected graph −

undirected graph

此图有三个节点,其中节点 0 和 1 由权重为 2 的边连接,节点 0 和 2 由权重为 1 的边连接。我们可以构造稠密、掩码和稀疏表示,如以下示例所示,同时谨记无向图由对称矩阵表示。

This graph has three nodes, where node 0 and 1 are connected by an edge of weight 2, and nodes 0 and 2 are connected by an edge of weight 1. We can construct the dense, masked and sparse representations as shown in the following example, keeping in mind that an undirected graph is represented by a symmetric matrix.

G_dense = np.array([ [0, 2, 1],
                     [2, 0, 0],
                     [1, 0, 0] ])

G_masked = np.ma.masked_values(G_dense, 0)
from scipy.sparse import csr_matrix

G_sparse = csr_matrix(G_dense)
print G_sparse.data

上述程序将生成以下输出。

The above program will generate the following output.

array([2, 1, 2, 1])
undirected graph using symmetric matrix

这与之前的图相同,只是节点 0 和 2 由权重为 0 的边连接。在这种情况下,以上的稠密表示会导致歧义 − 如果 0 是一个有意义的值,那么如何表示非边。在这种情况下,必须使用掩码或稀疏表示来消除歧义。

This is identical to the previous graph, except nodes 0 and 2 are connected by an edge of zero weight. In this case, the dense representation above leads to ambiguities − how can non-edges be represented, if zero is a meaningful value. In this case, either a masked or a sparse representation must be used to eliminate the ambiguity.

让我们考虑以下示例。

Let us consider the following example.

from scipy.sparse.csgraph import csgraph_from_dense
G2_data = np.array
([
   [np.inf, 2, 0 ],
   [2, np.inf, np.inf],
   [0, np.inf, np.inf]
])
G2_sparse = csgraph_from_dense(G2_data, null_value=np.inf)
print G2_sparse.data

上述程序将生成以下输出。

The above program will generate the following output.

array([ 2., 0., 2., 0.])

Word ladders using sparse graphs

词梯是由刘易斯·卡罗尔发明的游戏,在游戏中,单词通过每一步更改一个字母而链接在一起。例如 −

Word ladders is a game invented by Lewis Carroll, in which words are linked by changing a single letter at each step. For example −

APE → APT → AIT → BIT → BIG → BAG → MAG → MAN

APE → APT → AIT → BIT → BIG → BAG → MAG → MAN

在这里,我们在七个步骤中从“APE”走到了“MAN”,每次更改一个字母。问题是 - 我们可以使用相同的规则在这些单词之间找到更短的路径吗?这个问题自然表示为稀疏图问题。节点将对应于各个单词,我们将在相差最多一个字母的单词之间创建连接。

Here, we have gone from "APE" to "MAN" in seven steps, changing one letter each time. The question is - Can we find a shorter path between these words using the same rule? This problem is naturally expressed as a sparse graph problem. The nodes will correspond to individual words, and we will create connections between words that differ by at the most – one letter.

Obtaining a List of Words

当然,首先,我们必须获得一个有效单词列表。我在使用 Mac,而 Mac 在以下代码块中给出的位置有一个单词词典。如果你的架构不同,你可能不得不搜索一番才能找到你的系统词典。

First, of course, we must obtain a list of valid words. I am running Mac, and Mac has a word dictionary at the location given in the following code block. If you are on a different architecture, you may have to search a bit to find your system dictionary.

wordlist = open('/usr/share/dict/words').read().split()
print len(wordlist)

上述程序将生成以下输出。

The above program will generate the following output.

235886

我们现在想查看长度为 3 的单词,所以让我们只选择长度正确的那些单词。我们还将剔除以大写字母(专有名词)开头或包含非字母数字字符(例如撇号和连字符)的单词。最后,我们将确保一切都在小写中以备以后比较。

We now want to look at words of length 3, so let us select just those words of the correct length. We will also eliminate words, which start with upper case (proper nouns) or contain non-alpha-numeric characters such as apostrophes and hyphens. Finally, we will make sure everything is in lower case for a comparison later on.

word_list = [word for word in word_list if len(word) == 3]
word_list = [word for word in word_list if word[0].islower()]
word_list = [word for word in word_list if word.isalpha()]
word_list = map(str.lower, word_list)
print len(word_list)

上述程序将生成以下输出。

The above program will generate the following output.

1135

现在,我们有一个 1135 个有效的三字母单词列表(确切数量可能会因所使用的特定列表而异)。这些单词中的每一个都将成为我们图中的一个节点,我们将创建连接节点的边,每个节点都与另一对单词相关联,这些单词仅相差一个字母。

Now, we have a list of 1135 valid three-letter words (the exact number may change depending on the particular list used). Each of these words will become a node in our graph, and we will create edges connecting the nodes associated with each pair of words, which differs by only one letter.

import numpy as np
word_list = np.asarray(word_list)

word_list.dtype
word_list.sort()

word_bytes = np.ndarray((word_list.size, word_list.itemsize),
   dtype = 'int8',
   buffer = word_list.data)
print word_bytes.shape

上述程序将生成以下输出。

The above program will generate the following output.

(1135, 3)

我们将使用各点之间的汉明距离来确定哪些词对是相连的。汉明距离测量两个向量之间哪些条目不同:汉明距离等于 1/N1/N的两个单词相连,其中 NN 是该单词中相连的字母的数量。

We will use the Hamming distance between each point to determine, which pairs of words are connected. The Hamming distance measures the fraction of entries between two vectors, which differ: any two words with a hamming distance equal to 1/N1/N, where NN is the number of letters, which are connected in the word ladder.

from scipy.spatial.distance import pdist, squareform
from scipy.sparse import csr_matrix
hamming_dist = pdist(word_bytes, metric = 'hamming')
graph = csr_matrix(squareform(hamming_dist < 1.5 / word_list.itemsize))

在比较距离时,我们不使用相等,因为这对于浮点值是不稳定的。只要单词列表的没有两个条目相同,不等式产生所需的结果。现在,我们的图已经建立好,我们将使用最短路径搜索在图中任意两个单词之间找到路径。

When comparing the distances, we do not use equality because this can be unstable for floating point values. The inequality produces the desired result as long as no two entries of the word list are identical. Now, that our graph is set up, we will use the shortest path search to find the path between any two words in the graph.

i1 = word_list.searchsorted('ape')
i2 = word_list.searchsorted('man')
print word_list[i1],word_list[i2]

上述程序将生成以下输出。

The above program will generate the following output.

ape, man

我们需要检查它们是否匹配,因为如果单词没有在列表中,输出中将会有一个错误。现在,我们需要做的就是在这个图中这两个索引之间找到最短路径。我们将使用 dijkstra’s 算法,因为它允许我们仅为一个节点找到路径。

We need to check that these match, because if the words are not in the list there will be an error in the output. Now, all we need is to find the shortest path between these two indices in the graph. We will use dijkstra’s algorithm, because it allows us to find the path for just one node.

from scipy.sparse.csgraph import dijkstra
distances, predecessors = dijkstra(graph, indices = i1, return_predecessors = True)
print distances[i2]

上述程序将生成以下输出。

The above program will generate the following output.

5.0

因此,我们看到“ape”和“man”之间的最短路径仅包含五个步骤。我们可以使用该算法返回的前驱来重建该路径。

Thus, we see that the shortest path between ‘ape’ and ‘man’ contains only five steps. We can use the predecessors returned by the algorithm to reconstruct this path.

path = []
i = i2

while i != i1:
   path.append(word_list[i])
   i = predecessors[i]

path.append(word_list[i1])
print path[::-1]i2]

上述程序将生成以下输出。

The above program will generate the following output.

['ape', 'ope', 'opt', 'oat', 'mat', 'man']

SciPy - Spatial

scipy.spatial package 可以利用 Qhull library 来计算一组点的三角剖分、Voronoi 图和凸包。此外,它还包含 KDTree implementations ,用于最近邻点查询和在各种度量中进行距离计算的实用程序。

The scipy.spatial package can compute Triangulations, Voronoi Diagrams and Convex Hulls of a set of points, by leveraging the Qhull library. Moreover, it contains KDTree implementations for nearest-neighbor point queries and utilities for distance computations in various metrics.

Delaunay Triangulations

让我们了解一下什么是 Delaunay 三角剖分,以及它们如何在 SciPy 中使用。

Let us understand what Delaunay Triangulations are and how they are used in SciPy.

What are Delaunay Triangulations?

在数学和计算几何中,对于平面中的一组离散点 P ,Delaunay 三角剖分是一个三角剖分 DT(P) ,使得 P 中的任何点都不在 DT(P) 中的任何三角形的圆内。

In mathematics and computational geometry, a Delaunay triangulation for a given set P of discrete points in a plane is a triangulation DT(P) such that no point in P is inside the circumcircle of any triangle in DT(P).

我们可以通过 SciPy 计算出相同的内容。我们考虑以下示例。

We can the compute the same through SciPy. Let us consider the following example.

from scipy.spatial import Delaunay
points = np.array([[0, 4], [2, 1.1], [1, 3], [1, 2]])
tri = Delaunay(points)
import matplotlib.pyplot as plt
plt.triplot(points[:,0], points[:,1], tri.simplices.copy())
plt.plot(points[:,0], points[:,1], 'o')
plt.show()

上述程序将生成以下输出。

The above program will generate the following output.

delaunay triangulations

Coplanar Points

让我们了解共面点是什么以及它们如何在 SciPy 中使用。

Let us understand what Coplanar Points are and how they are used in SciPy.

What are Coplanar Points?

共面点是位于同一平面的三个或更多点。回忆一下,平面是一个平坦的表面,它无限地向各个方向延伸。在数学教科书中,通常将它表示为一个四边形。

Coplanar points are three or more points that lie in the same plane. Recall that a plane is a flat surface, which extends without end in all directions. It is usually shown in math textbooks as a four-sided figure.

让我们看看如何使用 SciPy 找到这一点。我们考虑以下示例。

Let us see how we can find this using SciPy. Let us consider the following example.

from scipy.spatial import Delaunay
points = np.array([[0, 0], [0, 1], [1, 0], [1, 1], [1, 1]])
tri = Delaunay(points)
print tri.coplanar

上述程序将生成以下输出。

The above program will generate the following output.

array([[4, 0, 3]], dtype = int32)

这意味着点 4 位于三角形 0 和顶点 3 附近,但未包含在三角剖分中。

This means that point 4 resides near triangle 0 and vertex 3, but is not included in the triangulation.

Convex hulls

让我们了解什么是凸包以及它们如何在 SciPy 中使用。

Let us understand what convex hulls are and how they are used in SciPy.

What are Convex Hulls?

在数学中,欧几里得平面或欧几里得空间(或更普遍地,实数上的仿射空间)中的一组点 X 的 convex hullconvex envelope 是包含 X 的最小 convex set

In mathematics, the convex hull or convex envelope of a set of points X in the Euclidean plane or in a Euclidean space (or, more generally, in an affine space over the reals) is the smallest convex set that contains X.

让我们考虑以下示例来详细理解它。

Let us consider the following example to understand it in detail.

from scipy.spatial import ConvexHull
points = np.random.rand(10, 2) # 30 random points in 2-D
hull = ConvexHull(points)
import matplotlib.pyplot as plt
plt.plot(points[:,0], points[:,1], 'o')
for simplex in hull.simplices:
plt.plot(points[simplex,0], points[simplex,1], 'k-')
plt.show()

上述程序将生成以下输出。

The above program will generate the following output.

convex hulls

SciPy - ODR

ODR 代表 Orthogonal Distance Regression ,用于回归研究。基本的线性回归通常用于通过在图表上绘制最佳拟合线来估计两个变量 yx 之间的关系。

ODR stands for Orthogonal Distance Regression, which is used in the regression studies. Basic linear regression is often used to estimate the relationship between the two variables y and x by drawing the line of best fit on the graph.

为此使用的数学方法称为 Least Squares ,其目的是最小化每个点的平方误差之和。这里关键的问题是,如何计算每个点的误差(也称为残差)?

The mathematical method that is used for this is known as Least Squares, and aims to minimize the sum of the squared error for each point. The key question here is how do you calculate the error (also known as the residual) for each point?

在标准线性回归中,目标是从 X 值预测 Y 值,因此明智的做法是计算 Y 值的误差(在下图中显示为灰色线)。然而,有时考虑 X 和 Y 中的误差更明智(如下面的图像中的虚线红线所示)。

In a standard linear regression, the aim is to predict the Y value from the X value – so the sensible thing to do is to calculate the error in the Y values (shown as the gray lines in the following image). However, sometimes it is more sensible to take into account the error in both X and Y (as shown by the dotted red lines in the following image).

例如 − 当您知道您的 X 测量值不确定,或者当您不想关注一个变量相对于另一个变量的误差时。

For example − When you know your measurements of X are uncertain, or when you do not want to focus on the errors of one variable over another.

orthogonal distance linear regression

正交距离回归(ODR)是一种可以做到这一点的方法(在这种上下文中,正交表示垂直 – 因此它计算垂直于线的误差,而不是只是“垂直”的误差)。

Orthogonal Distance Regression (ODR) is a method that can do this (orthogonal in this context means perpendicular – so it calculates errors perpendicular to the line, rather than just ‘vertically’).

scipy.odr Implementation for Univariate Regression

以下示例演示了 scipy.odr 在单变量回归中的实现。

The following example demonstrates scipy.odr implementation for univariate regression.

import numpy as np
import matplotlib.pyplot as plt
from scipy.odr import *
import random

# Initiate some data, giving some randomness using random.random().
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([i**2 + random.random() for i in x])

# Define a function (quadratic in our case) to fit the data with.
def linear_func(p, x):
   m, c = p
   return m*x + c

# Create a model for fitting.
linear_model = Model(linear_func)

# Create a RealData object using our initiated data from above.
data = RealData(x, y)

# Set up ODR with the model and data.
odr = ODR(data, linear_model, beta0=[0., 1.])

# Run the regression.
out = odr.run()

# Use the in-built pprint method to give us results.
out.pprint()

上述程序将生成以下输出。

The above program will generate the following output.

Beta: [ 5.51846098 -4.25744878]
Beta Std Error: [ 0.7786442 2.33126407]

Beta Covariance: [
   [ 1.93150969 -4.82877433]
   [ -4.82877433 17.31417201
]]

Residual Variance: 0.313892697582
Inverse Condition #: 0.146618499389
Reason(s) for Halting:
   Sum of squares convergence

SciPy - Special Package

特殊包中提供的函数是通用函数,遵循广播和自动数组循环。

The functions available in the special package are universal functions, which follow broadcasting and automatic array looping.

让我们看看一些最常用的特殊函数 −

Let us look at some of the most frequently used special functions −

  1. Cubic Root Function

  2. Exponential Function

  3. Relative Error Exponential Function

  4. Log Sum Exponential Function

  5. Lambert Function

  6. Permutations and Combinations Function

  7. Gamma Function

我们现在简要了解一下这些函数中的每一个函数。

Let us now understand each of these functions in brief.

Cubic Root Function

此三次方根函数的语法为 – scipy.special.cbrt(x)。这将获取 x 的按元素立方根。

The syntax of this cubic root function is – scipy.special.cbrt(x). This will fetch the element-wise cube root of x.

让我们考虑以下示例。

Let us consider the following example.

from scipy.special import cbrt
res = cbrt([10, 9, 0.1254, 234])
print res

上述程序将生成以下输出。

The above program will generate the following output.

[ 2.15443469 2.08008382 0.50053277 6.16224015]

Exponential Function

指数函数的语法为 – scipy.special.exp10(x)。这将按元素计算 10**x。

The syntax of the exponential function is – scipy.special.exp10(x). This will compute 10**x element wise.

让我们考虑以下示例。

Let us consider the following example.

from scipy.special import exp10
res = exp10([2, 9])
print res

上述程序将生成以下输出。

The above program will generate the following output.

[1.00000000e+02  1.00000000e+09]

Relative Error Exponential Function

此函数的语法为 – scipy.special.exprel(x)。它生成相对误差指数,(exp(x) - 1)/x。

The syntax for this function is – scipy.special.exprel(x). It generates the relative error exponential, (exp(x) - 1)/x.

x 接近零时,exp(x) 接近 1,因此 exp(x) - 1 的数值计算会遭受精度灾难性损失。然后实现 exprel(x) 以避免当 x 接近零时发生的精度损失。

When x is near zero, exp(x) is near 1, so the numerical calculation of exp(x) - 1 can suffer from catastrophic loss of precision. Then exprel(x) is implemented to avoid the loss of precision, which occurs when x is near zero.

让我们考虑以下示例。

Let us consider the following example.

from scipy.special import exprel
res = exprel([-0.25, -0.1, 0, 0.1, 0.25])
print res

上述程序将生成以下输出。

The above program will generate the following output.

[0.88479687 0.95162582 1.   1.05170918 1.13610167]

Log Sum Exponential Function

此函数的语法为 – scipy.special.logsumexp(x)。它有助于计算输入元素指数之和的对数。

The syntax for this function is – scipy.special.logsumexp(x). It helps to compute the log of the sum of exponentials of input elements.

让我们考虑以下示例。

Let us consider the following example.

from scipy.special import logsumexp
import numpy as np
a = np.arange(10)
res = logsumexp(a)
print res

上述程序将生成以下输出。

The above program will generate the following output.

9.45862974443

Lambert Function

此函数的语法为 – scipy.special.lambertw(x)。它也被称为 Lambert W 函数。Lambert W 函数 W(z) 定义为 w * exp(w) 的反函数。换句话说,对于任何复数 z,W(z) 的值满足 z = W(z) * exp(W(z))。

The syntax for this function is – scipy.special.lambertw(x). It is also called as the Lambert W function. The Lambert W function W(z) is defined as the inverse function of w * exp(w). In other words, the value of W(z) is such that z = W(z) * exp(W(z)) for any complex number z.

Lambert W 函数是一个多分支函数,具有无限多个分支。每个分支给出方程 z = w exp(w) 的一个解。此处,分支由整数 k 索引。

The Lambert W function is a multivalued function with infinitely many branches. Each branch gives a separate solution of the equation z = w exp(w). Here, the branches are indexed by the integer k.

我们考虑以下示例。这里,Lambert W 函数是 w exp(w) 的反函数。

Let us consider the following example. Here, the Lambert W function is the inverse of w exp(w).

from scipy.special import lambertw
w = lambertw(1)
print w
print w * np.exp(w)

上述程序将生成以下输出。

The above program will generate the following output.

(0.56714329041+0j)
(1+0j)

Permutations & Combinations

让我们先讨论排列和组合,以便能清楚地理解它们。

Let us discuss permutations and combinations separately for understanding them clearly.

Combinations − 组合函数的语法是 - scipy.special.comb(N,k)。让我们考虑以下示例 -

Combinations − The syntax for combinations function is – scipy.special.comb(N,k). Let us consider the following example −

from scipy.special import comb
res = comb(10, 3, exact = False,repetition=True)
print res

上述程序将生成以下输出。

The above program will generate the following output.

220.0

Note − 仅在 exact = False 的情况下接受数组参数。如果 k > N、N < 0 或 k < 0,则返回 0。

Note − Array arguments are accepted only for exact = False case. If k > N, N < 0, or k < 0, then a 0 is returned.

Permutations − 排列函数的语法是 - scipy.special.perm(N,k)。一次对 N 个对象进行 k 个排列,即 N 的 k 个排列。这也称为“部分排列”。

Permutations − The syntax for combinations function is – scipy.special.perm(N,k). Permutations of N things taken k at a time, i.e., k-permutations of N. This is also known as “partial permutations”.

让我们考虑以下示例。

Let us consider the following example.

from scipy.special import perm
res = perm(10, 3, exact = True)
print res

上述程序将生成以下输出。

The above program will generate the following output.

720

Gamma Function

伽马函数通常被称为广义阶乘,因为 z*gamma(z) = gamma(z+1) 并且 gamma(n+1) = n!,其中“n”是自然数。

The gamma function is often referred to as the generalized factorial since z*gamma(z) = gamma(z+1) and gamma(n+1) = n!, for a natural number ‘n’.

组合函数的语法是 - scipy.special.gamma(x)。一次对 N 个对象进行 k 个排列,即 N 的 k 个排列。这也称为“部分排列”。

The syntax for combinations function is – scipy.special.gamma(x). Permutations of N things taken k at a time, i.e., k-permutations of N. This is also known as “partial permutations”.

组合函数的语法是 - scipy.special.gamma(x)。一次对 N 个对象进行 k 个排列,即 N 的 k 个排列。这也称为“部分排列”。

The syntax for combinations function is – scipy.special.gamma(x). Permutations of N things taken k at a time, i.e., k-permutations of N. This is also known as “partial permutations”.

from scipy.special import gamma
res = gamma([0, 0.5, 1, 5])
print res

上述程序将生成以下输出。

The above program will generate the following output.

[inf  1.77245385  1.  24.]