Concurrency In Python 简明教程

Concurrency in Python - Introduction

在本章中,我们将了解 Python 中并发的概念,并了解不同的线程和进程。

In this chapter, we will understand the concept of concurrency in Python and learn about the different threads and processes.

What is Concurrency?

简单来说,并发就是两个或更多事件在同一时刻发生。并发是一种自然现象,因为许多事件在任何给定时间同时发生。

In simple words, concurrency is the occurrence of two or more events at the same time. Concurrency is a natural phenomenon because many events occur simultaneously at any given time.

在编程方面,并发是指执行中两个任务重叠。通过并行编程,我们可以提升应用程序和软件系统的性能,因为我们可以同时处理请求,而不是等待前一个请求完成。

In terms of programming, concurrency is when two tasks overlap in execution. With concurrent programming, the performance of our applications and software systems can be improved because we can concurrently deal with the requests rather than waiting for a previous one to be completed.

Historical Review of Concurrency

以下要点将为我们提供并发的简要历史回顾:

Following points will give us the brief historical review of concurrency −

From the concept of railroads

并发与铁路概念密切相关。随着铁路的发展,需要在同一铁路系统上处理多列火车,以便每列火车都能够安全到达目的地。

Concurrency is closely related with the concept of railroads. With the railroads, there was a need to handle multiple trains on the same railroad system in such a way that every train would get to its destination safely.

Concurrent computing in academia

计算机科学界开始对并发感兴趣始于 Edsger W. Dijkstra 于 1965 年发表的研究论文。在本文中,他发现了并发控制的属性——互斥问题的解决办法。

The interest in computer science concurrency began with the research paper published by Edsger W. Dijkstra in 1965. In this paper, he identified and solved the problem of mutual exclusion, the property of concurrency control.

High-level concurrency primitives

近年来,由于高级并发原语的引入,程序员正在获得改进的并发解决方案。

In recent times, programmers are getting improved concurrent solutions because of the introduction of high-level concurrency primitives.

Improved concurrency with programming languages

谷歌的 Golang、Rust 和 Python 等编程语言在帮助我们获得更好的并发解决方案的领域取得了令人难以置信的发展。

Programming languages such as Google’s Golang, Rust and Python have made incredible developments in areas which help us get better concurrent solutions.

What is thread & multithreading?

Thread 是操作系统中可以执行的最小执行单元。它本身不是程序,而是运行在程序中。换句话说,线程并不彼此独立。每个线程都与其他线程共享代码段、数据段等。它们也称为轻量级进程。

Thread is the smallest unit of execution that can be performed in an operating system. It is not itself a program but runs within a program. In other words, threads are not independent of one other. Each thread shares code section, data section, etc. with other threads. They are also known as lightweight processes.

一个线程包含以下组件:

A thread consists of the following components −

  1. Program counter which consist of the address of the next executable instruction

  2. Stack

  3. Set of registers

  4. A unique id

Multithreading 在另一方面,是 CPU 通过并发执行多个线程来管理操作系统使用的能力。多线程的主要目的是通过将一个进程划分为多个线程来实现并行。多线程的概念可以在下例的帮助下理解。

Multithreading, on the other hand, is the ability of a CPU to manage the use of operating system by executing multiple threads concurrently. The main idea of multithreading is to achieve parallelism by dividing a process into multiple threads. The concept of multithreading can be understood with the help of the following example.

Example

假设我们运行一个特定进程,其中我们打开 MS Word 并向其中输入内容。一个线程将被分配来打开 MS Word,另一个线程将被要求在其中输入内容。现在,如果我们想要编辑现有内容,那么将需要另一个线程来执行编辑任务,依此类推。

Suppose we are running a particular process wherein we open MS Word to type content into it. One thread will be assigned to open MS Word and another thread will be required to type content in it. And now, if we want to edit the existing then another thread will be required to do the editing task and so on.

What is process & multiprocessing?

*进程*被定义为一个实体,它表示在系统中要实现的基本工作单元。简单来说,我们用文本文件编写我们的计算机程序,当我们执行这个程序时,它会变成一个进程,执行程序中提到的所有任务。在进程生命周期中,它会经过不同的阶段——开始、就绪、运行、等待和终止。

A*process*is defined as an entity, which represents the basic unit of work to be implemented in the system. To put it in simple terms, we write our computer programs in a text file and when we execute this program, it becomes a process that performs all the tasks mentioned in the program. During the process life cycle, it passes through different stages – Start, Ready, Running, Waiting and Terminating.

下图展示了进程的不同阶段 −

Following diagram shows the different stages of a process −

multiprocessing

一个进程可以只有一个线程(称为主线程),或多个线程,它们有自己的一组寄存器、程序计数器和堆栈。下图将向我们展示它们的区别 −

A process can have only one thread, called primary thread, or multiple threads having their own set of registers, program counter and stack. Following diagram will show us the difference −

multiprocessing one

Multiprocessing, 在另一方面,是在单个计算机系统中使用两个或更多个 CPU 单元。我们的主要目标是从我们的硬件中获得全部潜力。为了实现这一点,我们需要利用我们的计算机系统中可用的 CPU 核心数。多处理是实现这一点的最佳方法。

Multiprocessing, on the other hand, is the use of two or more CPUs units within a single computer system. Our primary goal is to get the full potential from our hardware. To achieve this, we need to utilize full number of CPU cores available in our computer system. Multiprocessing is the best approach to do so.

multiprocessing two

Python 是最流行的编程语言之一。下面是一些使其适合并发应用程序的原因 −

Python is one of the most popular programming languages. Followings are some reasons that make it suitable for concurrent applications −

Syntactic sugar

语法糖是编程语言中设计为使事情更易于阅读或表达的语法。它使该语言对人类使用“更甜美”:事情可以表达得更清楚、更简洁,或者基于偏好以备选风格表达。Python 附带了 Magic 方法,可以定义这些方法来对对象执行操作。这些 Magic 方法用作语法糖并绑定到更易于理解的关键字。

Syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express. It makes the language “sweeter” for human use: things can be expressed more clearly, more concisely, or in an alternative style based on preference. Python comes with Magic methods, which can be defined to act on objects. These Magic methods are used as syntactic sugar and bound to more easy-to-understand keywords.

Large Community

Python 语言在数据科学家和数学家之间得到了广泛采用,他们从事 AI、机器学习、深度学习和定量分析领域的工作。

Python language has witnessed a massive adoption rate amongst data scientists and mathematicians, working in the field of AI, machine learning, deep learning and quantitative analysis.

Useful APIs for concurrent programming

Python 2 和 3 拥有大量专门用于并行/并发编程的 API。其中最流行的是 threading, concurrent.features, multiprocessing, asyncio, gevent and greenlets, 等。

Python 2 and 3 have large number of APIs dedicated for parallel/concurrent programming. Most popular of them are threading, concurrent.features, multiprocessing, asyncio, gevent and greenlets, etc.

Limitations of Python in implementing concurrent applications

Python 对于并发应用程序有局限性。这种限制被称为 GIL (Global Interpreter Lock) 存在于 Python 中。GIL 永远不允许我们利用 CPU 的多个核心,因此我们可以说 Python 中没有真正的线程。我们可以这样理解 GIL 的概念 −

Python comes with a limitation for concurrent applications. This limitation is called GIL (Global Interpreter Lock) is present within Python. GIL never allows us to utilize multiple cores of CPU and hence we can say that there are no true threads in Python. We can understand the concept of GIL as follows −

GIL (Global Interpreter Lock)

它是 Python 世界中最有争议的话题之一。在 CPython 中,GIL 是互斥锁——互斥锁,它使事物线程安全。换句话说,我们可以说 GIL 阻止多个线程并行执行 Python 代码。一次只能由一个线程保持锁,如果我们想要执行一个线程,那么它必须首先获取锁。下图将帮助您理解 GIL 的工作原理。

It is one of the most controversial topics in the Python world. In CPython, GIL is the mutex - the mutual exclusion lock, which makes things thread safe. In other words, we can say that GIL prevents multiple threads from executing Python code in parallel. The lock can be held by only one thread at a time and if we want to execute a thread then it must acquire the lock first. The diagram shown below will help you understand the working of GIL.

limitations

然而,Python 中有一些库和实现,例如 Numpy, JpythonIronPytbhon. 这些库可以不与 GIL 发生任何交互地工作。

However, there are some libraries and implementations in Python such as Numpy, Jpython and IronPytbhon. These libraries work without any interaction with GIL.