Concurrency In Python 简明教程

Concurrency vs Parallelism

并发性和并行性都用于多线程程序,但对于它们之间的相似性和差异存在很多混淆。这方面的重大问题是:并发性是并行性还是不是?尽管这两个术语看起来非常相似,但对上述问题的答案是否定的,并发性和并行性并不相同。现在,如果它们不相同,那么它们之间的基本区别是什么?

Both concurrency and parallelism are used in relation to multithreaded programs but there is a lot of confusion about the similarity and difference between them. The big question in this regard: is concurrency parallelism or not? Although both the terms appear quite similar but the answer to the above question is NO, concurrency and parallelism are not same. Now, if they are not same then what is the basic difference between them?

简单来说,并发性处理从不同线程访问共享状态,而并行性处理利用多个 CPU 或其内核来提高硬件性能。

In simple terms, concurrency deals with managing the access to shared state from different threads and on the other side, parallelism deals with utilizing multiple CPUs or its cores to improve the performance of hardware.

Concurrency in Detail

并发性是指两个任务在执行过程中重叠。这可能是指应用程序同时处理多个任务的情况。我们可以通过图表来理解它;多个任务同时进行,如下所示:

Concurrency is when two tasks overlap in execution. It could be a situation where an application is progressing on more than one task at the same time. We can understand it diagrammatically; multiple tasks are making progress at the same time, as follows −

concurrency

Levels of Concurrency

在本节中,我们将讨论编程中并发性的三个重要级别:

In this section, we will discuss the three important levels of concurrency in terms of programming −

Low-Level Concurrency

在这个级别的并发性中,原子操作的使用是显式的。我们不能将这种并发性用于应用程序构建,因为它非常容易出错且难以调试。即使 Python 也不支持这种并发性。

In this level of concurrency, there is explicit use of atomic operations. We cannot use such kind of concurrency for application building, as it is very error-prone and difficult to debug. Even Python does not support such kind of concurrency.

Mid-Level Concurrency

在这个并发性中,不使用显式原子操作。它使用显式锁。Python 和其他编程语言支持这种并发性。大多数应用程序程序员都会使用这种并发性。

In this concurrency, there is no use of explicit atomic operations. It uses the explicit locks. Python and other programming languages support such kind of concurrency. Mostly application programmers use this concurrency.

High-Level Concurrency

在这个并发性中,不使用显式原子操作,也不使用显式锁。Python 有 concurrent.futures 模块来支持这种并发性。

In this concurrency, neither explicit atomic operations nor explicit locks are used. Python has concurrent.futures module to support such kind of concurrency.

Properties of Concurrent Systems

为了使程序或并发系统正确,它必须满足一些属性。与系统终止相关的属性如下:

For a program or concurrent system to be correct, some properties must be satisfied by it. Properties related to the termination of system are as follows −

Correctness property

正确性属性意味着程序或系统必须提供所需的正确答案。简而言之,我们可以说系统必须将起始程序状态正确映射到最终状态。

The correctness property means that the program or the system must provide the desired correct answer. To keep it simple, we can say that the system must map the starting program state to final state correctly.

Safety property

安全性属性意味着程序或系统必须保持在 “good”“safe” 状态,并且永远不会做 “bad”

The safety property means that the program or the system must remain in a “good” or “safe” state and never does anything “bad”.

Liveness property

此属性意味着程序或系统必须 “make progress” ,并且它将达到某些理想状态。

This property means that a program or system must “make progress” and it would reach at some desirable state.

Actors of concurrent systems

这是并发系统的常见属性之一,其中可以有多个进程和线程同时运行以对其自身任务进行处理。这些进程和线程称为并发系统的参与者。

This is one common property of concurrent system in which there can be multiple processes and threads, which run at the same time to make progress on their own tasks. These processes and threads are called actors of the concurrent system.

Resources of Concurrent Systems

参与者必须利用内存、磁盘、打印机等资源才能执行其任务。

The actors must utilize the resources such as memory, disk, printer etc. in order to perform their tasks.

Certain set of rules

每个并发系统都必须拥有一组规则来定义参与者要执行的任务类型以及每个任务的时间安排。这些任务可以是获取锁、共享内存、修改状态等。

Every concurrent system must possess a set of rules to define the kind of tasks to be performed by the actors and the timing for each. The tasks could be acquiring of locks, memory sharing, modifying the state, etc.

Barriers of Concurrent Systems

Sharing of data

在实现并发系统时的一个重要问题是多个线程或进程之间的数据共享。实际上,程序员必须确保锁保护共享数据,以便对其的所有访问都序列化,并且一次只能有一个线程或进程访问共享数据。如果多个线程或进程都尝试访问同一共享数据,那么并非全部,但至少其中一个将被阻塞并且仍处于空闲状态。换句话说,我们可以说,当锁处于强制状态时,我们将一次只能使用一个进程或线程。有一些简单的解决方案可以消除上述障碍:

An important issue while implementing the concurrent systems is the sharing of data among multiple threads or processes. Actually, the programmer must ensure that locks protect the shared data so that all the accesses to it are serialized and only one thread or process can access the shared data at a time. In case, when multiple threads or processes are all trying to access the same shared data then not all but at least one of them would be blocked and would remain idle. In other words, we can say that we would be able to use only one process or thread at a time when lock is in force. There can be some simple solutions to remove the above-mentioned barriers −

Data Sharing Restriction

最简单的解决方案是不共享任何可变数据。在这种情况下,我们不需要使用显式锁,并且由于相互数据导致的并发性障碍将得到解决。

The simplest solution is not to share any mutable data. In this case, we need not to use explicit locking and the barrier of concurrency due to mutual data would be solved.

Data Structure Assistance

很多时候,并发进程需要同时访问相同的数据。除了使用显式锁之外,另一个解决方案是使用支持并发访问的数据结构。例如,我们可以使用 queue 模块,它提供线程安全的队列。我们还可以使用 multiprocessing.JoinableQueue 类进行基于多处理的并发性。

Many times the concurrent processes need to access the same data at the same time. Another solution, than using of explicit locks, is to use a data structure that supports concurrent access. For example, we can use the queue module, which provides thread-safe queues. We can also use multiprocessing.JoinableQueue classes for multiprocessing-based concurrency.

Immutable Data Transfer

有时候,我们正在使用的的数据结构(比如并发队列)不合适,那么我们可以在不锁定它的情况下传递不可变数据。

Sometimes, the data structure that we are using, say concurrency queue, is not suitable then we can pass the immutable data without locking it.

Mutable Data Transfer

为了延续上述解决方案,假设如果需要仅传递可变数据而不是不可变数据,那么我们可以传递可读的、可变数据。

In continuation of the above solution, suppose if it is required to pass only mutable data, rather than immutable data, then we can pass mutable data that is read only.

Sharing of I/O Resources

在实现并行系统时的另一个重要问题是由线程或进程使用 I/O 资源。当一个线程或进程长时间使用 I/O 而另一个处于空闲状态时,会出现问题。我们在使用 I/O 密集型应用程序时可以看到这种类型的障碍。可以借助示例来理解它,即从网络浏览器请求页面。它是一个重量级应用程序。在此,如果请求数据的速度低于使用数据的速度,那么我们的并发系统中就有 I/O 障碍。

Another important issue in implementing concurrent systems is the use of I/O resources by threads or processes. The problem arises when one thread or process is using the I/O for such a long time and other is sitting idle. We can see such kind of barrier while working with an I/O heavy application. It can be understood with the help of an example, the requesting of pages from web browser. It is a heavy application. Here, if the rate at which the data is requested is slower than the rate at which it is consumed then we have I/O barrier in our concurrent system.

以下 Python 脚本用于请求网页并获取网络获取请求页面的时间 -

The following Python script is for requesting a web page and getting the time our network took to get the requested page −

import urllib.request

import time

ts = time.time()

req = urllib.request.urlopen('https://www.tutorialspoint.com')

pageHtml = req.read()

te = time.time()

print("Page Fetching Time : {} Seconds".format (te-ts))

执行上述脚本后,我们可以获得页面获取时间,如下所示。

After executing the above script, we can get the page fetching time as shown below.

Output

Page Fetching Time: 1.0991398811340332 Seconds

我们可以看到,获取页面时间超过一秒。现在,如果我们想要获取数千个不同的网页,你可以理解我们的网络需要多少时间。

We can see that the time to fetch the page is more than one second. Now what if we want to fetch thousands of different web pages, you can understand how much time our network would take.

What is Parallelism?

可以将并行性定义为将任务拆分为可以同时处理的子任务的艺术。它与上面讨论的并行性相反,其中两个或更多事件同时发生。我们可以用图表理解它;一个任务被分解成可以在并行中处理的多个子任务,如下所示:

Parallelism may be defined as the art of splitting the tasks into subtasks that can be processed simultaneously. It is opposite to the concurrency, as discussed above, in which two or more events are happening at the same time. We can understand it diagrammatically; a task is broken into a number of subtasks that can be processed in parallel, as follows −

parallelism

为了更深入地了解并行性和并发性之间的区别,请考虑以下几点:

To get more idea about the distinction between concurrency and parallelism, consider the following points −

Concurrent but not parallel

应用程序可以是并发的而不是并行的,这意味着它可以同时处理多个任务,但任务不会被分解为子任务。

An application can be concurrent but not parallel means that it processes more than one task at the same time but the tasks are not broken down into subtasks.

Parallel but not concurrent

应用程序可以是并行的而不是并发的,这意味着它一次只处理一个任务,并且可以并行处理分解为子任务的任务。

An application can be parallel but not concurrent means that it only works on one task at a time and the tasks broken down into subtasks can be processed in parallel.

Neither parallel nor concurrent

应用程序既不是并行的也不是并发的。这意味着它一次只处理一个任务,并且该任务永远不会被分解为子任务。

An application can be neither parallel nor concurrent. This means that it works on only one task at a time and the task is never broken into subtasks.

Both parallel and concurrent

应用程序既可以是并行的也可以是并发的,这意味着它既可以在一个时间内处理多个任务,又可以将任务分解为子任务以并行执行它们。

An application can be both parallel and concurrent means that it both works on multiple tasks at a time and the task is broken into subtasks for executing them in parallel.

Necessity of Parallelism

我们可以通过在单个 CPU 的不同内核或在网络内连接的多个计算机之间分配子任务来实现并行性。

We can achieve parallelism by distributing the subtasks among different cores of single CPU or among multiple computers connected within a network.

考虑以下重要点以了解为什么有必要实现并行性:

Consider the following important points to understand why it is necessary to achieve parallelism −

Efficient code execution

在并行性的帮助下,我们可以高效地运行我们的代码。它将节省我们的时间,因为部分相同的代码正在并行运行。

With the help of parallelism, we can run our code efficiently. It will save our time because the same code in parts is running in parallel.

Faster than sequential computing

顺序计算受制于物理和实际因素,因此无法获得更快的计算结果。另一方面,并行计算解决了这个问题,并为我们提供了比顺序计算更快的计算结果。

Sequential computing is constrained by physical and practical factors due to which it is not possible to get faster computing results. On the other hand, this issue is solved by parallel computing and gives us faster computing results than sequential computing.

Less execution time

并行处理可以降低程序代码的执行时间。

Parallel processing reduces the execution time of program code.

如果我们来谈谈并行的一个真实例子,我们电脑的显卡就是突出并行处理真实功率的示例,因为它有数百个独立的处理核心,可以独立工作,同时执行任务。由于这个原因,我们也可以运行很多高级应用程序和游戏。

If we talk about real life example of parallelism, the graphics card of our computer is the example that highlights the true power of parallel processing because it has hundreds of individual processing cores that work independently and can do the execution at the same time. Due to this reason, we are able to run high-end applications and games as well.

Understanding of the processors for implementation

我们了解并发性、并行性和它们之间的区别,但我们应该了解在哪些系统上实现它们。非常有必要了解我们将要实现的系统,因为这使我们可以在设计软件时做出明智的决策。我们有两个类型的处理器 -

We know about concurrency, parallelism and the difference between them but what about the system on which it is to be implemented. It is very necessary to have the understanding of the system, on which we are going to implement, because it gives us the benefit to take informed decision while designing the software. We have the following two kinds of processors −

Single-core processors

单核处理器在任何给定时间只能执行一个线程。这些处理器使用 context switching 来存储特定时间线程所需的所有信息,然后稍后恢复这些信息。上下文切换机制帮助我们在给定的一秒内对多个线程取得进展,看起来系统正在处理多项任务。

Single-core processors are capable of executing one thread at any given time. These processors use context switching to store all the necessary information for a thread at a specific time and then restoring the information later. The context switching mechanism helps us make progress on a number of threads within a given second and it looks as if the system is working on multiple things.

单核处理器具有许多优点。这些处理器需要更少的功率,并且多个内核之间没有复杂的通信协议。另一方面,单核处理器的速度受到限制,不适用于大型应用程序。

Single-core processors come with many advantages. These processors require less power and there is no complex communication protocol between multiple cores. On the other hand, the speed of single-core processors is limited and it is not suitable for larger applications.

Multi-core processors

多核处理器具有多个独立的处理单元,也称为 cores

Multi-core processors have multiple independent processing units also called cores.

此类处理器不需要上下文切换机制,因为每个内核都包含执行一系列存储指令所需的一切信息。

Such processors do not need context switching mechanism as each core contains everything it needs to execute a sequence of stored instructions.

Fetch-Decode-Execute Cycle

多核处理器的内核遵循一个执行循环。此循环称为 Fetch-Decode-Execute 循环。它涉及以下步骤:

The cores of multi-core processors follow a cycle for executing. This cycle is called the Fetch-Decode-Execute cycle. It involves the following steps −

Fetch

这是循环的第一步,包括从程序存储器中获取指令。

This is the first step of cycle, which involves the fetching of instructions from the program memory.

Decode

最近获取的指令将被转换为一系列信号,这些信号将触发 CPU 的其他部分。

Recently fetched instructions would be converted to a series of signals that will trigger other parts of the CPU.

Execute

这是最终步骤,其中将执行已获取和已解码的指令。执行结果将存储在 CPU 寄存器中。

It is the final step in which the fetched and the decoded instructions would be executed. The result of execution will be stored in a CPU register.

此处的优点之一是,多核处理器中的执行速度比单核处理器快。它适用于大型应用程序。另一方面,多个内核之间的复杂通信协议是一个问题。多个内核所需的功率高于单核处理器。

One advantage over here is that the execution in multi-core processors are faster than that of single-core processors. It is suitable for larger applications. On the other hand, complex communication protocol between multiple cores is an issue. Multiple cores require more power than single-core processors.