Postgresql 中文操作指南

34.5. Pipeline Mode #

libpq 管道模式允许应用程序发送查询,而无需读取之前发送查询的结果。利用管道模式,客户端等待服务器的时间更少,因为可以在单个网络事务中发送/接收多个查询/结果。

libpq pipeline mode allows applications to send a query without having to read the result of the previously sent query. Taking advantage of the pipeline mode, a client will wait less for the server, since multiple queries/results can be sent/received in a single network transaction.

虽然管道模式提供了显着的性能提升,但使用管道模式编写客户端更为复杂,因为它涉及管理待处理查询队列并查找队列中哪个结果对应哪个查询。

While pipeline mode provides a significant performance boost, writing clients using the pipeline mode is more complex because it involves managing a queue of pending queries and finding which result corresponds to which query in the queue.

管道模式通常也在客户端和服务器上消耗更多内存,尽管仔细积极地管理发送/接收队列可以缓解这种情况。这适用于连接是否处于阻塞或非阻塞模式。

Pipeline mode also generally consumes more memory on both the client and server, though careful and aggressive management of the send/receive queue can mitigate this. This applies whether or not the connection is in blocking or non-blocking mode.

虽然 libpq 的管道 API 是在 PostgreSQL 14 中引入的,但它是一个客户端特性,不需要特殊服务器支持,并且可以在支持 v3 扩展查询协议的任何服务器上工作。有关更多信息,请参阅 Section 55.2.4

While libpq’s pipeline API was introduced in PostgreSQL 14, it is a client-side feature which doesn’t require special server support and works on any server that supports the v3 extended query protocol. For more information see Section 55.2.4.

34.5.1. Using Pipeline Mode #

若要发出管道,应用程序必须将连接切换到管道模式,此操作可通过 PQenterPipelineMode 完成。 PQpipelineStatus 可用于测试管道模式是否处于活动状态。在管道模式下,仅允许利用扩展查询协议的 asynchronous operations ,不允许包含多个 SQL 命令的命令字符串,也不允许 COPY 。使用同步命令执行函数(如 PQfnPQexecPQexecParamsPQpreparePQexecPreparedPQdescribePreparedPQdescribePortal )是一种错误状态。 PQsendQuery 也不允许,因为它使用简单查询协议。在所有已派发的命令的处理其结果后,且最终管道结果已使用,应用程序可以使用 PQexitPipelineMode 返回到非管道模式。

To issue pipelines, the application must switch the connection into pipeline mode, which is done with PQenterPipelineMode. PQpipelineStatus can be used to test whether pipeline mode is active. In pipeline mode, only asynchronous operations that utilize the extended query protocol are permitted, command strings containing multiple SQL commands are disallowed, and so is COPY. Using synchronous command execution functions such as PQfn, PQexec, PQexecParams, PQprepare, PQexecPrepared, PQdescribePrepared, PQdescribePortal, is an error condition. PQsendQuery is also disallowed, because it uses the simple query protocol. Once all dispatched commands have had their results processed, and the end pipeline result has been consumed, the application may return to non-pipelined mode with PQexitPipelineMode.

Note

最好在 non-blocking mode 中与 libpq 结合使用管道模式。如果在阻塞模式中使用,则会出现客户端/服务器死锁。 [15 ]

It is best to use pipeline mode with libpq in non-blocking mode. If used in blocking mode it is possible for a client/server deadlock to occur. [15]

34.5.1.1. Issuing Queries #

在进入管道模式后,应用程序使用 PQsendQueryParams 或其已准备查询兄弟 PQsendQueryPrepared 派发请求。这些请求在客户端排队,直至刷新到服务器; PQpipelineSync 用于在管道中建立同步点或调用 PQflush 时,就会发生这种情况。函数 PQsendPreparePQsendDescribePreparedPQsendDescribePortal 也可以在管道模式中使用。结果处理在下文中描述。

After entering pipeline mode, the application dispatches requests using PQsendQueryParams or its prepared-query sibling PQsendQueryPrepared. These requests are queued on the client-side until flushed to the server; this occurs when PQpipelineSync is used to establish a synchronization point in the pipeline, or when PQflush is called. The functions PQsendPrepare, PQsendDescribePrepared, and PQsendDescribePortal also work in pipeline mode. Result processing is described below.

服务器按客户端发送的顺序执行语句,并返回结果。服务器将立即开始执行管道中的命令,而不等待管道结束。请注意,结果缓存在服务器端;当使用 PQpipelineSync 建立同步点,或者调用 PQsendFlushRequest 时,服务器将刷新该缓冲区。如果任何语句遇到错误,服务器将中止当前事务,并在下一个同步点之前不会执行队列中的任何后续命令;针对每个此类命令生成 PGRES_PIPELINE_ABORTED 结果。(即使管道中的命令回滚事务,这种情况仍然成立。)查询处理在同步点之后恢复。

The server executes statements, and returns results, in the order the client sends them. The server will begin executing the commands in the pipeline immediately, not waiting for the end of the pipeline. Note that results are buffered on the server side; the server flushes that buffer when a synchronization point is established with PQpipelineSync, or when PQsendFlushRequest is called. If any statement encounters an error, the server aborts the current transaction and does not execute any subsequent command in the queue until the next synchronization point; a PGRES_PIPELINE_ABORTED result is produced for each such command. (This remains true even if the commands in the pipeline would rollback the transaction.) Query processing resumes after the synchronization point.

对于一个操作依赖于前一个操作的结果,这是正常的;例如,一个查询可能定义一个表,管道中下一个查询使用该表。类似地,应用程序可以在管道中使用后面的语句创建命名的已准备语句并执行它。

It’s fine for one operation to depend on the results of a prior one; for example, one query may define a table that the next query in the same pipeline uses. Similarly, an application may create a named prepared statement and execute it with later statements in the same pipeline.

34.5.1.2. Processing Results #

要处理管道中一个查询的结果,应用程序重复调用 PQgetResult 并处理每个结果,直到 PQgetResult 返回 null。然后可以使用 PQgetResult 再次检索管道中下一个查询的结果,并重复该循环。应用程序按正常方式处理各个语句结果。当管道中所有查询的结果都已返回时,PQgetResult 将返回包含状态值 PGRES_PIPELINE_SYNC 的结果

To process the result of one query in a pipeline, the application calls PQgetResult repeatedly and handles each result until PQgetResult returns null. The result from the next query in the pipeline may then be retrieved using PQgetResult again and the cycle repeated. The application handles individual statement results as normal. When the results of all the queries in the pipeline have been returned, PQgetResult returns a result containing the status value PGRES_PIPELINE_SYNC

客户端可以选择延迟结果处理,直到完整管道已发送,或者用发送 pipeline 中的进一步查询与之交叉;请参见 Section 34.5.1.4

The client may choose to defer result processing until the complete pipeline has been sent, or interleave that with sending further queries in the pipeline; see Section 34.5.1.4.

要进入单行模式,请在使用 PQgetResult_检索结果之前调用 _PQsetSingleRowMode。此模式选择仅对当前正在处理的查询有效。要获取有关 _PQsetSingleRowMode_用法的更多信息,请参阅 Section 34.6

To enter single-row mode, call PQsetSingleRowMode before retrieving results with PQgetResult. This mode selection is effective only for the query currently being processed. For more information on the use of PQsetSingleRowMode, refer to Section 34.6.

PQgetResult_的行为与正常的异步处理相同,只是它可能包含新 _PGresult_类型 _PGRES_PIPELINE_SYNCPGRES_PIPELINE_ABORTEDPGRES_PIPELINE_SYNC 对于 pipeline 中相应位置处的每个 PQpipelineSync_仅报告一次。_PGRES_PIPELINE_ABORTED 在第一次错误及其所有后续结果中代替普通的查询结果发出,直到下一个 PGRES_PIPELINE_SYNC;请参见 Section 34.5.1.3

PQgetResult behaves the same as for normal asynchronous processing except that it may contain the new PGresult types PGRES_PIPELINE_SYNC and PGRES_PIPELINE_ABORTED. PGRES_PIPELINE_SYNC is reported exactly once for each PQpipelineSync at the corresponding point in the pipeline. PGRES_PIPELINE_ABORTED is emitted in place of a normal query result for the first error and all subsequent results until the next PGRES_PIPELINE_SYNC; see Section 34.5.1.3.

PQisBusyPQconsumeInput 等在处理管道结果时按正常方式运行。特别是,如果到目前为止已提交所有查询的结果,则在管道中间调用 PQisBusy 时返回 0。

PQisBusy, PQconsumeInput, etc operate as normal when processing pipeline results. In particular, a call to PQisBusy in the middle of a pipeline returns 0 if the results for all the queries issued so far have been consumed.

libpq 不向应用程序提供有关当前正在处理的查询的任何信息(除了 PQgetResult 返回 null 来指示我们开始返回下一个查询的结果)。应用程序必须跟踪它发送查询的顺序,以便将其与对应结果关联起来。应用程序通常将使用状态机或 FIFO 队列来实现此目的。

libpq does not provide any information to the application about the query currently being processed (except that PQgetResult returns null to indicate that we start returning the results of next query). The application must keep track of the order in which it sent queries, to associate them with their corresponding results. Applications will typically use a state machine or a FIFO queue for this.

34.5.1.3. Error Handling #

从客户端视角,在 PQresultStatus 返回 PGRES_FATAL_ERROR 之后,管道就会标记为已中止。对于中止管道中剩余的每个排队操作,PQresultStatus 将报告 PGRES_PIPELINE_ABORTED 结果。PQpipelineSync 的结果以 PGRES_PIPELINE_SYNC 的形式报告,表示中止管道的结束和正常结果处理的恢复。

From the client’s perspective, after PQresultStatus returns PGRES_FATAL_ERROR, the pipeline is flagged as aborted. PQresultStatus will report a PGRES_PIPELINE_ABORTED result for each remaining queued operation in an aborted pipeline. The result for PQpipelineSync is reported as PGRES_PIPELINE_SYNC to signal the end of the aborted pipeline and resumption of normal result processing.

在错误恢复期间,客户端 must 使用 PQgetResult 处理结果。

The client must process results with PQgetResult during error recovery.

如果管道使用隐式事务,那么已经执行的操作将被回滚,而紧跟在失败操作之后排队等待的操作将被完全跳过。如果管道启动并提交单个显式事务(即第一个语句是 BEGIN,最后一个是 COMMIT),则会出现同样的行为,只不过会话在管道结束时仍处于中止的事务状态。如果管道包含 multiple explicit transactions,则在错误发生前提交的所有事务仍然保持提交,当前正在进行的事务被中止,所有后续操作(包括后续事务)都将被完全跳过。如果同步点在一个中止状态的显式事务块中发生,那么除非后续命令使用 ROLLBACK 将事务置于正常模式,否则下一个管道会立即中止。

If the pipeline used an implicit transaction, then operations that have already executed are rolled back and operations that were queued to follow the failed operation are skipped entirely. The same behavior holds if the pipeline starts and commits a single explicit transaction (i.e. the first statement is BEGIN and the last is COMMIT) except that the session remains in an aborted transaction state at the end of the pipeline. If a pipeline contains multiple explicit transactions, all transactions that committed prior to the error remain committed, the currently in-progress transaction is aborted, and all subsequent operations are skipped completely, including subsequent transactions. If a pipeline synchronization point occurs with an explicit transaction block in aborted state, the next pipeline will become aborted immediately unless the next command puts the transaction in normal mode with ROLLBACK.

Note

当该客户端 sends 一个 COMMIT 时,客户端不得假设工作已提交,只有在收到相应结果以确认已完成提交后才有此假设。由于错误会异步到达,因此应用程序需要能够从最近一次 received 提交的更改重新启动,并在出现问题时重新发送此点之后完成的工作。

The client must not assume that work is committed when it sends a COMMIT — only when the corresponding result is received to confirm the commit is complete. Because errors arrive asynchronously, the application needs to be able to restart from the last received committed change and resend work done after that point if something goes wrong.

34.5.1.4. Interleaving Result Processing and Query Dispatch #

为了避免大型管道中的死锁,客户端应围绕非阻塞事件循环进行构建,使用操作系统设施,如 selectpollWaitForMultipleObjectEx 等。

To avoid deadlocks on large pipelines the client should be structured around a non-blocking event loop using operating system facilities such as select, poll, WaitForMultipleObjectEx, etc.

通常而言,客户端应用程序应维护一个待调度任务队列和一个已经被调度但尚未处理其结果的任务队列。当套接字可写时,它应调度更多任务。当套接字可读时,它应读取结果并处理结果,将其与相应结果队列中的下一个条目进行匹配。根据可用内存,应频繁读取套接字中的结果:无需等到管道结束才读取结果。管道应限定在逻辑工作单元中,通常(但不一定)是每个管道一个事务。无需退出管道模式并在管道之间重新进入该模式,或等待一个管道完成再发送下一个管道。

The client application should generally maintain a queue of work remaining to be dispatched and a queue of work that has been dispatched but not yet had its results processed. When the socket is writable it should dispatch more work. When the socket is readable it should read results and process them, matching them up to the next entry in its corresponding results queue. Based on available memory, results from the socket should be read frequently: there’s no need to wait until the pipeline end to read the results. Pipelines should be scoped to logical units of work, usually (but not necessarily) one transaction per pipeline. There’s no need to exit pipeline mode and re-enter it between pipelines, or to wait for one pipeline to finish before sending the next.

关于使用 select() 和一个简单状态机来跟踪已发送和已接收任务的示例,见 PostgreSQL 源代码分发中的 src/test/modules/libpq_pipeline/libpq_pipeline.c

An example using select() and a simple state machine to track sent and received work is in src/test/modules/libpq_pipeline/libpq_pipeline.c in the PostgreSQL source distribution.

34.5.2. Functions Associated with Pipeline Mode #

  • PQpipelineStatus #

    • Returns the current pipeline mode status of the libpq connection.

PGpipelineStatus PQpipelineStatus(const PGconn *conn);
  • PQpipelineStatus can return one of the following values:

    • PQenterPipelineMode #

  • Causes a connection to enter pipeline mode if it is currently idle or already in pipeline mode.

int PQenterPipelineMode(PGconn *conn);
  • Returns 1 for success. Returns 0 and has no effect if the connection is not currently idle, i.e., it has a result ready, or it is waiting for more input from the server, etc. This function does not actually send anything to the server, it just changes the libpq connection state.

    • PQexitPipelineMode #

  • Causes a connection to exit pipeline mode if it is currently in pipeline mode with an empty queue and no pending results.

int PQexitPipelineMode(PGconn *conn);
  • Returns 1 for success. Returns 1 and takes no action if not in pipeline mode. If the current statement isn’t finished processing, or PQgetResult has not been called to collect results from all previously sent query, returns 0 (in which case, use PQerrorMessage to get more information about the failure).

    • PQpipelineSync #

  • Marks a synchronization point in a pipeline by sending a sync message and flushing the send buffer. This serves as the delimiter of an implicit transaction and an error recovery point; see Section 34.5.1.3.

int PQpipelineSync(PGconn *conn);
  • Returns 1 for success. Returns 0 if the connection is not in pipeline mode or sending a sync message failed.

    • PQsendFlushRequest #

  • Sends a request for the server to flush its output buffer.

int PQsendFlushRequest(PGconn *conn);
  • Returns 1 for success. Returns 0 on any failure.

  • The server flushes its output buffer automatically as a result of PQpipelineSync being called, or on any request when not in pipeline mode; this function is useful to cause the server to flush its output buffer in pipeline mode without establishing a synchronization point. Note that the request is not itself flushed to the server automatically; use PQflush if necessary.

    • PQ_PIPELINE_ON

  • The libpq connection is in pipeline mode.

    • PQ_PIPELINE_OFF

  • The libpq connection is not in pipeline mode.

    • PQ_PIPELINE_ABORTED

  • The libpq connection is in pipeline mode and an error occurred while processing the current pipeline. The aborted flag is cleared when PQgetResult returns a result of type PGRES_PIPELINE_SYNC.

34.5.3. When to Use Pipeline Mode #

与异步查询模式非常类似,使用管道模式不会产生有意义的性能开销。它会增加客户端应用程序的复杂性,而且需要格外小心才能防止客户端/服务器死锁,但是管道模式可以大幅提升性能,但会增加内存使用量,因为状态停留的时间更长。

Much like asynchronous query mode, there is no meaningful performance overhead when using pipeline mode. It increases client application complexity, and extra caution is required to prevent client/server deadlocks, but pipeline mode can offer considerable performance improvements, in exchange for increased memory usage from leaving state around longer.

当服务器距离较远(例如网络延迟(“ping时间”)较高),并且当快速连续执行很多小型操作时,管道模式最有用。在每次查询执行很多倍客户端/服务器往返时间时,使用管道化的命令通常收益较小。300毫秒往返时间距离的服务器上,执行一个100语句操作,那么仅在网络延迟上,该服务器将花费30秒而无需进行管道;通过管道化,它可能只需要花费0.3秒来等待服务器发送结果。

Pipeline mode is most useful when the server is distant, i.e., network latency (“ping time”) is high, and also when many small operations are being performed in rapid succession. There is usually less benefit in using pipelined commands when each query takes many multiples of the client/server round-trip time to execute. A 100-statement operation run on a server 300 ms round-trip-time away would take 30 seconds in network latency alone without pipelining; with pipelining it may spend as little as 0.3 s waiting for results from the server.

当应用程序执行大量小的 INSERT、_UPDATE_和 _DELETE_操作(这些操作无法轻松地转换成针对设置的操作或 _COPY_操作)时,请使用管道命令。

Use pipelined commands when your application does lots of small INSERT, UPDATE and DELETE operations that can’t easily be transformed into operations on sets, or into a COPY operation.

当客户端需要来自一个操作的信息来生成下一个操作时,管道模式没有意义。在这种情况下,客户端必须引入一个同步点,并等待完成一个完整的客户端/服务器往返才能获取它需要的结果。但是,通常可以通过调整客户端设计来交换所需的服务器端信息。读写改循环是特别好的候选者;例如:

Pipeline mode is not useful when information from one operation is required by the client to produce the next operation. In such cases, the client would have to introduce a synchronization point and wait for a full client/server round-trip to get the results it needs. However, it’s often possible to adjust the client design to exchange the required information server-side. Read-modify-write cycles are especially good candidates; for example:

BEGIN;
SELECT x FROM mytable WHERE id = 42 FOR UPDATE;
-- result: x=2
-- client adds 1 to x:
UPDATE mytable SET x = 3 WHERE id = 42;
COMMIT;

可以通过以下方式更高效地完成:

could be much more efficiently done with:

UPDATE mytable SET x = x + 1 WHERE id = 42;

当一个管道包含多个事务(请参见 Section 34.5.1.3)时,管道不太有用,并且更复杂。

Pipelining is less useful, and more complex, when a single pipeline contains multiple transactions (see Section 34.5.1.3).

[15 ] 客户端将阻止尝试向服务器发送查询,但服务器会阻止尝试向客户端发送已处理的查询结果。仅在客户端发送足够的查询以在其输出缓冲区和服务器的接收缓冲区中填充数据,但它不会首先处理来自服务器的输入时,才会发生这种情况,但是很难准确预测将会何时发生这种情况。

[15] The client will block trying to send queries to the server, but the server will block trying to send results to the client from queries it has already processed. This only occurs when the client sends enough queries to fill both its output buffer and the server’s receive buffer before it switches to processing input from the server, but it’s hard to predict exactly when that will happen.