Postgresql 中文操作指南

43.11. PL/pgSQL under the Hood #

本节讨论一些实现细节,这些细节通常对 PL/pgSQL 用户了解非常重要。

This section discusses some implementation details that are frequently important for PL/pgSQL users to know.

43.11.1. Variable Substitution #

PL/pgSQL 函数中的 SQL 语句和表达式可以引用函数中的变量和参数。在后台,PL/pgSQL 用查询参数替换此类引用项。查询参数仅会被替换在语法允许的位置。如极端情况,请考虑以下示例的糟糕编程风格:

SQL statements and expressions within a PL/pgSQL function can refer to variables and parameters of the function. Behind the scenes, PL/pgSQL substitutes query parameters for such references. Query parameters will only be substituted in places where they are syntactically permissible. As an extreme case, consider this example of poor programming style:

INSERT INTO foo (foo) VALUES (foo(foo));

foo 的第一次出现语法上必须是一个表名,所以它不会被替换,即使函数有一个名为 foo 的变量。第二次出现必须是该表的列名,所以它也不会被替换。同样,第三次出现必须是一个函数名,所以也不会被替换。只有最后一个出现可能是引用 PL/pgSQL 函数的变量的候选项。

The first occurrence of foo must syntactically be a table name, so it will not be substituted, even if the function has a variable named foo. The second occurrence must be the name of a column of that table, so it will not be substituted either. Likewise the third occurrence must be a function name, so it also will not be substituted for. Only the last occurrence is a candidate to be a reference to a variable of the PL/pgSQL function.

理解此问题还有一种方法,即变量替换只能将数据值插入到 SQL 命令中;它不能动态更改命令引用的数据库对象。(如果您想执行此操作,则必须动态构建命令字符串,如 Section 43.5.4中所述。)

Another way to understand this is that variable substitution can only insert data values into an SQL command; it cannot dynamically change which database objects are referenced by the command. (If you want to do that, you must build a command string dynamically, as explained in Section 43.5.4.)

由于变量的名称在语法上与表列的名称没有区别,因此在也引用表的语句中可能会出现歧义:给定的名称是表示表列还是变量?我们把之前的示例改成

Since the names of variables are syntactically no different from the names of table columns, there can be ambiguity in statements that also refer to tables: is a given name meant to refer to a table column, or a variable? Let’s change the previous example to

INSERT INTO dest (col) SELECT foo + bar FROM src;

此处,destsrc 必须是表名,col 必须是 dest 的列,但 foobar 可能合理地是函数的变量或 src 的列。

Here, dest and src must be table names, and col must be a column of dest, but foo and bar might reasonably be either variables of the function or columns of src.

默认情况下,如果 SQL 语句中的名称可以引用变量或表列,PL/pgSQL 将报告一个错误。你可以通过重命名变量或列、限定模棱两可的引用或告诉 PL/pgSQL 首选哪个解释来解决此类问题。

By default, PL/pgSQL will report an error if a name in an SQL statement could refer to either a variable or a table column. You can fix such a problem by renaming the variable or column, or by qualifying the ambiguous reference, or by telling PL/pgSQL which interpretation to prefer.

最简单的解决方案是重命名变量或列。一种常见的编码规则是,对 PL/pgSQL 变量使用与对列名不同的命名约定。例如,如果你始终将函数变量命名为 v_something while none of your column names start with v_ ,那么不会发生冲突。

The simplest solution is to rename the variable or column. A common coding rule is to use a different naming convention for PL/pgSQL variables than you use for column names. For example, if you consistently name function variables vsomething while none of your column names start with _v_, no conflicts will occur.

或者,您可以限定不明确的引用以使其明确。在上例中,_src.foo_将对表列进行明确引用。要创建一个对变量的明确引用,请在带标签的块中声明它并使用块的标签(参见 Section 43.2)。例如:

Alternatively you can qualify ambiguous references to make them clear. In the above example, src.foo would be an unambiguous reference to the table column. To create an unambiguous reference to a variable, declare it in a labeled block and use the block’s label (see Section 43.2). For example,

<<block>>
DECLARE
    foo int;
BEGIN
    foo := ...;
    INSERT INTO dest (col) SELECT block.foo + bar FROM src;

此处 block.foo 表示变量,即使 src 中存在列 foo。函数参数以及 FOUND 等特殊变量可以通过函数名限定,因为它们在隐式标记为函数名的外部块中声明。

Here block.foo means the variable even if there is a column foo in src. Function parameters, as well as special variables such as FOUND, can be qualified by the function’s name, because they are implicitly declared in an outer block labeled with the function’s name.

有时,在大量的 PL/pgSQL 代码中修复所有模棱两可的引用并不实际。在这种情况下,你可以指定 PL/pgSQL 应该将模棱两可的引用解析为变量(这与 PostgreSQL 9.0 之前的 PL/pgSQL 行为兼容),或者解析为表列(这与 Oracle 等某些其他系统兼容)。

Sometimes it is impractical to fix all the ambiguous references in a large body of PL/pgSQL code. In such cases you can specify that PL/pgSQL should resolve ambiguous references as the variable (which is compatible with PL/pgSQL’s behavior before PostgreSQL 9.0), or as the table column (which is compatible with some other systems such as Oracle).

若要在系统范围内更改此行为,请将配置参数_plpgsql.variable_conflict_设置成_error_、_use_variable_或_use_column_中的一个(其中_error_为出厂默认值)。此参数会影响 PL/pgSQL 函数中后续语句的编译,但不影响当前会话中已编译的语句。由于更改此设置可能会导致 PL/pgSQL 函数的行为发生意外更改,因此只有超级用户可以更改此设置。

To change this behavior on a system-wide basis, set the configuration parameter plpgsql.variable_conflict to one of error, use_variable, or use_column (where error is the factory default). This parameter affects subsequent compilations of statements in PL/pgSQL functions, but not statements already compiled in the current session. Because changing this setting can cause unexpected changes in the behavior of PL/pgSQL functions, it can only be changed by a superuser.

你还可以通过在函数文本的开头插入以下特殊命令之一,根据函数按函数设置行为:

You can also set the behavior on a function-by-function basis, by inserting one of these special commands at the start of the function text:

#variable_conflict error
#variable_conflict use_variable
#variable_conflict use_column

这些命令仅影响它们所在的函数,并覆盖_plpgsql.variable_conflict_的设置。以下是一个示例:

These commands affect only the function they are written in, and override the setting of plpgsql.variable_conflict. An example is

CREATE FUNCTION stamp_user(id int, comment text) RETURNS void AS $$
    #variable_conflict use_variable
    DECLARE
        curtime timestamp := now();
    BEGIN
        UPDATE users SET last_modified = curtime, comment = comment
          WHERE users.id = id;
    END;
$$ LANGUAGE plpgsql;

在_UPDATE_命令中,curtime、_comment_和_id_将引用函数的变量和参数,而不管_users_是否具有这些名称的列。请注意,我们必须在_WHERE_子句中限定对_users.id_的引用,使其引用表格列。但是我们不必将对_comment_的引用限定为_UPDATE_列表中的目标,因为从语法上讲,它必须是_users_的列。我们可以使用这种方式编写相同的函数,而不依赖于_variable_conflict_设置:

In the UPDATE command, curtime, comment, and id will refer to the function’s variable and parameters whether or not users has columns of those names. Notice that we had to qualify the reference to users.id in the WHERE clause to make it refer to the table column. But we did not have to qualify the reference to comment as a target in the UPDATE list, because syntactically that must be a column of users. We could write the same function without depending on the variable_conflict setting in this way:

CREATE FUNCTION stamp_user(id int, comment text) RETURNS void AS $$
    <<fn>>
    DECLARE
        curtime timestamp := now();
    BEGIN
        UPDATE users SET last_modified = fn.curtime, comment = stamp_user.comment
          WHERE users.id = stamp_user.id;
    END;
$$ LANGUAGE plpgsql;

变量替换不会在提供给 EXECUTE_或其变量之一的命令字符串中发生。如果您需要在这样的命令中插入变化的值,请在构造字符串值时这样做或使用 _USING,如 Section 43.5.4中所示。

Variable substitution does not happen in a command string given to EXECUTE or one of its variants. If you need to insert a varying value into such a command, do so as part of constructing the string value, or use USING, as illustrated in Section 43.5.4.

变量替换目前仅适用于_SELECT_、INSERTUPDATEDELETE_以及包含其中一个命令(例如_EXPLAIN_和_CREATE TABLE …​ AS SELECT)的命令,因为主 SQL 引擎只允许在这些命令中使用查询参数。要在其他语句类型(通常称为实用程序语句)中使用非常量名称或值,你必须将实用程序语句构造为一个字符串,然后_EXECUTE_它。

Variable substitution currently works only in SELECT, INSERT, UPDATE, DELETE, and commands containing one of these (such as EXPLAIN and CREATE TABLE …​ AS SELECT), because the main SQL engine allows query parameters only in these commands. To use a non-constant name or value in other statement types (generically called utility statements), you must construct the utility statement as a string and EXECUTE it.

43.11.2. Plan Caching #

PL/pgSQL 解释器在首次调用函数(在每个会话中)时解析函数的源文本并生成一个内部二进制指令树。指令树完全翻译了 PL/pgSQL 语句结构,但不会立即翻译函数中使用的单个 SQL 表达式和 SQL 命令。

The PL/pgSQL interpreter parses the function’s source text and produces an internal binary instruction tree the first time the function is called (within each session). The instruction tree fully translates the PL/pgSQL statement structure, but individual SQL expressions and SQL commands used in the function are not translated immediately.

每次先在函数中执行每个表达式和 SQL 命令时,PL/pgSQL 解释器都会使用 SPI 管理器的 SPI_prepare 函数,解析并分析该命令以创建已准备好的语句。随后访问该表达式或命令时,将重新使用已准备好的语句。因此,具有很少访问条件代码路径的函数将永远不会产生分析从未在当前会话中执行的那些命令的开销。一个缺点是,在执行过程中,直到函数的该部分被访问到,才能检测到特定表达式或命令中的错误。(在初始解析过程中将检测到微不足道的语法错误,但任何更深层次的错误都将在执行前无法检测到。)

As each expression and SQL command is first executed in the function, the PL/pgSQL interpreter parses and analyzes the command to create a prepared statement, using the SPI manager’s SPI_prepare function. Subsequent visits to that expression or command reuse the prepared statement. Thus, a function with conditional code paths that are seldom visited will never incur the overhead of analyzing those commands that are never executed within the current session. A disadvantage is that errors in a specific expression or command cannot be detected until that part of the function is reached in execution. (Trivial syntax errors will be detected during the initial parsing pass, but anything deeper will not be detected until execution.)

此外,PL/pgSQL(或者更确切地说,SPI 管理器)可以尝试缓存与任何特定已准备好的语句相关联的执行计划。如果不使用已缓存的计划,则在每次访问该语句时都会生成一个新的执行计划,并且可以将当前参数值(即 PL/pgSQL 变量值)用于优化所选计划。如果该语句没有参数,或被执行多次,SPI 管理器会考虑创建一个与特定参数值无关的 generic 计划,并对其进行缓存以备重新使用。通常,这种情况只会发生在执行计划不会极度受其中引用的 PL/pgSQL 变量值影响时。如果是这样,每次生成一个计划都是净胜利。参见 PREPARE ,以获取关于已准备好的语句的行为的更多信息。

PL/pgSQL (or more precisely, the SPI manager) can furthermore attempt to cache the execution plan associated with any particular prepared statement. If a cached plan is not used, then a fresh execution plan is generated on each visit to the statement, and the current parameter values (that is, PL/pgSQL variable values) can be used to optimize the selected plan. If the statement has no parameters, or is executed many times, the SPI manager will consider creating a generic plan that is not dependent on specific parameter values, and caching that for re-use. Typically this will happen only if the execution plan is not very sensitive to the values of the PL/pgSQL variables referenced in it. If it is, generating a plan each time is a net win. See PREPARE for more information about the behavior of prepared statements.

由于 PL/pgSQL 以这种方式保存准备语句,有时还保存执行计划,因此直接出现在 PL/pgSQL 函数中的 SQL 命令必须在每次执行时都引用相同的表格和列;也就是说,你不能使用一个参数作为 SQL 命令中的表格或列的名称。要规避此限制,你可以使用 PL/pgSQL _EXECUTE_语句构造动态命令——代价是每次执行时都执行新的解析分析并构造新的执行计划。

Because PL/pgSQL saves prepared statements and sometimes execution plans in this way, SQL commands that appear directly in a PL/pgSQL function must refer to the same tables and columns on every execution; that is, you cannot use a parameter as the name of a table or column in an SQL command. To get around this restriction, you can construct dynamic commands using the PL/pgSQL EXECUTE statement — at the price of performing new parse analysis and constructing a new execution plan on every execution.

记录变量的可变特性会带来另一个方面的问题。当记录变量的字段在表达式或语句中使用时,字段的数据类型不能从函数的一个调用更改到下一次调用,因为每个表达式将使用第一次到达表达式时存在的数据类型进行分析。_EXECUTE_可在必要时用于解决此问题。

The mutable nature of record variables presents another problem in this connection. When fields of a record variable are used in expressions or statements, the data types of the fields must not change from one call of the function to the next, since each expression will be analyzed using the data type that is present when the expression is first reached. EXECUTE can be used to get around this problem when necessary.

如果同一个函数被用作多个表格的触发器,则 PL/pgSQL 会为每个此类表格独立准备并缓存语句——也就是说,对于每个触发器函数和表格组合都有一个缓存,而不仅仅针对每个函数。这缓解了与不同数据类型有关的一些问题;例如,触发器函数将能够成功处理名为_key_的列,即使它碰巧在不同的表格中具有不同的类型。

If the same function is used as a trigger for more than one table, PL/pgSQL prepares and caches statements independently for each such table — that is, there is a cache for each trigger function and table combination, not just for each function. This alleviates some of the problems with varying data types; for instance, a trigger function will be able to work successfully with a column named key even if it happens to have different types in different tables.

同样,具有多态参数类型的函数对于每个调用它们的实际参数类型组合都有一个单独的语句缓存,以便数据类型差异不会导致意外失败。

Likewise, functions having polymorphic argument types have a separate statement cache for each combination of actual argument types they have been invoked for, so that data type differences do not cause unexpected failures.

语句缓存有时会对时间敏感值的解释产生出人意料的影响。例如,以下这两个函数之间的区别:

Statement caching can sometimes have surprising effects on the interpretation of time-sensitive values. For example there is a difference between what these two functions do:

CREATE FUNCTION logfunc1(logtxt text) RETURNS void AS $$
    BEGIN
        INSERT INTO logtable VALUES (logtxt, 'now');
    END;
$$ LANGUAGE plpgsql;

及:

and:

CREATE FUNCTION logfunc2(logtxt text) RETURNS void AS $$
    DECLARE
        curtime timestamp;
    BEGIN
        curtime := 'now';
        INSERT INTO logtable VALUES (logtxt, curtime);
    END;
$$ LANGUAGE plpgsql;

对于_logfunc1_,PostgreSQL 主解析器在分析_INSERT_时知道字符串_'now'应该解释为_timestamp,因为_logtable_的目标列是那种类型。因此,当_INSERT_被分析时,_'now'_将被转换为_timestamp_常量,然后在该会话的生命周期内_logfunc1_的所有调用中使用。不用说,这不是程序员想要的。一个更好的主意是使用_now()_或_current_timestamp_函数。

In the case of logfunc1, the PostgreSQL main parser knows when analyzing the INSERT that the string 'now' should be interpreted as timestamp, because the target column of logtable is of that type. Thus, 'now' will be converted to a timestamp constant when the INSERT is analyzed, and then used in all invocations of logfunc1 during the lifetime of the session. Needless to say, this isn’t what the programmer wanted. A better idea is to use the now() or current_timestamp function.

对于_logfunc2_,PostgreSQL 主解析器不知道_'now'_应该变成什么类型,因此它返回包含字符串_now_的类型为_text_的数据值。在接下来的对局部变量_curtime_的赋值中,PL/pgSQL 解析器通过调用_textout_和_timestamp_in_函数进行转换,将此字符串强制转换为_timestamp_类型。因此,计算出的时间戳会像程序员期望的那样在每次执行时更新。即使这恰好按预期工作,也不是很有效,因此最好还是使用_now()_函数。

In the case of logfunc2, the PostgreSQL main parser does not know what type 'now' should become and therefore it returns a data value of type text containing the string now. During the ensuing assignment to the local variable curtime, the PL/pgSQL interpreter casts this string to the timestamp type by calling the textout and timestamp_in functions for the conversion. So, the computed time stamp is updated on each execution as the programmer expects. Even though this happens to work as expected, it’s not terribly efficient, so use of the now() function would still be a better idea.