Postgresql 中文操作指南

38.10. C-Language Functions #

用户定义的函数可以用 C(或可与 C 兼容的语言,如 C++)编写。此类函数被编译成可动态加载的对象(也称为共享库),服务器会根据需要加载这些对象。动态加载功能是“C 语言”函数与“内部”函数的区别所在——两种函数的实际编码约定基本相同。(因此,标准内部函数库对于用户定义的 C 函数来说是一个丰富的编码示例来源。)

User-defined functions can be written in C (or a language that can be made compatible with C, such as C++). Such functions are compiled into dynamically loadable objects (also called shared libraries) and are loaded by the server on demand. The dynamic loading feature is what distinguishes “C language” functions from “internal” functions — the actual coding conventions are essentially the same for both. (Hence, the standard internal function library is a rich source of coding examples for user-defined C functions.)

当前,仅一种调用约定用于 C 函数(“版本 1”)。通过编写函数的 PG_FUNCTION_INFO_V1() 宏调用,如以下所示,来表示对该调用约定的支持。

Currently only one calling convention is used for C functions (“version 1”). Support for that calling convention is indicated by writing a PG_FUNCTION_INFO_V1() macro call for the function, as illustrated below.

38.10.1. Dynamic Loading #

在会话中第一次调用某个可加载对象文件中的用户定义函数时,动态加载器会将该对象文件加载到内存中,以便可以调用该函数。因此,用户定义的 C 函数的 CREATE FUNCTION 必须为该函数指定两条信息:可加载对象文件的名称,以及要在此对象文件中调用的特定函数的 C 名称(链接符号)。如果未明确指定 C 名称,则假设该名称与 SQL 函数名称相同。

The first time a user-defined function in a particular loadable object file is called in a session, the dynamic loader loads that object file into memory so that the function can be called. The CREATE FUNCTION for a user-defined C function must therefore specify two pieces of information for the function: the name of the loadable object file, and the C name (link symbol) of the specific function to call within that object file. If the C name is not explicitly specified then it is assumed to be the same as the SQL function name.

以下算法用于根据 CREATE FUNCTION 命令中给定的名称来查找共享对象文件:

The following algorithm is used to locate the shared object file based on the name given in the CREATE FUNCTION command:

如果此序列不起作用,则将特定于平台的共享库文件名扩展名(通常为 .so)附加到给定名称,然后再次尝试此序列。如果这样做也不行,则加载将失败。

If this sequence does not work, the platform-specific shared library file name extension (often .so) is appended to the given name and this sequence is tried again. If that fails as well, the load will fail.

建议相对于 $libdir 或通过动态库路径找到共享库。如果新安装位于不同位置,这将简化版本升级。可以使用 pg_config --pkglibdir 命令找出 $libdir 实际代表的目录。

It is recommended to locate shared libraries either relative to $libdir or through the dynamic library path. This simplifies version upgrades if the new installation is at a different location. The actual directory that $libdir stands for can be found out with the command pg_config --pkglibdir.

PostgreSQL 服务器所运行的用户 ID 必须能够遍历到您打算加载的文件的路径。使文件或更高级别的目录对 postgres 用户不可读和/或不可执行是一个常见错误。

The user ID the PostgreSQL server runs as must be able to traverse the path to the file you intend to load. Making the file or a higher-level directory not readable and/or not executable by the postgres user is a common mistake.

在任何情况下,在 CREATE FUNCTION 命令中提供的文件名都会在系统目录中逐字记录,因此如果需要再次加载该文件,则会应用相同的过程。

In any case, the file name that is given in the CREATE FUNCTION command is recorded literally in the system catalogs, so if the file needs to be loaded again the same procedure is applied.

Note

PostgreSQL 不会自动编译 C 函数。在 CREATE FUNCTION 命令中引用对象文件之前,必须先编译该文件。有关其他信息,请参见 Section 38.10.5

PostgreSQL will not compile a C function automatically. The object file must be compiled before it is referenced in a CREATE FUNCTION command. See Section 38.10.5 for additional information.

为了确保未将动态加载的对象文件加载到不兼容的服务器中,PostgreSQL 会检查该文件是否包含内容适当的“魔术块”。这允许服务器检测明显的兼容性错误,例如为 PostgreSQL 的不同主要版本编译的代码。要包括魔术块,请在包含头文件 fmgr.h 之后,在一个(且仅一个)模块源文件中编写以下内容:

To ensure that a dynamically loaded object file is not loaded into an incompatible server, PostgreSQL checks that the file contains a “magic block” with the appropriate contents. This allows the server to detect obvious incompatibilities, such as code compiled for a different major version of PostgreSQL. To include a magic block, write this in one (and only one) of the module source files, after having included the header fmgr.h:

PG_MODULE_MAGIC;

在首次使用后,将保留动态加载的对象文件在内存中。在同一个会话中对该文件中的函数进行后续调用,将仅产生很小的符号表查找开销。如果您需要强制重新加载对象文件,例如在重新编译之后,请开始一个新的会话。

After it is used for the first time, a dynamically loaded object file is retained in memory. Future calls in the same session to the function(s) in that file will only incur the small overhead of a symbol table lookup. If you need to force a reload of an object file, for example after recompiling it, begin a fresh session.

或者,动态加载的文件可以包含一个初始化函数。如果文件包含一个名为 _PG_init 的函数,则该函数将在加载文件后立即被调用。函数不接收参数,应返回空值。目前,没有办法卸载动态加载的文件。

Optionally, a dynamically loaded file can contain an initialization function. If the file includes a function named _PG_init, that function will be called immediately after loading the file. The function receives no parameters and should return void. There is presently no way to unload a dynamically loaded file.

38.10.2. Base Types in C-Language Functions #

要了解如何编写 C 语言函数,您需要了解 PostgreSQL 在内部如何表示基本数据类型,以及如何将它们传递到函数和从函数中传递出来。在内部,PostgreSQL 将基本类型视为“内存二进制大对象”。您在类型上定义的用户定义函数,反过来定义了 PostgreSQL 可以对该类型进行操作的方式。也就是说,PostgreSQL 只会从磁盘存储和检索数据,并使用您的用户定义函数来输入、处理和输出数据。

To know how to write C-language functions, you need to know how PostgreSQL internally represents base data types and how they can be passed to and from functions. Internally, PostgreSQL regards a base type as a “blob of memory”. The user-defined functions that you define over a type in turn define the way that PostgreSQL can operate on it. That is, PostgreSQL will only store and retrieve the data from disk and use your user-defined functions to input, process, and output the data.

基础类型有三种内部格式之一:

Base types can have one of three internal formats:

值类型只能有 1、2 或 4 字节的长度(如果您计算机上的 sizeof(Datum) 为 8,还可以有 8 字节)。您应谨慎定义类型,以便它们在所有体系结构上都具有相同大小(以字节为单位)。例如,long 类型很危险,因为它在某些计算机上为 4 字节,而在其他计算机上为 8 字节,而 int 类型在大多数 Unix 计算机上为 4 字节。在 Unix 计算机上对 int4 类型的合理实现可以是:

By-value types can only be 1, 2, or 4 bytes in length (also 8 bytes, if sizeof(Datum) is 8 on your machine). You should be careful to define your types such that they will be the same size (in bytes) on all architectures. For example, the long type is dangerous because it is 4 bytes on some machines and 8 bytes on others, whereas int type is 4 bytes on most Unix machines. A reasonable implementation of the int4 type on Unix machines might be:

/* 4-byte integer, passed by value */
typedef int int4;

(实际 PostgreSQL C 代码调用此类型 int32 ,因为 C 中的约定是 int_XX_ 表示 XX bits 。因此,还请注意 C 类型 int8 大小为 1 字节。在 C 中,SQL 类型 int8 称为 int64 。另请参阅 Table 38.2 。)

(The actual PostgreSQL C code calls this type int32, because it is a convention in C that int_XX_ means XX bits. Note therefore also that the C type int8 is 1 byte in size. The SQL type int8 is called int64 in C. See also Table 38.2.)

另一方面,任何大小的定长类型都可以按引用传递。例如,以下是对 PostgreSQL 类型的一个实现示例:

On the other hand, fixed-length types of any size can be passed by-reference. For example, here is a sample implementation of a PostgreSQL type:

/* 16-byte structure, passed by reference */
typedef struct
{
    double  x, y;
} Point;

只有在将此类类型传进和传出 PostgreSQL 函数时,才能使用它们的指针。要返回此类类型的一个值,可以使用 palloc 分配适当数量的内存,填充已分配的内存,然后返回指向该内存的指针。(另外,如果您只想返回一个与其中一个输入参数相同数据类型的相同值,则可以跳过额外的 palloc,然后返回指向输入值的指针。)

Only pointers to such types can be used when passing them in and out of PostgreSQL functions. To return a value of such a type, allocate the right amount of memory with palloc, fill in the allocated memory, and return a pointer to it. (Also, if you just want to return the same value as one of your input arguments that’s of the same data type, you can skip the extra palloc and just return the pointer to the input value.)

最后,所有可变长类型也必须按引用传递。所有可变长类型都必须以一个完全 4 字节长的长度字段开头,该字段将由 SET_VARSIZE 设置;切勿直接设置此字段!要存储在该类型中的所有数据都必须位于紧跟该长度字段之后的内存中。该长度字段包含结构体的总长度,即它包括长度字段本身的大小。

Finally, all variable-length types must also be passed by reference. All variable-length types must begin with an opaque length field of exactly 4 bytes, which will be set by SET_VARSIZE; never set this field directly! All data to be stored within that type must be located in the memory immediately following that length field. The length field contains the total length of the structure, that is, it includes the size of the length field itself.

另一重要的一点是避免在数据类型值中留下任何未初始化的位;例如,注意将结构体中可能存在的任何对齐填充字节清零。如果没有,您的数据类型的逻辑相等常量可能会被规划器视为不相等,从而导致低效(尽管不正确)的计划。

Another important point is to avoid leaving any uninitialized bits within data type values; for example, take care to zero out any alignment padding bytes that might be present in structs. Without this, logically-equivalent constants of your data type might be seen as unequal by the planner, leading to inefficient (though not incorrect) plans.

Warning

_Never_修改按引用传递的输入值的内容。如果你这样做则有可能会损坏磁盘上的数据,因为你接收到的指针可能直接指向磁盘缓冲区。该规则的唯一例外在 Section 38.12中进行了说明。

Never modify the contents of a pass-by-reference input value. If you do so you are likely to corrupt on-disk data, since the pointer you are given might point directly into a disk buffer. The sole exception to this rule is explained in Section 38.12.

例如,我们可以将 text 类型定义如下:

As an example, we can define the type text as follows:

typedef struct {
    int32 length;
    char data[FLEXIBLE_ARRAY_MEMBER];
} text;

_[FLEXIBLE_ARRAY_MEMBER]_表示 SQL 数据部分的实际长度未通过此声明指定。

The [FLEXIBLE_ARRAY_MEMBER] notation means that the actual length of the data part is not specified by this declaration.

在操作可变长类型时,我们必须小心地分配正确数量的内存,并正确设置长度字段。例如,如果我们希望在 text 结构体中存储 40 字节,我们可以使用如下代码片段:

When manipulating variable-length types, we must be careful to allocate the correct amount of memory and set the length field correctly. For example, if we wanted to store 40 bytes in a text structure, we might use a code fragment like this:

#include "postgres.h"
...
char buffer[40]; /* our source data */
...
text *destination = (text *) palloc(VARHDRSZ + 40);
SET_VARSIZE(destination, VARHDRSZ + 40);
memcpy(destination->data, buffer, 40);
...

VARHDRSZsizeof(int32) 相同,但使用宏 VARHDRSZ 来引用可变长类型开销的大小被认为是一种良好的风格。另外,长度字段 must 应该使用 SET_VARSIZE 宏设置,而不是简单的赋值。

VARHDRSZ is the same as sizeof(int32), but it’s considered good style to use the macro VARHDRSZ to refer to the size of the overhead for a variable-length type. Also, the length field must be set using the SET_VARSIZE macro, not by simple assignment.

Table 38.2显示与 PostgreSQL 的许多内置 SQL 数据类型相对应的 C 类型。“定义”所述文件需要包含以获取类型定义。(实际定义可能位于由所列文件包含的不同文件中。建议用户坚持已定义的界面。)请注意,你应始终在服务端代码的所有源文件中首先包含_postgres.h_,因为它声明了你将需要的一些内容,因为首先包含其他标头可能会导致可移植性问题。

Table 38.2 shows the C types corresponding to many of the built-in SQL data types of PostgreSQL. The “Defined In” column gives the header file that needs to be included to get the type definition. (The actual definition might be in a different file that is included by the listed file. It is recommended that users stick to the defined interface.) Note that you should always include postgres.h first in any source file of server code, because it declares a number of things that you will need anyway, and because including other headers first can cause portability issues.

Table 38.2. Equivalent C Types for Built-in SQL Types

SQL Type

C Type

Defined In

boolean

bool

postgres.h (maybe compiler built-in)

box

BOX*

utils/geo_decls.h

bytea

bytea*

postgres.h

"char"

char

(compiler built-in)

character

BpChar*

postgres.h

cid

CommandId

postgres.h

date

DateADT

utils/date.h

float4 (real)

float4

postgres.h

float8 (double precision)

float8

postgres.h

int2 (smallint)

int16

postgres.h

int4 (integer)

int32

postgres.h

int8 (bigint)

int64

postgres.h

interval

Interval*

datatype/timestamp.h

lseg

LSEG*

utils/geo_decls.h

name

Name

postgres.h

numeric

Numeric

utils/numeric.h

oid

Oid

postgres.h

oidvector

oidvector*

postgres.h

path

PATH*

utils/geo_decls.h

point

POINT*

utils/geo_decls.h

regproc

RegProcedure

postgres.h

text

text*

postgres.h

tid

ItemPointer

storage/itemptr.h

time

TimeADT

utils/date.h

time with time zone

TimeTzADT

utils/date.h

timestamp

Timestamp

datatype/timestamp.h

timestamp with time zone

TimestampTz

datatype/timestamp.h

varchar

VarChar*

postgres.h

xid

TransactionId

postgres.h

既然我们已经遍历了基础类型的各个可能结构,我们可以展示一些真实函数的示例。

Now that we’ve gone over all of the possible structures for base types, we can show some examples of real functions.

38.10.3. Version 1 Calling Conventions #

版本 1 调用约定依赖于宏来消除传递参数和结果的大部分复杂性。版本 1 函数的 C 声明始终为:

The version-1 calling convention relies on macros to suppress most of the complexity of passing arguments and results. The C declaration of a version-1 function is always:

Datum funcname(PG_FUNCTION_ARGS)

此外,宏调用:

In addition, the macro call:

PG_FUNCTION_INFO_V1(funcname);

必须出现在同一源文件中。(传统上,它写在了函数本身之前。)此宏调用对于 internal 语言函数来说不需要,因为 PostgreSQL 假设所有内部函数都使用版本 1 约定。但是,对于动态加载的函数是必需的。

must appear in the same source file. (Conventionally, it’s written just before the function itself.) This macro call is not needed for internal-language functions, since PostgreSQL assumes that all internal functions use the version-1 convention. It is, however, required for dynamically-loaded functions.

在版本 1 函数中,每个实际参数都使用 PG_GETARG_xxx()_ macro that corresponds to the argument’s data type. (In non-strict functions there needs to be a previous check about argument null-ness using PG_ARGISNULL(); see below.) The result is returned using a PG_RETURN_xxx()_ 返回类型宏获取。PG_GETARG_xxx()_ takes as its argument the number of the function argument to fetch, where the count starts at 0. PG_RETURN_xxx()_ 将作为其参数取要返回的实际值。

In a version-1 function, each actual argument is fetched using a PG_GETARGxxx()_ macro that corresponds to the argument’s data type. (In non-strict functions there needs to be a previous check about argument null-ness using PG_ARGISNULL(); see below.) The result is returned using a PG_RETURNxxx()_ macro for the return type. PG_GETARGxxx()_ takes as its argument the number of the function argument to fetch, where the count starts at 0. PG_RETURNxxx()_ takes as its argument the actual value to return.

下面是一些使用版本 1 调用约定的示例:

Here are some examples using the version-1 calling convention:

#include "postgres.h"
#include <string.h>
#include "fmgr.h"
#include "utils/geo_decls.h"
#include "varatt.h"

PG_MODULE_MAGIC;

/* by value */

PG_FUNCTION_INFO_V1(add_one);

Datum
add_one(PG_FUNCTION_ARGS)
{
    int32   arg = PG_GETARG_INT32(0);

    PG_RETURN_INT32(arg + 1);
}

/* by reference, fixed length */

PG_FUNCTION_INFO_V1(add_one_float8);

Datum
add_one_float8(PG_FUNCTION_ARGS)
{
    /* The macros for FLOAT8 hide its pass-by-reference nature. */
    float8   arg = PG_GETARG_FLOAT8(0);

    PG_RETURN_FLOAT8(arg + 1.0);
}

PG_FUNCTION_INFO_V1(makepoint);

Datum
makepoint(PG_FUNCTION_ARGS)
{
    /* Here, the pass-by-reference nature of Point is not hidden. */
    Point     *pointx = PG_GETARG_POINT_P(0);
    Point     *pointy = PG_GETARG_POINT_P(1);
    Point     *new_point = (Point *) palloc(sizeof(Point));

    new_point->x = pointx->x;
    new_point->y = pointy->y;

    PG_RETURN_POINT_P(new_point);
}

/* by reference, variable length */

PG_FUNCTION_INFO_V1(copytext);

Datum
copytext(PG_FUNCTION_ARGS)
{
    text     *t = PG_GETARG_TEXT_PP(0);

    /*
     * VARSIZE_ANY_EXHDR is the size of the struct in bytes, minus the
     * VARHDRSZ or VARHDRSZ_SHORT of its header.  Construct the copy with a
     * full-length header.
     */
    text     *new_t = (text *) palloc(VARSIZE_ANY_EXHDR(t) + VARHDRSZ);
    SET_VARSIZE(new_t, VARSIZE_ANY_EXHDR(t) + VARHDRSZ);

    /*
     * VARDATA is a pointer to the data region of the new struct.  The source
     * could be a short datum, so retrieve its data through VARDATA_ANY.
     */
    memcpy(VARDATA(new_t),          /* destination */
           VARDATA_ANY(t),          /* source */
           VARSIZE_ANY_EXHDR(t));   /* how many bytes */
    PG_RETURN_TEXT_P(new_t);
}

PG_FUNCTION_INFO_V1(concat_text);

Datum
concat_text(PG_FUNCTION_ARGS)
{
    text  *arg1 = PG_GETARG_TEXT_PP(0);
    text  *arg2 = PG_GETARG_TEXT_PP(1);
    int32 arg1_size = VARSIZE_ANY_EXHDR(arg1);
    int32 arg2_size = VARSIZE_ANY_EXHDR(arg2);
    int32 new_text_size = arg1_size + arg2_size + VARHDRSZ;
    text *new_text = (text *) palloc(new_text_size);

    SET_VARSIZE(new_text, new_text_size);
    memcpy(VARDATA(new_text), VARDATA_ANY(arg1), arg1_size);
    memcpy(VARDATA(new_text) + arg1_size, VARDATA_ANY(arg2), arg2_size);
    PG_RETURN_TEXT_P(new_text);
}

假设上述代码已在文件_funcs.c_中准备就绪并已编译为共享对象,我们可以通过如下命令为 PostgreSQL 定义函数:

Supposing that the above code has been prepared in file funcs.c and compiled into a shared object, we could define the functions to PostgreSQL with commands like this:

CREATE FUNCTION add_one(integer) RETURNS integer
     AS 'DIRECTORY/funcs', 'add_one'
     LANGUAGE C STRICT;

-- note overloading of SQL function name "add_one"
CREATE FUNCTION add_one(double precision) RETURNS double precision
     AS 'DIRECTORY/funcs', 'add_one_float8'
     LANGUAGE C STRICT;

CREATE FUNCTION makepoint(point, point) RETURNS point
     AS 'DIRECTORY/funcs', 'makepoint'
     LANGUAGE C STRICT;

CREATE FUNCTION copytext(text) RETURNS text
     AS 'DIRECTORY/funcs', 'copytext'
     LANGUAGE C STRICT;

CREATE FUNCTION concat_text(text, text) RETURNS text
     AS 'DIRECTORY/funcs', 'concat_text'
     LANGUAGE C STRICT;

这里,DIRECTORY_代表共享库文件的目录(例如 PostgreSQL 教程目录,其中包含本节中所用示例的代码)。(更好的方法是在_AS_子句中仅使用'funcs',并在将_DIRECTORY_添加到搜索路径之后。在任何情况下,我们都可以省略共享库的特定于系统的扩展名,通常为.so_。)

Here, DIRECTORY stands for the directory of the shared library file (for instance the PostgreSQL tutorial directory, which contains the code for the examples used in this section). (Better style would be to use just 'funcs' in the AS clause, after having added DIRECTORY to the search path. In any case, we can omit the system-specific extension for a shared library, commonly .so.)

请注意,我们已将这些函数指定为“严格”,这意味着如果任何输入值为 null,系统应自动假定一个 null 结果。通过此操作,我们避免了必须在函数代码中检查 null 输入。如果没有此操作,我们就必须使用_PG_ARGISNULL()_显示检查 null 值。

Notice that we have specified the functions as “strict”, meaning that the system should automatically assume a null result if any input value is null. By doing this, we avoid having to check for null inputs in the function code. Without this, we’d have to check for null values explicitly, using PG_ARGISNULL().

宏_PG_ARGISNULL(n)允许函数测试每个输入是否是 null。(当然,仅在未声明为“严格”的函数中执行此操作时有必要。)与_PG_GETARG_xxx()_ macros, the input arguments are counted beginning at zero. Note that one should refrain from executing PG_GETARG_xxx()一样,直到已验证参数不是 null。要返回 null 结果,请执行_PG_RETURN_NULL();这适用于严格和非严格函数。

The macro PG_ARGISNULL(_n)_ allows a function to test whether each input is null. (Of course, doing this is only necessary in functions not declared “strict”.) As with the PG_GETARGxxx()_ macros, the input arguments are counted beginning at zero. Note that one should refrain from executing PG_GETARGxxx()_ until one has verified that the argument isn’t null. To return a null result, execute PG_RETURN_NULL(); this works in both strict and nonstrict functions.

乍一看,与使用普通的_C_调用约定相比,版本 1 编码约定似乎只是毫无意义的晦涩难懂。但是,它们确实允许我们处理可为_NULL_的参数/返回值,以及“烤”的(压缩或非直接)值。

At first glance, the version-1 coding conventions might appear to be just pointless obscurantism, compared to using plain C calling conventions. They do however allow us to deal with _NULL_able arguments/return values, and “toasted” (compressed or out-of-line) values.

版本 1 接口提供的其他选项是 PG_GETARG_xxx()_ macros. The first of these, PG_GETARG_xxx_COPY(), guarantees to return a copy of the specified argument that is safe for writing into. (The normal macros will sometimes return a pointer to a value that is physically stored in a table, which must not be written to. Using the PG_GETARG_xxx_COPY() macros guarantees a writable result.) The second variant consists of the PG_GETARG_xxx_SLICE() macros which take three arguments. The first is the number of the function argument (as above). The second and third are the offset and length of the segment to be returned. Offsets are counted from zero, and a negative length requests that the remainder of the value be returned. These macros provide more efficient access to parts of large values in the case where they have storage type “external”. (The storage type of a column can be specified using ALTER TABLE _tablename ALTER COLUMN colname SET STORAGE storagetype. storagetype 的两种变体(可选值为 plainexternalextendedmain 。)

Other options provided by the version-1 interface are two variants of the PG_GETARGxxx()_ macros. The first of these, PG_GETARGxxxCOPY(), guarantees to return a copy of the specified argument that is safe for writing into. (The normal macros will sometimes return a pointer to a value that is physically stored in a table, which must not be written to. Using the PG_GETARGxxxCOPY() macros guarantees a writable result.) The second variant consists of the PG_GETARGxxxSLICE() macros which take three arguments. The first is the number of the function argument (as above). The second and third are the offset and length of the segment to be returned. Offsets are counted from zero, and a negative length requests that the remainder of the value be returned. These macros provide more efficient access to parts of large values in the case where they have storage type “external”. (The storage type of a column can be specified using ALTER TABLE _tablename ALTER COLUMN colname SET STORAGE storagetype. _storagetype is one of plain, external, extended, or main.)

最后,版本 1 函数调用约定使得返回结果集( Section 38.10.8)和实现触发器函数( Chapter 39)以及过程语言调用处理程序( Chapter 58)变为可能。有关更多详细信息,请参阅源发行中的 src/backend/utils/fmgr/README

Finally, the version-1 function call conventions make it possible to return set results (Section 38.10.8) and implement trigger functions (Chapter 39) and procedural-language call handlers (Chapter 58). For more details see src/backend/utils/fmgr/README in the source distribution.

38.10.4. Writing Code #

在我们讨论更高级的话题之前,我们应该讨论一些用于 PostgreSQL C 语言函数的编码规则。虽然将用除 C 语言以外的其他语言编写的函数加载到 PostgreSQL 中可能是可能的,但这通常很困难(如果可能的话),因为其他语言(例如 C++、FORTRAN 或 Pascal)通常不遵循与 C 相同的调用约定。也就是说,其他语言不能用相同的方式在函数之间传递参数和返回值。因此,我们将假设你的 C 语言函数实际上是用 C 编写的。

Before we turn to the more advanced topics, we should discuss some coding rules for PostgreSQL C-language functions. While it might be possible to load functions written in languages other than C into PostgreSQL, this is usually difficult (when it is possible at all) because other languages, such as C++, FORTRAN, or Pascal often do not follow the same calling convention as C. That is, other languages do not pass argument and return values between functions in the same way. For this reason, we will assume that your C-language functions are actually written in C.

用于编写和构建 C 函数的基本规则如下:

The basic rules for writing and building C functions are as follows:

38.10.5. Compiling and Linking Dynamically-Loaded Functions #

在能够使用以 C 语言编写的 PostgreSQL 扩展函数之前,必须以特殊的方式对其进行编译和链接,以生成一个可以由服务器动态加载的文件。确切地说,需要创建一个_shared library_。

Before you are able to use your PostgreSQL extension functions written in C, they must be compiled and linked in a special way to produce a file that can be dynamically loaded by the server. To be precise, a shared library needs to be created.

如若要获取本节中所包含的更多信息,则应阅读操作系统的文档,尤其是 C 编译器、_cc_和链接编辑器_ld_的手册页。此外,PostgreSQL 源代码在_contrib_目录中包含若干可用的示例。但是,如果你依赖这些示例,则你的模块将依赖于 PostgreSQL 源代码的可用性。

For information beyond what is contained in this section you should read the documentation of your operating system, in particular the manual pages for the C compiler, cc, and the link editor, ld. In addition, the PostgreSQL source code contains several working examples in the contrib directory. If you rely on these examples you will make your modules dependent on the availability of the PostgreSQL source code, however.

创建共享库通常类似于链接可执行文件:首先将源文件编译为对象文件,然后将对象文件链接在一起。对象文件需要创建为_position-independent code_(PIC),在概念上,这意味着当它们被可执行文件加载时,它们可以放在内存中的任意位置。(打算用于可执行文件的对象文件通常不会以这种方式编译。)链接共享库的命令包含特殊标志,以便将其与链接可执行文件区分开来(至少在理论上如此——在某些系统上,实践更糟糕)。

Creating shared libraries is generally analogous to linking executables: first the source files are compiled into object files, then the object files are linked together. The object files need to be created as position-independent code (PIC), which conceptually means that they can be placed at an arbitrary location in memory when they are loaded by the executable. (Object files intended for executables are usually not compiled that way.) The command to link a shared library contains special flags to distinguish it from linking an executable (at least in theory — on some systems the practice is much uglier).

在以下示例中,我们假设你的源代码位于文件_foo.c_中,并且我们将创建一个共享库_foo.so_。除非另有说明,否则中间对象文件将被称为_foo.o_。共享库可以包含多个对象文件,但我们仅在此处使用一个。

In the following examples we assume that your source code is in a file foo.c and we will create a shared library foo.so. The intermediate object file will be called foo.o unless otherwise noted. A shared library can contain more than one object file, but we only use one here.

  • FreeBSD

    • The compiler flag to create PIC is -fPIC. To create shared libraries the compiler flag is -shared.

gcc -fPIC -c foo.c
gcc -shared -o foo.so foo.o
  • This is applicable as of version 3.0 of FreeBSD.

    • Linux

  • The compiler flag to create PIC is -fPIC. The compiler flag to create a shared library is -shared. A complete example looks like this:

cc -fPIC -c foo.c
cc -shared -o foo.so foo.o
  • macOS

    • Here is an example. It assumes the developer tools are installed.

cc -c foo.c
cc -bundle -flat_namespace -undefined suppress -o foo.so foo.o
  • NetBSD

    • The compiler flag to create PIC is -fPIC. For ELF systems, the compiler with the flag -shared is used to link shared libraries. On the older non-ELF systems, ld -Bshareable is used.

gcc -fPIC -c foo.c
gcc -shared -o foo.so foo.o
  • OpenBSD

    • The compiler flag to create PIC is -fPIC. ld -Bshareable is used to link shared libraries.

gcc -fPIC -c foo.c
ld -Bshareable -o foo.so foo.o
  • Solaris

    • The compiler flag to create PIC is -KPIC with the Sun compiler and -fPIC with GCC. To link shared libraries, the compiler option is -G with either compiler or alternatively -shared with GCC.

cc -KPIC -c foo.c
cc -G -o foo.so foo.o
  • or

gcc -fPIC -c foo.c
gcc -G -o foo.so foo.o

Tip

如果这对你来说过于复杂,则应考虑使用 GNU Libtool ,它使用统一的接口隐藏平台差异。

If this is too complicated for you, you should consider using GNU Libtool, which hides the platform differences behind a uniform interface.

然后可以将生成的共享库文件加载到 PostgreSQL 中。在向_CREATE FUNCTION_命令指定文件名时,必须给定共享库文件的名称,而不是中间目标文件。请注意,可以从 CREATE FUNCTION_命令中省略系统标准共享库扩展名(通常为 _.so_或.sl_),并且通常应该省略以获得最佳的可移植性。

The resulting shared library file can then be loaded into PostgreSQL. When specifying the file name to the CREATE FUNCTION command, one must give it the name of the shared library file, not the intermediate object file. Note that the system’s standard shared-library extension (usually .so or .sl) can be omitted from the CREATE FUNCTION command, and normally should be omitted for best portability.

请参阅 Section 38.10.1,了解服务端预期在哪里找到共享库文件。

Refer back to Section 38.10.1 about where the server expects to find the shared library files.

38.10.6. Composite-Type Arguments #

复合类型没有像 C 结构那样的固定布局。复合类型的实例可以包含空字段。此外,作为继承层次结构一部分的复合类型与同一继承层次结构中的其他成员可以具有不同的字段。因此,PostgreSQL 提供了一个函数界面,用于从 C 访问复合类型的字段。

Composite types do not have a fixed layout like C structures. Instances of a composite type can contain null fields. In addition, composite types that are part of an inheritance hierarchy can have different fields than other members of the same inheritance hierarchy. Therefore, PostgreSQL provides a function interface for accessing fields of composite types from C.

假设我们要编写一个函数来回答查询:

Suppose we want to write a function to answer the query:

SELECT name, c_overpaid(emp, 1500) AS overpaid
    FROM emp
    WHERE name = 'Bill' OR name = 'Sam';

使用版本 1 调用约定,我们可以按以下方式定义 c_overpaid

Using the version-1 calling conventions, we can define c_overpaid as:

#include "postgres.h"
#include "executor/executor.h"  /* for GetAttributeByName() */

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(c_overpaid);

Datum
c_overpaid(PG_FUNCTION_ARGS)
{
    HeapTupleHeader  t = PG_GETARG_HEAPTUPLEHEADER(0);
    int32            limit = PG_GETARG_INT32(1);
    bool isnull;
    Datum salary;

    salary = GetAttributeByName(t, "salary", &isnull);
    if (isnull)
        PG_RETURN_BOOL(false);
    /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary. */

    PG_RETURN_BOOL(DatumGetInt32(salary) > limit);
}

GetAttributeByName_是 PostgreSQL 系统函数,它从指定的行中返回属性。它有三个参数:类型为 _HeapTupleHeader_的参数传递到函数中、所需属性的名称,以及返回参数(指示属性是否为 null)。_GetAttributeByName_返回一个 _Datum_值,你可以使用相应的 _DatumGet_XXX()_ 函数将其转换为正确的数据类型。请注意,如果 null 标志设置,则返回值没有意义;在尝试对结果进行任何操作之前,始终检查 null 标志。

GetAttributeByName is the PostgreSQL system function that returns attributes out of the specified row. It has three arguments: the argument of type HeapTupleHeader passed into the function, the name of the desired attribute, and a return parameter that tells whether the attribute is null. GetAttributeByName returns a Datum value that you can convert to the proper data type by using the appropriate DatumGet_XXX()_ function. Note that the return value is meaningless if the null flag is set; always check the null flag before trying to do anything with the result.

还有_GetAttributeByNum_,它按列号而非名称选择目标属性。

There is also GetAttributeByNum, which selects the target attribute by column number instead of name.

以下命令在 SQL 中声明函数 c_overpaid

The following command declares the function c_overpaid in SQL:

CREATE FUNCTION c_overpaid(emp, integer) RETURNS boolean
    AS 'DIRECTORY/funcs', 'c_overpaid'
    LANGUAGE C STRICT;

请注意,我们使用了_STRICT_,这样我们不必检查输入参数是否是 NULL。

Notice we have used STRICT so that we did not have to check whether the input arguments were NULL.

38.10.7. Returning Rows (Composite Types) #

要从 C 语言函数返回行或复合类型值,可以使用一个特殊 API,它提供宏和函数来隐藏构建复合数据类型的大多数复杂性。要使用此 API,源文件必须包括:

To return a row or composite-type value from a C-language function, you can use a special API that provides macros and functions to hide most of the complexity of building composite data types. To use this API, the source file must include:

#include "funcapi.h"

有两种方法可以构建复合数据值(以下称为“元组”):你可以根据 Datum 值数组来构建它,或者根据可以传递给元组列数据类型的输入转换函数的 C 字符串数组来构建它。在任一种情况下,首先需要获取或构造元组结构的_TupleDesc_描述符。当处理 Datum 时,你将 TupleDesc_传递给 _BlessTupleDesc,然后为每一行调用_heap_form_tuple_。当处理 C 字符串时,你将 TupleDesc_传递给 _TupleDescGetAttInMetadata,然后为每一行调用_BuildTupleFromCStrings_。对于返回元组集的函数,可以在函数的第一次调用期间完成所有设置步骤。

There are two ways you can build a composite data value (henceforth a “tuple”): you can build it from an array of Datum values, or from an array of C strings that can be passed to the input conversion functions of the tuple’s column data types. In either case, you first need to obtain or construct a TupleDesc descriptor for the tuple structure. When working with Datums, you pass the TupleDesc to BlessTupleDesc, and then call heap_form_tuple for each row. When working with C strings, you pass the TupleDesc to TupleDescGetAttInMetadata, and then call BuildTupleFromCStrings for each row. In the case of a function returning a set of tuples, the setup steps can all be done once during the first call of the function.

对于设置必需的 TupleDesc,提供有几个帮助器函数。在返回复合值的大多函数中,推荐的做法是调用:

Several helper functions are available for setting up the needed TupleDesc. The recommended way to do this in most functions returning composite values is to call:

TypeFuncClass get_call_result_type(FunctionCallInfo fcinfo,
                                   Oid *resultTypeId,
                                   TupleDesc *resultTupleDesc)

传递与调用函数本身传递的相同 fcinfo_结构。(当然,这要求您使用版本 1 调用约定。)_resultTypeId_可以指定为 _NULL_或作为接收函数结果类型 OID 的局部变量的地址。_resultTupleDesc_应为局部 _TupleDesc_变量的地址。检查结果是否为 _TYPEFUNC_COMPOSITE;如果是,则 resultTupleDesc_已经填写了必需的 _TupleDesc。(如果不是,则可以报告错误,如“在不能接受记录类型的上下文中调用返回记录的函数”。)

passing the same fcinfo struct passed to the calling function itself. (This of course requires that you use the version-1 calling conventions.) resultTypeId can be specified as NULL or as the address of a local variable to receive the function’s result type OID. resultTupleDesc should be the address of a local TupleDesc variable. Check that the result is TYPEFUNC_COMPOSITE; if so, resultTupleDesc has been filled with the needed TupleDesc. (If it is not, you can report an error along the lines of “function returning record called in context that cannot accept type record”.)

Tip

_get_call_result_type_可以解决多态函数结果的实际类型;因此它对于返回标量多态结果的函数很有用,而不仅仅是返回复合结果的函数。通常,_resultTypeId_输出对于返回多态标量的函数是有用的。

get_call_result_type can resolve the actual type of a polymorphic function result; so it is useful in functions that return scalar polymorphic results, not only functions that return composites. The resultTypeId output is primarily useful for functions returning polymorphic scalars.

Note

get_call_result_type 有一个兄弟 get_expr_result_type,它可用于解析表达式树表示的函数调用的预期输出类型。当尝试从函数本身外部确定结果类型时,可以使用它。还有 get_func_result_type,它可在仅函数的 OID 可用时使用。但是,这些函数无法处理声明为返回 record 的函数,而 get_func_result_type 无法解析多态类型,所以你应优先使用 get_call_result_type

get_call_result_type has a sibling get_expr_result_type, which can be used to resolve the expected output type for a function call represented by an expression tree. This can be used when trying to determine the result type from outside the function itself. There is also get_func_result_type, which can be used when only the function’s OID is available. However these functions are not able to deal with functions declared to return record, and get_func_result_type cannot resolve polymorphic types, so you should preferentially use get_call_result_type.

获取 _TupleDesc_s 的较旧且现已弃用的函数为:

Older, now-deprecated functions for obtaining _TupleDesc_s are:

TupleDesc RelationNameGetTupleDesc(const char *relname)

获取已命名关系的行类型的 TupleDesc,以及:

to get a TupleDesc for the row type of a named relation, and:

TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases)

以此方式获取基于类型 OID 的 TupleDesc。可用于为基本类型或组合类型获取 TupleDesc。然而,它将不会对返回 record 的函数起作用,也不能解析多态类型。

to get a TupleDesc based on a type OID. This can be used to get a TupleDesc for a base or composite type. It will not work for a function that returns record, however, and it cannot resolve polymorphic types.

获取 TupleDesc 后,调用:

Once you have a TupleDesc, call:

TupleDesc BlessTupleDesc(TupleDesc tupdesc)

计划操作数据值时使用;或

if you plan to work with Datums, or:

AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc)

计划操作 C 字符串时使用。如果编写返回集合的函数,您可以在 FuncCallContext 结构中保存这些函数的结果,分别使用 tuple_descattinmeta 字段。

if you plan to work with C strings. If you are writing a function returning set, you can save the results of these functions in the FuncCallContext structure — use the tuple_desc or attinmeta field respectively.

操作数据值时,使用:

When working with Datums, use:

HeapTuple heap_form_tuple(TupleDesc tupdesc, Datum *values, bool *isnull)

以数据值形式给定用户数据以构建 HeapTuple

to build a HeapTuple given user data in Datum form.

操作 C 字符串时,使用:

When working with C strings, use:

HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)

以 C 字符串形式给定用户数据以构建 HeapTuplevalues 是个 C 字符串数组,针对返回行中的每个属性一个字符串。每个 C 字符串应从属性数据类型的输入函数预期形式。为其中一个属性返回空值,values 数组中的对应指针应设置为 NULL。每次返回行需要再次调用此函数。

to build a HeapTuple given user data in C string form. values is an array of C strings, one for each attribute of the return row. Each C string should be in the form expected by the input function of the attribute data type. In order to return a null value for one of the attributes, the corresponding pointer in the values array should be set to NULL. This function will need to be called again for each row you return.

从函数返回构建元组后,必须将其转换为 Datum。使用:

Once you have built a tuple to return from your function, it must be converted into a Datum. Use:

HeapTupleGetDatum(HeapTuple tuple)

HeapTuple 转换为有效的数据值。如果您打算仅仅返回一行,可以直接返回此 Datum,也可以将其用作集合返回函数中的当前返回值。

to convert a HeapTuple into a valid Datum. This Datum can be returned directly if you intend to return just a single row, or it can be used as the current return value in a set-returning function.

下一部分将给出示例。

An example appears in the next section.

38.10.8. Returning Sets #

C 语言函数有两种返回集合(多行)的选项。在一种称为 ValuePerCall 模式的方法中,会重复调用集合返回函数(每次传递相同参数),且在没有更多行返回并通过返回 NULL 产生信号时每次返回新行。因此,集合返回函数 (SRF) 必须在各个调用中保存足够的状态以记住它在做什么,并在每次调用时返回正确的项目。在另一种称为 Materialize 模式的方法中,SRF 填充并返回包含其整个结果的元组存储对象;之后,针对整个结果仅进行一次调用,且不需要任何调用间状态。

C-language functions have two options for returning sets (multiple rows). In one method, called ValuePerCall mode, a set-returning function is called repeatedly (passing the same arguments each time) and it returns one new row on each call, until it has no more rows to return and signals that by returning NULL. The set-returning function (SRF) must therefore save enough state across calls to remember what it was doing and return the correct next item on each call. In the other method, called Materialize mode, an SRF fills and returns a tuplestore object containing its entire result; then only one call occurs for the whole result, and no inter-call state is needed.

使用 ValuePerCall 模式时,务必记住无法保证查询一定运行至完成;即,由于 LIMIT 等选项,执行程序可能会在检索所有行前停止向集合返回函数进行调用。这意味着在最后一个调用中,对其执行清理活动不安全,因为这种情况可能永远不会发生。对于需要访问外部资源的函数,例如文件描述符,建议使用 Materialize 模式。

When using ValuePerCall mode, it is important to remember that the query is not guaranteed to be run to completion; that is, due to options such as LIMIT, the executor might stop making calls to the set-returning function before all rows have been fetched. This means it is not safe to perform cleanup activities in the last call, because that might not ever happen. It’s recommended to use Materialize mode for functions that need access to external resources, such as file descriptors.

本部分的剩余文稿记录了一组在 ValuePerCall 模式下使用 SRF 时常用的帮助程序宏(虽然不是必需的)。可在 src/backend/utils/fmgr/README 中找到关于 Materialize 模式的其他详情。此外,PostgreSQL 源代码分发中的 contrib 模块包含许多使用 ValuePerCall 和 Materialize 模式的 SRF 示例。

The remainder of this section documents a set of helper macros that are commonly used (though not required to be used) for SRFs using ValuePerCall mode. Additional details about Materialize mode can be found in src/backend/utils/fmgr/README. Also, the contrib modules in the PostgreSQL source distribution contain many examples of SRFs using both ValuePerCall and Materialize mode.

如需使用此描述的 ValuePerCall 支持宏,请包含 funcapi.h。这些宏使用结构 FuncCallContext,其中包含需要在各个调用中保存的状态。在调用 SRF 中,fcinfo→flinfo→fn_extra 用于在调用中保存 FuncCallContext 中的指针。宏会在首次使用时自动填写该字段,并预期在后续使用中找到相同的指针。

To use the ValuePerCall support macros described here, include funcapi.h. These macros work with a structure FuncCallContext that contains the state that needs to be saved across calls. Within the calling SRF, fcinfo→flinfo→fn_extra is used to hold a pointer to FuncCallContext across calls. The macros automatically fill that field on first use, and expect to find the same pointer there on subsequent uses.

typedef struct FuncCallContext
{
    /*
     * Number of times we've been called before
     *
     * call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and
     * incremented for you every time SRF_RETURN_NEXT() is called.
     */
    uint64 call_cntr;

    /*
     * OPTIONAL maximum number of calls
     *
     * max_calls is here for convenience only and setting it is optional.
     * If not set, you must provide alternative means to know when the
     * function is done.
     */
    uint64 max_calls;

    /*
     * OPTIONAL pointer to miscellaneous user-provided context information
     *
     * user_fctx is for use as a pointer to your own data to retain
     * arbitrary context information between calls of your function.
     */
    void *user_fctx;

    /*
     * OPTIONAL pointer to struct containing attribute type input metadata
     *
     * attinmeta is for use when returning tuples (i.e., composite data types)
     * and is not used when returning base data types. It is only needed
     * if you intend to use BuildTupleFromCStrings() to create the return
     * tuple.
     */
    AttInMetadata *attinmeta;

    /*
     * memory context used for structures that must live for multiple calls
     *
     * multi_call_memory_ctx is set by SRF_FIRSTCALL_INIT() for you, and used
     * by SRF_RETURN_DONE() for cleanup. It is the most appropriate memory
     * context for any memory that is to be reused across multiple calls
     * of the SRF.
     */
    MemoryContext multi_call_memory_ctx;

    /*
     * OPTIONAL pointer to struct containing tuple description
     *
     * tuple_desc is for use when returning tuples (i.e., composite data types)
     * and is only needed if you are going to build the tuples with
     * heap_form_tuple() rather than with BuildTupleFromCStrings().  Note that
     * the TupleDesc pointer stored here should usually have been run through
     * BlessTupleDesc() first.
     */
    TupleDesc tuple_desc;

} FuncCallContext;

使用基础架构的 SRF 要使用的宏包括:

The macros to be used by an SRF using this infrastructure are:

SRF_IS_FIRSTCALL()

用此确定是首次还是后续时间调用您的函数。在首次调用(仅首次)中调用:

Use this to determine if your function is being called for the first or a subsequent time. On the first call (only), call:

SRF_FIRSTCALL_INIT()

以初始化 FuncCallContext。在每次函数调用中,包括首次,调用:

to initialize the FuncCallContext. On every function call, including the first, call:

SRF_PERCALL_SETUP()

以设置以使用 FuncCallContext

to set up for using the FuncCallContext.

如果您的函数在当前调用中有要返回的数据,请使用:

If your function has data to return in the current call, use:

SRF_RETURN_NEXT(funcctx, result)

将其返回给调用方。(result 必须是 Datum 类型,是单个值或如上所述准备好的元组。)最后,当您的函数完成返回数据时,请使用:

to return it to the caller. (result must be of type Datum, either a single value or a tuple prepared as described above.) Finally, when your function is finished returning data, use:

SRF_RETURN_DONE(funcctx)

清理结束 SRF。

to clean up and end the SRF.

SRF 被调用时当前的内存上下文是将在调用之间清除的瞬时上下文。这意味着您不必对使用 palloc 分配的所有内容调用 pfree;不管怎样它都会消失。但是,如果您想分配任何数据结构在调用中使用,则需要将它们放在其他地方。由 multi_call_memory_ctx 引用的内存上下文是任何需要保留直到 SRF 完成运行的数据的合适位置。在大多数情况下,这意味着您应该在进行首次调用设置时切换至 multi_call_memory_ctx。使用 funcctx→user_fctx 持有对任何此类跨调用数据结构的指针。(在 multi_call_memory_ctx 中分配的数据将在查询结束时自动消失,因此也不必手动释放该数据。)

The memory context that is current when the SRF is called is a transient context that will be cleared between calls. This means that you do not need to call pfree on everything you allocated using palloc; it will go away anyway. However, if you want to allocate any data structures to live across calls, you need to put them somewhere else. The memory context referenced by multi_call_memory_ctx is a suitable location for any data that needs to survive until the SRF is finished running. In most cases, this means that you should switch into multi_call_memory_ctx while doing the first-call setup. Use funcctx→user_fctx to hold a pointer to any such cross-call data structures. (Data you allocate in multi_call_memory_ctx will go away automatically when the query ends, so it is not necessary to free that data manually, either.)

Warning

虽然函数的实际参数在每次调用之间保持不变,但如果表取消了参数值(通常由 PG_GETARG_xxx macro) in the transient context then the detoasted copies will be freed on each cycle. Accordingly, if you keep references to such values in your user_fctx 以透明方式完成),则必须在表取消之后将它们复制到 multi_call_memory_ctx 中,或确保仅在此上下文中表取消值。

While the actual arguments to the function remain unchanged between calls, if you detoast the argument values (which is normally done transparently by the PG_GETARGxxx macro) in the transient context then the detoasted copies will be freed on each cycle. Accordingly, if you keep references to such values in your _user_fctx, you must either copy them into the multi_call_memory_ctx after detoasting, or ensure that you detoast the values only in that context.

一个完整的伪代码示例如下所示:

A complete pseudo-code example looks like the following:

Datum
my_set_returning_function(PG_FUNCTION_ARGS)
{
    FuncCallContext  *funcctx;
    Datum             result;
    further declarations as needed

    if (SRF_IS_FIRSTCALL())
    {
        MemoryContext oldcontext;

        funcctx = SRF_FIRSTCALL_INIT();
        oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
        /* One-time setup code appears here: */
        user code
        if returning composite
            build TupleDesc, and perhaps AttInMetadata
        endif returning composite
        user code
        MemoryContextSwitchTo(oldcontext);
    }

    /* Each-time setup code appears here: */
    user code
    funcctx = SRF_PERCALL_SETUP();
    user code

    /* this is just one way we might test whether we are done: */
    if (funcctx->call_cntr < funcctx->max_calls)
    {
        /* Here we want to return another item: */
        user code
        obtain result Datum
        SRF_RETURN_NEXT(funcctx, result);
    }
    else
    {
        /* Here we are done returning items, so just report that fact. */
        /* (Resist the temptation to put cleanup code here.) */
        SRF_RETURN_DONE(funcctx);
    }
}

返回复合类型的简单 SRF 的完整示例如下所示:

A complete example of a simple SRF returning a composite type looks like:

PG_FUNCTION_INFO_V1(retcomposite);

Datum
retcomposite(PG_FUNCTION_ARGS)
{
    FuncCallContext     *funcctx;
    int                  call_cntr;
    int                  max_calls;
    TupleDesc            tupdesc;
    AttInMetadata       *attinmeta;

    /* stuff done only on the first call of the function */
    if (SRF_IS_FIRSTCALL())
    {
        MemoryContext   oldcontext;

        /* create a function context for cross-call persistence */
        funcctx = SRF_FIRSTCALL_INIT();

        /* switch to memory context appropriate for multiple function calls */
        oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);

        /* total number of tuples to be returned */
        funcctx->max_calls = PG_GETARG_INT32(0);

        /* Build a tuple descriptor for our result type */
        if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
            ereport(ERROR,
                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                     errmsg("function returning record called in context "
                            "that cannot accept type record")));

        /*
         * generate attribute metadata needed later to produce tuples from raw
         * C strings
         */
        attinmeta = TupleDescGetAttInMetadata(tupdesc);
        funcctx->attinmeta = attinmeta;

        MemoryContextSwitchTo(oldcontext);
    }

    /* stuff done on every call of the function */
    funcctx = SRF_PERCALL_SETUP();

    call_cntr = funcctx->call_cntr;
    max_calls = funcctx->max_calls;
    attinmeta = funcctx->attinmeta;

    if (call_cntr < max_calls)    /* do when there is more left to send */
    {
        char       **values;
        HeapTuple    tuple;
        Datum        result;

        /*
         * Prepare a values array for building the returned tuple.
         * This should be an array of C strings which will
         * be processed later by the type input functions.
         */
        values = (char **) palloc(3 * sizeof(char *));
        values[0] = (char *) palloc(16 * sizeof(char));
        values[1] = (char *) palloc(16 * sizeof(char));
        values[2] = (char *) palloc(16 * sizeof(char));

        snprintf(values[0], 16, "%d", 1 * PG_GETARG_INT32(1));
        snprintf(values[1], 16, "%d", 2 * PG_GETARG_INT32(1));
        snprintf(values[2], 16, "%d", 3 * PG_GETARG_INT32(1));

        /* build a tuple */
        tuple = BuildTupleFromCStrings(attinmeta, values);

        /* make the tuple into a datum */
        result = HeapTupleGetDatum(tuple);

        /* clean up (this is not really necessary) */
        pfree(values[0]);
        pfree(values[1]);
        pfree(values[2]);
        pfree(values);

        SRF_RETURN_NEXT(funcctx, result);
    }
    else    /* do when there is no more left */
    {
        SRF_RETURN_DONE(funcctx);
    }
}

在 SQL 中声明此函数的一种方法是:

One way to declare this function in SQL is:

CREATE TYPE __retcomposite AS (f1 integer, f2 integer, f3 integer);

CREATE OR REPLACE FUNCTION retcomposite(integer, integer)
    RETURNS SETOF __retcomposite
    AS 'filename', 'retcomposite'
    LANGUAGE C IMMUTABLE STRICT;

另一种方法是使用 OUT 参数:

A different way is to use OUT parameters:

CREATE OR REPLACE FUNCTION retcomposite(IN integer, IN integer,
    OUT f1 integer, OUT f2 integer, OUT f3 integer)
    RETURNS SETOF record
    AS 'filename', 'retcomposite'
    LANGUAGE C IMMUTABLE STRICT;

请注意,在此方法中函数的输出类型正式为匿名 record 类型。

Notice that in this method the output type of the function is formally an anonymous record type.

38.10.9. Polymorphic Arguments and Return Types #

可以声明 C 语言函数接受并返回 Section 38.2.5中描述的多态类型。当某个函数的参数或返回类型被定义为多态类型时,函数作者无法预先知道它将调用何种数据类型,或需要返回何种数据类型。fmgr.h_中提供了两个例程,以允许版本 1 C 函数发现其参数的实际数据类型和它期望返回的类型。这些例程称为 _get_fn_expr_rettype(FmgrInfo *flinfo)_和 _get_fn_expr_argtype(FmgrInfo *flinfo, int argnum)。它们返回结果或参数类型 OID,如果信息不可用则返回 InvalidOid。结构 flinfo_通常作为 _fcinfo→flinfo_访问。参数 _argnum_以零为基。_get_call_result_type_也可以作为 _get_fn_expr_rettype_的替代品。同时还有 _get_fn_expr_variadic,它可以用来找出变长参数是否已合并到一个数组中。对于 _VARIADIC "any"_函数,这非常有用,因为对于采用普通数组类型的变长函数,总是会发生这种合并。

C-language functions can be declared to accept and return the polymorphic types described in Section 38.2.5. When a function’s arguments or return types are defined as polymorphic types, the function author cannot know in advance what data type it will be called with, or need to return. There are two routines provided in fmgr.h to allow a version-1 C function to discover the actual data types of its arguments and the type it is expected to return. The routines are called get_fn_expr_rettype(FmgrInfo *flinfo) and get_fn_expr_argtype(FmgrInfo *flinfo, int argnum). They return the result or argument type OID, or InvalidOid if the information is not available. The structure flinfo is normally accessed as fcinfo→flinfo. The parameter argnum is zero based. get_call_result_type can also be used as an alternative to get_fn_expr_rettype. There is also get_fn_expr_variadic, which can be used to find out whether variadic arguments have been merged into an array. This is primarily useful for VARIADIC "any" functions, since such merging will always have occurred for variadic functions taking ordinary array types.

例如,假设我们要编写一个函数来接受任何类型的单个元素,并返回该类型的单维数组:

For example, suppose we want to write a function to accept a single element of any type, and return a one-dimensional array of that type:

PG_FUNCTION_INFO_V1(make_array);
Datum
make_array(PG_FUNCTION_ARGS)
{
    ArrayType  *result;
    Oid         element_type = get_fn_expr_argtype(fcinfo->flinfo, 0);
    Datum       element;
    bool        isnull;
    int16       typlen;
    bool        typbyval;
    char        typalign;
    int         ndims;
    int         dims[MAXDIM];
    int         lbs[MAXDIM];

    if (!OidIsValid(element_type))
        elog(ERROR, "could not determine data type of input");

    /* get the provided element, being careful in case it's NULL */
    isnull = PG_ARGISNULL(0);
    if (isnull)
        element = (Datum) 0;
    else
        element = PG_GETARG_DATUM(0);

    /* we have one dimension */
    ndims = 1;
    /* and one element */
    dims[0] = 1;
    /* and lower bound is 1 */
    lbs[0] = 1;

    /* get required info about the element type */
    get_typlenbyvalalign(element_type, &typlen, &typbyval, &typalign);

    /* now build the array */
    result = construct_md_array(&element, &isnull, ndims, dims, lbs,
                                element_type, typlen, typbyval, typalign);

    PG_RETURN_ARRAYTYPE_P(result);
}

以下命令在 SQL 中声明了函数 make_array

The following command declares the function make_array in SQL:

CREATE FUNCTION make_array(anyelement) RETURNS anyarray
    AS 'DIRECTORY/funcs', 'make_array'
    LANGUAGE C IMMUTABLE;

只针对 C 语言函数存在多态性的一种变体:可以将它们声明为接受 "any" 类型的参数。(请注意,此类型名称必须加双引号,因为它也是 SQL 保留字。)这与 anyelement 的工作方式类似,只不过它不会将不同的 "any" 参数限定为相同类型,也不会帮助确定函数的结果类型。C 语言函数还可以将最终参数声明为 VARIADIC "any"。这将匹配一种或多种任何类型的实际参数(不一定相同类型)。这些参数将 not 被收集到数组中,就像使用普通变长函数一样;它们只会被分别传递给函数。当使用此功能时,必须使用 PG_NARGS() 宏和上面描述的方法来确定实际参数的数量及其类型。此外,此类函数的用户可能希望在其函数调用中使用 VARIADIC 关键字,并期望函数将数组元素作为单独的参数进行处理。如果需要,函数本身必须在使用 get_fn_expr_variadic 检测到实际参数已用 VARIADIC 标记后,实现该行为。

There is a variant of polymorphism that is only available to C-language functions: they can be declared to take parameters of type "any". (Note that this type name must be double-quoted, since it’s also an SQL reserved word.) This works like anyelement except that it does not constrain different "any" arguments to be the same type, nor do they help determine the function’s result type. A C-language function can also declare its final parameter to be VARIADIC "any". This will match one or more actual arguments of any type (not necessarily the same type). These arguments will not be gathered into an array as happens with normal variadic functions; they will just be passed to the function separately. The PG_NARGS() macro and the methods described above must be used to determine the number of actual arguments and their types when using this feature. Also, users of such a function might wish to use the VARIADIC keyword in their function call, with the expectation that the function would treat the array elements as separate arguments. The function itself must implement that behavior if wanted, after using get_fn_expr_variadic to detect that the actual argument was marked with VARIADIC.

38.10.10. Shared Memory and LWLocks #

在服务器启动时,加载项可以保留 LWLock 和共享内存分配。必须在 shared_preload_libraries中指定加载项的共享库才能加载它。共享库应在其 _PG_init_函数中注册一个 _shmem_request_hook。此 _shmem_request_hook_可以保留 LWLock 或共享内存。通过调用 _pg_ReserveLWLock 来保留 LWLock,通过调用:

Add-ins can reserve LWLocks and an allocation of shared memory on server startup. The add-in’s shared library must be preloaded by specifying it in shared_preload_libraries. The shared library should register a shmem_request_hook in its _PG_init function. This shmem_request_hook can reserve LWLocks or shared memory. Shared memory is reserved by calling:

void RequestAddinShmemSpace(int size)

从您的 shmem_request_hook 中。

from your shmem_request_hook.

通过调用来保留 LWLock:

LWLocks are reserved by calling:

void RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)

从您的 shmem_request_hook 中。这将确保一个 num_lwlocks LWLock 的数组在名称 tranche_name 下可用。使用 GetNamedLWLockTranche 获取指向此数组的指针。

from your shmem_request_hook. This will ensure that an array of num_lwlocks LWLocks is available under the name tranche_name. Use GetNamedLWLockTranche to get a pointer to this array.

您可以在 PostgreSQL 源代码树的 contrib/pg_stat_statements/pg_stat_statements.c 中找到 shmem_request_hook 的示例。

An example of a shmem_request_hook can be found in contrib/pg_stat_statements/pg_stat_statements.c in the PostgreSQL source tree.

为了避免可能的竞争条件,每个后端使用 LWLock AddinShmemInitLock 连接到和初始化它自己共享内存的分配,如下所示:

To avoid possible race-conditions, each backend should use the LWLock AddinShmemInitLock when connecting to and initializing its allocation of shared memory, as shown here:

static mystruct *ptr = NULL;

if (!ptr)
{
        bool    found;

        LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
        ptr = ShmemInitStruct("my struct name", size, &found);
        if (!found)
        {
                initialize contents of shmem area;
                acquire any requested LWLocks using:
                ptr->locks = GetNamedLWLockTranche("my tranche name");
        }
        LWLockRelease(AddinShmemInitLock);
}

38.10.11. Using C++ for Extensibility #

虽然 PostgreSQL 后端是用 C 编写的,但是只要遵循这些准则,就可以用 C++ 来编写扩展:

Although the PostgreSQL backend is written in C, it is possible to write extensions in C++ if these guidelines are followed:

总而言之,最好将 C++ 代码放在与后端接口的 extern C 函数的墙后面,并且避免异常、内存和调用堆栈泄漏。

In summary, it is best to place C++ code behind a wall of extern C functions that interface to the backend, and avoid exception, memory, and call stack leakage.