Postgresql 中文操作指南

4.1. Lexical Structure #

SQL 输入由 _commands_序列组成。命令由 _tokens_序列组成,以分号 (“;”) 终止。输入流的结尾也终止命令。哪些标记是有效的取决于特定命令的语法。

SQL input consists of a sequence of commands. A command is composed of a sequence of tokens, terminated by a semicolon (“;”). The end of the input stream also terminates a command. Which tokens are valid depends on the syntax of the particular command.

标记可以是 key wordidentifierquoted identifierliteral(或常量)或特殊字符符号。标记通常由空白(空格、制表符、换行符)分隔,但如果不存在歧义,则不必如此(通常仅在特殊字符与其他标记类型相邻时才会出现这种情况)。

A token can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol. Tokens are normally separated by whitespace (space, tab, newline), but need not be if there is no ambiguity (which is generally only the case if a special character is adjacent to some other token type).

例如,以下为(语法上)有效的 SQL 输入:

For example, the following is (syntactically) valid SQL input:

SELECT * FROM MY_TABLE;
UPDATE MY_TABLE SET A = 5;
INSERT INTO MY_TABLE VALUES (3, 'hi there');

这是一系列三个命令,每行一个(虽然这不是必需的;一行可以有多个命令,并且命令可以按行拆分)。

This is a sequence of three commands, one per line (although this is not required; more than one command can be on a line, and commands can usefully be split across lines).

此外,_comments_可以在 SQL 输入中出现。它们不是标记,它们实际上等同于空白。

Additionally, comments can occur in SQL input. They are not tokens, they are effectively equivalent to whitespace.

在哪些标记标识命令,哪些是操作数或参数方面,SQL 语法并不十分一致。前几个标记通常是命令名称,因此,在以上示例中,我们通常会讨论“SELECT”、“UPDATE”和“INSERT”命令。但例如, _UPDATE_命令总是需要 _SET_标记出现在特定位置,而 _INSERT_的这个特定变体还需要 _VALUES_才能完整。每个命令的精确语法规则在 Part VI中进行说明。

The SQL syntax is not very consistent regarding what tokens identify commands and which are operands or parameters. The first few tokens are generally the command name, so in the above example we would usually speak of a “SELECT”, an “UPDATE”, and an “INSERT” command. But for instance the UPDATE command always requires a SET token to appear in a certain position, and this particular variation of INSERT also requires a VALUES in order to be complete. The precise syntax rules for each command are described in Part VI.

4.1.1. Identifiers and Key Words #

以上示例中的 SELECT、_UPDATE_或 _VALUES_等标记是 _key words_的示例,即在 SQL 语言中有固定含义的词。标记 _MY_TABLE_和 _A_是 _identifiers_的示例。它们标识表名、列名或其他数据库对象,具体取决于它们在其中使用的命令。因此,它们有时干脆称为“名称”。关键词和标识符具有相同的词法结构,这意味着人们无法知道一个标记是标识符还是不了解语言的关键词。可以在 Appendix C中找到关键词的完整列表。

Tokens such as SELECT, UPDATE, or VALUES in the example above are examples of key words, that is, words that have a fixed meaning in the SQL language. The tokens MY_TABLE and A are examples of identifiers. They identify names of tables, columns, or other database objects, depending on the command they are used in. Therefore they are sometimes simply called “names”. Key words and identifiers have the same lexical structure, meaning that one cannot know whether a token is an identifier or a key word without knowing the language. A complete list of key words can be found in Appendix C.

SQL 标识符和关键字必须以字母(a-z,还有带重音符号的字母和非拉丁字母)或下划线(_)开头。标识符或关键字中的后续字符可以是字母、下划线、数字(0-9)或美元符号($)。请注意,根据 SQL 标准的措辞,美元符号不允许出现在标识符中,因此使用它们可能会使应用程序的可移植性降低。SQL 标准不会定义包含数字或以下划线开头或结尾的关键字,因此这种形式的标识符对标准的未来扩展可能存在冲突是安全的。

SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($). Note that dollar signs are not allowed in identifiers according to the letter of the SQL standard, so their use might render applications less portable. The SQL standard will not define a key word that contains digits or starts or ends with an underscore, so identifiers of this form are safe against possible conflict with future extensions of the standard.

系统不会使用标识符 NAMEDATALEN -1 个以上字节;较长的名称可以在命令中编写,但会进行截断。默认情况下, NAMEDATALEN 为 64,因此最大标识符长度为 63 个字节。如果此限制存在问题,可通过更改 src/include/pg_config_manual.h 中的 NAMEDATALEN 常量来增加限制。

The system uses no more than NAMEDATALEN-1 bytes of an identifier; longer names can be written in commands, but they will be truncated. By default, NAMEDATALEN is 64 so the maximum identifier length is 63 bytes. If this limit is problematic, it can be raised by changing the NAMEDATALEN constant in src/include/pg_config_manual.h.

关键字和未加引号的标识符不区分大小写。因此:

Key words and unquoted identifiers are case-insensitive. Therefore:

UPDATE MY_TABLE SET A = 5;

可以等效写为:

can equivalently be written as:

uPDaTE my_TabLE SeT a = 5;

一种经常使用的约定是以大写写关键字,以小写写名称,例如:

A convention often used is to write key words in upper case and names in lower case, e.g.:

UPDATE my_table SET a = 5;

还有第二种标识符: delimited identifierquoted identifier 。它是将任意字符序列用双引号 ( " ) 括起来形成的。有界标识符始终是标识符,而绝不是关键字。因此 "select" 可用于引用名为“select”的列或表,而未加引号的 select 将被视为关键字,因此在需要使用表或列名称的地方使用它会导致解析错误。该示例可以用带引号的标识符这样编写:

There is a second kind of identifier: the delimited identifier or quoted identifier. It is formed by enclosing an arbitrary sequence of characters in double-quotes ("). A delimited identifier is always an identifier, never a key word. So "select" could be used to refer to a column or table named “select”, whereas an unquoted select would be taken as a key word and would therefore provoke a parse error when used where a table or column name is expected. The example can be written with quoted identifiers like this:

UPDATE "my_table" SET "a" = 5;

带引号的标识符可以包含任何字符,除了代码为零的字符。(要包含双引号,请写两个双引号。)这允许构造原本不可能的表名或列名,例如包含空格或和号的表名或列名。长度限制仍然适用。

Quoted identifiers can contain any character, except the character with code zero. (To include a double quote, write two double quotes.) This allows constructing table or column names that would otherwise not be possible, such as ones containing spaces or ampersands. The length limitation still applies.

给标识符加上引号也会使其区分大小写,而未加引号的名称总是折叠为小写。例如,标识符 FOOfoo"foo" 被 PostgreSQL 视为相同,但 "Foo""FOO" 则与这三个标识符和彼此不同。(PostgreSQL 中将不带引号的名称折叠为小写这与 SQL 标准不兼容,该标准规定应将不带引号的名称折叠为大写。因此,foo 应等同于 "FOO",而不是 "foo",根据标准。如果您想编写可移植的应用程序,建议您始终对某个名称加上引号或从不加上引号。)

Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case. For example, the identifiers FOO, foo, and "foo" are considered the same by PostgreSQL, but "Foo" and "FOO" are different from these three and each other. (The folding of unquoted names to lower case in PostgreSQL is incompatible with the SQL standard, which says that unquoted names should be folded to upper case. Thus, foo should be equivalent to "FOO" not "foo" according to the standard. If you want to write portable applications you are advised to always quote a particular name or never quote it.)

有变体的引号标识符允许在转义的 Unicode 字符前用其代码点对其进行标识。此变体从 U& 开始(大写或小写 U 加上和符号),注意空格,位于开头双引号之前,例如 U&"foo"。(请注意,这会与运算符 & 产生歧义。在运算符周围使用空格可避免此问题。)在引号内,Unicode 字符可通过写入反斜杠后跟四位十六进制代码点编号或反斜杠后跟加号后跟六位十六进制代码点编号的形式进行转义指定。例如,标识符 "data" 可写为

A variant of quoted identifiers allows including escaped Unicode characters identified by their code points. This variant starts with U& (upper or lower case U followed by ampersand) immediately before the opening double quote, without any spaces in between, for example U&"foo". (Note that this creates an ambiguity with the operator &. Use spaces around the operator to avoid this problem.) Inside the quotes, Unicode characters can be specified in escaped form by writing a backslash followed by the four-digit hexadecimal code point number or alternatively a backslash followed by a plus sign followed by a six-digit hexadecimal code point number. For example, the identifier "data" could be written as

U&"d\0061t\+000061"

以下较不简单的示例会使用西里尔字母编写俄语单词“slon”(大象):

The following less trivial example writes the Russian word “slon” (elephant) in Cyrillic letters:

U&"\0441\043B\043E\043D"

如果需要反斜杠以外的不同转义字符,则可在字符串后使用 UESCAPE 子句指定,例如:

If a different escape character than backslash is desired, it can be specified using the UESCAPE clause after the string, for example:

U&"d!0061t!+000061" UESCAPE '!'

转义字符可以是除了十六进制数字、加号、单引号、双引号或空格字符以外的任何单一字符。请注意,在 UESCAPE 之后,转义字符使用单引号编写,而不是双引号。

The escape character can be any single character other than a hexadecimal digit, the plus sign, a single quote, a double quote, or a whitespace character. Note that the escape character is written in single quotes, not double quotes, after UESCAPE.

要按照字面意思在标识符中包含转义字符,请写两次。

To include the escape character in the identifier literally, write it twice.

可以同时使用 4 位数或 6 位数转义形式来指定 UTF-16 代理对,以便组合代码点大于 U+FFFF 的字符,虽然在技术上 6 位数形式的可用性使之不必要。(不会直接存储代理对,而是将其组合为一个代码点。)

Either the 4-digit or the 6-digit escape form can be used to specify UTF-16 surrogate pairs to compose characters with code points larger than U+FFFF, although the availability of the 6-digit form technically makes this unnecessary. (Surrogate pairs are not stored directly, but are combined into a single code point.)

如果服务器编码不是 UTF-8,则这些转义序列之一标识的 Unicode 代码点将被转换为实际的服务器编码;如果无法转换,则会报告错误。

If the server encoding is not UTF-8, the Unicode code point identified by one of these escape sequences is converted to the actual server encoding; an error is reported if that’s not possible.

4.1.2. Constants #

PostgreSQL 中有三种 implicitly-typed constants:字符串、位字符串和数字。常量也可以使用明确的类型指定,这可以使系统对表示进行更准确的处理和更有效的操作。以下小节将讨论这些替代方法。

There are three kinds of implicitly-typed constants in PostgreSQL: strings, bit strings, and numbers. Constants can also be specified with explicit types, which can enable more accurate representation and more efficient handling by the system. These alternatives are discussed in the following subsections.

4.1.2.1. String Constants #

SQL 中的字符串常量是以单引号 ( ' ) 为界任意字符序列,例如 'This is a string' 。要包含在字符串常量中,请编写两个相邻的单引号,例如 'Dianne''s horse' 。请注意,它 not 等于双引号字符 ( " )。

A string constant in SQL is an arbitrary sequence of characters bounded by single quotes ('), for example 'This is a string'. To include a single-quote character within a string constant, write two adjacent single quotes, e.g., 'Dianne''s horse'. Note that this is not the same as a double-quote character (").

仅由空格 with at least one newline 分隔的两个字符串常量被连接起来,并有效地被视为一个常量编写的字符串。例如:

Two string constants that are only separated by whitespace with at least one newline are concatenated and effectively treated as if the string had been written as one constant. For example:

SELECT 'foo'
'bar';

等效于:

is equivalent to:

SELECT 'foobar';

但:

but:

SELECT 'foo'      'bar';

不是有效的语法。(此有点奇怪的行为是由 SQL 指定的;PostgreSQL 遵从此标准。)

is not valid syntax. (This slightly bizarre behavior is specified by SQL; PostgreSQL is following the standard.)

4.1.2.2. String Constants with C-Style Escapes #

PostgreSQL 还接受“转义”字符串常量,它是对 SQL 标准的扩展。通过在开头的单引号前编写字母 E(大写或小写)来指定转义字符串常量,例如,E’foo'。(当跨行继续转义字符串常量时,仅在第一个开头的引号前编写 E。)在转义字符串中,反斜杠字符 (\) 开始类似于 C 的 _backslash escape_序列,其中反斜杠和以下字符的组合表示特殊字节值,如 Table 4.1中所示。

PostgreSQL also accepts “escape” string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote, e.g., E’foo'. (When continuing an escape string constant across lines, write E only before the first opening quote.) Within an escape string, a backslash character (\) begins a C-like backslash escape sequence, in which the combination of backslash and following character(s) represent a special byte value, as shown in Table 4.1.

Table 4.1. Backslash Escape Sequences

Backslash Escape Sequence

Interpretation

\b

backspace

\f

form feed

\n

newline

\r

carriage return

\t

tab

o, oo, ooo (o = 0–7)

octal byte value

\x_h, _\x_hh (_h = 0–9, A–F)

hexadecimal byte value

\u_xxxx, _\U_xxxxxxxx (_x = 0–9, A–F)

16 or 32-bit hexadecimal Unicode character value

反斜杠后的任何其他字符按照字面意义理解。因此,要包含反斜杠字符,请写两个反斜杠 (\\)。此外,除了 '' 的正常方式,还可以通过编写 \' 在转义字符串中包含单引号。

Any other character following a backslash is taken literally. Thus, to include a backslash character, write two backslashes (\\). Also, a single quote can be included in an escape string by writing \', in addition to the normal way of ''.

您有责任创建的字节序列(尤其是在使用八进制或十六进制转义时)在服务器字符集编码中构成有效字符。一个有用的替代方法是使用 Unicode 转义或 Section 4.1.2.3中说明的替代 Unicode 转义语法;然后,服务器将检查是否可以执行字符转换。

It is your responsibility that the byte sequences you create, especially when using the octal or hexadecimal escapes, compose valid characters in the server character set encoding. A useful alternative is to use Unicode escapes or the alternative Unicode escape syntax, explained in Section 4.1.2.3; then the server will check that the character conversion is possible.

Caution

如果配置参数 standard_conforming_stringsoff,则 PostgreSQL 将在常规和转义字符串常量中识别反斜杠转义。然而,在 PostgreSQL 9.1 中,默认值为 on,这意味着仅在转义字符串常量中识别反斜杠转义。这种行为更符合标准,但可能会破坏依赖于历史行为的应用程序,其中总是识别反斜杠转义。作为一种解决方法,您可以将此参数设置为 off,但最好迁移到不使用反斜杠转义。如果您需要使用反斜杠转义来表示特殊字符,则使用 _E_编写字符串常量。

If the configuration parameter standard_conforming_strings is off, then PostgreSQL recognizes backslash escapes in both regular and escape string constants. However, as of PostgreSQL 9.1, the default is on, meaning that backslash escapes are recognized only in escape string constants. This behavior is more standards-compliant, but might break applications which rely on the historical behavior, where backslash escapes were always recognized. As a workaround, you can set this parameter to off, but it is better to migrate away from using backslash escapes. If you need to use a backslash escape to represent a special character, write the string constant with an E.

除了 standard_conforming_strings,配置参数 escape_string_warningbackslash_quote还管理字符串常量中的反斜杠处理。

In addition to standard_conforming_strings, the configuration parameters escape_string_warning and backslash_quote govern treatment of backslashes in string constants.

代码为零的字符不能在字符串常量中。

The character with the code zero cannot be in a string constant.

4.1.2.3. String Constants with Unicode Escapes #

PostgreSQL 还支持另一种转义语法,允许通过代码点指定任意 Unicode 字符。Unicode 转义字符串常量从 U& 开始(大写或小写字母 U 加上和符号),注意空格,位于开头引号之前,例如 U&'foo'。(请注意,这会与运算符 & 产生歧义。在运算符周围使用空格可避免此问题。)在引号内,Unicode 字符可通过写入反斜杠后跟四位十六进制代码点编号或反斜杠后跟加号后跟六位十六进制代码点编号的形式进行转义指定。例如,字符串 'data' 可以写为

PostgreSQL also supports another type of escape syntax for strings that allows specifying arbitrary Unicode characters by code point. A Unicode escape string constant starts with U& (upper or lower case letter U followed by ampersand) immediately before the opening quote, without any spaces in between, for example U&'foo'. (Note that this creates an ambiguity with the operator &. Use spaces around the operator to avoid this problem.) Inside the quotes, Unicode characters can be specified in escaped form by writing a backslash followed by the four-digit hexadecimal code point number or alternatively a backslash followed by a plus sign followed by a six-digit hexadecimal code point number. For example, the string 'data' could be written as

U&'d\0061t\+000061'

以下较不简单的示例会使用西里尔字母编写俄语单词“slon”(大象):

The following less trivial example writes the Russian word “slon” (elephant) in Cyrillic letters:

U&'\0441\043B\043E\043D'

如果需要反斜杠以外的不同转义字符,则可在字符串后使用 UESCAPE 子句指定,例如:

If a different escape character than backslash is desired, it can be specified using the UESCAPE clause after the string, for example:

U&'d!0061t!+000061' UESCAPE '!'

转义字符可以是任何单个字符,但不能是十六进制数字、加号、单引号、双引号或空格字符。

The escape character can be any single character other than a hexadecimal digit, the plus sign, a single quote, a double quote, or a whitespace character.

要将转义字符直接包含在字符串中,请写两次该字符。

To include the escape character in the string literally, write it twice.

可以同时使用 4 位数或 6 位数转义形式来指定 UTF-16 代理对,以便组合代码点大于 U+FFFF 的字符,虽然在技术上 6 位数形式的可用性使之不必要。(不会直接存储代理对,而是将其组合为一个代码点。)

Either the 4-digit or the 6-digit escape form can be used to specify UTF-16 surrogate pairs to compose characters with code points larger than U+FFFF, although the availability of the 6-digit form technically makes this unnecessary. (Surrogate pairs are not stored directly, but are combined into a single code point.)

如果服务器编码不是 UTF-8,则这些转义序列之一标识的 Unicode 代码点将被转换为实际的服务器编码;如果无法转换,则会报告错误。

If the server encoding is not UTF-8, the Unicode code point identified by one of these escape sequences is converted to the actual server encoding; an error is reported if that’s not possible.

此外,只有在打开配置参数 standard_conforming_strings时,字符串常量的 Unicode 转义语法才能正常工作。这是因为,否则,该语法可能会混淆客户端,后者将 SQL 语句解析到可能导致 SQL 注入和类似安全问题的地步。如果该参数设置为关闭,则该语法将被拒绝并显示错误消息。

Also, the Unicode escape syntax for string constants only works when the configuration parameter standard_conforming_strings is turned on. This is because otherwise this syntax could confuse clients that parse the SQL statements to the point that it could lead to SQL injections and similar security issues. If the parameter is set to off, this syntax will be rejected with an error message.

4.1.2.4. Dollar-Quoted String Constants #

虽然用于指定字符串常量的标准语法通常很方便,但当所需的字符串包含许多单引号时可能难以理解,因为必须对每个单引号进行加倍。为了在这种情况下允许更具可读性的查询,PostgreSQL 提供了另一种称为“美元引用”的方式来编写字符串常量。用美元引用的字符串常量由一个美元符号 ($)、一个零个或多个字符的可选“标签”、另一个美元符号、构成字符串内容的任意字符序列、一个美元符号、与开始这个美元引用的相同的标签以及一个美元符号组成。例如,以下是用美元引用指定字符串“Dianne’s horse”的两种不同方式:

While the standard syntax for specifying string constants is usually convenient, it can be difficult to understand when the desired string contains many single quotes, since each of those must be doubled. To allow more readable queries in such situations, PostgreSQL provides another way, called “dollar quoting”, to write string constants. A dollar-quoted string constant consists of a dollar sign ($), an optional “tag” of zero or more characters, another dollar sign, an arbitrary sequence of characters that makes up the string content, a dollar sign, the same tag that began this dollar quote, and a dollar sign. For example, here are two different ways to specify the string “Dianne’s horse” using dollar quoting:

$$Dianne's horse$$
$SomeTag$Dianne's horse$SomeTag$

请注意,在用美元引用的字符串内部,可以不转义就使用单引号。实际上,用美元引用的字符串内部的任何字符都不会被转义:字符串内容总是直接写入。反斜杠不是特殊的,美元符号也不是,除非它们是与开始标签匹配的序列的一部分。

Notice that inside the dollar-quoted string, single quotes can be used without needing to be escaped. Indeed, no characters inside a dollar-quoted string are ever escaped: the string content is always written literally. Backslashes are not special, and neither are dollar signs, unless they are part of a sequence matching the opening tag.

可以通过在每个嵌套级别选择不同的标签来嵌套用美元引用的字符串常量。这最常用于编写函数定义。例如:

It is possible to nest dollar-quoted string constants by choosing different tags at each nesting level. This is most commonly used in writing function definitions. For example:

$function$
BEGIN
    RETURN ($1 ~ $q$[\t\r\n\v\\]$q$);
END;
$function$

在此,序列 $q$[\t\r\n\v\\]$q$_表示以美元符号引用的文本字符串 _[\t\r\n\v\\],该字符串将在 PostgreSQL 执行函数主体时被识别。但是,由于序列不匹配外部美元引用限定符 $function$,所以对于外部字符串而言,它只是常量中的一些其他字符。

Here, the sequence $q$[\t\r\n\v\\]$q$ represents a dollar-quoted literal string [\t\r\n\v\\], which will be recognized when the function body is executed by PostgreSQL. But since the sequence does not match the outer dollar quoting delimiter $function$, it is just some more characters within the constant so far as the outer string is concerned.

用美元引用的字符串的标签(如果有)遵循与未引用的标识符相同的规则,不同之处在于它不能包含美元符号。标签区分大小写,因此 $tag$String content$tag$ 是正确的,但 $TAG$String content$tag$ 是不正确的。

The tag, if any, of a dollar-quoted string follows the same rules as an unquoted identifier, except that it cannot contain a dollar sign. Tags are case sensitive, so $tag$String content$tag$ is correct, but $TAG$String content$tag$ is not.

紧跟在关键字或标识符之后的用美元引用的字符串必须用空格与之分隔;否则,美元引用定界符将被视为前一个标识符的一部分。

A dollar-quoted string that follows a keyword or identifier must be separated from it by whitespace; otherwise the dollar quoting delimiter would be taken as part of the preceding identifier.

美元引用不是 SQL 标准的一部分,但它通常是编写复杂的字符串字面值(比符合标准的单引号语法)更方便的方法。它在过程函数定义中经常需要在其他常量中表示字符串常量时特别有用。使用单引号语法,在上述示例中,每个反斜杠都必须写成四个反斜杠,在解析原始字符串常量时将缩减为两个反斜杠,然后在函数执行期间重新解析内部字符串常量时缩减为一个反斜杠。

Dollar quoting is not part of the SQL standard, but it is often a more convenient way to write complicated string literals than the standard-compliant single quote syntax. It is particularly useful when representing string constants inside other constants, as is often needed in procedural function definitions. With single-quote syntax, each backslash in the above example would have to be written as four backslashes, which would be reduced to two backslashes in parsing the original string constant, and then to one when the inner string constant is re-parsed during function execution.

4.1.2.5. Bit-String Constants #

位字符串常量看起来像在开始引号之前立即带有 B(大写或小写)的常规字符串常量(没有中间空格),例如 B'1001'。位字符串常量中只允许使用字符 01

Bit-string constants look like regular string constants with a B (upper or lower case) immediately before the opening quote (no intervening whitespace), e.g., B'1001'. The only characters allowed within bit-string constants are 0 and 1.

或者,可以使用前导 X(大写或小写)以十六进制记法指定位字符串常量,例如 X'1FF'。此表示法等效于每个十六进制数字有四个二进制位的位字符串常量。

Alternatively, bit-string constants can be specified in hexadecimal notation, using a leading X (upper or lower case), e.g., X'1FF'. This notation is equivalent to a bit-string constant with four binary digits for each hexadecimal digit.

两种形式的位字符串常量都可以像常规字符串常量一样续行。美元引用不能用于位字符串常量。

Both forms of bit-string constant can be continued across lines in the same way as regular string constants. Dollar quoting cannot be used in a bit-string constant.

4.1.2.6. Numeric Constants #

数值常量以以下通用形式接受:

Numeric constants are accepted in these general forms:

digits
digits.[digits][e[+-]digits]
[digits].digits[e[+-]digits]
digitse[+-]digits

其中 digits 是一个或多个十进制数字(0 到 9)。如果使用了小数点,则小数点前后至少必须有一个数字。如果存在指数标记(e),则指数标记后面至少必须有一个数字。常量中不能包含任何空格或其他字符,但可以使用下划线,如下面所述进行可视分组。请注意,任何前导加号或减号实际上都不被视为常量的一部分;它是一个应用于常量的运算符。

where digits is one or more decimal digits (0 through 9). At least one digit must be before or after the decimal point, if one is used. At least one digit must follow the exponent marker (e), if one is present. There cannot be any spaces or other characters embedded in the constant, except for underscores, which can be used for visual grouping as described below. Note that any leading plus or minus sign is not actually considered part of the constant; it is an operator applied to the constant.

以下是一些有效的数值常量的示例:

These are some examples of valid numeric constants:

42 3.5 4. .001 5e2 1.925e-3

42 3.5 4. .001 5e2 1.925e-3

此外,非十进制整数常量以以下形式接受:

Additionally, non-decimal integer constants are accepted in these forms:

0xhexdigits
0ooctdigits
0bbindigits

其中 hexdigits 是一个或多个十六进制数字(0-9、A-F),octdigits 是一个或多个八进制数字(0-7),bindigits 是一个或多个二进制数字(0 或 1)。十六进制数字和基数前缀可以是大写或小写。请注意,只有整数可以有非十进制形式,而不能有分数部分的数字。

where hexdigits is one or more hexadecimal digits (0-9, A-F), octdigits is one or more octal digits (0-7), and bindigits is one or more binary digits (0 or 1). Hexadecimal digits and the radix prefixes can be in upper or lower case. Note that only integers can have non-decimal forms, not numbers with fractional parts.

以下是一些有效的非十进制整数常量的示例:

These are some examples of valid non-decimal integer constants:

0b100101 0B10011001 0o273 0O755 0x42f 0XFFFF

0b100101 0B10011001 0o273 0O755 0x42f 0XFFFF

对于可视化分组,可在数字之间插入下划线。它们对常量的值没有进一步影响。例如:

For visual grouping, underscores can be inserted between digits. These have no further effect on the value of the constant. For example:

1_500_000_000 0b10001000_00000000 0o_1_755 0xFFFF_FFFF 1.618_034

1_500_000_000 0b10001000_00000000 0o_1_755 0xFFFF_FFFF 1.618_034

下划线不允许出现在数字常量或数字组的开头或结尾(即小数点或指数标记之前或之后),并且不允许多个连续下划线。

Underscores are not allowed at the start or end of a numeric constant or a group of digits (that is, immediately before or after the decimal point or the exponent marker), and more than one underscore in a row is not allowed.

不包含小数点和指数的数字常量最初假定为类型 integer ,如果其值适合类型 integer (32 位);否则,假定类型为 bigint ,如果其值适合类型 bigint (64 位);否则,它被认为是类型 numeric 。包含小数点和/或指数的常量最初总是假定为类型 numeric

A numeric constant that contains neither a decimal point nor an exponent is initially presumed to be type integer if its value fits in type integer (32 bits); otherwise it is presumed to be type bigint if its value fits in type bigint (64 bits); otherwise it is taken to be type numeric. Constants that contain decimal points and/or exponents are always initially presumed to be type numeric.

最初分配给数字常量的 data (数据)类型只是类型解析算法的起点。大多数情况下,常量将根据上下文自动强制转换为最合适的类型。在必要时,你可以通过强制转换来强制将数字值解释为特定的数据类型。例如,你可以通过编写以下内容强制将数字值视为类型 real (float4):

The initially assigned data type of a numeric constant is just a starting point for the type resolution algorithms. In most cases the constant will be automatically coerced to the most appropriate type depending on context. When necessary, you can force a numeric value to be interpreted as a specific data type by casting it. For example, you can force a numeric value to be treated as type real (float4) by writing:

REAL '1.23'  -- string style
1.23::REAL   -- PostgreSQL (historical) style

实际上,这些只是接下来讨论的一般强制转换符号的特殊情况。

These are actually just special cases of the general casting notations discussed next.

4.1.2.7. Constants of Other Types #

arbitrary 类型的一个常量可以使用以下任何一种符号输入:

A constant of an arbitrary type can be entered using any one of the following notations:

type 'string'
'string'::type
CAST ( 'string' AS type )

字符串常量的文本传递给称为 type 的类型的输入转换例程。结果是指示类型的一个常量。如果对常量必须是什么类型没有歧义(例如,当它直接分配给表格列时),则可以省略显式类型转换,在这种情况下,它会自动强制转换为该类型。

The string constant’s text is passed to the input conversion routine for the type called type. The result is a constant of the indicated type. The explicit type cast can be omitted if there is no ambiguity as to the type the constant must be (for example, when it is assigned directly to a table column), in which case it is automatically coerced.

可以使用常规的 SQL 符号或美元引用来编写字符串常量。

The string constant can be written using either regular SQL notation or dollar-quoting.

还可以使用函数式的语法指定类型强制转换:

It is also possible to specify a type coercion using a function-like syntax:

typename ( 'string' )

但并非所有类型名称都可以这样使用;有关详细信息,请参见 Section 4.2.9

but not all type names can be used in this way; see Section 4.2.9 for details.

也可以使用 ::CAST() 和函数调用语法来指定任意表达式的运行时类型转换,如 Section 4.2.9 中所讨论的。为避免语法歧义, type 'string'_ syntax can only be used to specify the type of a simple literal constant. Another restriction on the _ type_ ' string '_ 语法是它对数组类型不起作用;使用 ::CAST() 来指定数组常量的类型。

The ::, CAST(), and function-call syntaxes can also be used to specify run-time type conversions of arbitrary expressions, as discussed in Section 4.2.9. To avoid syntactic ambiguity, the type_ 'string'_ syntax can only be used to specify the type of a simple literal constant. Another restriction on the type_ 'string'_ syntax is that it does not work for array types; use :: or CAST() to specify the type of an array constant.

CAST() 语法符合 SQL。 _type ' string '_ 语法是对该标准的概括:SQL 仅为少数数据类型指定此语法,但 PostgreSQL 允许所有类型都使用此语法。带 :: 的语法是 PostgreSQL 的历史用法,函数调用语法也是如此。

The CAST() syntax conforms to SQL. The _type 'string'_ syntax is a generalization of the standard: SQL specifies this syntax only for a few data types, but PostgreSQL allows it for all types. The syntax with :: is historical PostgreSQL usage, as is the function-call syntax.

4.1.3. Operators #

运算符名称最多由以下列表中的 NAMEDATALEN-1(默认情况下为 63)个字符序列组成:

An operator name is a sequence of up to NAMEDATALEN-1 (63 by default) characters from the following list:

+ - * / < > = ~ ! @ # % ^ & | ` ?

+ - * / < > = ~ ! @ # % ^ & | ` ?

但是,对运算符名称有一些限制:

There are a few restrictions on operator names, however:

~ ! @ # % ^ & | ` ?

~ ! @ # % ^ & | ` ?

使用非 SQL 标准的运算符名称时,通常需要空格分隔相邻的运算符以避免歧义。例如,如果你定义了一个名为 @ 的前缀运算符,便不能编写 X*@Y;你必须编写 X* @Y 来确保 PostgreSQL 将其读成两个运算符名称,而不是一个。

When working with non-SQL-standard operator names, you will usually need to separate adjacent operators with spaces to avoid ambiguity. For example, if you have defined a prefix operator named @, you cannot write X*@Y; you must write X* @Y to ensure that PostgreSQL reads it as two operator names not one.

4.1.4. Special Characters #

一些非字母数字字符具有与作为运算符不同的特殊含义。可在描述相应语法元素的位置中找到用法详情。本部分仅用于通知这些字符的存在并总结这些字符的用途。

Some characters that are not alphanumeric have a special meaning that is different from being an operator. Details on the usage can be found at the location where the respective syntax element is described. This section only exists to advise the existence and summarize the purposes of these characters.

4.1.5. Comments #

注释是以双破折号开头并延伸到行尾的一系列字符,例如:

A comment is a sequence of characters beginning with double dashes and extending to the end of the line, e.g.:

-- This is a standard SQL comment

或者,可以使用 C 风格的块注释:

Alternatively, C-style block comments can be used:

/* multiline comment
 * with nesting: /* nested block comment */
 */

注释以 / and extends to the matching occurrence of / 开始。这些块注释会嵌套,如 SQL 标准中指定的那样,但与 C 不同,以便人们可以注释掉可能包含现有块注释的大块代码。

where the comment begins with / and extends to the matching occurrence of /. These block comments nest, as specified in the SQL standard but unlike C, so that one can comment out larger blocks of code that might contain existing block comments.

在进一步进行语法分析之前,注释将从输入流中删除,并有效地被空白替换。

A comment is removed from the input stream before further syntax analysis and is effectively replaced by whitespace.

4.1.6. Operator Precedence #

Table 4.2显示了 PostgreSQL 中运算符的优先级和结合性。大多数运算符具有相同的优先级且具有左结合性。运算符的优先级和结合性被固化到解析器中。如果您希望包含多个运算符的表达式以优先级规则暗示的方式以外的方式进行解析,请添加括号。

Table 4.2 shows the precedence and associativity of the operators in PostgreSQL. Most operators have the same precedence and are left-associative. The precedence and associativity of the operators is hard-wired into the parser. Add parentheses if you want an expression with multiple operators to be parsed in some other way than what the precedence rules imply.

Table 4.2. Operator Precedence (highest to lowest)

Operator/Element

Associativity

Description

.

left

table/column name separator

::

left

PostgreSQL-style typecast

[ ]

left

array element selection

+ -

right

unary plus, unary minus

COLLATE

left

collation selection

AT

left

AT TIME ZONE

^

left

exponentiation

* / %

left

multiplication, division, modulo

+ -

left

addition, subtraction

(any other operator)

left

all other native and user-defined operators

BETWEEN IN LIKE ILIKE SIMILAR

range containment, set membership, string matching

< > = >= <>

comparison operators

IS ISNULL NOTNULL

IS TRUE, IS FALSE, IS NULL, IS DISTINCT FROM, etc.

NOT

right

logical negation

AND

left

logical conjunction

OR

left

logical disjunction

请注意,运算符优先级规则也适用于与上述内置运算符同名的用户定义运算符。例如,如果您为某个自定义数据类型定义了一个“”运算符,它将具有与内置““运算符相同的优先级,无论您的运算符做什么。

Note that the operator precedence rules also apply to user-defined operators that have the same names as the built-in operators mentioned above. For example, if you define a “” operator for some custom data type it will have the same precedence as the built-in “” operator, no matter what yours does.

当模式限定运算符名称用于 OPERATOR 语法时,例如在中:

When a schema-qualified operator name is used in the OPERATOR syntax, as for example in:

SELECT 3 OPERATOR(pg_catalog.+) 4;

对于“任何其他运算符”,_OPERATOR_构造取 Table 4.2中显示的默认优先级。无论 _OPERATOR()_内部出现哪种特定运算符,这都是正确的。

the OPERATOR construct is taken to have the default precedence shown in Table 4.2 for “any other operator”. This is true no matter which specific operator appears inside OPERATOR().

Note

9.5 版本之前的 PostgreSQL 使用略有不同的运算符优先级规则。尤其是, >=_和<>_过去被视为泛型运算符;_IS_测试过去具有更高的优先级;_NOT BETWEEN_和相关结构作用不一致,在某些情况下被认为具有 _NOT_而不是_BETWEEN_的优先级。这些规则已更改,以更好地符合 SQL 标准并减少因逻辑等价结构处理不一致而造成的混乱。在大多数情况下,这些更改不会导致行为更改,或者可能导致通过添加括号即可解决的“没有此类运算符”故障。但是,存在一些极端情况,其中查询可能会更改行为,而不会报告任何解析错误。

PostgreSQL versions before 9.5 used slightly different operator precedence rules. In particular, >= and <> used to be treated as generic operators; IS tests used to have higher priority; and NOT BETWEEN and related constructs acted inconsistently, being taken in some cases as having the precedence of NOT rather than BETWEEN. These rules were changed for better compliance with the SQL standard and to reduce confusion from inconsistent treatment of logically equivalent constructs. In most cases, these changes will result in no behavioral change, or perhaps in “no such operator” failures which can be resolved by adding parentheses. However there are corner cases in which a query might change behavior without any parsing error being reported.