Postgresql 中文操作指南

8.4. Binary Data Types #

bytea 数据类型允许存储二进制字符串;请参阅 Table 8.6.

The bytea data type allows storage of binary strings; see Table 8.6.

Table 8.6. Binary Data Types

Name

Storage Size

Description

bytea

1 or 4 bytes plus the actual binary string

variable-length binary string

二进制字符串是八位组(或字节)序列。二进制字符串以两种方式区别于字符字符串。首先,二进制字符串特别允许存储值零的八位组和其他“不可打印”的八位组(通常情况,是范围为 32 到 126 以外的八位组)。字符字符串不允许有零八位组,且同样不允许任何其他八位组值和根据数据库所选字符集编码无效的八位组值序列。其次,对二进制字符串的操作处理实际字节,而对字符字符串的处理取决于区域设置。简而言之,二进制字符串适合存储程序员认为是“原始字节”的数据,而字符字符串适合存储文本。

A binary string is a sequence of octets (or bytes). Binary strings are distinguished from character strings in two ways. First, binary strings specifically allow storing octets of value zero and other “non-printable” octets (usually, octets outside the decimal range 32 to 126). Character strings disallow zero octets, and also disallow any other octet values and sequences of octet values that are invalid according to the database’s selected character set encoding. Second, operations on binary strings process the actual bytes, whereas the processing of character strings depends on locale settings. In short, binary strings are appropriate for storing data that the programmer thinks of as “raw bytes”, whereas character strings are appropriate for storing text.

bytea 类型支持两种输入和输出格式:“hex” 格式和 PostgreSQL 的历史“escape”格式。在输入中始终接受这两种格式。输出格式取决于配置参数 bytea_output;默认值为 hex。(请注意,hex 格式是在 PostgreSQL 9.0 中引入的;早期版本和某些工具不理解它。)

The bytea type supports two formats for input and output: “hex” format and PostgreSQL’s historical “escape” format. Both of these are always accepted on input. The output format depends on the configuration parameter bytea_output; the default is hex. (Note that the hex format was introduced in PostgreSQL 9.0; earlier versions and some tools don’t understand it.)

SQL 标准定义了不同的二进制字符串类型,称为 BLOBBINARY LARGE OBJECT。输入格式与 bytea 不同,但所提供的函数和运算符大多相同。

The SQL standard defines a different binary string type, called BLOB or BINARY LARGE OBJECT. The input format is different from bytea, but the provided functions and operators are mostly the same.

8.4.1. bytea Hex Format #

“hex”格式将二进制数据编码为每字节 2 个十六进制数字,最显着的字节在前。整个字符串之前有序列 \x (以区别于逃逸格式)。在某些情况下,初始反斜杠可能需要通过加倍来转义(参见 Section 4.1.2.1)。对于输入,十六进制数字可以是大写或小写,并且在数字对之间允许空格(但在数字对内或在起始 \x 序列中不允许)。hex 格式与广泛的外部应用程序和协议兼容,并且它比转义格式更快,因此较好使用。

The “hex” format encodes binary data as 2 hexadecimal digits per byte, most significant nibble first. The entire string is preceded by the sequence \x (to distinguish it from the escape format). In some contexts, the initial backslash may need to be escaped by doubling it (see Section 4.1.2.1). For input, the hexadecimal digits can be either upper or lower case, and whitespace is permitted between digit pairs (but not within a digit pair nor in the starting \x sequence). The hex format is compatible with a wide range of external applications and protocols, and it tends to be faster to convert than the escape format, so its use is preferred.

示例:

Example:

SET bytea_output = 'hex';

SELECT '\xDEADBEEF'::bytea;
   bytea
------------
 \xdeadbeef

8.4.2. bytea Escape Format #

“转义”格式是 bytea 类型传统的 PostgreSQL 格式。它采用了一种将二进制字符串表示为一连串 ASCII 字符的方法,同时将那些无法表示为 ASCII 字符的字节转换成特殊的转义序列。如果从应用程序的角度来看,将字节表示为字符是有意义的,那么这种表示就可能会很方便。但在实践中,这通常会产生混淆,因为它模糊了二进制字符串和字符字符串之间的区别,而且所选择的特定转义机制有些难以操作。因此,对于大多数新应用程序,可能应该避免使用这种格式。

The “escape” format is the traditional PostgreSQL format for the bytea type. It takes the approach of representing a binary string as a sequence of ASCII characters, while converting those bytes that cannot be represented as an ASCII character into special escape sequences. If, from the point of view of the application, representing bytes as characters makes sense, then this representation can be convenient. But in practice it is usually confusing because it fuzzes up the distinction between binary strings and character strings, and also the particular escape mechanism that was chosen is somewhat unwieldy. Therefore, this format should probably be avoided for most new applications.

在以转义格式输入 bytea 值时,某些值 must 的八位字节必须转义,而所有八位字节值 can 都必须转义。通常,要转义八位字节,请将其转换为它三位八进制值并在前面加上反斜杠。反斜杠本身(八进制十进制值 92)也可以用双反斜杠表示。 Table 8.7 显示了必须转义的字符,并在适用时提供了其他转义序列。

When entering bytea values in escape format, octets of certain values must be escaped, while all octet values can be escaped. In general, to escape an octet, convert it into its three-digit octal value and precede it by a backslash. Backslash itself (octet decimal value 92) can alternatively be represented by double backslashes. Table 8.7 shows the characters that must be escaped, and gives the alternative escape sequences where applicable.

Table 8.7. bytea Literal Escaped Octets

Table 8.7. bytea Literal Escaped Octets

Decimal Octet Value

Description

Escaped Input Representation

Example

Hex Representation

0

zero octet

'\000'

'\000'::bytea

\x00

39

single quote

'''' or '\047'

''''::bytea

\x27

92

backslash

'\\' or '\134'

'\\'::bytea

\x5c

0 to 31 and 127 to 255

“non-printable” octets

'xxx' (octal value)

'\001'::bytea

\x01

转义 non-printable 八位字节的要求因区域设置而异。在某些情况下,你可以不转义它们。

The requirement to escape non-printable octets varies depending on locale settings. In some instances you can get away with leaving them unescaped.

必须将单引号加倍的原因(如 Table 8.7 中所示)是因为这适用于 SQL 命令中的任何字符串文字。通用字符串文字解析器消耗最外侧单引号并将任何一对单引号缩减为一个数据字符。bytea 输入函数看到的只是单引号,它将它视为普通数据字符。然而,bytea 输入函数将反斜杠视为特殊字符, Table 8.7 中显示的其他行为由此函数实现。

The reason that single quotes must be doubled, as shown in Table 8.7, is that this is true for any string literal in an SQL command. The generic string-literal parser consumes the outermost single quotes and reduces any pair of single quotes to one data character. What the bytea input function sees is just one single quote, which it treats as a plain data character. However, the bytea input function treats backslashes as special, and the other behaviors shown in Table 8.7 are implemented by that function.

在某些情况下,反斜杠必须比上面显示的加倍,因为通用字符串文字解析器还会将一对反斜杠缩减为一个数据字符;参见 Section 4.1.2.1

In some contexts, backslashes must be doubled compared to what is shown above, because the generic string-literal parser will also reduce pairs of backslashes to one data character; see Section 4.1.2.1.

默认情况下,Bytea 八位字节以 hex 格式输出。如果您将 bytea_output 更改为 escape,则“不可打印”八位字节将被转换为其等效的三位八进制值,并加一个反斜杠。大多数“可打印”八位字节都通过它们在客户端字符集中的标准表示来输出,例如:

Bytea octets are output in hex format by default. If you change bytea_output to escape, “non-printable” octets are converted to their equivalent three-digit octal value and preceded by one backslash. Most “printable” octets are output by their standard representation in the client character set, e.g.:

SET bytea_output = 'escape';

SELECT 'abc \153\154\155 \052\251\124'::bytea;
     bytea
----------------
 abc klm *\251T

十进制值为 92(反斜杠)的八位字节在输出中加倍。详细信息见 Table 8.8

The octet with decimal value 92 (backslash) is doubled in the output. Details are in Table 8.8.

Table 8.8. bytea Output Escaped Octets

Table 8.8. bytea Output Escaped Octets

Decimal Octet Value

Description

Escaped Output Representation

Example

Output Result

92

backslash

\\

'\134'::bytea

\\

0 to 31 and 127 to 255

“non-printable” octets

xxx (octal value)

'\001'::bytea

\001

32 to 126

“printable” octets

client character set representation

'\176'::bytea

~

根据你使用的 PostgreSQL 前端,可能需要针对转义和取消转义 bytea 字符串执行其他操作。例如,如果你的界面会自动翻译换行符和回车符,你可能还需要转义它们。

Depending on the front end to PostgreSQL you use, you might have additional work to do in terms of escaping and unescaping bytea strings. For example, you might also have to escape line feeds and carriage returns if your interface automatically translates these.