Postgresql 中文操作指南

9.5. Binary String Functions and Operators #

此部分描述了用于检查和操作二进制字符串(即 bytea 类型的值)的函数和运算符。其中的许多函数在目的和语法上等同于上一部分中描述的文本字符串函数。

This section describes functions and operators for examining and manipulating binary strings, that is values of type bytea. Many of these are equivalent, in purpose and syntax, to the text-string functions described in the previous section.

SQL 定义了一些字符串函数,该函数使用关键字而不是逗号来分隔参数。详细信息在 Table 9.11 中。PostgreSQL 还提供了使用常规函数调用语法版本的这些函数(请参见 Table 9.12)。

SQL defines some string functions that use key words, rather than commas, to separate arguments. Details are in Table 9.11. PostgreSQL also provides versions of these functions that use the regular function invocation syntax (see Table 9.12).

Table 9.11. SQL Binary String Functions and Operators

Table 9.11. SQL Binary String Functions and Operators

Function/Operator

Description

Example(s)

bytea _

_ byteabytea

Concatenates the two binary strings.

_'\x123456'::bytea

'\x789a00bcde'::bytea_ → \x123456789a00bcde

bit_length ( bytea ) → integer

Returns number of bits in the binary string (8 times the octet_length).

bit_length('\x123456'::bytea)24

btrim ( bytes bytea, bytesremoved bytea ) → bytea

Removes the longest string containing only bytes appearing in bytesremoved from the start and end of bytes.

btrim('\x1234567890'::bytea, '\x9012'::bytea)\x345678

ltrim ( bytes bytea, bytesremoved bytea ) → bytea

Removes the longest string containing only bytes appearing in bytesremoved from the start of bytes.

ltrim('\x1234567890'::bytea, '\x9012'::bytea)\x34567890

octet_length ( bytea ) → integer

Returns number of bytes in the binary string.

octet_length('\x123456'::bytea)3

overlay ( bytes bytea PLACING newsubstring bytea FROM start integer [ FOR count integer ] ) → bytea

Replaces the substring of bytes that starts at the start'th byte and extends for count bytes with newsubstring. If count is omitted, it defaults to the length of newsubstring.

overlay('\x1234567890'::bytea placing '\002\003'::bytea from 2 for 3)\x12020390

position ( substring bytea IN bytes bytea ) → integer

Returns first starting index of the specified substring within bytes, or zero if it’s not present.

position('\x5678'::bytea in '\x1234567890'::bytea)3

rtrim ( bytes bytea, bytesremoved bytea ) → bytea

Removes the longest string containing only bytes appearing in bytesremoved from the end of bytes.

rtrim('\x1234567890'::bytea, '\x9012'::bytea)\x12345678

substring ( bytes bytea [ FROM start integer ] [ FOR count integer ] ) → bytea

Extracts the substring of bytes starting at the start'th byte if that is specified, and stopping after count bytes if that is specified. Provide at least one of start and count.

substring('\x1234567890'::bytea from 3 for 2)\x5678

trim ( [ LEADING

TRAILING

BOTH ] bytesremoved bytea FROM bytes bytea ) → bytea

Removes the longest string containing only bytes appearing in bytesremoved from the start, end, or both ends (BOTH is the default) of bytes.

trim('\x9012'::bytea from '\x1234567890'::bytea)\x345678

trim ( [ LEADING

TRAILING

BOTH ] [ FROM ] bytes bytea, bytesremoved bytea ) → bytea

This is a non-standard syntax for trim().

trim(both from '\x1234567890'::bytea, '\x9012'::bytea)\x345678

其他二进制字符串操作函数可用,并列在 Table 9.12 中。其中一些内部用于实现 Table 9.11 中列出的 SQL 标准字符串函数。

Additional binary string manipulation functions are available and are listed in Table 9.12. Some of them are used internally to implement the SQL-standard string functions listed in Table 9.11.

Table 9.12. Other Binary String Functions

Function

Description

Example(s)

bit_count ( bytes bytea ) → bigint

Returns the number of bits set in the binary string (also known as “popcount”).

bit_count('\x1234567890'::bytea)15

get_bit ( bytes bytea, n bigint ) → integer

Extracts n’th bit from binary string.

get_bit('\x1234567890'::bytea, 30)1

get_byte ( bytes bytea, n integer ) → integer

Extracts n’th byte from binary string.

get_byte('\x1234567890'::bytea, 4)144

length ( bytea ) → integer

Returns the number of bytes in the binary string.

length('\x1234567890'::bytea)5

length ( bytes bytea, encoding name ) → integer

Returns the number of characters in the binary string, assuming that it is text in the given encoding.

length('jose'::bytea, 'UTF8')4

md5 ( bytea ) → text

Computes the MD5 hash of the binary string, with the result written in hexadecimal.

md5('Th\000omas'::bytea)8ab2d3c9689aaf18​b4958c334c82d8b1

set_bit ( bytes bytea, n bigint, newvalue integer ) → bytea

Sets n’th bit in binary string to newvalue.

set_bit('\x1234567890'::bytea, 30, 0)\x1234563890

set_byte ( bytes bytea, n integer, newvalue integer ) → bytea

Sets n’th byte in binary string to newvalue.

set_byte('\x1234567890'::bytea, 4, 64)\x1234567840

sha224 ( bytea ) → bytea

Computes the SHA-224 hash of the binary string.

sha224('abc'::bytea)\x23097d223405d8228642a477bda2​55b32aadbce4bda0b3f7e36c9da7

sha256 ( bytea ) → bytea

Computes the SHA-256 hash of the binary string.

sha256('abc'::bytea)\xba7816bf8f01cfea414140de5dae2223​b00361a396177a9cb410ff61f20015ad

sha384 ( bytea ) → bytea

Computes the SHA-384 hash of the binary string.

sha384('abc'::bytea)\xcb00753f45a35e8bb5a03d699ac65007​272c32ab0eded1631a8b605a43ff5bed​8086072ba1e7cc2358baeca134c825a7

sha512 ( bytea ) → bytea

Computes the SHA-512 hash of the binary string.

sha512('abc'::bytea)\xddaf35a193617abacc417349ae204131​12e6fa4e89a97ea20a9eeee64b55d39a​2192992a274fc1a836ba3c23a3feebbd​454d4423643ce80e2a9ac94fa54ca49f

substr ( bytes bytea, start integer [, count integer ] ) → bytea

Extracts the substring of bytes starting at the start'th byte, and extending for count bytes if that is specified. (Same as substring(_bytes from start for count).)

_substr('\x1234567890'::bytea, 3, 2)\x5678

函数 get_byteset_byte 将二进制字符串的第一个字节编号为 0 字节。函数 get_bitset_bit 从每个字节内的右侧对位进行编号;例如,第 0 位是第一个字节的最低有效位,而第 15 位是第二个字节的最高有效位。

Functions get_byte and set_byte number the first byte of a binary string as byte 0. Functions get_bit and set_bit number bits from the right within each byte; for example bit 0 is the least significant bit of the first byte, and bit 15 is the most significant bit of the second byte.

出于历史原因,函数 md5 返回类型 text 的十六进制编码值,而SHA-2函数返回类型 bytea 。使用函数 encodedecode 在两者之间转换。例如,写 encode(sha256('abc'), 'hex') 以获取十六进制编码的文本表示形式,或写 decode(md5('abc'), 'hex') 以获取 bytea 值。

For historical reasons, the function md5 returns a hex-encoded value of type text whereas the SHA-2 functions return type bytea. Use the functions encode and decode to convert between the two. For example write encode(sha256('abc'), 'hex') to get a hex-encoded text representation, or decode(md5('abc'), 'hex') to get a bytea value.

Table 9.13 中显示了用于在不同字符集(编码)之间转换字符串以及以文本形式表示任意二进制数据的函数。对于这些函数,类型 text 的参数或结果表示为数据库的默认编码,而类型 bytea 的参数或结果则表示为由另一个参数命名的编码。

Functions for converting strings between different character sets (encodings), and for representing arbitrary binary data in textual form, are shown in Table 9.13. For these functions, an argument or result of type text is expressed in the database’s default encoding, while arguments or results of type bytea are in an encoding named by another argument.

Table 9.13. Text/Binary String Conversion Functions

Function

Description

Example(s)

convert ( bytes bytea, src_encoding name, dest_encoding name ) → bytea

Converts a binary string representing text in encoding src_encoding to a binary string in encoding dest_encoding (see Section 24.3.4 for available conversions).

convert('text_in_utf8', 'UTF8', 'LATIN1')\x746578745f696e5f75746638

convert_from ( bytes bytea, src_encoding name ) → text

Converts a binary string representing text in encoding src_encoding to text in the database encoding (see Section 24.3.4 for available conversions).

convert_from('text_in_utf8', 'UTF8')text_in_utf8

convert_to ( string text, dest_encoding name ) → bytea

Converts a text string (in the database encoding) to a binary string encoded in encoding dest_encoding (see Section 24.3.4 for available conversions).

convert_to('some_text', 'UTF8')\x736f6d655f74657874

encode ( bytes bytea, format text ) → text

Encodes binary data into a textual representation; supported format values are: base64, escape, hex.

encode('123\000\001', 'base64')MTIzAAE=

decode ( string text, format text ) → bytea

Decodes binary data from a textual representation; supported format values are the same as for encode.

decode('MTIzAAE=', 'base64')\x3132330001

encodedecode 函数支持以下文本格式:

The encode and decode functions support the following textual formats:

  • base64 #

    • The base64 format is that of RFC 2045 Section 6.8. As per the RFC, encoded lines are broken at 76 characters. However instead of the MIME CRLF end-of-line marker, only a newline is used for end-of-line. The decode function ignores carriage-return, newline, space, and tab characters. Otherwise, an error is raised when decode is supplied invalid base64 data — including when trailing padding is incorrect.

  • escape #

    • The escape format converts zero bytes and bytes with the high bit set into octal escape sequences (_nnn), and it doubles backslashes. Other byte values are represented literally. The _decode function will raise an error if a backslash is not followed by either a second backslash or three octal digits; it accepts other byte values unchanged.

  • hex #

    • The hex format represents each 4 bits of data as one hexadecimal digit, 0 through f, writing the higher-order digit of each byte first. The encode function outputs the a-f hex digits in lower case. Because the smallest unit of data is 8 bits, there are always an even number of characters returned by encode. The decode function accepts the a-f characters in either upper or lower case. An error is raised when decode is given invalid hex data — including when given an odd number of characters.

另请参见 Section 9.21中的聚合函数 _string_agg_和 Section 35.4中的大对象函数。

See also the aggregate function string_agg in Section 9.21 and the large object functions in Section 35.4.