Postgresql 中文操作指南

8.3. Character Types #

Table 8.4. Character Types

Name

Description

character varying(_n), _varchar(_n)_

variable-length with limit

character(_n), _char(_n), _bpchar(_n)_

fixed-length, blank-padded

bpchar

variable unlimited length, blank-trimmed

text

variable unlimited length

Table 8.4显示了 PostgreSQL 中可用的通用字符类型。

Table 8.4 shows the general-purpose character types available in PostgreSQL.

SQL 定义了两种主要的字符类型: character varying(_n)_ 和 character(_n), where _n 是正整数。这两种类型都可以存储长度达 n 字符的字符串(不是字节)。如果尝试将更长的字符串存储到这些类型的某列中,则会发生错误,除非多余字符全部都是空格,在这种情况下,该字符串将被截断至最大长度。(SQL 标准要求有此稍微奇怪的例外情况。)但是,如果明确将值强制转换为 character varying(_n)_ 或 character(_n), then an over-length value will be truncated to _n 字符,则不会引发错误。(SQL 标准也要求如此。)如果要存储的字符串短于声明的长度,则 character 类型的字符将以空格填充;character varying 类型的字符只会存储较短的字符串。

SQL defines two primary character types: character varying(_n)_ and character(_n), where _n is a positive integer. Both of these types can store strings up to n characters (not bytes) in length. An attempt to store a longer string into a column of these types will result in an error, unless the excess characters are all spaces, in which case the string will be truncated to the maximum length. (This somewhat bizarre exception is required by the SQL standard.) However, if one explicitly casts a value to character varying(_n)_ or character(_n), then an over-length value will be truncated to _n characters without raising an error. (This too is required by the SQL standard.) If the string to be stored is shorter than the declared length, values of type character will be space-padded; values of type character varying will simply store the shorter string.

此外,PostgreSQL 提供 text_类型,该类型用于存储任何长度的字符串。虽然 _text_类型不在 SQL 标准中,但有几个其他 SQL 数据库管理系统也具有此类型。_text_是 PostgreSQL 的本地字符串数据类型,因为大多数针对字符串进行操作的内置函数声明都会采用或返回 _text_而不是 _character varying。出于很多目的,_character varying_就像它在 _text_上的 domain一样。

In addition, PostgreSQL provides the text type, which stores strings of any length. Although the text type is not in the SQL standard, several other SQL database management systems have it as well. text is PostgreSQL’s native string data type, in that most built-in functions operating on strings are declared to take or return text not character varying. For many purposes, character varying acts as though it were a domain over text.

类型名称 varcharcharacter varying 的别名,而 bpchar(带有长度说明符)和 charcharacter 的别名。varcharchar 别名在 SQL 标准中定义;bpchar 是 PostgreSQL 扩展。

The type name varchar is an alias for character varying, while bpchar (with length specifier) and char are aliases for character. The varchar and char aliases are defined in the SQL standard; bpchar is a PostgreSQL extension.

如果指定了长度 n,则该长度必须大于零且不能超过 10,485,760。如果在不使用长度说明符的情况下使用 character varying(或 varchar),则该类型接受任何长度的字符串。如果 bpchar 缺少长度说明符,则它也接受任何长度的字符串,但结尾空格在语义上无关紧要。如果 character(或 char)缺少说明符,则它相当于 character(1)

If specified, the length n must be greater than zero and cannot exceed 10,485,760. If character varying (or varchar) is used without length specifier, the type accepts strings of any length. If bpchar lacks a length specifier, it also accepts strings of any length, but trailing spaces are semantically insignificant. If character (or char) lacks a specifier, it is equivalent to character(1).

character 类型的字符在物理上以空格填充至指定的宽度 n,并以这种方式存储和显示。但是,在比较 character 类型的两个值时,结尾空格被视为语义上无关紧要,且会被忽略。在空格显着的校对规则中,此行为可能会产生意外结果;例如,SELECT 'a '::CHAR(2) collate "C" < E’a\n'::CHAR(2) 返回 true,即使 C 区域设置会将空格视为大于换行符。当将 character 类型值转换为其他字符串类型之一时,结尾空格将被移除。请注意,在 arecharacter varying 值以及在使用模式匹配(即 text 和正则表达式时,结尾空格在语义上是显着的。

Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character. In collations where whitespace is significant, this behavior can produce unexpected results; for example SELECT 'a '::CHAR(2) collate "C" < E’a\n'::CHAR(2) returns true, even though C locale would consider a space to be greater than a newline. Trailing spaces are removed when converting a character value to one of the other string types. Note that trailing spaces are semantically significant in character varying and text values, and when using pattern matching, that is LIKE and regular expressions.

可以在这些数据类型中存储的字符由在创建数据库时选择的数据库字符集确定。无论具体的字符集是什么,代码为零的字符(有时称为 NUL)都无法存储。有关更多信息,请参阅 LIKE

The characters that can be stored in any of these data types are determined by the database character set, which is selected when the database is created. Regardless of the specific character set, the character with code zero (sometimes called NUL) cannot be stored. For more information refer to Section 24.3.

对于短字符串(最多 126 个字节),其存储要求为 1 个字节加上实际字符串,在 character_中这包括空格填充。更长的字符串具有 4 个字节的开销,而不是 1 个字节。长字符串由系统自动压缩,因此磁盘上的实际要求可能更少。极长的值也存储在后台表中,以不干扰对较短列值的快速访问。在任何情况下,能存储的最长可能的字符串大约有 1 GB。(数据类型声明中允许 _n_的最大值低于此值。修改此值没有用,因为对于多字节字符编码,字符数和字节数可能会大不相同。如果你希望存储的最大长度没有指定长度限制,请使用没有长度说明符的 _text_或 _character varying,而不是指定任意长度限制。)

The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of character. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn’t be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.)

Tip

除了在使用空白填充类型时存储空间增加以及在存储到受长度限制的列中时检查长度需要多几个 CPU 周期之外,这三种类型之间不存在性能差异。虽然 character varying)_ 在其他一些数据库系统中具有性能优势,但在 PostgreSQL 中没有此类优势;实际上,character(_n)_ 通常是最慢的,因为它增加了存储成本。在大多数情况下,应使用 character(_ntext

There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(_n)_ has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(_n)_ is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

有关字符串的语法,请参阅 Section 4.1.2.1,有关可用运算符和函数的信息,请参阅 Chapter 9

Refer to Section 4.1.2.1 for information about the syntax of string literals, and to Chapter 9 for information about available operators and functions.

Example 8.1. Using the Character Types

CREATE TABLE test1 (a character(4));
INSERT INTO test1 VALUES ('ok');
SELECT a, char_length(a) FROM test1; -- (1)

  a   | char_length
------+-------------
 ok   |           2


CREATE TABLE test2 (b varchar(5));
INSERT INTO test2 VALUES ('ok');
INSERT INTO test2 VALUES ('good      ');
INSERT INTO test2 VALUES ('too long');
ERROR:  value too long for type character varying(5)
INSERT INTO test2 VALUES ('too long'::varchar(5)); -- explicit truncation
SELECT b, char_length(b) FROM test2;

   b   | char_length
-------+-------------
 ok    |           2
 good  |           5
 too l |           5

PostgreSQL 中还有两种其他固定长度字符类型,如 Table 8.5所示。这些类型不用于通用目的,只用于内部系统目录。name_类型用于存储标识符。其长度目前定义为 64 个字节(63 个可用字符加上终止符),但应使用 _C_源代码中的常量 _NAMEDATALEN_引用它。长度在编译时设置(因此可针对特殊用途进行调整);默认最大长度在将来的版本中可能会更改。类型 _"char"(注意引号)不同于 char(1),因为它只使用一个字节的存储,因此只能存储单个 ASCII 字符。它在系统目录中用作简单的枚举类型。

There are two other fixed-length character types in PostgreSQL, shown in Table 8.5. These are not intended for general-purpose use, only for use in the internal system catalogs. The name type is used to store identifiers. Its length is currently defined as 64 bytes (63 usable characters plus terminator) but should be referenced using the constant NAMEDATALEN in C source code. The length is set at compile time (and is therefore adjustable for special uses); the default maximum length might change in a future release. The type "char" (note the quotes) is different from char(1) in that it only uses one byte of storage, and therefore can store only a single ASCII character. It is used in the system catalogs as a simplistic enumeration type.

Table 8.5. Special Character Types

Name

Storage Size

Description

"char"

1 byte

single-byte internal type

name

64 bytes

internal type for object names