Postgresql 中文操作指南

24.1. Locale Support #

Locale 支持是指应用程序尊重有关字母、分类、数字格式化等的文化偏好。PostgreSQL 使用服务器操作系统提供的标准 ISO C 和 POSIX 区域设置工具。有关其他信息,请参阅您的系统的文档。

Locale support refers to an application respecting cultural preferences regarding alphabets, sorting, number formatting, etc. PostgreSQL uses the standard ISO C and POSIX locale facilities provided by the server operating system. For additional information refer to the documentation of your system.

24.1.1. Overview #

区域设置支持在使用 initdb 创建数据库集群时自动初始化。initdb 将使用其执行环境的区域设置初始化数据库集群,因此,如果您的系统已设置为在您的数据库集群中使用您想要的区域设置,则无需执行其他操作。如果您想使用不同的区域设置(或者您不确定您的系统设置为使用哪个区域设置),则可以向 initdb 指示到底使用哪个区域设置,方法是指定 —​locale 选项。例如:

Locale support is automatically initialized when a database cluster is created using initdb. initdb will initialize the database cluster with the locale setting of its execution environment by default, so if your system is already set to use the locale that you want in your database cluster then there is nothing else you need to do. If you want to use a different locale (or you are not sure which locale your system is set to), you can instruct initdb exactly which locale to use by specifying the —​locale option. For example:

initdb --locale=sv_SE

Unix 系统的此示例将区域设置设为瑞典语(sv),如瑞典所说的(SE)。其他可能性包括 en_US(美国英语)和 fr_CA(加拿大法语)。如果一个区域设置可以使用多个字符集,则该规范可以采用 language_territory.codeset 形式。例如,fr_BE.UTF-8 表示比利时 (BE) 所说的法语语言 (fr),并使用 UTF-8 字符集编码。

This example for Unix systems sets the locale to Swedish (sv) as spoken in Sweden (SE). Other possibilities might include en_US (U.S. English) and fr_CA (French Canadian). If more than one character set can be used for a locale then the specifications can take the form language_territory.codeset. For example, fr_BE.UTF-8 represents the French language (fr) as spoken in Belgium (BE), with a UTF-8 character set encoding.

您的系统下以什么名称提供哪些区域设置取决于操作系统供应商提供了什么,以及安装了什么。在大多数 Unix 系统上,命令 locale -a 将提供可用区域设置的列表。Windows 使用更详细的区域设置名称,例如 German_GermanySwedish_Sweden.1252,但原理相同。

What locales are available on your system under what names depends on what was provided by the operating system vendor and what was installed. On most Unix systems, the command locale -a will provide a list of available locales. Windows uses more verbose locale names, such as German_Germany or Swedish_Sweden.1252, but the principles are the same.

有时,混合几个区域设置中的规则是有用的,例如,使用英语排序规则但使用西班牙语信息。为了支持这一点,存在一组区域设置子类别,仅控制本地化规则的某些方面:

Occasionally it is useful to mix rules from several locales, e.g., use English collation rules but Spanish messages. To support that, a set of locale subcategories exist that control only certain aspects of the localization rules:

类别名称转换为 initdb 选项的名称,以覆盖特定类别的区域设置选择。例如,要将区域设置设为加拿大法语,但对货币格式化使用美国规则,请使用 initdb --locale=fr_CA --lc-monetary=en_US

The category names translate into names of initdb options to override the locale choice for a specific category. For instance, to set the locale to French Canadian, but use U.S. rules for formatting currency, use initdb --locale=fr_CA --lc-monetary=en_US.

如果您想让系统表现得好像没有区域设置支持,请使用特殊区域设置名称 C,或等效 POSIX

If you want the system to behave as if it had no locale support, use the special locale name C, or equivalently POSIX.

当创建数据库时,某些语言环境类别必须固定其值。您可以对不同的数据库使用不同的设置,但创建数据库后,您不能再更改该数据库的这些设置。LC_COLLATELC_CTYPE 是这些类别。它们会影响索引的排序顺序,因此必须保持固定状态,否则文本列上的索引就会损坏。(但您可以按照 Section 24.2 中所讨论的内容,使用校对来减轻这种限制。)当运行 initdb 时,将确定这些类别的默认值,并且在创建新数据库时将使用这些值,除非在 CREATE DATABASE 命令中另有规定。

Some locale categories must have their values fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore. LC_COLLATE and LC_CTYPE are these categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on text columns would become corrupt. (But you can alleviate this restriction using collations, as discussed in Section 24.2.) The default values for these categories are determined when initdb is run, and those values are used when new databases are created, unless specified otherwise in the CREATE DATABASE command.

可以随时通过设置与语言环境类别同名的服务器配置参数来更改其他语言环境类别(有关详细信息,请参阅 Section 20.11.2)。实际上,initdb 选择的值只写入配置文件 postgresql.conf 中,以便在启动服务器时用作默认值。如果您从 postgresql.conf 中删除这些赋值,则服务器将从其执行环境继承这些设置。

The other locale categories can be changed whenever desired by setting the server configuration parameters that have the same name as the locale categories (see Section 20.11.2 for details). The values that are chosen by initdb are actually only written into the configuration file postgresql.conf to serve as defaults when the server is started. If you remove these assignments from postgresql.conf then the server will inherit the settings from its execution environment.

请注意,服务器的区域设置行为是由服务器看到的环境变量确定的,而不是由任何客户端的环境确定的。因此,在启动服务器前,请务必配置正确的区域设置。这样做的一个结果是,如果客户端和服务器在不同的区域设置中设置,则消息可能会出现不同的语言,具体取决于它们来自何处。

Note that the locale behavior of the server is determined by the environment variables seen by the server, not by the environment of any client. Therefore, be careful to configure the correct locale settings before starting the server. A consequence of this is that if client and server are set up in different locales, messages might appear in different languages depending on where they originated.

Note

当我们谈到从执行环境继承语言环境时,在大多数操作系统上,这意味着以下内容:对于给定的语言环境类别(例如排序),按此顺序查询以下环境变量,直到找到设置了一个变量为止:LC_ALLLC_COLLATE(或与相应类别对应的变量)、LANG。如果未设置这些环境变量中的任何一个,则语言环境默认为 C

When we speak of inheriting the locale from the execution environment, this means the following on most operating systems: For a given locale category, say the collation, the following environment variables are consulted in this order until one is found to be set: LC_ALL, LC_COLLATE (or the variable corresponding to the respective category), LANG. If none of these environment variables are set then the locale defaults to C.

某些消息本地化库还会查看环境变量 LANGUAGE,该变量覆盖所有其他区域设置,以便设置消息的语言。如果您有疑问,请参阅您的操作系统的文档,特别是有关 gettext 的文档。

Some message localization libraries also look at the environment variable LANGUAGE which overrides all other locale settings for the purpose of setting the language of messages. If in doubt, please refer to the documentation of your operating system, in particular the documentation about gettext.

为了允许将消息翻译成用户首选的语言,必须在构建时选择 NLS(configure --enable-nls)。所有其他区域设置支持将自动内置。

To enable messages to be translated to the user’s preferred language, NLS must have been selected at build time (configure --enable-nls). All other locale support is built in automatically.

24.1.2. Behavior #

区域设置影响以下 SQL 功能:

The locale settings influence the following SQL features:

在 PostgreSQL 中使用其他区域设置(CPOSIX 除外)的缺点是其对性能的影响。它会减慢字符处理速度,并阻止 LIKE 使用普通索引。出于这个原因,只有在您真正需要区域设置时才使用它们。

The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them.

为了变通处理 PostgreSQL 在非 C 语言环境下使用带有 LIKE 子句的索引,存在几个自定义操作符类。它们允许创建执行严格的逐字符比较的索引,忽略语言环境比较规则。有关更多信息,请参阅 Section 11.10。另一种方法是按照 Section 24.2 中所讨论的内容,使用 C 校对创建索引。

As a workaround to allow PostgreSQL to use indexes with LIKE clauses under a non-C locale, several custom operator classes exist. These allow the creation of an index that performs a strict character-by-character comparison, ignoring locale comparison rules. Refer to Section 11.10 for more information. Another approach is to create indexes using the C collation, as discussed in Section 24.2.

24.1.3. Selecting Locales #

区域设置可以在不同的范围内选择,具体取决于要求。以上概述显示了如何使用 initdb 指定区域设置,以设置整个集群的默认设置。以下列表显示了可以在其中选择区域设置的位置。每个项目都提供后续项目的默认设置,并且每个较低项目都允许以更精细的粒度覆盖默认设置。

Locales can be selected in different scopes depending on requirements. The above overview showed how locales are specified using initdb to set the defaults for the entire cluster. The following list shows where locales can be selected. Each item provides the defaults for the subsequent items, and each lower item allows overriding the defaults on a finer granularity.

24.1.4. Locale Providers #

PostgreSQL 支持多 locale providers。这指定哪个库提供区域设置数据。一个标准提供程序名称是 libc,该提供程序使用操作系统 C 库提供的区域设置。这些是操作系统提供的的大多数工具使用的区域设置。另一个提供程序是 icu,该提供程序使用外部 ICU 库。只有在构建 PostgreSQL 时配置了对 ICU 的支持时,才能使用 ICU 区域设置。

PostgreSQL supports multiple locale providers. This specifies which library supplies the locale data. One standard provider name is libc, which uses the locales provided by the operating system C library. These are the locales used by most tools provided by the operating system. Another provider is icu, which uses the external ICU library. ICU locales can only be used if support for ICU was configured when PostgreSQL was built.

如上所述,选择区域设置的命令和工具,每个命令和工具都有一个选择区域设置提供程序的选项。前面显示的示例都使用 libc 提供程序,这是默认提供程序。以下是一个使用 ICU 提供程序初始化数据库群集的示例:

The commands and tools that select the locale settings, as described above, each have an option to select the locale provider. The examples shown earlier all use the libc provider, which is the default. Here is an example to initialize a database cluster using the ICU provider:

initdb --locale-provider=icu --icu-locale=en

有关详细信息,请参见各个命令和程序的说明。请注意,您可以在不同的粒度上混合区域设置提供程序,例如默认情况下为集群使用 libc,但有一个使用 icu 提供程序的数据库,然后在这些数据库中有使用任一提供程序的排序规则对象。

See the description of the respective commands and programs for details. Note that you can mix locale providers at different granularities, for example use libc by default for the cluster but have one database that uses the icu provider, and then have collation objects using either provider within those databases.

使用哪个区域设置提供程序取决于个人需求。对于大多数简单用途,任一提供程序都可以给出充足的结果。对于 libc 提供程序,则取决于操作系统提供什么;一些操作系统比其他操作系统更好。对于高级用途,ICU 提供了更多的区域设置变体和自定义选项。

Which locale provider to use depends on individual requirements. For most basic uses, either provider will give adequate results. For the libc provider, it depends on what the operating system offers; some operating systems are better than others. For advanced uses, ICU offers more locale variants and customization options.

24.1.5. ICU Locales #

24.1.5.1. ICU Locale Names #

语言环境名称的 ICU 格式为 Language Tag

The ICU format for the locale name is a Language Tag.

CREATE COLLATION mycollation1 (provider = icu, locale = 'ja-JP');
CREATE COLLATION mycollation2 (provider = icu, locale = 'fr');

24.1.5.2. Locale Canonicalization and Validation #

在将 ICU 定义为新的 ICU 排序对象或数据库时,如果区域设置名称不是这种形式,则将该区域设置名称转换为语言标签(“规范化”)。例如:

When defining a new ICU collation object or database with ICU as the provider, the given locale name is transformed ("canonicalized") into a language tag if not already in that form. For instance,

CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true');
NOTICE:  using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8');
NOTICE:  using standard form "de-DE" for locale "de_DE.utf8"

如果您看到此通知,请确保 providerlocale 是预期结果。若要使用 ICU 提供程序时获得一致的结果,请指定规范 language tag 而不是依赖于转换。

If you see this notice, ensure that the provider and locale are the expected result. For consistent results when using the ICU provider, specify the canonical language tag instead of relying on the transformation.

不包含语言名称或使用特殊语言名称 root 的区域设置将被转换为包含语言 und(“未限定”)。

A locale with no language name, or the special language name root, is transformed to have the language und ("undefined").

ICU 可以将大多数 libc 区域设置名称以及某些其他格式转换为语言标签,以方便过渡到 ICU。如果在 ICU 中使用了 libc 区域设置名称,则其可能与 libc 中的行为不完全相同。

ICU can transform most libc locale names, as well as some other formats, into language tags for easier transition to ICU. If a libc locale name is used in ICU, it may not have precisely the same behavior as in libc.

如果在解释区域设置名称时出现问题,或者如果区域设置名称代表 ICU 无法识别的语言或区域,您将看到以下警告:

If there is a problem interpreting the locale name, or if the locale name represents a language or region that ICU does not recognize, you will see the following warning:

CREATE COLLATION nonsense (provider = icu, locale = 'nonsense');
WARNING:  ICU locale "nonsense" has unknown language "nonsense"
HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
CREATE COLLATION

icu_validation_level 控制如何报告消息。除非设置为 ERROR,否则仍会创建校对,但行为可能不是用户预期的那样。

icu_validation_level controls how the message is reported. Unless set to ERROR, the collation will still be created, but the behavior may not be what the user intended.

24.1.5.3. Language Tag #

语言标签(在 BCP 47 中定义)是一种标准的标识符,用于标识语言、区域和其他有关区域设置的信息。

A language tag, defined in BCP 47, is a standardized identifier used to identify languages, regions, and other information about a locale.

基本语言标记非常简单,可以是 language-region ;甚至可以简单为 languagelanguage 是语言代码(例如, fr 代表法语),而 region 是地区代码(例如, CA 代表加拿大)。示例: ja-JPdefr-CA

Basic language tags are simply language-region; or even just language. The language is a language code (e.g. fr for French), and region is a region code (e.g. CA for Canada). Examples: ja-JP, de, or fr-CA.

可以将排序设置包含在语言标签中以自定义排序行为。ICU 允许进行广泛自定义,例如,对重音、大小写和标点的敏感性(或不敏感);文本中数字的处理;以及许多其他选项以满足各种用途。

Collation settings may be included in the language tag to customize collation behavior. ICU allows extensive customization, such as sensitivity (or insensitivity) to accents, case, and punctuation; treatment of digits within text; and many other options to satisfy a variety of uses.

若要将该附加排序规则信息包含在语言标记中,请附加 -u ,表示有其他排序规则设置,后跟一个或多个 -_key-value pairs. The key is the key for a collation setting and value is a valid value for that setting. For boolean settings, the -_key ,可以不指定对应的 -__value ,这意味着 true 的值。

To include this additional collation information in a language tag, append -u, which indicates there are additional collation settings, followed by one or more -key-value pairs. The key is the key for a collation setting and value is a valid value for that setting. For boolean settings, the -key may be specified without a corresponding -__value, which implies a value of true.

例如,语言标签 en-US-u-kn-ks-level2 表示使用美国区域中的英语语言的区域设置,而排序设置 kn 设置为 trueks 设置为 level2。这些设置表示排序将不区分大小写,并将一系列数字视为一个数字:

For example, the language tag en-US-u-kn-ks-level2 means the locale with the English language in the US region, with collation settings kn set to true and ks set to level2. Those settings mean the collation will be case-insensitive and treat a sequence of digits as a single number:

CREATE COLLATION mycollation5 (provider = icu, deterministic = false, locale = 'en-US-u-kn-ks-level2');
SELECT 'aB' = 'Ab' COLLATE mycollation5 as result;
 result
--------
 t
(1 row)

SELECT 'N-45' < 'N-123' COLLATE mycollation5 as result;
 result
--------
 t
(1 row)

有关详细内容和使用带自定义校对信息的语言标记的附加示例,请参阅 Section 24.2.3

See Section 24.2.3 for details and additional examples of using language tags with custom collation information for the locale.

24.1.6. Problems #

如果区域设置支持无法按上述解释正常工作,请检查您操作系统的区域设置支持是否已正确配置。如果您的操作系统提供了的话,可以使用命令 locale -a 来检查您的系统上安装了哪些区域设置。

If locale support doesn’t work according to the explanation above, check that the locale support in your operating system is correctly configured. To check what locales are installed on your system, you can use the command locale -a if your operating system provides it.

检查 PostgreSQL 是否实际使用了您认为它正在使用的区域设置。在创建数据库时确定 LC_COLLATELC_CTYPE 设置,并且无法更改,除非创建一个新数据库。包括 LC_MESSAGESLC_MONETARY 在内的其他区域设置最初由服务器启动时的环境确定,但可以通过即时方式更改。可以使用 SHOW 命令检查活动的区域设置。

Check that PostgreSQL is actually using the locale that you think it is. The LC_COLLATE and LC_CTYPE settings are determined when a database is created, and cannot be changed except by creating a new database. Other locale settings including LC_MESSAGES and LC_MONETARY are initially determined by the environment the server is started in, but can be changed on-the-fly. You can check the active locale settings using the SHOW command.

源分发中的 src/test/locale 目录包含用于 PostgreSQL 区域设置支持的测试套件。

The directory src/test/locale in the source distribution contains a test suite for PostgreSQL’s locale support.

通过解析错误消息的文本来处理服务器端错误的客户端应用程序在服务器消息采用不同语言时显然会出现问题。建议此类应用程序的作者使用错误代码方案。

Client applications that handle server-side errors by parsing the text of the error message will obviously have problems when the server’s messages are in a different language. Authors of such applications are advised to make use of the error code scheme instead.

维护消息翻译目录需要许多志愿者持续努力,他们希望看到 PostgreSQL 很好地表达其首选语言。如果当前您语言的消息不可用或未完全翻译,我们将感谢您的协助。如果您想要提供帮助,请参阅 Chapter 57 或写信至开发人员邮件列表。

Maintaining catalogs of message translations requires the on-going efforts of many volunteers that want to see PostgreSQL speak their preferred language well. If messages in your language are currently not available or not fully translated, your assistance would be appreciated. If you want to help, refer to Chapter 57 or write to the developers' mailing list.