Postgresql 中文操作指南

F.10. citext — a case-insensitive character string type #

citext 模块提供了一个不区分大小写的字符串类型 citext。从本质上来说,在比较值时,它在内部调用 lower。否则,它几乎完全像 text 一样运行。

The citext module provides a case-insensitive character string type, citext. Essentially, it internally calls lower when comparing values. Otherwise, it behaves almost exactly like text.

Tip

考虑使用 nondeterministic collations (参见 Section 24.2.2.4)代替此模块。它们可用于大小写不敏感比较、重音不敏感比较和其他组合,并且可以正确处理更多 Unicode 特殊情况。

Consider using nondeterministic collations (see Section 24.2.2.4) instead of this module. They can be used for case-insensitive comparisons, accent-insensitive comparisons, and other combinations, and they handle more Unicode special cases correctly.

此模块被认为是“受信任的”,也就是说,它可以由在当前数据库上具有 CREATE 权限的非超级用户安装。

This module is considered “trusted”, that is, it can be installed by non-superusers who have CREATE privilege on the current database.

F.10.1. Rationale #

在 PostgreSQL 中执行不区分大小写匹配的标准方法是在比较值时使用 lower 函数,例如

The standard approach to doing case-insensitive matches in PostgreSQL has been to use the lower function when comparing values, for example

SELECT * FROM tab WHERE lower(col) = LOWER(?);

这相当不错,但有若干缺点:

This works reasonably well, but has a number of drawbacks:

citext 数据类型允许您在 SQL 查询中消除对 lower 的调用,并且允许将主键设为不区分大小写。citext 是对区域设置敏感的,就像 text 一样,这意味着匹配大写字符和小写字符取决于数据库 LC_CTYPE 设置的规则。同样,此行为与在查询中使用 lower 完全相同。但由于数据类型以透明的方式完成此操作,因此您不必在查询中记住做任何特殊的事情。

The citext data type allows you to eliminate calls to lower in SQL queries, and allows a primary key to be case-insensitive. citext is locale-aware, just like text, which means that the matching of upper case and lower case characters is dependent on the rules of the database’s LC_CTYPE setting. Again, this behavior is identical to the use of lower in queries. But because it’s done transparently by the data type, you don’t have to remember to do anything special in your queries.

F.10.2. How to Use It #

下面是一个简单的使用示例:

Here’s a simple example of usage:

CREATE TABLE users (
    nick CITEXT PRIMARY KEY,
    pass TEXT   NOT NULL
);

INSERT INTO users VALUES ( 'larry',  sha256(random()::text::bytea) );
INSERT INTO users VALUES ( 'Tom',    sha256(random()::text::bytea) );
INSERT INTO users VALUES ( 'Damian', sha256(random()::text::bytea) );
INSERT INTO users VALUES ( 'NEAL',   sha256(random()::text::bytea) );
INSERT INTO users VALUES ( 'Bjørn',  sha256(random()::text::bytea) );

SELECT * FROM users WHERE nick = 'Larry';

SELECT 语句将返回一个元组,即使 nick 列已设为 larry 且查询为 Larry

The SELECT statement will return one tuple, even though the nick column was set to larry and the query was for Larry.

F.10.3. String Comparison Behavior #

citext 通过将每个字符串转换为小写(就如同调用了 lower)来执行比较,然后正常比较结果。因此,例如,如果 lower 会对它们产生相同的结果,那么两个字符串将被视为相等。

citext performs comparisons by converting each string to lower case (as though lower were called) and then comparing the results normally. Thus, for example, two strings are considered equal if lower would produce identical results for them.

为了尽可能模拟不区分大小写的归类,有大量字符串处理运算符和函数的 citext 专用版本。因此,例如,当应用于 citext 时,正则表达式运算符 ~~* 会表现出相同行为:它们都匹配不区分大小写。!~!~* 也是如此,还有 LIKE 运算符 ~~~~*,以及 !~~!~~*。如果您想匹配区分大小写,则可以将运算符的参数强制转换为 text

In order to emulate a case-insensitive collation as closely as possible, there are citext-specific versions of a number of string-processing operators and functions. So, for example, the regular expression operators ~ and ~* exhibit the same behavior when applied to citext: they both match case-insensitively. The same is true for !~ and !~*, as well as for the LIKE operators ~~ and ~~*, and !~~ and !~~*. If you’d like to match case-sensitively, you can cast the operator’s arguments to text.

同样,如果其参数为 citext,则所有以下函数的功能都执行不区分大小写的匹配:

Similarly, all of the following functions perform matching case-insensitively if their arguments are citext:

对于正则表达式函数,如果希望进行区分大小写的匹配,可以指定“c”标志来强制进行区分大小写的匹配。否则,如果你希望对这些函数中的一个进行区分大小写的处理,必须在使用这些函数之前将其强制转换为 text

For the regexp functions, if you want to match case-sensitively, you can specify the “c” flag to force a case-sensitive match. Otherwise, you must cast to text before using one of these functions if you want case-sensitive behavior.

F.10.4. Limitations #

F.10.5. Author #

受到唐纳德·弗雷泽的最初 citext 模块的启发。

Inspired by the original citext module by Donald Fraser.