Postgresql 中文操作指南
F.48. unaccent — a text search dictionary which removes diacritics #
unaccent 是一个文本搜索词典,它从词素中删除重音符号(变音符号)。它是一个过滤词典,这意味着它的输出始终会传给下一个词典(如果有的话),这与词典的正常行为不同。这允许对全文搜索进行不区分重音的处理。
unaccent is a text search dictionary that removes accents (diacritic signs) from lexemes. It’s a filtering dictionary, which means its output is always passed to the next dictionary (if any), unlike the normal behavior of dictionaries. This allows accent-insensitive processing for full text search.
unaccent 的当前实现不能用作 thesaurus 词典的标准化词典。
The current implementation of unaccent cannot be used as a normalizing dictionary for the thesaurus dictionary.
此模块被认为是“受信任的”,也就是说,它可以由在当前数据库上具有 CREATE 权限的非超级用户安装。
This module is considered “trusted”, that is, it can be installed by non-superusers who have CREATE privilege on the current database.
F.48.1. Configuration #
unaccent 词典支持以下选项:
An unaccent dictionary accepts the following options:
规则文件的格式如下:
The rules file has the following format:
一个更完整的示例,它对大多数欧洲语言非常有用,可以在 unaccent.rules 中找到,当安装 unaccent 模块时,它会被安装在 $SHAREDIR/tsearch_data/ 中。此规则文件会将带重音符号的字符转换为没有重音符号的相同字符,并且还会将连字扩展为等效的一系列简单字符(例如,Æ 转换为 AE)。
A more complete example, which is directly useful for most European languages, can be found in unaccent.rules, which is installed in $SHAREDIR/tsearch_data/ when the unaccent module is installed. This rules file translates characters with accents to the same characters without accents, and it also expands ligatures into the equivalent series of simple characters (for example, Æ to AE).
F.48.2. Usage #
安装 unaccent 扩展将创建一个基于它的文本搜索模板 unaccent 和一个词典 unaccent。unaccent 词典有默认参数设置 RULES='unaccent',这使其可以用标准 unaccent.rules 文件立即使用。如果您愿意,可以更改参数,例如
Installing the unaccent extension creates a text search template unaccent and a dictionary unaccent based on it. The unaccent dictionary has the default parameter setting RULES='unaccent', which makes it immediately usable with the standard unaccent.rules file. If you wish, you can alter the parameter, for example
mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');
或基于模板创建新的词典。
or create new dictionaries based on the template.
要测试词典,可以尝试:
To test the dictionary, you can try:
mydb=# select ts_lexize('unaccent','Hôtel');
ts_lexize
-----------
{Hotel}
(1 row)
以下是一个示例,展示如何将 unaccent 词典插入到文本搜索配置中:
Here is an example showing how to insert the unaccent dictionary into a text search configuration:
mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
mydb=# ALTER TEXT SEARCH CONFIGURATION fr
ALTER MAPPING FOR hword, hword_part, word
WITH unaccent, french_stem;
mydb=# select to_tsvector('fr','Hôtels de la Mer');
to_tsvector
-------------------
'hotel':1 'mer':4
(1 row)
mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');
?column?
----------
t
(1 row)
mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels'));
ts_headline
------------------------
<b>Hôtel</b> de la Mer
(1 row)
F.48.3. Functions #
unaccent() 函数可以从给定的字符串中删除重音符号(变音符号)。从本质上说,它是一个围绕 unaccent 类型词典的包装器,但它可以用于正常的文本搜索上下文之外。
The unaccent() function removes accents (diacritic signs) from a given string. Basically, it’s a wrapper around unaccent-type dictionaries, but it can be used outside normal text search contexts.
unaccent([dictionary regdictionary, ] string text) returns text
如果省略 dictionary 参数,则会使用与 unaccent() 函数本身出现在同一模式中并且名为 unaccent 的文本搜索词典。
If the dictionary argument is omitted, the text search dictionary named unaccent and appearing in the same schema as the unaccent() function itself is used.
例如:
For example:
SELECT unaccent('unaccent', 'Hôtel');
SELECT unaccent('Hôtel');