Postgresql 中文操作指南
12.7. Configuration Example #
文本搜索配置指定了将文档转换为 tsvector 所需的所有选项:用于将文本分解成标记的解析器,以及用于将每个标记转换为词素的词典。每次调用 to_tsvector 或 to_tsquery 都需要文本搜索配置来执行其处理。配置参数 default_text_search_config 指定默认配置的名称,即如果省略显式配置参数,则文本搜索函数使用的名称。它可以在 postgresql.conf 中设置,或使用 SET 命令为单个会话设置。
A text search configuration specifies all options necessary to transform a document into a tsvector: the parser to use to break text into tokens, and the dictionaries to use to transform each token into a lexeme. Every call of to_tsvector or to_tsquery needs a text search configuration to perform its processing. The configuration parameter default_text_search_config specifies the name of the default configuration, which is the one used by text search functions if an explicit configuration parameter is omitted. It can be set in postgresql.conf, or set for an individual session using the SET command.
提供了几种预定义文本搜索配置,您可以轻松创建自定义配置。为了方便管理文本搜索对象,提供了一组 SQL 命令,并且有若干 psql 命令可以显示关于文本搜索对象的信息( Section 12.10)。
Several predefined text search configurations are available, and you can create custom configurations easily. To facilitate management of text search objects, a set of SQL commands is available, and there are several psql commands that display information about text search objects (Section 12.10).
作为一个示例,我们将创建一个 pg 配置,首先通过复制内置 english 配置开始:
As an example we will create a configuration pg, starting by duplicating the built-in english configuration:
CREATE TEXT SEARCH CONFIGURATION public.pg ( COPY = pg_catalog.english );
我们将使用一个 PostgreSQL 特定的同义词表并将其存储在 $SHAREDIR/tsearch_data/pg_dict.syn 中。文件内容看起来像:
We will use a PostgreSQL-specific synonym list and store it in $SHAREDIR/tsearch_data/pg_dict.syn. The file contents look like:
postgres pg
pgsql pg
postgresql pg
我们这样定义同义词词典:
We define the synonym dictionary like this:
CREATE TEXT SEARCH DICTIONARY pg_dict (
TEMPLATE = synonym,
SYNONYMS = pg_dict
);
接下来,我们注册 Ispell 词典 english_ispell,它有自己的配置文件:
Next we register the Ispell dictionary english_ispell, which has its own configuration files:
CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = english,
AffFile = english,
StopWords = english
);
现在,我们可以设置 pg 配置文件中的词语映射:
Now we can set up the mappings for words in configuration pg:
ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pg_dict, english_ispell, english_stem;
我们选择不索引或搜索一些内置配置处理的标记类型:
We choose not to index or search some token types that the built-in configuration does handle:
ALTER TEXT SEARCH CONFIGURATION pg
DROP MAPPING FOR email, url, url_path, sfloat, float;
现在,我们可以测试我们的配置:
Now we can test our configuration:
SELECT * FROM ts_debug('public.pg', '
PostgreSQL, the highly scalable, SQL compliant, open source object-relational
database management system, is now undergoing beta testing of the next
version of our software.
');
下一步是设置会话以使用 public 架构中创建的新配置:
The next step is to set the session to use the new configuration, which was created in the public schema:
=> \dF
List of text search configurations
Schema | Name | Description
---------+------+-------------
public | pg |
SET default_text_search_config = 'public.pg';
SET
SHOW default_text_search_config;
default_text_search_config
----------------------------
public.pg