Postgresql 中文操作指南

8.13. XML Type #

xml 的数据类型可用于存储 XML 数据。它优于在 text 字段中存储 XML 数据,因为它会检查输入值是否格式正确,并且有一些支持函数可以对其执行类型安全操作;请参见 Section 9.15。使用此数据类型要求已用 configure --with-libxml 构建安装程序。

The xml data type can be used to store XML data. Its advantage over storing XML data in a text field is that it checks the input values for well-formedness, and there are support functions to perform type-safe operations on it; see Section 9.15. Use of this data type requires the installation to have been built with configure --with-libxml.

xml 类型可存储 XML 标准定义的格式良好的“文档”,以及“内容”片段,即通过引用 XQuery 和 XPath 数据模型的更宽松规范来定义的内容片段。从大致上说,这意味着内容片段可以有多个顶级元素或字符节点。表达式 _xmlvalue IS DOCUMENT_ 可用于评估特定 xml 值是完整文档还是仅是内容片段。

The xml type can store well-formed “documents”, as defined by the XML standard, as well as “content” fragments, which are defined by reference to the more permissive “document node” of the XQuery and XPath data model. Roughly, this means that content fragments can have more than one top-level element or character node. The expression _xmlvalue IS DOCUMENT_ can be used to evaluate whether a particular xml value is a full document or only a content fragment.

xml 的数据类型的限制和兼容性注释可以在 Section D.3 中找到。

Limits and compatibility notes for the xml data type can be found in Section D.3.

8.13.1. Creating XML Values #

要通过字符数据生成 xml 类型的值,请使用以下函数 xmlparse

To produce a value of type xml from character data, use the function xmlparse:

XMLPARSE ( { DOCUMENT | CONTENT } value)

示例:

Examples:

XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter></book>')
XMLPARSE (CONTENT 'abc<foo>bar</foo><bar>foo</bar>')

虽然这是根据 SQL 标准将字符串转换为 XML 值的唯一方法,但 PostgreSQL 特有的语法为:

While this is the only way to convert character strings into XML values according to the SQL standard, the PostgreSQL-specific syntaxes:

xml '<foo>bar</foo>'
'<foo>bar</foo>'::xml

也可以使用。

can also be used.

xml 类型不会针对文档类型声明(DTD)验证输入值,即使输入值指定 DTD 也是如此。目前也没有针对 XML 模式这类其他 XML 模式语言进行验证的内置支持。

The xml type does not validate input values against a document type declaration (DTD), even when the input value specifies a DTD. There is also currently no built-in support for validating against other XML schema languages such as XML Schema.

反向运算(即通过 xml 制作字符字符串值)使用了 xmlserialize 函数:

The inverse operation, producing a character string value from xml, uses the function xmlserialize:

XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [ NO ] INDENT ] )

type 可以是 charactercharacter varyingtext(或其中一个的别名)。同样,根据 SQL 标准,这是 xml 类型和字符类型间唯一转换方式,但 PostgreSQL 允许您简单地转换该值。

type can be character, character varying, or text (or an alias for one of those). Again, according to the SQL standard, this is the only way to convert between type xml and character types, but PostgreSQL also allows you to simply cast the value.

INDENT 选项会将结果作为漂亮格式输出,而 NO INDENT(这是默认选择)只会发出原始输入字符串。同样,转换到字符类型会产生原始字符串。

The INDENT option causes the result to be pretty-printed, while NO INDENT (which is the default) just emits the original input string. Casting to a character type likewise produces the original string.

当将字符字符串值转换为或转换自 xml 类型且没有分别经过 XMLPARSEXMLSERIALIZE 时,DOCUMENTCONTENT 的选择将由“XML 选项”会话配置参数决定,该参数可使用标准命令设置:

When a character string value is cast to or from type xml without going through XMLPARSE or XMLSERIALIZE, respectively, the choice of DOCUMENT versus CONTENT is determined by the “XML option” session configuration parameter, which can be set using the standard command:

SET XML OPTION { DOCUMENT | CONTENT };

或更类似 PostgreSQL 的语法

or the more PostgreSQL-like syntax

SET xmloption TO { DOCUMENT | CONTENT };

默认值为 CONTENT,因此允许所有形式的 XML 数据。

The default is CONTENT, so all forms of XML data are allowed.

8.13.2. Encoding Handling #

在处理客户端、服务器和通过它们的 XML 数据上的多个字符编码时,必须小心。使用文本模式将查询传递给服务器并将查询结果传递给客户端(这是正常模式)时,PostgreSQL 将客户端和服务器之间传递的所有字符数据以及反之亦然都转换为各个端的字符编码;请参见 Section 24.3。这包括 XML 值的字符串表示,例如以上示例中的值。通常,这意味着 XML 数据中包含的编码声明可能会在字符数据在客户端和服务器之间传输时转换为其他编码而变得无效,因为嵌入的编码声明不会改变。为了应对此行为,xml 所输入的字符中包含的编码声明的类型被 ignored,并且内容被假定为当前服务器编码。因此,为了正确处理,XML 数据的字符必须从客户端以当前客户端编码发送。客户端负责在将文档发送到服务器之前将其转换为当前客户端编码,或适当地调整客户端编码。在输出时,xml 类型的值不会有编码声明,并且客户端应假设所有数据都在当前客户端编码中。

Care must be taken when dealing with multiple character encodings on the client, server, and in the XML data passed through them. When using the text mode to pass queries to the server and query results to the client (which is the normal mode), PostgreSQL converts all character data passed between the client and the server and vice versa to the character encoding of the respective end; see Section 24.3. This includes string representations of XML values, such as in the above examples. This would ordinarily mean that encoding declarations contained in XML data can become invalid as the character data is converted to other encodings while traveling between client and server, because the embedded encoding declaration is not changed. To cope with this behavior, encoding declarations contained in character strings presented for input to the xml type are ignored, and content is assumed to be in the current server encoding. Consequently, for correct processing, character strings of XML data must be sent from the client in the current client encoding. It is the responsibility of the client to either convert documents to the current client encoding before sending them to the server, or to adjust the client encoding appropriately. On output, values of type xml will not have an encoding declaration, and clients should assume all data is in the current client encoding.

当使用二进制模式将查询参数传递到服务器并将查询结果传回客户端时,不会执行编码转换,因此情况不同。在这种情况下,将会识别 XML 数据中的编码声明,如果没有,则将假设数据采用 UTF-8 格式(XML 标准要求如此;请注意,PostgreSQL 不支持 UTF-16)。在输出中,数据将具有指定客户端编码的编码声明,但如果客户端编码是 UTF-8,则将被忽略。

When using binary mode to pass query parameters to the server and query results back to the client, no encoding conversion is performed, so the situation is different. In this case, an encoding declaration in the XML data will be observed, and if it is absent, the data will be assumed to be in UTF-8 (as required by the XML standard; note that PostgreSQL does not support UTF-16). On output, data will have an encoding declaration specifying the client encoding, unless the client encoding is UTF-8, in which case it will be omitted.

毋庸置疑,如果 XML 数据编码、客户端编码和服务器编码相同,那么在 PostgreSQL 中处理 XML 数据出错的可能性更小,而且效率更高。由于 XML 数据在内部以 UTF-8 处理,因此当服务器编码也是 UTF-8 时,计算的效率最高。

Needless to say, processing XML data with PostgreSQL will be less error-prone and more efficient if the XML data encoding, client encoding, and server encoding are the same. Since XML data is internally processed in UTF-8, computations will be most efficient if the server encoding is also UTF-8.

Caution

某些与 XML 相关的函数在服务器编码不是 UTF-8 时可能完全无法处理非 ASCII 数据。众所周知,xmltable()xpath() 特别存在此问题。

Some XML-related functions may not work at all on non-ASCII data when the server encoding is not UTF-8. This is known to be an issue for xmltable() and xpath() in particular.

8.13.3. Accessing XML Values #

xml 数据类型很特别,因为它不提供任何比较运算符。这是因为对于 XML 数据没有定义清晰且普遍有用的比较算法。这造成的一个后果是,您无法通过将 xml 列与搜索值进行比较来检索行。因此,XML 值通常应包含单独的主键字段,例如 ID。比较 XML 值的另一种解决方案是先将它们转换为字符字符串,但请注意,字符字符串比较与有用的 XML 比较方法几乎没有关系。

The xml data type is unusual in that it does not provide any comparison operators. This is because there is no well-defined and universally useful comparison algorithm for XML data. One consequence of this is that you cannot retrieve rows by comparing an xml column against a search value. XML values should therefore typically be accompanied by a separate key field such as an ID. An alternative solution for comparing XML values is to convert them to character strings first, but note that character string comparison has little to do with a useful XML comparison method.

由于 xml 数据类型没有比较运算符,因此无法直接在该类型的一列上创建索引。如果需要在 XML 数据中进行快速搜索,可能的解决方法包括将表达式转换为字符字符串类型并对其编制索引,或索引 XPath 表达式。当然,实际查询必须调整为按索引表达式进行搜索。

Since there are no comparison operators for the xml data type, it is not possible to create an index directly on a column of this type. If speedy searches in XML data are desired, possible workarounds include casting the expression to a character string type and indexing that, or indexing an XPath expression. Of course, the actual query would have to be adjusted to search by the indexed expression.

PostgreSQL 中的文本搜索功能也可以用于加速 XML 数据的全文档搜索。但相关的前置处理支持尚不可用在 PostgreSQL 发行版中。

The text-search functionality in PostgreSQL can also be used to speed up full-document searches of XML data. The necessary preprocessing support is, however, not yet available in the PostgreSQL distribution.