Postgresql 中文操作指南

F.50. xml2 — XPath querying and XSLT functionality #

xml2 模块提供 XPath 查询和 XSLT 功能。

The xml2 module provides XPath querying and XSLT functionality.

F.50.1. Deprecation Notice #

从 PostgreSQL 8.3 开始，核心服务器中提供了基于 SQL/XML 标准的 XML 相关功能。此功能涵盖了 XML 语法检查和 XPath 查询，这是此模块所执行的操作，但 API 完全不兼容。计划在未来版本的 PostgreSQL 中删除此模块，转而使用较新的标准 API，因此鼓励您尝试转换应用程序。如果您发现此模块的某些功能在较新的 API 中没有提供适当的形式，请向 < link:mailto:pgsql-hackers@lists.postgresql.org[pgsql-hackers@lists.postgresql.org]> 解释您的问题，以便解决不足之处。

From PostgreSQL 8.3 on, there is XML-related functionality based on the SQL/XML standard in the core server. That functionality covers XML syntax checking and XPath queries, which is what this module does, and more, but the API is not at all compatible. It is planned that this module will be removed in a future version of PostgreSQL in favor of the newer standard API, so you are encouraged to try converting your applications. If you find that some of the functionality of this module is not available in an adequate form with the newer API, please explain your issue to <link:mailto:pgsql-hackers@lists.postgresql.org[pgsql-hackers@lists.postgresql.org]> so that the deficiency can be addressed.

F.50.2. Description of Functions #

Table F.36 显示此模块提供的函数。这些函数提供了直接的 XML 解析和 XPath 查询。

Table F.36 shows the functions provided by this module. These functions provide straightforward XML parsing and XPath queries.

Table F.36. xml2 Functions

Function

Description

xml_valid ( document text ) → boolean

Parses the given document and returns true if the document is well-formed XML. (Note: this is an alias for the standard PostgreSQL function xml_is_well_formed(). The name xml_valid() is technically incorrect since validity and well-formedness have different meanings in XML.)

xpath_string ( document text, query text ) → text

Evaluates the XPath query on the supplied document, and casts the result to text.

xpath_number ( document text, query text ) → real

Evaluates the XPath query on the supplied document, and casts the result to real.

xpath_bool ( document text, query text ) → boolean

Evaluates the XPath query on the supplied document, and casts the result to boolean.

xpath_nodeset ( document text, query text, toptag text, itemtag text ) → text

Evaluates the query on the document and wraps the result in XML tags. If the result is multivalued, the output will look like: <toptag> <itemtag>Value 1 which could be an XML fragment</itemtag> <itemtag>Value 2….</itemtag> </toptag>

If either toptag or itemtag is an empty string, the relevant tag is omitted.

xpath_nodeset ( document text, query text, itemtag text ) → text

Like xpath_nodeset(document, query, toptag, itemtag) but result omits toptag.

xpath_nodeset ( document text, query text ) → text

Like xpath_nodeset(document, query, toptag, itemtag) but result omits both tags.

xpath_list ( document text, query text, separator text ) → text

Evaluates the query on the document and returns multiple values separated by the specified separator, for example Value 1,Value 2,Value 3 if separator is ,.

xpath_list ( document text, query text ) → text

This is a wrapper for the above function that uses , as the separator.

F.50.3. xpath_table #

xpath_table(text key, text document, text relation, text xpaths, text criteria) returns setof record

xpath_table 是一个表函数，它针对一组文档中的每组文档评估一系列 XPath 查询，并将结果作为表返回。结果的第一列中返回原始文档表中的主键字段，以便结果集可以很容易地用于联接。参数将在 Table F.37 中描述。

xpath_table is a table function that evaluates a set of XPath queries on each of a set of documents and returns the results as a table. The primary key field from the original document table is returned as the first column of the result so that the result set can readily be used in joins. The parameters are described in Table F.37.

Table F.37. xpath_table Parameters

Parameter

Description

key

the name of the “key” field — this is just a field to be used as the first column of the output table, i.e., it identifies the record from which each output row came (see note below about multiple values)

document

the name of the field containing the XML document

relation

the name of the table or view containing the documents

xpaths

one or more XPath expressions, separated by _

criteria

除了 XPath 字符串以外，这些参数都刚代入一条简单的 SQL SELECT 语句中，这样便可以获得一些灵活性 — 语句是

These parameters (except the XPath strings) are just substituted into a plain SQL SELECT statement, so you have some flexibility — the statement is

SELECT <key>, <document> FROM <relation> WHERE <criteria>

因此这些参数可以是 anything，在那些特定位置中有效。此 SELECT 的结果需要返回恰好两列（除非尝试列出键或文档的多个字段，否则它将始终返回）。请注意，此简单方法要求验证任何由用户提供的数值，以避免 SQL 注入攻击。

so those parameters can be anything valid in those particular locations. The result from this SELECT needs to return exactly two columns (which it will unless you try to list multiple fields for key or document). Beware that this simplistic approach requires that you validate any user-supplied values to avoid SQL injection attacks.

必须在 FROM 表达式中使用此函数，并使用 AS 子句指定输出列；例如

The function has to be used in a FROM expression, with an AS clause to specify the output columns; for example

SELECT * FROM
xpath_table('article_id',
            'article_xml',
            'articles',
            '/article/author|/article/pages|/article/title',
            'date_entered > ''2003-01-01'' ')
AS t(article_id integer, author text, page_count integer, title text);

AS 子句定义输出表中列的名称和类型。第一列是“键”字段，其余列与 XPath 查询对应。如果 XPath 查询多于结果列，将忽略额外的查询。如果结果列多于 XPath 查询，额外的列将为 NULL。

The AS clause defines the names and types of the columns in the output table. The first is the “key” field and the rest correspond to the XPath queries. If there are more XPath queries than result columns, the extra queries will be ignored. If there are more result columns than XPath queries, the extra columns will be NULL.

请注意，此示例将 page_count 结果列定义为整数。此函数在内部以字符串表示形式进行处理，因此当您表示希望在输出中显示整数时，它将采用 XPath 结果的字符串表示形式，并使用 PostgreSQL 输入函数将其转换为整数（或 AS 子句请求的任何类型）。如果无法执行此操作（例如结果为空），将产生错误 — 因此，如果您认为数据有任何问题，可能希望仅仅将 text 保留为列类型。

Notice that this example defines the page_count result column as an integer. The function deals internally with string representations, so when you say you want an integer in the output, it will take the string representation of the XPath result and use PostgreSQL input functions to transform it into an integer (or whatever type the AS clause requests). An error will result if it can’t do this — for example if the result is empty — so you may wish to just stick to text as the column type if you think your data has any problems.

调用 SELECT 语句不一定要仅为 SELECT * — 它可以按名称引用输出列，或将它们连接到其他表。此函数生成虚拟表，您可以使用此表执行任何希望的操作（例如聚合、联结、排序等等）。因此，我们也可以有：

The calling SELECT statement doesn’t necessarily have to be just SELECT * — it can reference the output columns by name or join them to other tables. The function produces a virtual table with which you can perform any operation you wish (e.g., aggregation, joining, sorting etc.). So we could also have:

SELECT t.title, p.fullname, p.email
FROM xpath_table('article_id', 'article_xml', 'articles',
                 '/article/title|/article/author/@id',
                 'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
       AS t(article_id integer, title text, author_id integer),
     tblPeopleInfo AS p
WHERE t.author_id = p.person_id;

作为更复杂的示例。当然，您可以出于方便起见，将所有这些包装到一个视图中。

as a more complicated example. Of course, you could wrap all of this in a view for convenience.

F.50.3.1. Multivalued Results #

xpath_table 函数假定每个 XPath 查询的结果可能是多值的，因此此函数返回的行数可能与输入文档的数目不同。返回的第一行包含每个查询的第一个结果，第二行包含每个查询的第二个结果。如果其中一个查询比其他查询的值少，则会返回 null 值。

The xpath_table function assumes that the results of each XPath query might be multivalued, so the number of rows returned by the function may not be the same as the number of input documents. The first row returned contains the first result from each query, the second row the second result from each query. If one of the queries has fewer values than the others, null values will be returned instead.

在某些情况下，如果与可能会返回其他结果的 XPath 查询一并使用，用户会知道给定的 XPath 查询仅会返回一个结果（可能是一个唯一的文档标识符），那么该单个值的结果仅会显示在结果的第一行。解决这一问题的方法，是使用密钥字段与一个更简单的 XPath 查询连接。例如：

In some cases, a user will know that a given XPath query will return only a single result (perhaps a unique document identifier) — if used alongside an XPath query returning multiple results, the single-valued result will appear only on the first row of the result. The solution to this is to use the key field as part of a join against a simpler XPath query. As an example:

CREATE TABLE test (
    id int PRIMARY KEY,
    xml text
);

INSERT INTO test VALUES (1, '<doc num="C1">
<line num="L1"><a>1</a><b>2</b><c>3</c></line>
<line num="L2"><a>11</a><b>22</b><c>33</c></line>
</doc>');

INSERT INTO test VALUES (2, '<doc num="C2">
<line num="L1"><a>111</a><b>222</b><c>333</c></line>
<line num="L2"><a>111</a><b>222</b><c>333</c></line>
</doc>');

SELECT * FROM
  xpath_table('id','xml','test',
              '/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
              'true')
  AS t(id int, doc_num varchar(10), line_num varchar(10), val1 int, val2 int, val3 int)
WHERE id = 1 ORDER BY doc_num, line_num

 id | doc_num | line_num | val1 | val2 | val3
----+---------+----------+------+------+------
  1 | C1      | L1       |    1 |    2 |    3
  1 |         | L2       |   11 |   22 |   33

要在每行获取 doc_num，解决方案是使用 xpath_table 的两个调用并连接结果：

To get doc_num on every line, the solution is to use two invocations of xpath_table and join the results:

SELECT t.*,i.doc_num FROM
  xpath_table('id', 'xml', 'test',
              '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
              'true')
    AS t(id int, line_num varchar(10), val1 int, val2 int, val3 int),
  xpath_table('id', 'xml', 'test', '/doc/@num', 'true')
    AS i(id int, doc_num varchar(10))
WHERE i.id=t.id AND i.id=1
ORDER BY doc_num, line_num;

 id | line_num | val1 | val2 | val3 | doc_num
----+----------+------+------+------+---------
  1 | L1       |    1 |    2 |    3 | C1
  1 | L2       |   11 |   22 |   33 | C1
(2 rows)

F.50.4. XSLT Functions #

如果已安装 libxslt，以下功能可用：

The following functions are available if libxslt is installed:

F.50.4.1. xslt_process #

xslt_process(text document, text stylesheet, text paramlist) returns text

该功能将 XSL 样式表应用于文档并返回转换后的结果。paramlist 是要用于转换中的参数分配列表，按 a=1,b=2 中的形式指定。请注意，参数解析非常简单：参数值不能包含逗号！

This function applies the XSL stylesheet to the document and returns the transformed result. The paramlist is a list of parameter assignments to be used in the transformation, specified in the form a=1,b=2. Note that the parameter parsing is very simple-minded: parameter values cannot contain commas!

xslt_process 也有一个不向转换传递任何参数的两个参数版本。

There is also a two-parameter version of xslt_process which does not pass any parameters to the transformation.

F.50.5. Author #

约翰·格雷 < link:mailto:jgray@azuli.co.uk[jgray@azuli.co.uk]>

John Gray <link:mailto:jgray@azuli.co.uk[jgray@azuli.co.uk]>

此模块的开发由 Torchbox Ltd.（www.torchbox.com）赞助。它具有与 PostgreSQL 相同的 BSD 许可证。

Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com). It has the same BSD license as PostgreSQL.