Postgresql 中文操作指南

41.3. Materialized Views #

PostgreSQL 中的物化视图像视图一样使用规则系统,但以表形式保存结果。两者之间的主要区别:

Materialized views in PostgreSQL use the rule system like views do, but persist the results in a table-like form. The main differences between:

CREATE MATERIALIZED VIEW mymatview AS SELECT * FROM mytab;

及:

and:

CREATE TABLE mymatview AS SELECT * FROM mytab;

在于物化视图随后不能直接更新,且用于创建物化视图的查询与视图查询的存储方式完全相同,因此可以使用物化视图为其生成最新数据:

are that the materialized view cannot subsequently be directly updated and that the query used to create the materialized view is stored in exactly the same way that a view’s query is stored, so that fresh data can be generated for the materialized view with:

REFRESH MATERIALIZED VIEW mymatview;

PostgreSQL 系统目录中关于物化视图的信息与表或视图中的完全相同。所以对于解析器来说,物化视图就像表或视图一样的关系。在查询中引用物化视图时,数据会直接从该视图中返回,就像从表中返回一样;规则仅用于填充物化视图。

The information about a materialized view in the PostgreSQL system catalogs is exactly the same as it is for a table or view. So for the parser, a materialized view is a relation, just like a table or a view. When a materialized view is referenced in a query, the data is returned directly from the materialized view, like from a table; the rule is only used for populating the materialized view.

虽然访问存储在物化视图中的数据通常比直接或通过视图访问底层表快得多,但数据并不总是最新的;但有时也不需要最新数据。考虑一下记录销量的表:

While access to the data stored in a materialized view is often much faster than accessing the underlying tables directly or through a view, the data is not always current; yet sometimes current data is not needed. Consider a table which records sales:

CREATE TABLE invoice (
    invoice_no    integer        PRIMARY KEY,
    seller_no     integer,       -- ID of salesperson
    invoice_date  date,          -- date of sale
    invoice_amt   numeric(13,2)  -- amount of sale
);

如果人们希望能够快速绘制历史销售数据,他们可能会进行汇总,并且可能不在乎当前日期不完整的数据:

If people want to be able to quickly graph historical sales data, they might want to summarize, and they may not care about the incomplete data for the current date:

CREATE MATERIALIZED VIEW sales_summary AS
  SELECT
      seller_no,
      invoice_date,
      sum(invoice_amt)::numeric(13,2) as sales_amt
    FROM invoice
    WHERE invoice_date < CURRENT_DATE
    GROUP BY
      seller_no,
      invoice_date;

CREATE UNIQUE INDEX sales_summary_seller
  ON sales_summary (seller_no, invoice_date);

此物化视图可能对创建用于销售人员的仪表板的图表有用。可以使用以下 SQL 语句,计划在每天晚上更新统计信息:

This materialized view might be useful for displaying a graph in the dashboard created for salespeople. A job could be scheduled to update the statistics each night using this SQL statement:

REFRESH MATERIALIZED VIEW sales_summary;

物化视图的另一种用途是允许通过 foreign data wrapper 更快地访问从远程系统引入的数据。下面是一个使用 file_fdw 的示例,带有时序,但由于这是使用本地系统上的高速缓存,因此与访问远程系统相比,性能差异通常会比此处所示更大。请注意,我们也利用了为物化视图设置索引的能力,而 file_fdw 不支持索引;此优势可能不适用于其他类型的 foreign data 访问。

Another use for a materialized view is to allow faster access to data brought across from a remote system through a foreign data wrapper. A simple example using file_fdw is below, with timings, but since this is using cache on the local system the performance difference compared to access to a remote system would usually be greater than shown here. Notice we are also exploiting the ability to put an index on the materialized view, whereas file_fdw does not support indexes; this advantage might not apply for other sorts of foreign data access.

设置:

Setup:

CREATE EXTENSION file_fdw;
CREATE SERVER local_file FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE words (word text NOT NULL)
  SERVER local_file
  OPTIONS (filename '/usr/share/dict/words');
CREATE MATERIALIZED VIEW wrd AS SELECT * FROM words;
CREATE UNIQUE INDEX wrd_word ON wrd (word);
CREATE EXTENSION pg_trgm;
CREATE INDEX wrd_trgm ON wrd USING gist (word gist_trgm_ops);
VACUUM ANALYZE wrd;

现在,让我们对单词进行拼写检查。直接使用 file_fdw

Now let’s spell-check a word. Using file_fdw directly:

SELECT count(*) FROM words WHERE word = 'caterpiler';

 count
-------
     0
(1 row)

使用 EXPLAIN ANALYZE,我们看到:

With EXPLAIN ANALYZE, we see:

 Aggregate  (cost=21763.99..21764.00 rows=1 width=0) (actual time=188.180..188.181 rows=1 loops=1)
   ->  Foreign Scan on words  (cost=0.00..21761.41 rows=1032 width=0) (actual time=188.177..188.177 rows=0 loops=1)
         Filter: (word = 'caterpiler'::text)
         Rows Removed by Filter: 479829
         Foreign File: /usr/share/dict/words
         Foreign File Size: 4953699
 Planning time: 0.118 ms
 Execution time: 188.273 ms

如果使用物化视图,则查询速度会快很多:

If the materialized view is used instead, the query is much faster:

 Aggregate  (cost=4.44..4.45 rows=1 width=0) (actual time=0.042..0.042 rows=1 loops=1)
   ->  Index Only Scan using wrd_word on wrd  (cost=0.42..4.44 rows=1 width=0) (actual time=0.039..0.039 rows=0 loops=1)
         Index Cond: (word = 'caterpiler'::text)
         Heap Fetches: 0
 Planning time: 0.164 ms
 Execution time: 0.117 ms

无论哪种方式,此单词拼写错误,所以让我们看看我们想要什么。再次使用 file_fdwpg_trgm

Either way, the word is spelled wrong, so let’s look for what we might have wanted. Again using file_fdw and pg_trgm:

SELECT word FROM words ORDER BY word <-> 'caterpiler' LIMIT 10;

     word
---------------
 cater
 caterpillar
 Caterpillar
 caterpillars
 caterpillar's
 Caterpillar's
 caterer
 caterer's
 caters
 catered
(10 rows)
 Limit  (cost=11583.61..11583.64 rows=10 width=32) (actual time=1431.591..1431.594 rows=10 loops=1)
   ->  Sort  (cost=11583.61..11804.76 rows=88459 width=32) (actual time=1431.589..1431.591 rows=10 loops=1)
         Sort Key: ((word <-> 'caterpiler'::text))
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Foreign Scan on words  (cost=0.00..9672.05 rows=88459 width=32) (actual time=0.057..1286.455 rows=479829 loops=1)
               Foreign File: /usr/share/dict/words
               Foreign File Size: 4953699
 Planning time: 0.128 ms
 Execution time: 1431.679 ms

使用物化视图:

Using the materialized view:

 Limit  (cost=0.29..1.06 rows=10 width=10) (actual time=187.222..188.257 rows=10 loops=1)
   ->  Index Scan using wrd_trgm on wrd  (cost=0.29..37020.87 rows=479829 width=10) (actual time=187.219..188.252 rows=10 loops=1)
         Order By: (word <-> 'caterpiler'::text)
 Planning time: 0.196 ms
 Execution time: 198.640 ms

如果您能容忍将远程数据周期性更新到本地数据库,那么性能优势将非常明显。

If you can tolerate periodic update of the remote data to the local database, the performance benefit can be substantial.