Postgresql 中文操作指南
Synopsis
CLUSTER [VERBOSE] table_name [ USING index_name ]
CLUSTER ( option [, ...] ) table_name [ USING index_name ]
CLUSTER [VERBOSE]
where option can be one of:
VERBOSE [ boolean ]
Description
CLUSTER 指示 PostgreSQL 根据 index_name 指定的索引对 table_name 指定的表进行集群。索引必须已在 table_name 上定义。
CLUSTER instructs PostgreSQL to cluster the table specified by table_name based on the index specified by index_name. The index must already have been defined on table_name.
当一张表格被聚类时,它将根据索引信息进行物理重新排序。聚类是一次性操作:当表格随后被更新时,这些更改不会被聚类。也就是说,不会尝试根据索引顺序存储新行或更新行。(如果希望的话,可以通过再次发布该命令来定期重新聚类。此外,将表格的 fillfactor 存储参数设定为低于 100% 可以帮助在更新期间保留聚类顺序,因为如果那里有足够的空间,则更新的行将保留在同一页面上。)
When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order. (If one wishes, one can periodically recluster by issuing the command again. Also, setting the table’s fillfactor storage parameter to less than 100% can aid in preserving cluster ordering during updates, since updated rows are kept on the same page if enough space is available there.)
当一张表格被聚类时,PostgreSQL 会记住它是根据哪个索引进行聚类的。形式 CLUSTER _table_name_ 使用与之前相同的索引重新聚类表格。你还可以使用 ALTER TABLE 的 CLUSTER 或 SET WITHOUT CLUSTER 形式来设定要用于未来聚类操作的索引,或清除任何以前的设定。
When a table is clustered, PostgreSQL remembers which index it was clustered by. The form CLUSTER _table_name_ reclusters the table using the same index as before. You can also use the CLUSTER or SET WITHOUT CLUSTER forms of ALTER TABLE to set the index to be used for future cluster operations, or to clear any previous setting.
没有 table_name 的 CLUSTER 重新聚类当前数据库中由调用用户所有的所有先前聚类表格,或者在超级用户调用时重新聚类所有此类表格。此形式的 CLUSTER 无法在事务块内执行。
CLUSTER without a table_name reclusters all the previously-clustered tables in the current database that the calling user owns, or all such tables if called by a superuser. This form of CLUSTER cannot be executed inside a transaction block.
当一张表格正在被聚类时,一个 ACCESS EXCLUSIVE 锁将被获取。这将阻止任何其他数据库操作(包括读取和写入)在 CLUSTER 完成之前对表格进行操作。
When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired on it. This prevents any other database operations (both reads and writes) from operating on the table until the CLUSTER is finished.
Parameters
-
table_name
-
The name (possibly schema-qualified) of a table.
-
-
index_name
-
The name of an index.
-
-
VERBOSE
-
Prints a progress report as each table is clustered.
-
-
boolean
-
Specifies whether the selected option should be turned on or off. You can write TRUE, ON, or 1 to enable the option, and FALSE, OFF, or 0 to disable it. The boolean value can also be omitted, in which case TRUE is assumed.
-
Notes
在你在一张表格中随机访问单行的情况下,表格中数据的实际顺序无关紧要。但是,如果你倾向于比其他数据更多地访问某些数据,并且有一个将这些数据分组在一起的索引,那么你将受益于使用 CLUSTER 。如果你正在请求一张表格的已索引值范围,或一个有多个匹配行的已索引值, CLUSTER 将有所帮助,因为一旦索引识别出与匹配的首行的表格页面,所有其他匹配行可能已经都在同一表格页面上了,因此你节省了磁盘访问,并且加快了查询速度。
In cases where you are accessing single rows randomly within a table, the actual order of the data in the table is unimportant. However, if you tend to access some data more than others, and there is an index that groups them together, you will benefit from using CLUSTER. If you are requesting a range of indexed values from a table, or a single indexed value that has multiple rows that match, CLUSTER will help because once the index identifies the table page for the first row that matches, all other rows that match are probably already on the same table page, and so you save disk accesses and speed up the query.
CLUSTER 可以使用指定索引上的索引扫描或(如果索引是一棵 b 树)顺序扫描后跟排序来重新对表格进行排序。它将尝试根据规划器成本参数和可用的统计信息选择更快捷的方法。
CLUSTER can re-sort the table using either an index scan on the specified index, or (if the index is a b-tree) a sequential scan followed by sorting. It will attempt to choose the method that will be faster, based on planner cost parameters and available statistical information.
在使用索引扫描时,将创建一个表格的临时副本,其中包含按索引顺序排列的表格数据。还将创建表格上每个索引的临时副本。因此,你需要至少等于表格大小和索引大小之和的磁盘可用空间。
When an index scan is used, a temporary copy of the table is created that contains the table data in the index order. Temporary copies of each index on the table are created as well. Therefore, you need free space on disk at least equal to the sum of the table size and the index sizes.
在使用顺序扫描和排序时,还会创建一个临时排序文件,因此峰值临时空间需求可能高达表格大小的两倍,加上索引大小。此方法通常比索引扫描方法更快,但如果磁盘空间需求不可忍受,你可以通过将 enable_sort 暂时设定为 off 来禁用此选择。
When a sequential scan and sort is used, a temporary sort file is also created, so that the peak temporary space requirement is as much as double the table size, plus the index sizes. This method is often faster than the index scan method, but if the disk space requirement is intolerable, you can disable this choice by temporarily setting enable_sort to off.
建议在聚类前将 maintenance_work_mem 设定为一个相当大的值(但不得超过你能专用于 CLUSTER 操作的 RAM 量)。
It is advisable to set maintenance_work_mem to a reasonably large value (but not more than the amount of RAM you can dedicate to the CLUSTER operation) before clustering.
由于规划器记录有关表格排序的统计信息,因此建议对新聚类表格运行 ANALYZE 。否则,规划器可能会对查询计划做出错误的选择。
Because the planner records statistics about the ordering of tables, it is advisable to run ANALYZE on the newly clustered table. Otherwise, the planner might make poor choices of query plans.
由于 CLUSTER 记住哪些索引被聚类,因此可以手动首次对想要聚类的表格进行聚类,然后建立一个定期维护脚本来执行没有参数的 CLUSTER ,以便定期重新聚类所需的表格。
Because CLUSTER remembers which indexes are clustered, one can cluster the tables one wants clustered manually the first time, then set up a periodic maintenance script that executes CLUSTER without any parameters, so that the desired tables are periodically reclustered.
每个运行 CLUSTER 的后端都将在 pg_stat_progress_cluster 视图中报告其进度。有关详细信息,请参阅 Section 28.4.2 。
Each backend running CLUSTER will report its progress in the pg_stat_progress_cluster view. See Section 28.4.2 for details.
聚类分区表会使用指定分区化索引的分区来对它的每个分区进行聚类。在对分区表进行聚类时,该索引不可省略。分区表上的 CLUSTER 不可执行在事务块内。
Clustering a partitioned table clusters each of its partitions using the partition of the specified partitioned index. When clustering a partitioned table, the index may not be omitted. CLUSTER on a partitioned table cannot be executed inside a transaction block.
Examples
在索引 employees_ind 的基础上对表格 employees 进行聚类:
Cluster the table employees on the basis of its index employees_ind:
CLUSTER employees USING employees_ind;
使用之前使用的相同索引对 employees 表进行聚类:
Cluster the employees table using the same index that was used before:
CLUSTER employees;
对数据库中之前已经聚类过的所有表格进行聚类:
Cluster all tables in the database that have previously been clustered:
CLUSTER;
Compatibility
SQL 标准中没有 CLUSTER 语句。
There is no CLUSTER statement in the SQL standard.
语法
The syntax
CLUSTER index_name ON table_name
也受支持,以兼容于 8.3 之前的 PostgreSQL 版本。
is also supported for compatibility with pre-8.3 PostgreSQL versions.