Postgresql 中文操作指南
5.11. Table Partitioning #
PostgreSQL 支持基本表分区。本节介绍作为数据库设计一部分实现分区的理由和方法。
PostgreSQL supports basic table partitioning. This section describes why and how to implement partitioning as part of your database design.
5.11.1. Overview #
分区是指将一个逻辑上很大的表拆分为较小的物理部分。分区可以提供几个好处:
Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits:
这些好处通常仅在表本来非常大的情况下才有用。表何时会受益于分区的确切点取决于应用程序,尽管经验法则是表的规模应超过数据库服务器的物理内存。
These benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server.
PostgreSQL 提供对以下分区形式的内置支持:
PostgreSQL offers built-in support for the following forms of partitioning:
-
Range Partitioning #
-
The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example, one might partition by date ranges, or by ranges of identifiers for particular business objects. Each range’s bounds are understood as being inclusive at the lower end and exclusive at the upper end. For example, if one partition’s range is from 1 to 10, and the next one’s range is from 10 to 20, then value 10 belongs to the second partition not the first.
-
-
List Partitioning #
-
The table is partitioned by explicitly listing which key value(s) appear in each partition.
-
-
Hash Partitioning #
-
The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.
-
如果您的应用程序需要使用上述未列出的其他分区形式,则可以使用诸如继承和 UNION ALL 视图等替代方法。这些方法提供了灵活性,但不会像内置声明性分区那样具有一些性能优势。
If your application needs to use other forms of partitioning not listed above, alternative methods such as inheritance and UNION ALL views can be used instead. Such methods offer flexibility but do not have some of the performance benefits of built-in declarative partitioning.
5.11.2. Declarative Partitioning #
PostgreSQL 允许您声明一个表已划分为分区。被划分的表被称为 partitioned table。声明包含如上所述的 partitioning method,加上用作 partition key 的列或表达式的列表。
PostgreSQL allows you to declare that a table is divided into partitions. The table that is divided is referred to as a partitioned table. The declaration includes the partitioning method as described above, plus a list of columns or expressions to be used as the partition key.
分区表本身是一个没有自己存储的“虚拟”表。相反,存储属于 partitions,它们是与分区表关联的普通表。每个分区都会存储由其 partition bounds 定义的数据子集。插入分区表中的所有行都将根据分区键列的值路由到适当的分区之一。更新行的分区键将导致其移动到不同的分区(如果它不再满足其原始分区的分区范围)。
The partitioned table itself is a “virtual” table having no storage of its own. Instead, the storage belongs to partitions, which are otherwise-ordinary tables associated with the partitioned table. Each partition stores a subset of the data as defined by its partition bounds. All rows inserted into a partitioned table will be routed to the appropriate one of the partitions based on the values of the partition key column(s). Updating the partition key of a row will cause it to be moved into a different partition if it no longer satisfies the partition bounds of its original partition.
分区本身可定义为分区表,导致 sub-partitioning 为结果。尽管所有分区必须与其所分区的父级拥有相同的列,但分区可以拥有自身索引、约束和默认值,与其他分区不同。有关创建分区表和分区的详细信息,请参阅 CREATE TABLE 。
Partitions may themselves be defined as partitioned tables, resulting in sub-partitioning. Although all partitions must have the same columns as their partitioned parent, partitions may have their own indexes, constraints and default values, distinct from those of other partitions. See CREATE TABLE for more details on creating partitioned tables and partitions.
无法将常规表变为分区表,反之亦然。但是,将现有常规表或分区表添加为分区表分区,或从分区表中移除分区并将其转换成独立表是可行的;这可以简化和加速许多维护进程。有关 ATTACH PARTITION 和 DETACH PARTITION 子命令的详细信息,请参阅 ALTER TABLE 。
It is not possible to turn a regular table into a partitioned table or vice versa. However, it is possible to add an existing regular or partitioned table as a partition of a partitioned table, or remove a partition from a partitioned table turning it into a standalone table; this can simplify and speed up many maintenance processes. See ALTER TABLE to learn more about the ATTACH PARTITION and DETACH PARTITION sub-commands.
分区也可以是 foreign tables ,但需要极度小心,因为这意味着用户有责任确保外部表的内容满足分区规则。还有一些其他限制条件。有关详细信息,请参阅 CREATE FOREIGN TABLE 。
Partitions can also be foreign tables, although considerable care is needed because it is then the user’s responsibility that the contents of the foreign table satisfy the partitioning rule. There are some other restrictions as well. See CREATE FOREIGN TABLE for more information.
5.11.2.1. Example #
假设我们正在为一家大型冰淇淋公司构建数据库。该公司每天测量峰值温度和各个地区冰淇淋的销售情况。从概念上讲,我们希望获得以下表格:
Suppose we are constructing a database for a large ice cream company. The company measures peak temperatures every day as well as ice cream sales in each region. Conceptually, we want a table like:
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
);
我们知道大多数查询都只会访问过去一周、一个月或一个季度的,因为此表的用途是为主管生成在线报告。为了减少需要存储的旧数据量,我们决定只保留最近3年的数据。在每月开始时,将会删除最旧月份的数据。在此情况下,我们可以利用分区来帮助我们满足衡量表的所有不同要求。
We know that most queries will access just the last week’s, month’s or quarter’s data, since the main use of this table will be to prepare online reports for management. To reduce the amount of old data that needs to be stored, we decide to keep only the most recent 3 years worth of data. At the beginning of each month we will remove the oldest month’s data. In this situation we can use partitioning to help us meet all of our different requirements for the measurements table.
在本例中,如果要使用声明性分区,请使用以下步骤:
To use declarative partitioning in this case, use the following steps:
在上述示例中,我们每个月都会创建一个新分区,因此最好编写一个脚本自动生成所需的 DDL。
In the above example we would be creating a new partition each month, so it might be wise to write a script that generates the required DDL automatically.
5.11.2.2. Partition Maintenance #
通常,在最初定义表时建立的分区集合并不打算保持静态。通常,要删除保存旧数据的和定期为新数据添加新分区。分区的其中一个最重要的优点就是它允许通过操作分区结构而不是实际移动大量数据来近乎立即执行此项原本十分繁琐的任务。
Normally the set of partitions established when initially defining the table is not intended to remain static. It is common to want to remove partitions holding old data and periodically add new partitions for new data. One of the most important advantages of partitioning is precisely that it allows this otherwise painful task to be executed nearly instantaneously by manipulating the partition structure, rather than physically moving large amounts of data around.
删除旧数据的最简单选项是删除不再必要的分区:
The simplest option for removing old data is to drop the partition that is no longer necessary:
DROP TABLE measurement_y2006m02;
这可以非常快速地删除数百万条记录,因为它不必逐个删除每条记录。但请注意,上述命令需要对父表设置 ACCESS EXCLUSIVE 锁。
This can very quickly delete millions of records because it doesn’t have to individually delete every record. Note however that the above command requires taking an ACCESS EXCLUSIVE lock on the parent table.
另一个通常较好的选项是将分区从分区表中删除,但仍可以作为其自己的表进行访问。这有两种形式:
Another option that is often preferable is to remove the partition from the partitioned table but retain access to it as a table in its own right. This has two forms:
ALTER TABLE measurement DETACH PARTITION measurement_y2006m02;
ALTER TABLE measurement DETACH PARTITION measurement_y2006m02 CONCURRENTLY;
这些功能允许在删除数据之前对数据执行进一步操作。例如,这通常是使用 COPY 、pg_dump 或类似工具备份数据的有用时机。这可能还是将数据聚合成较小格式、执行其他数据操作或运行报告的有用时机。命令的第一种形式要求对父表进行 ACCESS EXCLUSIVE 锁定。在第二种形式中,添加 CONCURRENTLY 限定符允许分离操作仅要求对父表进行 SHARE UPDATE EXCLUSIVE 锁定,但有关限制条件的详细信息,请参阅 ALTER TABLE … DETACH PARTITION 。
These allow further operations to be performed on the data before it is dropped. For example, this is often a useful time to back up the data using COPY, pg_dump, or similar tools. It might also be a useful time to aggregate data into smaller formats, perform other data manipulations, or run reports. The first form of the command requires an ACCESS EXCLUSIVE lock on the parent table. Adding the CONCURRENTLY qualifier as in the second form allows the detach operation to require only SHARE UPDATE EXCLUSIVE lock on the parent table, but see ALTER TABLE … DETACH PARTITION for details on the restrictions.
同样,我们可以添加新分区来处理新数据。我们可以像上面创建原始分区一样,在分区表中创建一个空分区:
Similarly we can add a new partition to handle new data. We can create an empty partition in the partitioned table just as the original partitions were created above:
CREATE TABLE measurement_y2008m02 PARTITION OF measurement
FOR VALUES FROM ('2008-02-01') TO ('2008-03-01')
TABLESPACE fasttablespace;
作为替代,有时在分区结构外部创建一个新表会更方便,稍后将其附加为一个分区。这允许在添加到分区表中之前对新数据进行加载、检查和转换。而且, ATTACH PARTITION 操作只需对分区表设置 SHARE UPDATE EXCLUSIVE 锁,而不是 CREATE TABLE … PARTITION OF 所需的 ACCESS EXCLUSIVE 锁,因此它更有利于对分区表的并发操作。 CREATE TABLE … LIKE 选项有助于避免繁冗地重复父表的定义:
As an alternative, it is sometimes more convenient to create the new table outside the partition structure, and attach it as a partition later. This allows new data to be loaded, checked, and transformed prior to it appearing in the partitioned table. Moreover, the ATTACH PARTITION operation requires only SHARE UPDATE EXCLUSIVE lock on the partitioned table, as opposed to the ACCESS EXCLUSIVE lock that is required by CREATE TABLE … PARTITION OF, so it is more friendly to concurrent operations on the partitioned table. The CREATE TABLE … LIKE option is helpful to avoid tediously repeating the parent table’s definition:
CREATE TABLE measurement_y2008m02
(LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS)
TABLESPACE fasttablespace;
ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' );
\copy measurement_y2008m02 from 'measurement_y2008m02'
-- possibly some other data preparation work
ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
FOR VALUES FROM ('2008-02-01') TO ('2008-03-01' );
在运行 ATTACH PARTITION 命令之前,建议按照上面展示的那样,对要附加的表创建 CHECK 约束,使其与预期的分区约束相匹配。这样,系统将能够跳过扫描,而扫描本来是验证隐式分区约束所必需的。如果没有 CHECK 约束,系统将在持有该分区上的 ACCESS EXCLUSIVE 锁的同时扫描该表以验证分区约束。建议在 ATTACH PARTITION 完成后,删除现在多余的 CHECK 约束。如果要附加的表本身是一个分区表,则将递归锁定并扫描其每个子分区,直到遇到合适的 CHECK 约束或到达叶分区。
Before running the ATTACH PARTITION command, it is recommended to create a CHECK constraint on the table to be attached that matches the expected partition constraint, as illustrated above. That way, the system will be able to skip the scan which is otherwise needed to validate the implicit partition constraint. Without the CHECK constraint, the table will be scanned to validate the partition constraint while holding an ACCESS EXCLUSIVE lock on that partition. It is recommended to drop the now-redundant CHECK constraint after the ATTACH PARTITION is complete. If the table being attached is itself a partitioned table, then each of its sub-partitions will be recursively locked and scanned until either a suitable CHECK constraint is encountered or the leaf partitions are reached.
类似地,如果分区表有 DEFAULT 分区,建议创建一个 CHECK 约束,该约束不包括要附加的分区的约束。如果不这样做,将扫描 DEFAULT 分区以验证它不包含应位于正附加的分区中的任何记录。此操作将在持有 DEFAULT 分区上的 ACCESS EXCLUSIVE 锁时执行。如果 DEFAULT 分区本身是一个分区表,则将以上述方式递归检查其每个分区,就像要附加的表一样。
Similarly, if the partitioned table has a DEFAULT partition, it is recommended to create a CHECK constraint which excludes the to-be-attached partition’s constraint. If this is not done then the DEFAULT partition will be scanned to verify that it contains no records which should be located in the partition being attached. This operation will be performed whilst holding an ACCESS EXCLUSIVE lock on the DEFAULT partition. If the DEFAULT partition is itself a partitioned table, then each of its partitions will be recursively checked in the same way as the table being attached, as mentioned above.
如上所述,可以对分区表创建索引,以便将它们自动应用到整个层次结构中。这非常方便,因为现有的分区不仅会变得索引化,而且将来创建的任何分区也会索引化。一个限制是,在创建此类分区索引时不能使用 CONCURRENTLY 限定符。为了避免长时间锁定,可以使用 CREATE INDEX ON ONLY 分区表;此索引被标记为无效,而且索引不会自动应用于这些分区。可以使用 CONCURRENTLY 分别创建分区上的索引,然后使用 ALTER INDEX .. ATTACH PARTITION 将其 attached 到父索引上。一旦所有分区的索引附加到父索引后,父索引会自动被标记为有效。示例:
As explained above, it is possible to create indexes on partitioned tables so that they are applied automatically to the entire hierarchy. This is very convenient, as not only will the existing partitions become indexed, but also any partitions that are created in the future will. One limitation is that it’s not possible to use the CONCURRENTLY qualifier when creating such a partitioned index. To avoid long lock times, it is possible to use CREATE INDEX ON ONLY the partitioned table; such an index is marked invalid, and the partitions do not get the index applied automatically. The indexes on partitions can be created individually using CONCURRENTLY, and then attached to the index on the parent using ALTER INDEX .. ATTACH PARTITION. Once indexes for all partitions are attached to the parent index, the parent index is marked valid automatically. Example:
CREATE INDEX measurement_usls_idx ON ONLY measurement (unitsales);
CREATE INDEX CONCURRENTLY measurement_usls_200602_idx
ON measurement_y2006m02 (unitsales);
ALTER INDEX measurement_usls_idx
ATTACH PARTITION measurement_usls_200602_idx;
...
此技术也可以与 UNIQUE 和 PRIMARY KEY 约束一起使用;在创建约束时会隐式创建这些索引。示例:
This technique can be used with UNIQUE and PRIMARY KEY constraints too; the indexes are created implicitly when the constraint is created. Example:
ALTER TABLE ONLY measurement ADD UNIQUE (city_id, logdate);
ALTER TABLE measurement_y2006m02 ADD UNIQUE (city_id, logdate);
ALTER INDEX measurement_city_id_logdate_key
ATTACH PARTITION measurement_y2006m02_city_id_logdate_key;
...
5.11.2.3. Limitations #
以下限制适用于分区表:
The following limitations apply to partitioned tables:
单独的分区使用幕后的继承链接到它们的分区表。但是,如以下所述,不可能将继承的所有通用功能与声明式分区表或其分区一起使用。值得注意的是,分区不能有除它所属分区表以外的其他父母表,并且表也不能同时从分区表和常规表中继承。这意味着分区表及其分区永远不会与常规表共享继承层次结构。
Individual partitions are linked to their partitioned table using inheritance behind-the-scenes. However, it is not possible to use all of the generic features of inheritance with declaratively partitioned tables or their partitions, as discussed below. Notably, a partition cannot have any parents other than the partitioned table it is a partition of, nor can a table inherit from both a partitioned table and a regular table. That means partitioned tables and their partitions never share an inheritance hierarchy with regular tables.
由于由分区表及其分区组成的分区层次结构仍然是继承层次结构,因此 tableoid 和继承的所有一般规则均适用,如 Section 5.10 所述,但有一些例外:
Since a partition hierarchy consisting of the partitioned table and its partitions is still an inheritance hierarchy, tableoid and all the normal rules of inheritance apply as described in Section 5.10, with a few exceptions:
5.11.3. Partitioning Using Inheritance #
虽然内置的声明式分区适用于大多数常见用例,但有些情况下更灵活的方法可能有用。可以使用表继承来实现分区,这允许声明式分区不支持的几个功能,例如:
While the built-in declarative partitioning is suitable for most common use cases, there are some circumstances where a more flexible approach may be useful. Partitioning can be implemented using table inheritance, which allows for several features not supported by declarative partitioning, such as:
5.11.3.1. Example #
此示例构建了一个与上述声明式分区示例等效的分区结构。按照以下步骤操作:
This example builds a partitioning structure equivalent to the declarative partitioning example above. Use the following steps:
Note
实际上,如果大多数插入进入该子项,则最好先检查最新的子项。为简单起见,我们已按照本示例其他部分中所示的顺序显示触发器的测试。
In practice, it might be best to check the newest child first, if most inserts go into that child. For simplicity, we have shown the trigger’s tests in the same order as in other parts of this example.
正如我们所见,一个复杂表层次结构可能需要相当数量的 DDL。在上述示例中,我们每个月都会创建一个新的子表,因此最好编写一个自动生成所需 DDL 的脚本。
As we can see, a complex table hierarchy could require a substantial amount of DDL. In the above example we would be creating a new child table each month, so it might be wise to write a script that generates the required DDL automatically.
5.11.3.2. Maintenance for Inheritance Partitioning #
要快速删除旧数据,只需删除不再需要的子表:
To remove old data quickly, simply drop the child table that is no longer necessary:
DROP TABLE measurement_y2006m02;
要从继承层次结构表中删除子表,但仍保留对其作为单独表的访问权限:
To remove the child table from the inheritance hierarchy table but retain access to it as a table in its own right:
ALTER TABLE measurement_y2006m02 NO INHERIT measurement;
要添加一个新子表来处理新数据,请创建空的子表,就像上面创建原始子表一样:
To add a new child table to handle new data, create an empty child table just as the original children were created above:
CREATE TABLE measurement_y2008m02 (
CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' )
) INHERITS (measurement);
或者,可能希望在将新子表添加到表层次结构中之前创建并填充它。这允许在向父表上的查询显示之前加载、检查和转换数据。
Alternatively, one may want to create and populate the new child table before adding it to the table hierarchy. This could allow data to be loaded, checked, and transformed before being made visible to queries on the parent table.
CREATE TABLE measurement_y2008m02
(LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02
CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' );
\copy measurement_y2008m02 from 'measurement_y2008m02'
-- possibly some other data preparation work
ALTER TABLE measurement_y2008m02 INHERIT measurement;
5.11.3.3. Caveats #
以下警告适用于使用继承实现的分区:
The following caveats apply to partitioning implemented using inheritance:
5.11.4. Partition Pruning #
Partition pruning 是一种查询优化技术,它可以提高声明式分区表的性能。例如:
Partition pruning is a query optimization technique that improves performance for declaratively partitioned tables. As an example:
SET enable_partition_pruning = on; -- the default
SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
如果未进行分区剪裁,上述查询将扫描 measurement 表的每个分区。启用了分区剪裁时,计划程序将检查每个分区的定义,并证明不需要扫描分区,因为它不可能包含满足查询的 WHERE 子句的任何行。当计划程序能够证明这一点时,它会将分区从查询计划中排除 (prunes)。
Without partition pruning, the above query would scan each of the partitions of the measurement table. With partition pruning enabled, the planner will examine the definition of each partition and prove that the partition need not be scanned because it could not contain any rows meeting the query’s WHERE clause. When the planner can prove this, it excludes (prunes) the partition from the query plan.
通过使用 EXPLAIN 命令和 enable_partition_pruning 配置参数,可以显示已修剪分区与未修剪分区的计划之间的区别。对于此类表设置,典型的未优化计划为:
By using the EXPLAIN command and the enable_partition_pruning configuration parameter, it’s possible to show the difference between a plan for which partitions have been pruned and one for which they have not. A typical unoptimized plan for this type of table setup is:
SET enable_partition_pruning = off;
EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
QUERY PLAN
-----------------------------------------------------------------------------------
Aggregate (cost=188.76..188.77 rows=1 width=8)
-> Append (cost=0.00..181.05 rows=3085 width=0)
-> Seq Scan on measurement_y2006m02 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
-> Seq Scan on measurement_y2006m03 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
...
-> Seq Scan on measurement_y2007m11 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
-> Seq Scan on measurement_y2007m12 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
-> Seq Scan on measurement_y2008m01 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
一些或所有分区可能使用索引扫描而不是全表顺序扫描,但这里的重点是根本不需要扫描较旧的分区来回答此查询。启用分区剪裁时,我们将获得一个更便宜的计划,该计划将提供相同的答案:
Some or all of the partitions might use index scans instead of full-table sequential scans, but the point here is that there is no need to scan the older partitions at all to answer this query. When we enable partition pruning, we get a significantly cheaper plan that will deliver the same answer:
SET enable_partition_pruning = on;
EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
QUERY PLAN
-----------------------------------------------------------------------------------
Aggregate (cost=37.75..37.76 rows=1 width=8)
-> Seq Scan on measurement_y2008m01 (cost=0.00..33.12 rows=617 width=0)
Filter: (logdate >= '2008-01-01'::date)
请注意,分区剪裁仅由分区键明确定义的约束驱动,而非由索引的存在驱动。因此不必对键列定义索引。是否需要为给定分区创建索引取决于您是否期望扫描分区的查询一般扫描分区的很大一部分还是仅扫描很小的一部分。在后一种情况下索引将有用,而在前一种情况下则无用。
Note that partition pruning is driven only by the constraints defined implicitly by the partition keys, not by the presence of indexes. Therefore it isn’t necessary to define indexes on the key columns. Whether an index needs to be created for a given partition depends on whether you expect that queries that scan the partition will generally scan a large part of the partition or just a small part. An index will be helpful in the latter case but not the former.
不仅在给定查询的计划阶段,分区剪裁还可以在查询执行期间执行。这很有用,因为它可以在子句包含值在查询计划时未知的表达式的分区剪裁时,剪裁更多分区,例如在 PREPARE 语句中定义的参数、使用从子查询获取的值,或在嵌套循环联接的内部使用参数化的值。执行过程中的分区剪裁可以在下列任何时间执行:
Partition pruning can be performed not only during the planning of a given query, but also during its execution. This is useful as it can allow more partitions to be pruned when clauses contain expressions whose values are not known at query planning time, for example, parameters defined in a PREPARE statement, using a value obtained from a subquery, or using a parameterized value on the inner side of a nested loop join. Partition pruning during execution can be performed at any of the following times:
可以使用 enable_partition_pruning 设置禁用分区修剪。
Partition pruning can be disabled using the enable_partition_pruning setting.
5.11.5. Partitioning and Constraint Exclusion #
Constraint exclusion 是一种与分区剪裁类似的查询优化技术。虽然它主要用于通过旧继承方法实现的分区,但它也可用于其他目的,包括声明式分区。
Constraint exclusion is a query optimization technique similar to partition pruning. While it is primarily used for partitioning implemented using the legacy inheritance method, it can be used for other purposes, including with declarative partitioning.
除外约束的工作方式与分区剪裁极其相似,不同之处在于,它使用每张表 CHECK 约束(因此才得名),而分区剪裁使用表的表分区边界,而表分区边界仅出现在声明式分区的情况中。另一个不同之处在于,除外约束仅在计划时应用;不会尝试在执行时移除分区。
Constraint exclusion works in a very similar way to partition pruning, except that it uses each table’s CHECK constraints — which gives it its name — whereas partition pruning uses the table’s partition bounds, which exist only in the case of declarative partitioning. Another difference is that constraint exclusion is only applied at plan time; there is no attempt to remove partitions at execution time.
由于除外约束使用 CHECK 约束,这使得它与分区剪裁相比速度较慢,有时可以将其用作一项优势:除内部分区边界外,由于可以在声明式分区表上定义约束,除外约束可能能够从查询计划中删去其他分区。
The fact that constraint exclusion uses CHECK constraints, which makes it slow compared to partition pruning, can sometimes be used as an advantage: because constraints can be defined even on declaratively-partitioned tables, in addition to their internal partition bounds, constraint exclusion may be able to elide additional partitions from the query plan.
constraint_exclusion 的默认(建议的)设置既不是 on 也不是 off,而是一个称为 partition 的中间设置,该设置只对可能在继承分区表上工作的查询应用该技术。on 设置使计划程序检查所有查询中的 CHECK 约束,即使是那些不太可能受益的简单查询也是如此。
The default (and recommended) setting of constraint_exclusion is neither on nor off, but an intermediate setting called partition, which causes the technique to be applied only to queries that are likely to be working on inheritance partitioned tables. The on setting causes the planner to examine CHECK constraints in all queries, even simple ones that are unlikely to benefit.
除外约束适用于以下注意事项:
The following caveats apply to constraint exclusion:
5.11.6. Best Practices for Declarative Partitioning #
应仔细选择对表分区的划分方式,因为查询计划和执行的性能可能会受到不良设计的不利影响。
The choice of how to partition a table should be made carefully, as the performance of query planning and execution can be negatively affected by poor design.
最关键的设计决策之一将是根据其对数据分区的列或列集。通常最好的选择是根据最常出现于对分区表执行的查询 WHERE 子句中的列或列集对分区进行分区。与分区边界约束兼容的 WHERE 子句可用于剪裁不必要的分区。但是,您可能被迫通过 PRIMARY KEY 或 UNIQUE 约束要求做出其他决策。计划分区策略时删除不需要的数据也是一个需要考虑的因素。可以相当快地分离整个分区,所以最好以一种方式设计分区策略以便要一次移除的所有数据都位于单个分区中。
One of the most critical design decisions will be the column or columns by which you partition your data. Often the best choice will be to partition by the column or set of columns which most commonly appear in WHERE clauses of queries being executed on the partitioned table. WHERE clauses that are compatible with the partition bound constraints can be used to prune unneeded partitions. However, you may be forced into making other decisions by requirements for the PRIMARY KEY or a UNIQUE constraint. Removal of unwanted data is also a factor to consider when planning your partitioning strategy. An entire partition can be detached fairly quickly, so it may be beneficial to design the partition strategy in such a way that all data to be removed at once is located in a single partition.
选择应对表进行划分的目标分区数也是一个至关重要的决策。分区不够可能有意味着索引仍然太大,数据局部性也同样很差,可能导致低缓存命中率。然而,将表划分为过多分区也会导致问题。过多分区可能有意味着查询计划时间过长,并且查询计划和执行过程中的内存消耗更高,如下进一步描述。在选择对表分区的方法时,考虑将来可能发生的更改也很重要。例如,如果您选择为每个客户分配一个分区,并且目前的大客户数量较少,请考虑如果几年后发现自己拥有大量的客户会产生的影响。在这种情况下,最好选择按 HASH 分区,并选择合理数量的分区,而不是尝试按 LIST 分区并希望客户数量不会增加到超出实际数据分区能力的程度。
Choosing the target number of partitions that the table should be divided into is also a critical decision to make. Not having enough partitions may mean that indexes remain too large and that data locality remains poor which could result in low cache hit ratios. However, dividing the table into too many partitions can also cause issues. Too many partitions can mean longer query planning times and higher memory consumption during both query planning and execution, as further described below. When choosing how to partition your table, it’s also important to consider what changes may occur in the future. For example, if you choose to have one partition per customer and you currently have a small number of large customers, consider the implications if in several years you instead find yourself with a large number of small customers. In this case, it may be better to choose to partition by HASH and choose a reasonable number of partitions rather than trying to partition by LIST and hoping that the number of customers does not increase beyond what it is practical to partition the data by.
子分区对于进一步划分预期大于其他分区的那些分区很有用。另一个选择是将多个列与分区键一起用于范围分区。其中任何一种都容易导致过多的分区,所以建议保持克制。
Sub-partitioning can be useful to further divide partitions that are expected to become larger than other partitions. Another option is to use range partitioning with multiple columns in the partition key. Either of these can easily lead to excessive numbers of partitions, so restraint is advisable.
在查询计划和执行过程中考虑分区开销很重要。一般来说,查询计划器能够很好地处理最多几千个分区的分区层次结构,假设典型查询允许查询计划器剪裁所有分区,但留下少量分区。计划时间过长并且在计划器执行分区剪裁后仍保留更多分区时内存消耗也会更高。需要关注分区数量的一个原因在于,服务器的内存消耗可能会随着时间的推移而大幅增长,特别是在许多会话触及大量分区时。这是因为每个分区都需要将它的元数据加载到触及它的每个会话的本地内存中。
It is important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few thousand partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher when more partitions remain after the planner performs partition pruning. Another reason to be concerned about having a large number of partitions is that the server’s memory consumption may grow significantly over time, especially if many sessions touch large numbers of partitions. That’s because each partition requires its metadata to be loaded into the local memory of each session that touches it.
对于数据仓库类型的工作负载而言,使用比 OLTP 类型的工作负载更多数量的分区可能是合理的。一般而言,在数据仓库中,查询计划时间的影响较小,因为大多数处理时间是在查询执行过程中花费的。对于这两种类型的任何工作负载来说,尽早做出正确决策很重要,因为重新分区大量数据可能会非常缓慢。对预期工作负载进行模拟通常有利于优化分区策略。切勿仅仅假设更多分区比更少分区更好,反之亦然。
With data warehouse type workloads, it can make sense to use a larger number of partitions than with an OLTP type workload. Generally, in data warehouses, query planning time is less of a concern as the majority of processing time is spent during query execution. With either of these two types of workload, it is important to make the right decisions early, as re-partitioning large quantities of data can be painfully slow. Simulations of the intended workload are often beneficial for optimizing the partitioning strategy. Never just assume that more partitions are better than fewer partitions, nor vice-versa.