Postgresql 中文操作指南

F.2. amcheck — tools to verify table and index consistency #

amcheck 模块提供了允许验证关系结构逻辑一致性的函数。

The amcheck module provides functions that allow you to verify the logical consistency of the structure of relations.

B 树检查函数验证了特定关系的表示结构中的各种 invariants。索引扫描和其他重要操作后访问方法函数的正确性依赖于始终维持这些不变量。例如,某些函数验证的所有内容中包括,所有 B 树页面按“逻辑”顺序包含项目(例如,对于 text 上的 B 树索引,索引元组应按对照词法顺序排列)。如果该特定不变量以某种方式无法维持,我们可以预期对受影响页面的二进制搜索会错误地引导索引扫描,从而导致 SQL 查询错误的答案。如果结构显示有效,则不引发错误。

The B-Tree checking functions verify various invariants in the structure of the representation of particular relations. The correctness of the access method functions behind index scans and other important operations relies on these invariants always holding. For example, certain functions verify, among other things, that all B-Tree pages have items in “logical” order (e.g., for B-Tree indexes on text, index tuples should be in collated lexical order). If that particular invariant somehow fails to hold, we can expect binary searches on the affected page to incorrectly guide index scans, resulting in wrong answers to SQL queries. If the structure appears to be valid, no error is raised.

验证使用与索引扫描本身使用的相同过程执行,该过程可以是用户定义的操作员类代码。例如,B 树索引验证依赖于与一个或多个 B 树支持函数 1 例程进行比较。有关操作员类支持函数的详细信息,请参见 Section 38.16.3

Verification is performed using the same procedures as those used by index scans themselves, which may be user-defined operator class code. For example, B-Tree index verification relies on comparisons made with one or more B-Tree support function 1 routines. See Section 38.16.3 for details of operator class support functions.

与通过引发错误来报告损坏的 B 树检查函数不同,堆检查函数 verify_heapam 检查一个表并尝试返回一组行,每检测到一个损坏返回一行。尽管如此,如果 verify_heapam 依赖的设施自身已损坏,则该函数可能无法继续,并且可能引发错误。

Unlike the B-Tree checking functions which report corruption by raising errors, the heap checking function verify_heapam checks a table and attempts to return a set of rows, one row per corruption detected. Despite this, if facilities that verify_heapam relies upon are themselves corrupted, the function may be unable to continue and may instead raise an error.

可以向非超级用户授予执行 amcheck 函数的权限,但在授予此类权限之前,应仔细考虑数据安全和隐私问题。尽管这些函数生成的损坏报告主要关注损坏数据的结构和发现的损坏性质,但少于关注损坏数据的内容,某个获得执行这些函数的权限的攻击者,特别是如果攻击者还可以引发损坏,可能能够从这样的消息中推断出数据本身的部分信息。

Permission to execute amcheck functions may be granted to non-superusers, but before granting such permissions careful consideration should be given to data security and privacy concerns. Although the corruption reports generated by these functions do not focus on the contents of the corrupted data so much as on the structure of that data and the nature of the corruptions found, an attacker who gains permission to execute these functions, particularly if the attacker can also induce corruption, might be able to infer something of the data itself from such messages.

F.2.1. Functions #

  • bt_index_check(index regclass, heapallindexed boolean) returns void

    • bt_index_check tests that its target, a B-Tree index, respects a variety of invariants. Example usage:

test=# SELECT bt_index_check(index => c.oid, heapallindexed => i.indisunique),
               c.relname,
               c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
 bt_index_check |             relname             | relpages
----------------+---------------------------------+----------
                | pg_depend_reference_index       |       43
                | pg_depend_depender_index        |       40
                | pg_proc_proname_args_nsp_index  |       31
                | pg_description_o_c_o_index      |       21
                | pg_attribute_relid_attnam_index |       14
                | pg_proc_oid_index               |       10
                | pg_attribute_relid_attnum_index |        9
                | pg_amproc_fam_proc_index        |        5
                | pg_amop_opr_fam_index           |        5
                | pg_amop_fam_strat_index         |        5
(10 rows)
  • This example shows a session that performs verification of the 10 largest catalog indexes in the database “test”. Verification of the presence of heap tuples as index tuples is requested for the subset that are unique indexes. Since no error is raised, all indexes tested appear to be logically consistent. Naturally, this query could easily be changed to call bt_index_check for every index in the database where verification is supported.

  • bt_index_check acquires an AccessShareLock on the target index and the heap relation it belongs to. This lock mode is the same lock mode acquired on relations by simple SELECT statements. bt_index_check does not verify invariants that span child/parent relationships, but will verify the presence of all heap tuples as index tuples within the index when heapallindexed is true. When a routine, lightweight test for corruption is required in a live production environment, using bt_index_check often provides the best trade-off between thoroughness of verification and limiting the impact on application performance and availability.

    • bt_index_parent_check(index regclass, heapallindexed boolean, rootdescend boolean) returns void

  • bt_index_parent_check tests that its target, a B-Tree index, respects a variety of invariants. Optionally, when the heapallindexed argument is true, the function verifies the presence of all heap tuples that should be found within the index. When the optional rootdescend argument is true, verification re-finds tuples on the leaf level by performing a new search from the root page for each tuple. The checks that can be performed by bt_index_parent_check are a superset of the checks that can be performed by bt_index_check. bt_index_parent_check can be thought of as a more thorough variant of bt_index_check: unlike bt_index_check, bt_index_parent_check also checks invariants that span parent/child relationships, including checking that there are no missing downlinks in the index structure. bt_index_parent_check follows the general convention of raising an error if it finds a logical inconsistency or other problem.

  • A ShareLock is required on the target index by bt_index_parent_check (a ShareLock is also acquired on the heap relation). These locks prevent concurrent data modification from INSERT, UPDATE, and DELETE commands. The locks also prevent the underlying relation from being concurrently processed by VACUUM, as well as all other utility commands. Note that the function holds locks only while running, not for the entire transaction.

  • bt_index_parent_check's additional verification is more likely to detect various pathological cases. These cases may involve an incorrectly implemented B-Tree operator class used by the index that is checked, or, hypothetically, undiscovered bugs in the underlying B-Tree index access method code. Note that bt_index_parent_check cannot be used when hot standby mode is enabled (i.e., on read-only physical replicas), unlike bt_index_check.

Tip

bt_index_checkbt_index_parent_check 都在 DEBUG1DEBUG2 严重级别输出关于验证过程的日志消息。这些消息提供关于验证过程的详细信息,对 PostgreSQL 开发者来说可能很有趣。高级用户也可能发现此信息有帮助,因为它在验证确实检测到不一致时提供了额外的上下文。运行:

bt_index_check and bt_index_parent_check both output log messages about the verification process at DEBUG1 and DEBUG2 severity levels. These messages provide detailed information about the verification process that may be of interest to PostgreSQL developers. Advanced users may also find this information helpful, since it provides additional context should verification actually detect an inconsistency. Running:

SET client_min_messages = DEBUG1;

在运行验证查询之前在交互式 psql 会话中,将以一个可管理的详细程度显示关于验证进度的消息。

in an interactive psql session before running a verification query will display messages about the progress of verification with a manageable level of detail.

  • verify_heapam(relation regclass, on_error_stop boolean, check_toast boolean, skip text, startblock bigint, endblock bigint, blkno OUT bigint, offnum OUT integer, attnum OUT integer, msg OUT text) returns setof record

    • Checks a table, sequence, or materialized view for structural corruption, where pages in the relation contain data that is invalidly formatted, and for logical corruption, where pages are structurally valid but inconsistent with the rest of the database cluster.

    • The following optional arguments are recognized:

    • For each corruption detected, verify_heapam returns a row with the following columns:

  • on_error_stop

    • If true, corruption checking stops at the end of the first block in which any corruptions are found.

    • Defaults to false.

  • check_toast

    • If true, toasted values are checked against the target relation’s TOAST table.

    • This option is known to be slow. Also, if the toast table or its index is corrupt, checking it against toast values could conceivably crash the server, although in many cases this would just produce an error.

    • Defaults to false.

  • skip

    • If not none, corruption checking skips blocks that are marked as all-visible or all-frozen, as specified. Valid options are all-visible, all-frozen and none.

    • Defaults to none.

  • startblock

    • If specified, corruption checking begins at the specified block, skipping all previous blocks. It is an error to specify a startblock outside the range of blocks in the target table.

    • By default, checking begins at the first block.

  • endblock

    • If specified, corruption checking ends at the specified block, skipping all remaining blocks. It is an error to specify an endblock outside the range of blocks in the target table.

    • By default, all blocks are checked.

  • blkno

    • The number of the block containing the corrupt page.

  • offnum

    • The OffsetNumber of the corrupt tuple.

  • attnum

    • The attribute number of the corrupt column in the tuple, if the corruption is specific to a column and not the tuple as a whole.

  • msg

    • A message describing the problem detected.

F.2.2. Optional heapallindexed Verification #

当 B-Tree 验证函数的 heapallindexed 参数为 true 时,将基于与目标索引关系关联的表执行附加验证阶段。此操作包括一个“虚拟”CREATE INDEX 操作,它会基于一个临时的内存总结结构(这是在验证的第一个基本阶段按需构建的)检查是否存在所有假设的新索引元组。总结结构会“指纹识别”目标索引中找到的每个元组。heapallindexed 验证背后的高级原理是,与现有的目标索引相当的新索引只允许包含可在现有结构中找到的项。

When the heapallindexed argument to B-Tree verification functions is true, an additional phase of verification is performed against the table associated with the target index relation. This consists of a “dummy” CREATE INDEX operation, which checks for the presence of all hypothetical new index tuples against a temporary, in-memory summarizing structure (this is built when needed during the basic first phase of verification). The summarizing structure “fingerprints” every tuple found within the target index. The high level principle behind heapallindexed verification is that a new index that is equivalent to the existing, target index must only have entries that can be found in the existing structure.

附加的 heapallindexed 阶段会增加显著的开销:验证通常会花费好几倍的时间。但是,执行 heapallindexed 验证时,不会更改已获取的关系级别锁定。

The additional heapallindexed phase adds significant overhead: verification will typically take several times longer. However, there is no change to the relation-level locks acquired when heapallindexed verification is performed.

总结结构的大小受 maintenance_work_mem 约束。为了确保索引中应表示的每个堆元组,不存在不一致检测失败的概率超过 2%,每个元组需要大约 2 字节的内存。每个元组分配的内存越少,漏检不一致的概率会逐渐升高。这种方法显著限制了验证的开销,同时只略微降低了检测问题的概率,特别是对于将验证视为例行维护任务的安装。每个缺失或畸形的元组在每次新的验证尝试中都会有新的机会被检测到。

The summarizing structure is bound in size by maintenance_work_mem. In order to ensure that there is no more than a 2% probability of failure to detect an inconsistency for each heap tuple that should be represented in the index, approximately 2 bytes of memory are needed per tuple. As less memory is made available per tuple, the probability of missing an inconsistency slowly increases. This approach limits the overhead of verification significantly, while only slightly reducing the probability of detecting a problem, especially for installations where verification is treated as a routine maintenance task. Any single absent or malformed tuple has a new opportunity to be detected with each new verification attempt.

F.2.3. Using amcheck Effectively #

amcheck 可有效检测 data checksums 未能捕获的各种类型的故障模式。其中包括:

amcheck can be effective at detecting various types of failure modes that data checksums will fail to catch. These include:

结构损坏可能是由于存储硬件出现故障,或者相关文件被无关软件覆盖或修改而发生的。这种损坏也可以用 data page checksums 检测到。

Structural corruption can happen due to faulty storage hardware, or relation files being overwritten or modified by unrelated software. This kind of corruption can also be detected with data page checksums.

格式正确、内部一致且相对于其自身的内部校验和正确的关系页面仍然可能包含逻辑损坏。因此,使用校验和无法检测到这类损坏。示例包括主表中有烤面包的值,而烤面包表中缺少相对应的条目,以及带有事务 ID 的主表中的元组,该事务 ID 早于数据库或群集中最早的有效事务 ID。

Relation pages which are correctly formatted, internally consistent, and correct relative to their own internal checksums may still contain logical corruption. As such, this kind of corruption cannot be detected with checksums. Examples include toasted values in the main table which lack a corresponding entry in the toast table, and tuples in the main table with a Transaction ID that is older than the oldest valid Transaction ID in the database or cluster.

在生产系统中观察到了多种逻辑损坏的原因,包括 PostgreSQL 服务器软件中的错误、错误且不合理的备份和恢复工具以及用户错误。

Multiple causes of logical corruption have been observed in production systems, including bugs in the PostgreSQL server software, faulty and ill-conceived backup and restore tools, and user error.

受损关系最令人担忧的是在实时生产环境中,而恰恰是在此类环境中,高风险活动最不受欢迎。出于这个原因,verify_heapam 旨在诊断故障,而无需承担过度的风险。它不能防止后端崩溃的所有原因,因为即使执行调用查询在严重损坏的系统上也可能是不可靠的。访问 catalog tables 会被执行,如果目录本身已损坏,可能会出现问题。

Corrupt relations are most concerning in live production environments, precisely the same environments where high risk activities are least welcome. For this reason, verify_heapam has been designed to diagnose corruption without undue risk. It cannot guard against all causes of backend crashes, as even executing the calling query could be unsafe on a badly corrupted system. Access to catalog tables is performed and could be problematic if the catalogs themselves are corrupted.

一般来说,amcheck 只能证明损坏的存在;它无法证明损坏的缺失。

In general, amcheck can only prove the presence of corruption; it cannot prove its absence.

F.2.4. Repairing Corruption #

amcheck 产生的任何与损坏有关的错误都不应该是误报。amcheck 在定义上永远不应该发生的条件下会产生错误,因此通常需要仔细分析 amcheck 错误。

No error concerning corruption raised by amcheck should ever be a false positive. amcheck raises errors in the event of conditions that, by definition, should never happen, and so careful analysis of amcheck errors is often required.

没有解决 amcheck 检测到的问题的一般方法。应寻求关于不变量冲突的根本原因的解释。 pageinspect 可能在诊断 amcheck 检测到的故障时发挥有用作用。REINDEX 可能无法有效修复故障。

There is no general method of repairing problems that amcheck detects. An explanation for the root cause of an invariant violation should be sought. pageinspect may play a useful role in diagnosing corruption that amcheck detects. A REINDEX may not be effective in repairing corruption.