Postgresql 中文操作指南

pg_test_timing

pg_test_timing — 度量时间开销

pg_test_timing — measure timing overhead

Synopsis

pg_test_timing [ option …​]

pg_test_timing [option…​]

Description

pg_test_timing 是用于衡量系统上的时间开销并确认系统时间绝不后退的工具。收集时间数据耗时较长的系统会生成精度稍低的 EXPLAIN ANALYZE 结果。

pg_test_timing is a tool to measure the timing overhead on your system and confirm that the system time never moves backwards. Systems that are slow to collect timing data can give less accurate EXPLAIN ANALYZE results.

Options

pg_test_timing 接受以下命令行选项:

pg_test_timing accepts the following command-line options:

  • -d _duration—​duration=_duration

    • Specifies the test duration, in seconds. Longer durations give slightly better accuracy, and are more likely to discover problems with the system clock moving backwards. The default test duration is 3 seconds.

  • -V_—​version_

    • Print the pg_test_timing version and exit.

  • -?_—​help_

    • Show help about pg_test_timing command line arguments, and exit.

Usage

Interpreting Results

良好结果会显示大部分(>90%)的单个计时调用消耗的时间不足一个微秒。每次循环的平均开销会更低,不到 100 纳秒。以下是使用 TSC 时钟源的英特尔 i7-860 系统中的一个示例,展示了卓越的性能:

Good results will show most (>90%) individual timing calls take less than one microsecond. Average per loop overhead will be even lower, below 100 nanoseconds. This example from an Intel i7-860 system using a TSC clock source shows excellent performance:

Testing timing overhead for 3 seconds.
Per loop time including overhead: 35.96 ns
Histogram of timing durations:
  < us   % of total      count
     1     96.40465   80435604
     2      3.59518    2999652
     4      0.00015        126
     8      0.00002         13
    16      0.00000          2

请注意,每个循环时间使用的单位与直方图不同。循环的分辨率可以在几纳秒 (ns) 内,而单个计时调用只能解决到一个微秒 (us)。

Note that different units are used for the per loop time than the histogram. The loop can have resolution within a few nanoseconds (ns), while the individual timing calls can only resolve down to one microsecond (us).

Measuring Executor Timing Overhead

当查询执行器使用 EXPLAIN ANALYZE 运行语句时,个别操作也会进行计时以及显示摘要。可以通过使用 psql 程序计算行来检查系统开销:

When the query executor is running a statement using EXPLAIN ANALYZE, individual operations are timed as well as showing a summary. The overhead of your system can be checked by counting rows with the psql program:

CREATE TABLE t AS SELECT * FROM generate_series(1,100000);
\timing
SELECT COUNT(*) FROM t;
EXPLAIN ANALYZE SELECT COUNT(*) FROM t;

测量的 i7-860 系统在 9.8 毫秒内运行计数查询,而 EXPLAIN ANALYZE 版本则在 16.6 毫秒内运行,每个版本处理的均为 100,000 多行。6.8 毫秒的差异意味着每行的计时开销为 68 纳秒,大约是 pg_test_timing 预计开销的两倍。即使是那么小的开销量也会使得完全计时的计数语句的执行时间增加近 70%。对于更大的查询,计时开销就不那么成问题了。

The i7-860 system measured runs the count query in 9.8 ms while the EXPLAIN ANALYZE version takes 16.6 ms, each processing just over 100,000 rows. That 6.8 ms difference means the timing overhead per row is 68 ns, about twice what pg_test_timing estimated it would be. Even that relatively small amount of overhead is making the fully timed count statement take almost 70% longer. On more substantial queries, the timing overhead would be less problematic.

Changing Time Sources

在一些较新的 Linux 系统上,可以随时更改用于收集时间数据的时钟源。第二个示例显示了在上面的高速结果中使用的同一系统上,切换到更慢的 acpi_pm 时间源后可能导致的速度下降:

On some newer Linux systems, it’s possible to change the clock source used to collect timing data at any time. A second example shows the slowdown possible from switching to the slower acpi_pm time source, on the same system used for the fast results above:

# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm
# echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource
# pg_test_timing
Per loop time including overhead: 722.92 ns
Histogram of timing durations:
  < us   % of total      count
     1     27.84870    1155682
     2     72.05956    2990371
     4      0.07810       3241
     8      0.01357        563
    16      0.00007          3

在此配置中,上面的 EXPLAIN ANALYZE 样本需要 115.9 毫秒。那是 1061 纳秒的计时开销,同样也是由此实用程序直接测量的开销的小倍数。那么大的计时开销意味着实际查询本身仅占用已计入时间的一小部分,大部分时间都消耗在开销中了。在此配置中,涉及许多计时操作的任何 EXPLAIN ANALYZE 总数都会因计时开销而大幅增加。

In this configuration, the sample EXPLAIN ANALYZE above takes 115.9 ms. That’s 1061 ns of timing overhead, again a small multiple of what’s measured directly by this utility. That much timing overhead means the actual query itself is only taking a tiny fraction of the accounted for time, most of it is being consumed in overhead instead. In this configuration, any EXPLAIN ANALYZE totals involving many timed operations would be inflated significantly by timing overhead.

FreeBSD 也允许在运行时更改时间源,并且它会记录在启动期间选择计时器的相关信息:

FreeBSD also allows changing the time source on the fly, and it logs information about the timer selected during boot:

# dmesg | grep "Timecounter"
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
Timecounter "i8254" frequency 1193182 Hz quality 0
Timecounters tick every 10.000 msec
Timecounter "TSC" frequency 2531787134 Hz quality 800
# sysctl kern.timecounter.hardware=TSC
kern.timecounter.hardware: ACPI-fast -> TSC

其他系统可能只允许在启动时设置时间源。在较旧的 Linux 系统上,“时钟”内核设置是进行此类更改的唯一方式。即使在一些较新的系统上,你看到的时钟源选项唯一选择也是“jiffies”。Jiffies 是旧的 Linux 软件时钟实现,当它由足够快速的时间硬件作为后盾时,可能会具有良好的分辨率,如下所示:

Other systems may only allow setting the time source on boot. On older Linux systems the "clock" kernel setting is the only way to make this sort of change. And even on some more recent ones, the only option you’ll see for a clock source is "jiffies". Jiffies are the older Linux software clock implementation, which can have good resolution when it’s backed by fast enough timing hardware, as in this example:

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
jiffies
$ dmesg | grep time.c
time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
time.c: Detected 2400.153 MHz processor.
$ pg_test_timing
Testing timing overhead for 3 seconds.
Per timing duration including loop overhead: 97.75 ns
Histogram of timing durations:
  < us   % of total      count
     1     90.23734   27694571
     2      9.75277    2993204
     4      0.00981       3010
     8      0.00007         22
    16      0.00000          1
    32      0.00000          1

Clock Hardware and Timing Accuracy

通常在使用拥有不同精度的硬件时钟的计算机上进行精确计时信息收集。使用部分硬件时,操作系统几乎可以直接将系统时钟时间传递给程序。系统时钟还可以从仅仅提供时序中断的芯片中衍生而来,这是在某些已知时间间隔内的周期性滴答声。在这两种情况下,操作系统内核提供一个隐藏这些细节的时钟源。但是,该时钟源的准确度及其返回结果的速度会因底层硬件的不同而异。

Collecting accurate timing information is normally done on computers using hardware clocks with various levels of accuracy. With some hardware the operating systems can pass the system clock time almost directly to programs. A system clock can also be derived from a chip that simply provides timing interrupts, periodic ticks at some known time interval. In either case, operating system kernels provide a clock source that hides these details. But the accuracy of that clock source and how quickly it can return results varies based on the underlying hardware.

不准确的时间记录会导致系统不稳定。非常小心地测试时钟源的任何变更。有时进行操作系统默认设置是为了偏向可靠性而不是最佳准确度。此外,如果你正在使用虚拟机,请查看与它兼容的推荐时钟源。虚拟硬件在仿真计时器时面临着额外的困难,通常由供应商建议每个操作系统设置。

Inaccurate time keeping can result in system instability. Test any change to the clock source very carefully. Operating system defaults are sometimes made to favor reliability over best accuracy. And if you are using a virtual machine, look into the recommended time sources compatible with it. Virtual hardware faces additional difficulties when emulating timers, and there are often per operating system settings suggested by vendors.

时间戳计数器 (TSC) 时钟源是当前一代 CPU 上最精确的可用的时钟源。当它受到操作系统支持且 TSC 时钟可靠时,它是跟踪系统时间的首选方式。TSC 无法提供精确时序源的有多种方式,这使得它变得不可靠。旧的系统可能有一个随 CPU 温度而变化的 TSC 时钟,这使得它不能用于计时。尝试在某些旧的多核 CPU 上使用 TSC 会产生多个内核之间不一致的报告时间。这可能导致时间倒退,这是该程序所检查的问题。甚至最新的系统也可能无法在非常激进的省电配置下提供准确的 TSC 计时。

The Time Stamp Counter (TSC) clock source is the most accurate one available on current generation CPUs. It’s the preferred way to track the system time when it’s supported by the operating system and the TSC clock is reliable. There are several ways that TSC can fail to provide an accurate timing source, making it unreliable. Older systems can have a TSC clock that varies based on the CPU temperature, making it unusable for timing. Trying to use TSC on some older multicore CPUs can give a reported time that’s inconsistent among multiple cores. This can result in the time going backwards, a problem this program checks for. And even the newest systems can fail to provide accurate TSC timing with very aggressive power saving configurations.

较新的操作系统可能会检查已知的 TSC 问题,并在发现这些问题时切换到更慢、更稳定的时钟源。如果你的系统支持 TSC 时间但没有默认使用它,则出于正当理由可能禁用它。并且,有些操作系统可能无法正确检测所有可能的问题,或者即使在已知 TSC 不准确的情况下也会允许使用 TSC。

Newer operating systems may check for the known TSC problems and switch to a slower, more stable clock source when they are seen. If your system supports TSC time but doesn’t default to that, it may be disabled for a good reason. And some operating systems may not detect all the possible problems correctly, or will allow using TSC even in situations where it’s known to be inaccurate.

高精度事件计时器 (HPET) 是在 TSC 不准确的情况下 TSC 可用系统上的首选计时器。计时器芯片本身是可编程的,以允许高达 100 纳秒的分辨率,但你可能不会在系统时钟中看到如此高的准确度。

The High Precision Event Timer (HPET) is the preferred timer on systems where it’s available and TSC is not accurate. The timer chip itself is programmable to allow up to 100 nanosecond resolution, but you may not see that much accuracy in your system clock.

高级配置和电源界面 (ACPI) 提供了一个电源管理 (PM) 定时器,Linux 将其称为 acpi_pm。从 acpi_pm 衍生的时钟最多将提供 300 纳秒的分辨率。

Advanced Configuration and Power Interface (ACPI) provides a Power Management (PM) Timer, which Linux refers to as the acpi_pm. The clock derived from acpi_pm will at best provide 300 nanosecond resolution.

旧 PC 硬件上使用的计时器包括 8254 可编程间隔计时器 (PIT)、实时时钟 (RTC)、高级可编程中断控制器 (APIC) 定时器和 Cyclone 定时器。这些计时器的目标是毫秒级分辨率。

Timers used on older PC hardware include the 8254 Programmable Interval Timer (PIT), the real-time clock (RTC), the Advanced Programmable Interrupt Controller (APIC) timer, and the Cyclone timer. These timers aim for millisecond resolution.

See Also