Postgresql 中文操作指南

62.1. Query Handling as a Complex Optimization Problem #

在所有关系运算符当中,join 最难处理和优化。可能查询计划的数量会随着查询中的连接数呈指数级增长。支持处理单个连接的各种 join methods(例如,PostgreSQL 中的嵌套循环、哈希连接、归并连接)和关系的各种 indexes(例如,PostgreSQL 中的 B 树、哈希、GiST 和 GIN)作为访问路径,造成了进一步的优化工作。

Among all relational operators the most difficult one to process and optimize is the join. The number of possible query plans grows exponentially with the number of joins in the query. Further optimization effort is caused by the support of a variety of join methods (e.g., nested loop, hash join, merge join in PostgreSQL) to process individual joins and a diversity of indexes (e.g., B-tree, hash, GiST and GIN in PostgreSQL) as access paths for relations.

普通 PostgreSQL 查询优化器对替代策略的空间执行 near-exhaustive search。此算法首先在 IBM 的 System R 数据库中引入,它生成了接近最优的连接顺序,但在查询中的连接数变大时,可能占用大量的内存和时间空间。这使得普通的 PostgreSQL 查询优化器不适用于连接大量表的查询。

The normal PostgreSQL query optimizer performs a near-exhaustive search over the space of alternative strategies. This algorithm, first introduced in IBM’s System R database, produces a near-optimal join order, but can take an enormous amount of time and memory space when the number of joins in the query grows large. This makes the ordinary PostgreSQL query optimizer inappropriate for queries that join a large number of tables.

德国弗莱贝格矿业与技术大学的自动控制学院在希望将 PostgreSQL 用作电网维护决策支持知识系统的后端时遇到了些问题。DBMS 需要处理基于知识系统的推理引擎的大型连接查询。这些查询中的连接数使使用普通查询优化器变得不可行。

The Institute of Automatic Control at the University of Mining and Technology, in Freiberg, Germany, encountered some problems when it wanted to use PostgreSQL as the backend for a decision support knowledge based system for the maintenance of an electrical power grid. The DBMS needed to handle large join queries for the inference machine of the knowledge based system. The number of joins in these queries made using the normal query optimizer infeasible.

在下面,我们描述了 genetic algorithm 的实现,它以一种对涉及大量连接的查询高效的方式解决连接排序问题。

In the following we describe the implementation of a genetic algorithm to solve the join ordering problem in a manner that is efficient for queries involving large numbers of joins.