Postgresql 中文操作指南
59.4. Foreign Data Wrapper Query Planning #
FDW 回调函数 GetForeignRelSize、GetForeignPaths、GetForeignPlan、PlanForeignModify、GetForeignJoinPaths、GetForeignUpperPaths 和 PlanDirectModify 必须适合 PostgreSQL 规划器的操作。以下是关于它们必须做什么的一些说明。
The FDW callback functions GetForeignRelSize, GetForeignPaths, GetForeignPlan, PlanForeignModify, GetForeignJoinPaths, GetForeignUpperPaths, and PlanDirectModify must fit into the workings of the PostgreSQL planner. Here are some notes about what they must do.
root 和 baserel 中的信息可用于减少从外键表中必须获取的信息量(因此降低成本)。baserel→baserestrictinfo 特别有趣,因为它包含应用于过滤要获取的行的数据限制限定词 (WHERE 条款)。(FDW 本身不需要执行这些限定词,因为核心执行器可以检查它们。)baserel→reltarget→exprs 可用于确定要获取哪些列;但请注意,它仅列出 ForeignScan 计划节点必须发出的列,而不是用于限定词评估但查询未输出的列。
The information in root and baserel can be used to reduce the amount of information that has to be fetched from the foreign table (and therefore reduce the cost). baserel→baserestrictinfo is particularly interesting, as it contains restriction quals (WHERE clauses) that should be used to filter the rows to be fetched. (The FDW itself is not required to enforce these quals, as the core executor can check them instead.) baserel→reltarget→exprs can be used to determine which columns need to be fetched; but note that it only lists columns that have to be emitted by the ForeignScan plan node, not columns that are used in qual evaluation but not output by the query.
FDW 规划函数可以使用各种专用字段来保存信息。通常,你存储在 FDW 专用字段中的任何内容都应分配空间,以便在规划结束时回收它。
Various private fields are available for the FDW planning functions to keep information in. Generally, whatever you store in FDW private fields should be palloc’d, so that it will be reclaimed at the end of planning.
baserel→fdw_private 是一个 void 指针,可供 FDW 规划函数存储与特定外部表相关的信息。核心规划器不会接触它,除了在创建 RelOptInfo 节点时将其初始化为 NULL。它可用于从 GetForeignRelSize 到 GetForeignPaths 和/或从 GetForeignPaths 到 GetForeignPlan 传递信息,从而避免重新计算。
baserel→fdw_private is a void pointer that is available for FDW planning functions to store information relevant to the particular foreign table. The core planner does not touch it except to initialize it to NULL when the RelOptInfo node is created. It is useful for passing information forward from GetForeignRelSize to GetForeignPaths and/or GetForeignPaths to GetForeignPlan, thereby avoiding recalculation.
GetForeignPaths 可通过将专用信息存储在 ForeignPath 节点的 fdw_private 字段中来识别不同访问路径的含义。fdw_private 声明为 List 指针,但实际上可以包含任何内容,因为核心规划器不会触及它。然而,最佳做法是使用 nodeToString 可以转储的表示,以便利用后端提供的调试支持。
GetForeignPaths can identify the meaning of different access paths by storing private information in the fdw_private field of ForeignPath nodes. fdw_private is declared as a List pointer, but could actually contain anything since the core planner does not touch it. However, best practice is to use a representation that’s dumpable by nodeToString, for use with debugging support available in the backend.
GetForeignPlan 可以检查所选 ForeignPath 节点的 fdw_private 字段,并可以生成 fdw_exprs 和 fdw_private 列表以放置在 ForeignScan 计划节点中,这些列表将在执行时可用。这两个列表都必须表示为 copyObject 知道如何复制的形式。fdw_private 列表没有其他限制,也不会以任何方式被核心后端解释。fdw_exprs 列表(如果非空)预计包含要在运行时执行的表达式树。这些树将由规划器进行后处理以使其完全可执行。
GetForeignPlan can examine the fdw_private field of the selected ForeignPath node, and can generate fdw_exprs and fdw_private lists to be placed in the ForeignScan plan node, where they will be available at execution time. Both of these lists must be represented in a form that copyObject knows how to copy. The fdw_private list has no other restrictions and is not interpreted by the core backend in any way. The fdw_exprs list, if not NIL, is expected to contain expression trees that are intended to be executed at run time. These trees will undergo post-processing by the planner to make them fully executable.
在 GetForeignPlan 中,通常可以按原样将传入目标列表复制到计划节点。传入 scan_clauses 列表包含与 baserel→baserestrictinfo 相同的子句,但为了提高执行效率,可以对其进行重新排序。在简单情况下,FDW 可以在 scan_clauses 列表中剔除仅 RestrictInfo 节点(使用 extract_actual_clauses),并将所有子句放入计划节点的限定词列表中,这意味着所有子句将在执行时由执行器检查。更复杂的 FDW 可能能够在内部检查某些子句,在这种情况下,可以从计划节点的限定词列表中删除这些子句,以便执行器不必浪费时间重新检查它们。
In GetForeignPlan, generally the passed-in target list can be copied into the plan node as-is. The passed scan_clauses list contains the same clauses as baserel→baserestrictinfo, but may be re-ordered for better execution efficiency. In simple cases the FDW can just strip RestrictInfo nodes from the scan_clauses list (using extract_actual_clauses) and put all the clauses into the plan node’s qual list, which means that all the clauses will be checked by the executor at run time. More complex FDWs may be able to check some of the clauses internally, in which case those clauses can be removed from the plan node’s qual list so that the executor doesn’t waste time rechecking them.
例如,FDW 可能会识别某些形式 foreign_variable = sub_expression 的限制子句,它确定可以在远程服务器上执行,给定 sub_expression 的本地评估值。这样的子句的实际识别应该在 GetForeignPaths 期间发生,因为它会影响路径的成本估算。路径的 fdw_private 字段可能包含指向已识别子句的 RestrictInfo 节点的指针。然后 GetForeignPlan 将从 scan_clauses 中删除该子句,但将 sub_expression 添加到 fdw_exprs,以确保它被揉捏成可执行形式。它还可能将控制信息放入计划节点的 fdw_private 字段中,以告诉执行函数在运行时执行什么操作。传输到远程服务器的查询将涉及 WHERE _foreign_variable = $1_ 之类的内容,其中参数值是在运行时从 fdw_exprs 表达式树的评估中获得的。
As an example, the FDW might identify some restriction clauses of the form foreign_variable = sub_expression, which it determines can be executed on the remote server given the locally-evaluated value of the sub_expression. The actual identification of such a clause should happen during GetForeignPaths, since it would affect the cost estimate for the path. The path’s fdw_private field would probably include a pointer to the identified clause’s RestrictInfo node. Then GetForeignPlan would remove that clause from scan_clauses, but add the sub_expression to fdw_exprs to ensure that it gets massaged into executable form. It would probably also put control information into the plan node’s fdw_private field to tell the execution functions what to do at run time. The query transmitted to the remote server would involve something like WHERE _foreign_variable = $1_, with the parameter value obtained at run time from evaluation of the fdw_exprs expression tree.
从计划节点的限定词列表中删除的任何子句都必须改为添加到 fdw_recheck_quals 中或由 RecheckForeignScan 重新检查,以确保在 READ COMMITTED 隔离级别下行为正确。当对查询中涉及的某个其他表进行并发更新时,执行器可能需要验证所有原始限定词是否仍然满足该元组,甚至可能针对不同的参数值集。使用 fdw_recheck_quals 通常比在 RecheckForeignScan 内实现检查更容易,但当外部连接被下推时,此方法将不够用,因为在这种情况下,连接元组可能会有一些字段变为 NULL,而不会完全拒绝元组。
Any clauses removed from the plan node’s qual list must instead be added to fdw_recheck_quals or rechecked by RecheckForeignScan in order to ensure correct behavior at the READ COMMITTED isolation level. When a concurrent update occurs for some other table involved in the query, the executor may need to verify that all of the original quals are still satisfied for the tuple, possibly against a different set of parameter values. Using fdw_recheck_quals is typically easier than implementing checks inside RecheckForeignScan, but this method will be insufficient when outer joins have been pushed down, since the join tuples in that case might have some fields go to NULL without rejecting the tuple entirely.
FDW 可以填充的另一个 ForeignScan 字段是 fdw_scan_tlist,它描述了 FDW 为此计划节点返回的元组。对于简单的外部表扫描,这可以设置为 NIL,这意味着返回的元组具有为外部表声明的行类型。一个非 NIL 值必须是一个目标列表(TargetEntry_s) containing Vars and/or expressions representing the returned columns. This might be used, for example, to show that the FDW has omitted some columns that it noticed won’t be needed for the query. Also, if the FDW can compute expressions used by the query more cheaply than can be done locally, it could add those expressions to _fdw_scan_tlist 的列表)。请注意,连接计划(由 GetForeignJoinPaths 制作的路径创建)必须始终提供 fdw_scan_tlist 来描述它们将返回的列集。
Another ForeignScan field that can be filled by FDWs is fdw_scan_tlist, which describes the tuples returned by the FDW for this plan node. For simple foreign table scans this can be set to NIL, implying that the returned tuples have the row type declared for the foreign table. A non-NIL value must be a target list (list of TargetEntry_s) containing Vars and/or expressions representing the returned columns. This might be used, for example, to show that the FDW has omitted some columns that it noticed won’t be needed for the query. Also, if the FDW can compute expressions used by the query more cheaply than can be done locally, it could add those expressions to _fdw_scan_tlist. Note that join plans (created from paths made by GetForeignJoinPaths) must always supply fdw_scan_tlist to describe the set of columns they will return.
FDW 应始终构造至少一条仅依赖于表限制子句的路径。在连接查询中,它还可以选择构造依赖于连接子句的路径,例如 foreign_variable = local_variable。此类子句不会出现在 baserel→baserestrictinfo 中,但必须在关系的连接列表中查找。使用此类子句的路径称为“参数化路径”。它必须通过适当的值 param_info 标识所选连接子句中使用的其他关系;使用 get_baserel_parampathinfo 来计算该值。在 GetForeignPlan 中,将连接子句的 local_variable 部分添加到 fdw_exprs 中,然后在运行时,用例与普通限制子句相同。
The FDW should always construct at least one path that depends only on the table’s restriction clauses. In join queries, it might also choose to construct path(s) that depend on join clauses, for example foreign_variable = local_variable. Such clauses will not be found in baserel→baserestrictinfo but must be sought in the relation’s join lists. A path using such a clause is called a “parameterized path”. It must identify the other relations used in the selected join clause(s) with a suitable value of param_info; use get_baserel_parampathinfo to compute that value. In GetForeignPlan, the local_variable portion of the join clause would be added to fdw_exprs, and then at run time the case works the same as for an ordinary restriction clause.
如果 FDW 支持远程连接,则 GetForeignJoinPaths 应该产生 ForeignPath_s for potential remote joins in much the same way as _GetForeignPaths 适用于基本表。有关预期连接的信息可以通过上述相同的方式传递到 GetForeignPlan。但是,baserestrictinfo 与连接关系无关;相反,特定连接的相关连接子句作为单独的参数 (extra→restrictlist) 传递给 GetForeignJoinPaths。
If an FDW supports remote joins, GetForeignJoinPaths should produce ForeignPath_s for potential remote joins in much the same way as _GetForeignPaths works for base tables. Information about the intended join can be passed forward to GetForeignPlan in the same ways described above. However, baserestrictinfo is not relevant for join relations; instead, the relevant join clauses for a particular join are passed to GetForeignJoinPaths as a separate parameter (extra→restrictlist).
此外,FDW 可能还支持直接执行某些高于扫描和连接级别的计划操作,例如分组或聚合。为了提供这样的选项,FDW 应该生成路径并将其插入到适当的 upper relation 中。例如,表示远程聚合的路径应插入到 UPPERREL_GROUP_AGG 关系中,使用 add_path。该路径将基于成本与通过读取外部关系的简单扫描路径执行的本地聚合进行比较(请注意,还必须提供这样的路径,否则在计划时会出现错误)。如果远程聚合路径获胜,这通常会发生,它将通过调用 GetForeignPlan 以通常的方式转换为计划。生成此类路径的推荐位置是在 GetForeignUpperPaths 回调函数中,该函数对于每个上层关系(即每个扫描/连接后处理步骤)调用一次,前提是查询的所有基本关系都来自同一 FDW。
An FDW might additionally support direct execution of some plan actions that are above the level of scans and joins, such as grouping or aggregation. To offer such options, the FDW should generate paths and insert them into the appropriate upper relation. For example, a path representing remote aggregation should be inserted into the UPPERREL_GROUP_AGG relation, using add_path. This path will be compared on a cost basis with local aggregation performed by reading a simple scan path for the foreign relation (note that such a path must also be supplied, else there will be an error at plan time). If the remote-aggregation path wins, which it usually would, it will be converted into a plan in the usual way, by calling GetForeignPlan. The recommended place to generate such paths is in the GetForeignUpperPaths callback function, which is called for each upper relation (i.e., each post-scan/join processing step), if all the base relations of the query come from the same FDW.
PlanForeignModify 和 Section 59.2.4 中描述的其他回调功能的基础假设是:将按常规方式扫描外部关系,然后由本地 ModifyTable 计划节点来驱动各个行更新。对于需要读取本地表和外部表来进行更新的情况,这是必要的常规方法。但是,如果某个操作完全可以由外部服务器执行,那么 FDW 可以生成表示该操作的路径,并将其插入到 UPPERREL_FINAL 上层关系中,这将在 ModifyTable 方法的竞争下进行。此方法还可用来实现远程 SELECT FOR UPDATE,而不是使用 Section 59.2.6 中描述的行锁定回调。请记住,插入到 UPPERREL_FINAL 中的路径负责实现查询的 all 行为。
PlanForeignModify and the other callbacks described in Section 59.2.4 are designed around the assumption that the foreign relation will be scanned in the usual way and then individual row updates will be driven by a local ModifyTable plan node. This approach is necessary for the general case where an update requires reading local tables as well as foreign tables. However, if the operation could be executed entirely by the foreign server, the FDW could generate a path representing that and insert it into the UPPERREL_FINAL upper relation, where it would compete against the ModifyTable approach. This approach could also be used to implement remote SELECT FOR UPDATE, rather than using the row locking callbacks described in Section 59.2.6. Keep in mind that a path inserted into UPPERREL_FINAL is responsible for implementing all behavior of the query.
在计划 UPDATE 或 DELETE 时,PlanForeignModify 和 PlanDirectModify 可以查找外部表 RelOptInfo 结构并利用扫描规划函数先前创建的 baserel→fdw_private 数据。然而,在 INSERT 中,目标表不被扫描,因此没有 RelOptInfo。List 返回的 PlanForeignModify 有与 ForeignScan 计划节点的 fdw_private 列表相同的限制,即它必须仅包含 copyObject 知道如何复制的结构。
When planning an UPDATE or DELETE, PlanForeignModify and PlanDirectModify can look up the RelOptInfo struct for the foreign table and make use of the baserel→fdw_private data previously created by the scan-planning functions. However, in INSERT the target table is not scanned so there is no RelOptInfo for it. The List returned by PlanForeignModify has the same restrictions as the fdw_private list of a ForeignScan plan node, that is it must contain only structures that copyObject knows how to copy.
具有 INSERT 子句的 ON CONFLICT 不支持指定冲突目标,因为对远程表的唯一约束或排除约束在本地是未知的。这反过来意味着不支持 ON CONFLICT DO UPDATE,因为该规范在那里是强制性的。
INSERT with an ON CONFLICT clause does not support specifying the conflict target, as unique constraints or exclusion constraints on remote tables are not locally known. This in turn implies that ON CONFLICT DO UPDATE is not supported, since the specification is mandatory there.