Make Vars be outer-join-aware.

Traditionally we used the same Var struct to represent the value of a table column everywhere in parse and plan trees. This choice predates our support for SQL outer joins, and it's really a pretty bad idea with outer joins, because the Var's value can depend on where it is in the tree: it might go to NULL above an outer join. So expression nodes that are equal() per equalfuncs.c might not represent the same value, which is a huge correctness hazard for the planner. To improve this, decorate Var nodes with a bitmapset showing which outer joins (identified by RTE indexes) may have nulled them at the point in the parse tree where the Var appears. This allows us to trust that equal() Vars represent the same value. A certain amount of klugery is still needed to cope with cases where we re-order two outer joins, but it's possible to make it work without sacrificing that core principle. PlaceHolderVars receive similar decoration for the same reason. In the planner, we include these outer join bitmapsets into the relids that an expression is considered to depend on, and in consequence also add outer-join relids to the relids of join RelOptInfos. This allows us to correctly perceive whether an expression can be calculated above or below a particular outer join. This change affects FDWs that want to plan foreign joins. They *must* follow suit when labeling foreign joins in order to match with the core planner, but for many purposes (if postgres_fdw is any guide) they'd prefer to consider only base relations within the join. To support both requirements, redefine ForeignScan.fs_relids as base+OJ relids, and add a new field fs_base_relids that's set up by the core planner. Large though it is, this commit just does the minimum necessary to install the new mechanisms and get check-world passing again. Follow-up patches will perform some cleanup. (The README additions and comments mention some stuff that will appear in the follow-up.) Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
author: Tom Lane <tgl@sss.pgh.pa.us> 2023-01-30 13:16:20 -0500
committer: Tom Lane <tgl@sss.pgh.pa.us> 2023-01-30 13:16:20 -0500
commit: 2489d76c4906f4461a364ca8ad7e0751ead8aa0d (patch)
tree: 145ebc28d5ea8f5a5ba340b9e353a11de786adae /src/backend/optimizer/path/clausesel.c
parent: ec7e053a98f39a9e3c7e6d35f0d2e83933882399 (diff)
download: postgresql-2489d76c4906f4461a364ca8ad7e0751ead8aa0d.tar.gz
postgresql-2489d76c4906f4461a364ca8ad7e0751ead8aa0d.zip
1 files changed, 13 insertions, 32 deletions
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 929a2311121..61db6ad951b 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -218,7 +218,7 @@ clauselist_selectivity_ext(PlannerInfo *root,
 
 			if (rinfo)
 			{
-				ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
+				ok = (rinfo->num_base_rels == 1) &&
 					(is_pseudo_constant_clause_relids(lsecond(expr->args),
 													  rinfo->right_relids) ||
 					 (varonleft = false,
@@ -580,30 +580,6 @@ find_single_rel_for_clauses(PlannerInfo *root, List *clauses)
 }
 
 /*
- * bms_is_subset_singleton
- *
- * Same result as bms_is_subset(s, bms_make_singleton(x)),
- * but a little faster and doesn't leak memory.
- *
- * Is this of use anywhere else?  If so move to bitmapset.c ...
- */
-static bool
-bms_is_subset_singleton(const Bitmapset *s, int x)
-{
-	switch (bms_membership(s))
-	{
-		case BMS_EMPTY_SET:
-			return true;
-		case BMS_SINGLETON:
-			return bms_is_member(x, s);
-		case BMS_MULTIPLE:
-			return false;
-	}
-	/* can't get here... */
-	return false;
-}
-
-/*
  * treat_as_join_clause -
  *	  Decide whether an operator clause is to be handled by the
  *	  restriction or join estimator.  Subroutine for clause_selectivity().
@@ -631,17 +607,20 @@ treat_as_join_clause(PlannerInfo *root, Node *clause, RestrictInfo *rinfo,
 	else
 	{
 		/*
-		 * Otherwise, it's a join if there's more than one relation used. We
-		 * can optimize this calculation if an rinfo was passed.
+		 * Otherwise, it's a join if there's more than one base relation used.
+		 * We can optimize this calculation if an rinfo was passed.
 		 *
 		 * XXX	Since we know the clause is being evaluated at a join, the
 		 * only way it could be single-relation is if it was delayed by outer
-		 * joins.  Although we can make use of the restriction qual estimators
-		 * anyway, it seems likely that we ought to account for the
-		 * probability of injected nulls somehow.
+		 * joins.  We intentionally count only baserels here, not OJs that
+		 * might be present in rinfo->clause_relids, so that we direct such
+		 * cases to the restriction qual estimators not join estimators.
+		 * Eventually some notice should be taken of the possibility of
+		 * injected nulls, but we'll likely want to do that in the restriction
+		 * estimators rather than starting to treat such cases as join quals.
 		 */
 		if (rinfo)
-			return (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+			return (rinfo->num_base_rels > 1);
 		else
 			return (NumRelids(root, clause) > 1);
 	}
@@ -754,7 +733,9 @@ clause_selectivity_ext(PlannerInfo *root,
 		 * for all non-JOIN_INNER cases.
 		 */
 		if (varRelid == 0 ||
-			bms_is_subset_singleton(rinfo->clause_relids, varRelid))
+			rinfo->num_base_rels == 0 ||
+			(rinfo->num_base_rels == 1 &&
+			 bms_is_member(varRelid, rinfo->clause_relids)))
 		{
 			/* Cacheable --- do we already have the result? */
 			if (jointype == JOIN_INNER)
author	Tom Lane <tgl@sss.pgh.pa.us>	2023-01-30 13:16:20 -0500
committer	Tom Lane <tgl@sss.pgh.pa.us>	2023-01-30 13:16:20 -0500
commit	2489d76c4906f4461a364ca8ad7e0751ead8aa0d (patch)
tree	145ebc28d5ea8f5a5ba340b9e353a11de786adae /src/backend/optimizer/path/clausesel.c
parent	ec7e053a98f39a9e3c7e6d35f0d2e83933882399 (diff)
download	postgresql-2489d76c4906f4461a364ca8ad7e0751ead8aa0d.tar.gz postgresql-2489d76c4906f4461a364ca8ad7e0751ead8aa0d.zip