| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When creating a new PlannerGlobal node in standard_planner(), most
fields are explicitly initialized, but a few are not. This doesn't
cause any functional issues, as makeNode() zeroes all fields by
default. However, the inconsistency is undesirable from a clarity and
maintenance perspective.
This patch explicitly initializes the remaining fields to improve
consistency and readability.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-TgQHNOiouqGcuHoBqbJjWyx4UxGKxUY3FrF4trGbcPA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When creating an explicit Sort node for the outer path of a mergejoin,
we need to determine the number of presorted keys of the outer path to
decide whether explicit incremental sort can be applied. Currently,
this is done by repeatedly calling pathkeys_count_contained_in.
This patch caches the number of presorted outer pathkeys in MergePath,
allowing us to save several calls to pathkeys_count_contained_in. It
can be considered a complement to the changes in commit 828e94c9d.
Reported-by: David Rowley <dgrowleyml@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqvBireB_w6x8BN5txdvBEHxVgZBt=rUnpf5ww5P_E_ww@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fc069a3a6319 implemented Self-Join Elimination (SJE) and put related logic
to ChangeVarNodes_walker(). This commit provides refactoring to remove the
SJE-related logic from ChangeVarNodes_walker() but adds a custom callback to
ChangeVarNodesExtended(), which has a chance to process a node before
ChangeVarNodes_walker(). Passing this callback to ChangeVarNodesExtended()
allows SJE-related node handling to be kept within the analyzejoins.c.
Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
|
|
|
|
| |
This reverts commit 250a718aadad68793e82103282247556a46a3cfc.
It shouldn't be pushed during the release freeze.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/E1uBIbY-000owH-0O%40gemulon.postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fc069a3a6319 implemented Self-Join Elimination (SJE) and put related logic
to ChangeVarNodes_walker(). This commit provides refactoring to remove the
SJE-related logic from ChangeVarNodes_walker() but adds a custom callback to
ChangeVarNodesExtended(), which has a chance to process a node before
ChangeVarNodes_walker(). Passing this callback to ChangeVarNodesExtended()
allows SJE-related node handling to be kept within the analyzejoins.c.
Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fc069a3a6319 implements Self-Join Elimination (SJE), which can remove base
relations when appropriate. However, regressions tests for SJE only cover
the case when placeholder variables (PHVs) are evaluated and needed only
in a single base rel. If this baserel is removed due to SJE, its clauses,
including PHVs, will be transferred to the keeping relation. Removing these
PHVs may trigger an error on plan creation -- thanks to the b3ff6c742f6c for
detecting that.
This commit skips removal of PHVs during SJE. This might also happen that
we skip the removal of some PHVs that could be removed. However, the overhead
of extra PHVs is small compared to the complexity of analysis needed to remove
them.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1349d2790 added support so that aggregate functions with an ORDER BY or
DISTINCT clause could make use of presorted inputs to avoid an implicit
sort within nodeAgg.c. That commit failed to consider that a FILTER
clause may exist that filters rows before the aggregate function
arguments are evaluated. That can be problematic if an aggregate
argument contains an expression which could error out during evaluation.
It's perfectly valid to want to have a FILTER clause which eliminates
such values, and with the pre-sorted path added in 1349d2790, it was
possible that the planner would produce a plan with a Sort node above
the Aggregate to perform the sort on the aggregate's arguments long before
the Aggregate node would filter out the non-matching values.
Here we fix this by inspecting ORDER BY / DISTINCT aggregate functions
which have a FILTER clause to see if the aggregate's arguments are
anything more complex than a Var or a Const. Evaluating these isn't
going to cause an error. If we find any non-Var, non-Const parameters
then the planner will now opt to perform the sort in the Aggregate node
for these aggregates, i.e. disable the presorted aggregate optimization.
An alternative fix would have been to completely disallow the presorted
optimization for Aggrefs with any FILTER clause, but that wasn't done as
that could cause large performance regressions for queries that see
significant gains from 1349d2790 due to presorted results coming in from
an Index Scan.
Backpatch to 16, where 1349d2790 was introduced
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Kaimeh <kkaimeh@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAK-%2BJz9J%3DQ06-M7cDJoPNeYbz5EZDqkjQbJnmRyQyzkbRGsYkA%40mail.gmail.com
Backpatch-through: 16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memoize typically marks cache entries as complete after fully scanning
the inner side of a join. However, in the case of unique joins, we
skip to the next outer tuple as soon as the first matching inner tuple
is found, leaving no opportunity to scan the inner side to completion.
To work around that, we mark cache entries as complete after fetching
the first matching inner tuple in unique joins.
This approach is only safe when all of the join's restriction clauses
are parameterized; otherwise, there is no guarantee that reading just
one tuple from the inner side is sufficient.
Currently, we check for this by verifying that the number of clauses
in ppi_clauses is no less than the number of the join's restriction
clauses. However, this check isn't entirely reliable, as ppi_clauses
includes join clauses available from all outer rels, not just the
current outer rel. This means the check could pass even if a
restriction clause isn't parameterized, as long as another join
clause, which doesn't belong to the current join, is included in
ppi_clauses.
To fix this, we explicitly check whether each restriction clause of
the current join is present in ppi_clauses.
While we're here, remove the XXX comment from the modified code, as
it's not justified; in certain cases, it's not possible to move a join
clause to the inner side.
This is arguably a bugfix, but no backpatch given the lack of field
reports.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-8JPouj=wBDj4DhK-WO4+Xdx=A2jbjvvyyTBQneJ1=BQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a GENERATED column is declared to have a domain data type where
the domain's constraints disallow null values, INSERT commands failed
because we built a targetlist that included coercing a null constant
to the domain's type. The failure occurred even when the generated
value would have been perfectly OK. This is adjacent to the issues
fixed in 0da39aa76, but we didn't notice for lack of testing a domain
with such a constraint.
We aren't going to use the result of the targetlist entry for the
generated column --- ExecComputeStoredGenerated will overwrite it.
So it's not really necessary that it have the exact datatype of
the generated column. This patch fixes the problem by changing
the targetlist entry to be a null Const of the domain's base type,
which should be sufficiently legal. (We do have to tweak
ExecCheckPlanOutput to accept the situation, though.)
This has been broken since we implemented generated columns.
However, this patch only applies easily as far back as v14, partly
because I (tgl) only carried 0da39aa76 back that far, but mostly
because v14 significantly refactored the handling of INSERT/UPDATE
targetlists. Given the lack of field complaints and the short
remaining support lifetime of v13, I judge the cost-benefit ratio
not good for devising a version that would work in v13.
Reported-by: jian he <jian.universality@gmail.com>
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxG59tip2+9h=rEv-ykOFjt0cbsPVchhi0RTij8bABBA0Q@mail.gmail.com
Backpatch-through: 14
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
d69d45a5a changed how em_is_child members are stored in
EquivalenceClasses. Children are no longer stored in the ec_members
list. optimizer/README mentioned that most operations "should ignore
child members", but that felt a little untrue now since child members
are now stored in a separate place, they simply won't be found by the
normal means of looking (a foreach loop over ec_members), and if you don't
find them, there's technically no need to "ignore" them.
Here we tweak the wording slightly to reflect the new storage location
for child members.
Reported-by: Amit Langote <amitlangote09@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqE8v=EuAP_3F_A2xn8zWx+nG_etW_Fe_DvKO-Fkx=+DdQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
These examples fail to account for join clauses generated by
EquivalenceClasses, but since we haven't mentioned EquivalenceClasses
yet it seems like it'd just add confusion to make them fully accurate.
Instead, parenthetically note that they're oversimplified.
Reported-by: Zeyuan Hu <ferrishu3886@gmail.com>
Co-authored-by: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACvHWmYFo+60yMqKJajDDvKN5EM41YHrCT3oxukwXmGAqpWvyw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When planning queries to partitioned tables, we clone all
EquivalenceMembers belonging to the partitioned table into em_is_child
EquivalenceMembers for each non-pruned partition. For partitioned tables
with large numbers of partitions, this meant the ec_members list could
become large and code searching that list would become slow. Effectively,
the more partitions which were present, the more searches needed to be
performed for operations such as find_ec_member_matching_expr() during
create_plan() and the more partitions present, the longer these searches
would take, i.e., a quadratic slowdown.
To fix this, here we adjust how we store EquivalenceMembers for
em_is_child members. Instead of storing these directly in ec_members,
these are now stored in a new array of Lists in the EquivalenceClass,
which is indexed by the relid. When we want to find EquivalenceMembers
belonging to a certain child relation, we can narrow the search to the
array element for that relation.
To make EquivalenceMember lookup easier and to reduce the amount of code
change, this commit provides a pair of functions to allow iteration over
the EquivalenceMembers of an EC which also handles finding the child
members, if required. Callers that never need to look at child members
can remain using the foreach loop over ec_members, which will now often
be faster due to only parent-level members being stored there.
The actual performance increases here are highly dependent on the number
of partitions and the query being planned. Performance increases can be
visible with as few as 8 partitions, but the speedup is marginal for
such low numbers of partitions. The speedups become much more visible
with a few dozen to hundreds of partitions. With some tested queries
using 56 partitions, the planner was around 3x faster than before. For
use cases with thousands of partitions, these are likely to become
significantly faster. Some testing has shown planner speedups of 60x or
more with 8192 partitions.
Author: Yuya Watari <watari.yuya@gmail.com>
Co-authored-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Tested-by: Thom Brown <thom@linux.com>
Tested-by: newtglobal postgresql_contributors <postgresql_contributors@newtglobalcorp.com>
Discussion: https://postgr.es/m/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A PG 17 optimization allowed columns with NOT NULL constraints to skip
table scans for IS NULL queries, and to skip IS NOT NULL checks for IS
NOT NULL queries. This didn't work for domain types, since domain types
don't follow the IS NULL/IS NOT NULL constraint logic. To fix, disable
this optimization for domains for PG 17+.
Reported-by: Jan Behrens
Diagnosed-by: Tom Lane
Discussion: https://postgr.es/m/Z37p0paENWWUarj-@momjian.us
Backpatch-through: 17
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows them to be added without scanning the table, and validating
them afterwards without holding access exclusive lock on the table after
any violating rows have been deleted or fixed.
Doing ALTER TABLE ... SET NOT NULL for a column that has an invalid
not-null constraint validates that constraint. ALTER TABLE .. VALIDATE
CONSTRAINT is also supported. There are various checks on whether an
invalid constraint is allowed in a child table when the parent table has
a valid constraint; this should match what we do for enforced/not
enforced constraints.
pg_attribute.attnotnull is now only an indicator for whether a not-null
constraint exists for the column; whether it's valid or invalid must be
queried in pg_constraint. Applications can continue to query
pg_attribute.attnotnull as before, but now it's possible that NULL rows
are present in the column even when that's set to true.
For backend internal purposes, we cache the nullability status in
CompactAttribute->attnullability that each tuple descriptor carries
(replacing CompactAttribute.attnotnull, which was a mirror of
Form_pg_attribute.attnotnull). During the initial tuple descriptor
creation, based on the pg_attribute scan, we set this to UNRESTRICTED if
pg_attribute.attnotnull is false, or to UNKNOWN if it's true; then we
update the latter to VALID or INVALID depending on the pg_constraint
scan. This flag is also copied when tupledescs are copied.
Comparing tuple descs for equality must also compare the
CompactAttribute.attnullability flag and return false in case of a
mismatch.
pg_dump deals with these constraints by storing the OIDs of invalid
not-null constraints in a separate array, and running a query to obtain
their properties. The regular table creation SQL omits them entirely.
They are then dealt with in the same way as "separate" CHECK
constraints, and dumped after the data has been loaded. Because no
additional pg_dump infrastructure was required, we don't bump its
version number.
I decided not to bump catversion either, because the old catalog state
works perfectly in the new world. (Trying to run with new catalog state
and the old server version would likely run into issues, however.)
System catalogs do not support invalid not-null constraints (because
commit 14e87ffa5c54 didn't allow them to have pg_constraint rows
anyway.)
Author: Rushabh Lathia <rushabh.lathia@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Tested-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAGPqQf0KitkNack4F5CFkFi-9Dqvp29Ro=EpcWt=4_hs-Rt+bQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were several places in ordering-related planning where a
requirement for btree was hardcoded but an amcanorder index could
suffice. This fixes that. We just need to do the necessary mapping
between strategy numbers and compare types and adjust some related
APIs so that this works independent of btree strategy numbers. For
instance, non-btree amcanorder indexes can now be used to support
sorting and merge joins. Also, predtest.c works independent of btree
strategy numbers now.
To avoid performance regressions, some details on btree and other
built-in index types are still hardcoded as shortcuts, but other index
types now have access to the same features by providing the required
flags and callbacks.
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit implements the automatic conversion of 'x IN (VALUES ...)' into
ScalarArrayOpExpr. That simplifies the query tree, eliminating the appearance
of an unnecessary join.
Since VALUES describes a relational table, and the value of such a list is
a table row, the optimizer will likely face an underestimation problem due to
the inability to estimate cardinality through MCV statistics. The cardinality
evaluation mechanism can work with the array inclusion check operation.
If the array is small enough (< 100 elements), it will perform a statistical
evaluation element by element.
We perform the transformation in the convert_ANY_sublink_to_join() if VALUES
RTE is proper and the transformation is convertible. The conversion is only
possible for operations on scalar values, not rows. Also, we currently
support the transformation only when it ends up with a constant array.
Otherwise, the evaluation of non-hashed SAOP might be slower than the
corresponding Hash Join with VALUES.
Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Ivan Kush <ivan.kush@tantorlabs.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit extracts the code to generate ScalarArrayOpExpr on top of the list
of expressions from match_orclause_to_indexcol() into a separate function
make_SAOP_expr(). This function was extracted to be used in optimization for
conversion of 'x IN (VALUES ...)' to 'x = ANY ...'. make_SAOP_expr() is
placed in clauses.c file as only two additional headers were needed there
compared with other places.
Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Ivan Kush <ivan.kush@tantorlabs.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the PathKey struct to use CompareType to record the sort
direction instead of hardcoding btree strategy numbers. The
CompareType is then converted to the index-type-specific strategy when
the plan is created.
This reduces the number of places btree strategy numbers are
hardcoded, and it's a self-contained subset of a larger effort to
allow non-btree indexes to behave like btrees.
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Derived clauses are stored in ec_derives, a List of RestrictInfos.
These clauses are later looked up by matching the left and right
EquivalenceMembers along with the clause's parent EC.
This linear search becomes expensive in queries with many joins or
partitions, where ec_derives may contain thousands of entries. In
particular, create_join_clause() can spend significant time scanning
this list.
To improve performance, introduce a hash table (ec_derives_hash) that
is built when the list reaches 32 entries -- the same threshold used
for join_rel_hash. The original list is retained alongside the hash
table to support EC merging and serialization
(_outEquivalenceClass()).
Each clause is stored in the hash table using a canonicalized key: the
EquivalenceMember with the lower memory address is placed in the key
before the one with the higher memory address. This avoids storing or
searching for both permutations of the same clause. For clauses
involving a constant EM, the key places NULL in the first slot and the
non-constant EM in the second.
The hash table is initialized using list_length(ec_derives_list) as
the size hint. simplehash internally adjusts this to the next power of
two after dividing by the fillfactor, so this typically results in at
least 64 buckets near the threshold -- avoiding immediate resizing
while adapting to the actual number of entries.
The lookup logic for derived clauses is now centralized in
ec_search_derived_clause_for_ems(), which consults the hash table when
available and falls back to the list otherwise.
The new ec_clear_derived_clauses() always frees ec_derives_list, even
though some of the original code paths that cleared the old
ec_derives field did not. This ensures consistent cleanup and avoids
leaking memory when large lists are discarded.
An assertion originally placed in find_derived_clause_for_ec_member()
is moved into ec_search_derived_clause_for_ems() so that it is
enforced consistently, regardless of whether the hash table or list is
used for lookup.
This design incorporates suggestions by David Rowley, who proposed
both the key canonicalization and the initial sizing approach to
balance memory usage and CPU efficiency.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Tested-by: Amit Langote <amitlangote09@gmail.com>
Tested-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAExHW5vZiQtWU6moszLP5iZ8gLX_ZAUbgEX0DxGLx9PGWCtqUg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
find_derived_clause_for_ec_member() searches for a previously-derived
clause that equates a non-constant EquivalenceMember to a constant.
It is only called for EquivalenceClasses with ec_has_const set, and
with a non-constant member the EquivalenceMember to search for.
The matched clause is expected to have the non-constant member on the
left-hand side and the constant EquivalenceMember on the right.
Assert that the RHS is indeed a constant, to catch violations of this
structure and enforce assumptions made by
generate_base_implied_equalities_const().
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CAExHW5scMxyFRqOFE6ODmBiW2rnVBEmeEcA-p4W_CyuEikURdA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
The check for non-inheritable constraints is performed later, and the
same comment is included at that point.
While we're here, remove one extraneous blank line.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CACJufxETi6x86S8EkH8mRfOcm2AenoE9t1pyCFVMpU34gVhF3w@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the historical implementation of SQL functions (if they don't get
inlined), we built plans for all the contained queries at first call
within an outer query, and then re-used those plans for the duration
of the outer query, and then forgot everything. This was not ideal,
not least because the plans could not be customized to specific values
of the function's parameters. Our plancache infrastructure seems
mature enough to be used here. That will solve both the problem with
not being able to build custom plans and the problem with not being
able to share work across successive outer queries.
Aside from those performance concerns, this change fixes a
longstanding bugaboo with SQL functions: you could not write DDL that
would affect later statements in the same function. That's mostly
still true with new-style SQL functions, since the results of parse
analysis are baked into the stored query trees (and protected by
dependency records). But for old-style SQL functions, it will now
work much as it does with PL/pgSQL functions, because we delay parse
analysis and planning of each query until we're ready to run it.
Some edge cases that require replanning are now handled better too;
see for example the new rowsecurity test, where we now detect an RLS
context change that was previously missed.
One other edge-case change that might be worthy of a release note
is that we now insist that a SQL function's result be generated
by the physically-last query within it. Previously, if the last
original query was deleted by a DO INSTEAD NOTHING rule, we'd be
willing to take the result from the preceding query instead.
This behavior was undocumented except in source-code comments,
and it seems hard to believe that anyone's relying on it.
Along the way to this feature, we needed a few infrastructure changes:
* The plancache can now take either a raw parse tree or an
analyzed-but-not-rewritten Query as the starting point for a
CachedPlanSource. If given a Query, it is caller's responsibility
that nothing will happen to invalidate that form of the query.
We use this for new-style SQL functions, where what's in pg_proc is
serialized Query(s) and we trust the dependency mechanism to disallow
DDL that would break those.
* The plancache now offers a way to invoke a post-rewrite callback
to examine/modify the rewritten parse tree when it is rebuilding
the parse trees after a cache invalidation. We need this because
SQL functions sometimes adjust the parse tree to make its output
exactly match the declared result type; if the plan gets rebuilt,
that has to be re-done.
* There is a new backend module utils/cache/funccache.c that
abstracts the idea of caching data about a specific function
usage (a particular function and set of input data types).
The code in it is moved almost verbatim from PL/pgSQL, which
has done that for a long time. We use that logic now for
SQL-language functions too, and maybe other PLs will have use
for it in the future.
Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://postgr.es/m/8216639.NyiUUSuA9g@aivenlaptop
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This expands the NOT ENFORCED constraint flag, previously only
supported for CHECK constraints (commit ca87c415e2f), to foreign key
constraints.
Normally, when a foreign key constraint is created on a table, action
and check triggers are added to maintain data integrity. With this
patch, if a constraint is marked as NOT ENFORCED, integrity checks are
no longer required, making these triggers unnecessary. Consequently,
when creating a NOT ENFORCED foreign key constraint, triggers will not
be created, and the constraint will be marked as NOT VALID.
Similarly, if an existing foreign key constraint is changed to NOT
ENFORCED, the associated triggers will be dropped, and the constraint
will also be marked as NOT VALID. Conversely, if a NOT ENFORCED
foreign key constraint is changed to ENFORCED, the necessary triggers
will be created, and the will be changed to VALID by performing
necessary validation.
Since not-enforced foreign key constraints have no triggers, the
shortcut used for example in psql and pg_dump to skip looking for
foreign keys if the relation is known not to have triggers no longer
applies. (It already didn't work for partitioned tables.)
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Tested-by: Triveni N <triveni.n@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
50e17ad28 (v14) and 29f45e299 (v15) made it so the planner could identify
IN and NOT IN clauses which have Const lists as right-hand arguments and
when an appropriate hash function is available for the data types, mark
the ScalarArrayOpExpr as hashable so the executor could execute it more
optimally by building and probing a hash table during expression
evaluation.
These commits both worked correctly when there was only a single
ScalarArrayOpExpr in the given expression being processed by the
planner, but when there were multiple, only the first was checked and any
subsequent ones were not identified, which resulted in less optimal
expression evaluation during query execution for all but the first found
ScalarArrayOpExpr.
Backpatch to 14, where 50e17ad28 was introduced.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/29a76f51-97b0-4c07-87b7-ec8e3b5345c9@gmail.com
Backpatch-through: 14
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, group_similar_or_args() permutes original positions of clauses
independently on whether it manages to find any groups of similar clauses.
While we are not providing any strict warranties on saving the original order
of OR-clauses, it is preferred that the original order be modified as little
as possible.
This commit changes the reordering algorithm of group_similar_or_args() in
the following way. We reorder each group of similar clauses so that the
first item of the group stays in place, but all the other items are moved
after it. So, if there are no similar clauses, the order of clauses stays
the same. When there are some groups, only required reordering happens while
the rest of the clauses remain in their places.
Reported-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/3ac7c436-81e1-4191-9caf-b0dd70b51511%40gmail.com
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This field can be optionally set in a PlannedStmt through the planner
hook, giving extensions the possibility to assign an identifier related
to a computed plan. The backend is changed to report it in the backend
entry of a process running (including the extended query protocol), with
semantics and APIs to set or get it similar to what is used for the
existing query ID (introduced in the backend via 4f0b0966c8). The plan
ID is reset at the same timing as the query ID. Currently, this
information is not added to the system view pg_stat_activity; extensions
can access it through PgBackendStatus.
Some patches have been proposed to provide some features in the planning
area, where a plan identifier is used as a key to know the plan involved
(for statistics, plan storage and manipulations, etc.), and the point of
this commit is to provide an anchor in the backend that extensions can
rely on for future work. The reset of the plan identifier is
controlled by core and follows the same pattern as the query identifier
added in 4f0b0966c8.
The contents of this commit are extracted from a larger set proposed
originally by Lukas Fittl, that Sami Imseih has proposed as an
independent change, with a few tweaks sprinkled by me.
Author: Lukas Fittl <lukas@fittl.com>
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg@mail.gmail.com
Discussion: https://postgr.es/m/CAA5RZ0vyWd4r35uUBUmhngv8XqeiJUkJDDKkLf5LCoWxv-t_pw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have collected several instances of a workaround for GCC bug 53119,
which caused false-positive compiler warnings. This bug has long been
fixed, but was still seen on the buildfarm, most recently on lapwing
with gcc (Debian 4.7.2-5). (The GCC bug tracker mentions that a fix
was backported to 4.7.4 and 4.8.3.)
That compiler no longer runs warning-free since commit 6fdd5d95634, so
we don't need to keep these workarounds. And furthermore, the
consensus appears to be that we don't want to keep supporting that era
of platform anymore at all.
This reverts the following commits:
d937904cce6a3d82e4f9c2127de7b59105a134b3
506428d091760650971433f6bc083531c307b368
b449afb582bb9015bfbb85abc10ce122aef9ec70
6392f2a0968c20ecde4d27b6652703ad931fce92
bad0763a4d7be3005eae35d460c73ac4bc7ebaad
5e0c761d0a13c7b4f7c5de618ac38560d74d74d0
and makes a few similar fixes to newer code.
Discussion: https://www.postgresql.org/message-id/flat/e170d61f-01ab-4cf9-ab68-91cd1fac62c5%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/flat/CA%2BTgmoYEAm-KKZibAP3hSqbTFTjUd47XtVcf3xSFDpyecXX9uQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit cbc127917e introduced tracking of unpruned relids to avoid
processing pruned relations, and changed ExecInitModifyTable() to
initialize only unpruned result relations. As a result, MERGE
statements that prune all target partitions can now lead to crashes
or incorrect behavior during execution.
The crash occurs because some executor code paths rely on
ModifyTableState.resultRelInfo[0] being present and initialized,
even when no result relations remain after pruning. For example,
ExecMerge() and ExecMergeNotMatched() use the first resultRelInfo
to determine the appropriate action. Similarly,
ExecInitPartitionInfo() assumes that at least one result relation
exists.
To preserve these assumptions, ExecInitModifyTable() now includes the
first result relation in the initialized result relation list if all
result relations for that ModifyTable were pruned. To enable that,
ExecDoInitialPruning() ensures the first relation is locked if it was
pruned and locking is necessary.
To support this exception to the pruning logic, PlannedStmt now
includes a list of RT indexes identifying the first result relation
of each ModifyTable node in the plan. This allows
ExecDoInitialPruning() to check whether each such relation was
pruned and, if so, lock it if necessary.
Bug: #18830
Reported-by: Robins Tharakan <tharakan@gmail.com>
Diagnozed-by: Tender Wang <tndrwang@gmail.com>
Diagnozed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/18830-1f31ea1dc930d444%40postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After pushing the bitmap iterator into table-AM specific code (as part
of making bitmap heap scan use the read stream API in 2b73a8cd33b7),
scan_bitmap_next_block() no longer returns the current block number.
Since scan_bitmap_next_block() isn't returning any relevant information
to bitmap table scan code, it makes more sense to get rid of it.
Now, bitmap table scan code only calls table_scan_bitmap_next_tuple(),
and the heap AM implementation of scan_bitmap_next_block() is a local
helper in heapam_handler.c.
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/flat/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When pulling up a subquery, if the subquery's target list items are
used in grouping set columns, we need to wrap them in PlaceHolderVars.
This ensures that expressions retain their separate identity so that
they will match grouping set columns when appropriate.
In 90947674f, we decided to wrap subquery outputs that are non-var
expressions in PlaceHolderVars. This prevents const-simplification
from merging them into the surrounding expressions after subquery
pullup, which could otherwise lead to failing to match those
subexpressions to grouping set columns, with the effect that they'd
not go to null when expected.
However, that left some loose ends. If the subquery's target list
contains two or more identical Var expressions, we can still fail to
match the Var expression to the expected grouping set expression.
This is not related to const-simplification, but rather to how we
match expressions to lower target items in setrefs.c.
For sort/group expressions, we use ressortgroupref matching, which
works well. For other expressions, we primarily rely on comparing the
expressions to determine if they are the same. Therefore, we need a
way to prevent setrefs.c from matching the expression to some other
identical ones.
To fix, wrap all subquery outputs in PlaceHolderVars if the parent
query uses grouping sets, ensuring that they preserve their separate
identity throughout the whole planning process.
Reported-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-meSahaanKskpBn0KKxdHAXC1_EJCVWHxEodqirrGJnw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In pull_up_simple_subquery and pull_up_constant_function, there is
code that sets wrap_non_vars to true when dealing with an appendrel
member. The goal is to wrap subquery outputs that are not simple Vars
in PlaceHolderVars, ensuring that what we pull up doesn't get merged
into a surrounding expression during later processing, which could
cause it to fail to match the expression actually available from the
appendrel.
However, this is unnecessary. When pulling up an appendrel child
subquery, the only part of the upper query that could reference the
appendrel child yet is the translated_vars list of the associated
AppendRelInfo that we just made for this child. Furthermore, we do
not want to force use of PHVs in the AppendRelInfo, as there is no
outer join between. In fact, perform_pullup_replace_vars always sets
wrap_non_vars to false before performing pullup_replace_vars on the
AppendRelInfo.
This patch simply removes the code that sets wrap_non_vars to true for
UNION ALL subqueries.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-VXDEi1v+hZYLxpOv0riJxHsCkCH1f46tLnhonEAyGCQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
makeWholeRowVar() has different rules for constructing a
whole-row Var depending on the kind of RTE it's representing.
This turns out to be problematic because the rewriter and planner
can convert view RTEs and set-returning-function RTEs into
subquery RTEs; so a whole-row Var made during planning might
look different from one made by the parser. In isolation this
doesn't cause any problem, but if a query contains Vars made
both ways for the same varno, there are cross-checks in the
executor that will complain. This manifests for UPDATE, DELETE,
and MERGE queries that use whole-row table references.
To fix, we need makeWholeRowVar() to produce the same result
from an inlined RTE as it would have for the original. For
an inlined view, we can use RangeTblEntry.relid to detect
that this had been a view RTE. For inlined SRFs, make a
data structure definition change akin to commit 47bb9db75,
and say that we won't clear RangeTblEntry.functions until
the end of planning. That allows makeWholeRowVar() to
repeat what it would have done with the unmodified RTE.
Reported-by: Duncan Sands <duncan.sands@deepbluecap.com>
Reported-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/3518c50a-ab18-482f-b916-a37263622501@deepbluecap.com
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Up to now we just punted on showing the window definitions used
in a plan, with window function calls represented as "OVER (?)".
To improve that, show the window definition implemented by each
WindowAgg plan node, and reference their window names in OVER.
For nameless window clauses generated by "OVER (...)", assign
unique names w1, w2, etc.
In passing, re-order the properties shown for a WindowAgg node
so that the Run Condition (if any) appears after the Window
property and before the Filter (if any). This seems more
sensible since the Run Condition is associated with the Window
and acts before the Filter.
Thanks to David G. Johnston and Álvaro Herrera for design
suggestions.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/144530.1741469955@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recognizing the real-life complexity where columns in the table often have
functional dependencies, PostgreSQL's estimation of the number of distinct
values over a set of columns can be underestimated (or much rarely,
overestimated) when dealing with multi-clause JOIN. In the case of hash
join, it can end up with a small number of predicted hash buckets and, as
a result, picking non-optimal merge join.
To improve the situation, we introduce one additional stage of bucket size
estimation - having two or more join clauses estimator lookup for extended
statistics and use it for multicolumn estimation. Clauses are grouped into
lists, each containing expressions referencing the same relation. The result
of the multicolumn estimation made over such a list is combined with others
according to the caller's logic. Clauses that are not estimated are returned
to the caller for further estimation.
Discussion: https://postgr.es/m/52257607-57f6-850d-399a-ec33a654457b%40postgrespro.ru
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Andy Fan <zhihui.fan1213@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change is dedicated to more active usage of IndexScan and parameterized
NestLoop paths in partitioned cases under an Append node, as it already works
with plain tables. As newly added regression tests demonstrate, it should
provide more smartness to the partitionwise technique.
With an indication of how many tuples are needed, it may be more meaningful
to use the 'fractional branch' subpaths of the Append path list, which are
more optimal for this specific number of tuples. Planning on a higher level,
if the optimizer needs all the tuples, it will choose non-fractional paths.
In the case when, during execution, Append needs to return fewer tuples than
declared by tuple_fraction, it would not be harmful to use the 'intermediate'
variant of paths. However, it will earn a considerable profit if a sensible
set of tuples is selected.
The change of the existing regression test demonstrates the positive outcome
of this feature: instead of scanning the whole table, the optimizer prefers
to use a parameterized scan, being aware of the only single tuple the join
has to produce to perform the query.
Discussion: https://www.postgresql.org/message-id/flat/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA%40mail.gmail.com
Author: Nikita Malakhov <hukutoc@gmail.com>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Andy Fan <zhihuifan1213@163.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
After more discussion about commit ce62f2f2a0a, rename the index AM
property amcancrosscompare to two separate properties
amconsistentequality and amconsistentordering. Also improve the
documentation and update some comments that were previously missed.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit b262ad440, we introduced an optimization that reduces an IS
NOT NULL qual on a column defined as NOT NULL to constant true, and an
IS NULL qual on a NOT NULL column to constant false, provided we can
prove that the input expression of the NullTest is not nullable by any
outer join. This deduction happens after we have generated multiple
clones of the same qual condition to cope with commuted-left-join
cases.
However, performing the NullTest deduction for clone clauses can be
unsafe, because we don't have a reliable way to determine if the input
expression of a NullTest is non-nullable: nullingrel bits in clone
clauses may not reflect reality, so we dare not draw conclusions from
clones about whether Vars are guaranteed not-null.
To fix, we check whether the given RestrictInfo is a clone clause in
restriction_is_always_true and restriction_is_always_false, and avoid
performing any reduction if it is.
There are several ensuing plan changes in predicate.out, and we have
to modify the tests to ensure that they continue to test what they are
intended to. Additionally, this fix causes the test case added in
f00ab1fd1 to no longer trigger the bug that commit fixed, so we also
remove that test case.
Back-patch to v17 where this bug crept in.
Reported-by: Ronald Cruz <cruz@rentec.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/f5320d3d-77af-4ce8-b9c3-4715ff33f213@rentec.com
Backpatch-through: 17
|
|
|
|
|
|
|
|
|
|
|
|
| |
As spotted by Coverity, the calculation of ojrelid mixes signed and unsigned
types causes possible overflow and undefined behavior. Instead of trying to
fix the expression, this commit eliminates the relied local variable. The
explicit branching is used to replace the -1 value. That, in turn, requires
changing the signature of the remove_rel_from_eclass() function.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/914330.1740330169%40sss.pgh.pa.us
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The callback functions ReplaceVarsFromTargetList_callback and
pullup_replace_vars_callback are both used to replace Vars in an
expression tree that reference a particular RTE with items from a
targetlist, and they both need to expand whole-tuple references and
deal with OLD/NEW RETURNING list Vars. As a result, currently there
is significant code duplication between these two functions.
This patch introduces a new function, ReplaceVarFromTargetList, to
perform the replacement and calls it from both callback functions,
thereby eliminating code duplication.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWhr=FM4X5kCPvVs-g2XEk+ceLsNtBK_zZMkqFn9vUjsw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 83ea6c540 added support for virtual generated columns that are
computed on read. All Var nodes in the query that reference virtual
generated columns must be replaced with the corresponding generation
expressions. Currently, this replacement occurs in the rewriter.
However, this approach has several issues. If a Var referencing a
virtual generated column has any varnullingrels, those varnullingrels
need to be propagated into the generation expression. Failing to do
so can lead to "wrong varnullingrels" errors and improper outer-join
removal.
Additionally, if such a Var comes from the nullable side of an outer
join, we may need to wrap the generation expression in a
PlaceHolderVar to ensure that it is evaluated at the right place and
hence is forced to null when the outer join should do so. In certain
cases, such as when the query uses grouping sets, we also need a
PlaceHolderVar for anything that is not a simple Var to isolate
subexpressions. Failure to do so can result in incorrect results.
To fix these issues, this patch expands the virtual generated columns
in the planner rather than in the rewriter, and leverages the
pullup_replace_vars architecture to avoid code duplication. The
generation expressions will be correctly marked with nullingrel bits
and wrapped in PlaceHolderVars when needed by the pullup_replace_vars
callback function. This requires handling the OLD/NEW RETURNING list
Vars in pullup_replace_vars_callback, as it may now deal with Vars
referencing the result relation instead of a subquery.
The "wrong varnullingrels" error was reported by Alexander Lakhin.
The incorrect result issue and the improper outer-join removal issue
were reported by Richard Guo.
Author: Richard Guo <guofenglinux@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/75eb1a6f-d59f-42e6-8a78-124ee808cda7@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In try_partitionwise_join, we try to break down the join between two
partitioned relations into joins between matching partitions. To
achieve this, we iterate through each pair of partitions from the two
joining relations and create child join relations for them. To reduce
memory accumulation during each iteration, one step we take is freeing
the SpecialJoinInfos created for the child joins.
A child join's SpecialJoinInfo is a copy of the parent join's
SpecialJoinInfo, with some members being translated copies of their
counterparts in the parent. However, when freeing the bitmapset
members in a child join's SpecialJoinInfo, we failed to check whether
they were translated copies. As a result, we inadvertently freed the
members that were still in use by the parent SpecialJoinInfo, leading
to crashes when those freed members were accessed.
To fix, check if each member of the child join's SpecialJoinInfo is a
translated copy and free it only if that's the case. This requires
passing the parent join's SpecialJoinInfo as a parameter to
free_child_join_sjinfo.
Back-patch to v17 where this bug crept in.
Bug: #18806
Reported-by: 孟令彬 <m_lingbin@126.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/18806-d70b0c9fdf63dcbf@postgresql.org
Backpatch-through: 17
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result. Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures. Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses. Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.
This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;
In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.
SELECT * FROM t1 WHERE x IN
(SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);
Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.
The SJE proof is based on innerrel_is_unique() machinery.
We can remove a self-join when for each outer row:
1. At most, one inner row matches the join clause;
2. Each matched inner row must be (physically) the same as the outer one;
3. Inner and outer rows have the same row mark.
In this patch, we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x;
2. Add to the list above the baseretrictinfo of the inner table;
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables;
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
possibility of self-join elimination, the inner and outer clauses must
match exactly.
The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins. Tests covering this feature
were added to join.sql. Some of the existing regression tests changed due
to self-join removal logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In set_append_rel_size(), we currently set rel->tuples to rel->rows
for an appendrel. Generally, rel->tuples is the raw number of tuples
in the relation and rel->rows is the estimated number of tuples after
the relation's restriction clauses have been applied. Although an
appendrel itself doesn't directly enforce any quals today, its child
relations may. Therefore, setting rel->tuples equal to rel->rows for
an appendrel isn't always appropriate.
Doing so can lead to issues in cost estimates in some cases. For
instance, when estimating the number of distinct values from an
appendrel, we would not be able to adjust the estimate based on the
restriction selectivity.
This patch addresses this by setting an appendrel's tuples to the
total number of tuples accumulated from each live child, which better
aligns with reality.
This is arguably a bug, but nobody has complained about that until
now, so no back-patch.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/CAMbWs4_TG_+kVn6fjG-5GYzzukrNK57=g9eUo4gsrUG26OFawg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit introduces changes to track unpruned relations explicitly,
making it possible for top-level plan nodes, such as ModifyTable and
LockRows, to avoid processing partitions pruned during initial
pruning. Scan-level nodes, such as Append and MergeAppend, already
avoid the unnecessary processing by accessing partition pruning
results directly via part_prune_index. In contrast, top-level nodes
cannot access pruning results directly and need to determine which
partitions remain unpruned.
To address this, this commit introduces a new bitmapset field,
es_unpruned_relids, which the executor uses to track the set of
unpruned relations. This field is referenced during plan
initialization to skip initializing certain nodes for pruned
partitions. It is initialized with PlannedStmt.unprunableRelids,
a new field that the planner populates with RT indexes of relations
that cannot be pruned during runtime pruning. These include relations
not subject to partition pruning and those required for execution
regardless of pruning.
PlannedStmt.unprunableRelids is computed during set_plan_refs() by
removing the RT indexes of runtime-prunable relations, identified
from PartitionPruneInfos, from the full set of relation RT indexes.
ExecDoInitialPruning() then updates es_unpruned_relids by adding
partitions that survive initial pruning.
To support this, PartitionedRelPruneInfo and PartitionedRelPruningData
now include a leafpart_rti_map[] array that maps partition indexes to
their corresponding RT indexes. The former is used in set_plan_refs()
when constructing unprunableRelids, while the latter is used in
ExecDoInitialPruning() to convert partition indexes returned by
get_matching_partitions() into RT indexes, which are then added to
es_unpruned_relids.
These changes make it possible for ModifyTable and LockRows nodes to
process only relations that remain unpruned after initial pruning.
ExecInitModifyTable() trims lists, such as resultRelations,
withCheckOptionLists, returningLists, and updateColnosLists, to
consider only unpruned partitions. It also creates ResultRelInfo
structs only for these partitions. Similarly, child RowMarks for
pruned relations are skipped.
By avoiding unnecessary initialization of structures for pruned
partitions, these changes improve the performance of updates and
deletes on partitioned tables during initial runtime pruning.
Due to ExecInitModifyTable() changes as described above, EXPLAIN on a
plan for UPDATE and DELETE that uses runtime initial pruning no longer
lists partitions pruned during initial pruning.
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit allows transformation of OR-clauses into SAOP's for index scans
within nested loop joins. That required the following changes.
1. Make match_orclause_to_indexcol() and group_similar_or_args() understand
const-ness in the same way as match_opclause_to_indexcol(). This
generally makes our approach more uniform.
2. Make match_join_clauses_to_index() pass OR-clauses to
match_clause_to_index().
3. Also switch match_join_clauses_to_index() to use list_append_unique_ptr()
for adding clauses to *joinorclauses. That avoids possible duplicates
when processing the same clauses with different indexes. Previously such
duplicates were elimited in match_clause_to_index(), but now
group_similar_or_args() each time generates distinct copies of grouped
OR clauses.
Discussion: https://postgr.es/m/CAPpHfdv%2BjtNwofg-p5z86jLYZUTt6tR17Wy00ta0dL%3DwHQN3ZA%40mail.gmail.com
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
|
|
|
|
|
|
|
| |
Since d4378c0005e6, match_clause_to_indexcol() doesn't always return NULL
for an OR clause. This commit reflects that in the function header comment.
Reported-by: Pavel Borisov <pashkin.elfe@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consistently use "Size" (or size_t, or in some places int64 or double)
as the type for variables holding memory allocation sizes. In most
places variables' data types were fine already, but we had an ancient
habit of computing bytes from kilobytes-units GUCs with code like
"work_mem * 1024L". That risks overflow on Win64 where they did not
make "long" as wide as "size_t". We worked around that by restricting
such GUCs' ranges, so you couldn't set work_mem et al higher than 2GB
on Win64. This patch removes that restriction, after replacing such
calculations with "work_mem * (Size) 1024" or variants of that.
It should be noted that this patch was constructed by searching
outwards from the GUCs that have MAX_KILOBYTES as upper limit.
So I can't positively guarantee there are no other places doing
memory-size arithmetic in int or long variables. I do however feel
pretty confident that increasing MAX_KILOBYTES on Win64 is safe now.
Also, nothing in our code should be dealing in multiple-gigabyte
allocations without authorization from a relevant GUC, so it seems
pretty likely that this search caught everything that could be at
risk of overflow.
Author: Vladlen Popolitov <v.popolitov@postgrespro.ru>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves PartitionPruneInfo from plan nodes to PlannedStmt,
simplifying traversal by centralizing all PartitionPruneInfo
structures in a single list in it, which holds all instances for the
main query and its subqueries. Instead of plan nodes (Append or
MergeAppend) storing PartitionPruneInfo pointers, they now reference
an index in this list.
A bitmapset field is added to PartitionPruneInfo to store the RT
indexes corresponding to the apprelids field in Append or MergeAppend.
This allows execution pruning logic to verify that it operates on the
correct plan node, mainly to facilitate debugging.
Duplicated code in set_append_references() and
set_mergeappend_references() is refactored into a new function,
register_pruneinfo(). This updates RT indexes by applying rtoffet
and adds PartitionPruneInfo to the global list in PlannerGlobal.
By allowing pruning to be performed without traversing the plan tree,
this change lays the groundwork for runtime initial pruning to occur
independently of plan tree initialization.
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> (earlier version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a column is omitted in an INSERT, and there's no column default,
the code in preptlist.c generates a NULL Const to be inserted.
Furthermore, if the column is of a domain type, we wrap the Const
in CoerceToDomain, so as to throw a run-time error if the domain
has a NOT NULL constraint. That's fine as far as it goes, but
there are two problems:
1. We're being sloppy about the type/typmod that the Const is
labeled with. It really should have the domain's base type/typmod,
since it's the input to CoerceToDomain not the output. This can
result in coerce_to_domain inserting a useless length-coercion
function (useless because it's being applied to a null). The
coercion would typically get const-folded away later, but it'd
be better not to create it in the first place.
2. We're not applying expression preprocessing (specifically,
eval_const_expressions) to the resulting expression tree.
The planner's primary expression-preprocessing pass already happened,
so that means the length coercion step and CoerceToDomain node miss
preprocessing altogether.
This is at the least inefficient, since it means the length coercion
and CoerceToDomain will actually be executed for each inserted row,
though they could be const-folded away in most cases. Worse, it
seems possible that missing preprocessing for the length coercion
could result in an invalid plan (for example, due to failing to
perform default-function-argument insertion). I'm not aware of
any live bug of that sort with core datatypes, and it might be
unreachable for extension types as well because of restrictions of
CREATE CAST, but I'm not entirely convinced that it's unreachable.
Hence, it seems worth back-patching the fix (although I only went
back to v14, as the patch doesn't apply cleanly at all in v13).
There are several places in the rewriter that are building null
domain constants the same way as preptlist.c. While those are
before the planner and hence don't have any reachable bug, they're
still applying a length coercion that will be const-folded away
later, uselessly wasting cycles. Hence, make a utility routine
that all of these places can call to do it right.
Making this code more careful about the typmod assigned to the
generated NULL constant has visible but cosmetic effects on some
of the plans shown in contrib/postgres_fdw's regression tests.
Discussion: https://postgr.es/m/1865579.1738113656@sss.pgh.pa.us
Backpatch-through: 14
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should run the expression subtrees of PartitionedRelPruneInfo
structs through fix_scan_expr. Failure to do so means that
AlternativeSubPlans within those expressions won't be cleaned up
properly, resulting in "unrecognized node type" errors since v14.
It seems fairly likely that at least some of the other steps done
by fix_scan_expr are important here as well, resulting in as-yet-
undetected bugs. Therefore, I've chosen to back-patch this to
all supported branches including v13, even though the known
symptom doesn't manifest in v13.
Per bug #18778 from Alexander Lakhin.
Discussion: https://postgr.es/m/18778-24cd399df6c806af@postgresql.org
|