diff options
author | Peter Geoghegan <pg@bowt.ie> | 2024-11-08 13:10:10 -0500 |
---|---|---|
committer | Peter Geoghegan <pg@bowt.ie> | 2024-11-08 13:10:10 -0500 |
commit | b5ee4e52026b03b19ba47cde27bed9cf740576cc (patch) | |
tree | ba868ef9228a936cd44f84e008cb8e674e3c5806 /src/backend/access/nbtree/nbtutils.c | |
parent | 14e87ffa5c543b5f30ead7413084c25f7735039f (diff) | |
download | postgresql-b5ee4e52026b03b19ba47cde27bed9cf740576cc.tar.gz postgresql-b5ee4e52026b03b19ba47cde27bed9cf740576cc.zip |
Avoid nbtree parallel scan currPos confusion.
Commit 1bd4bc85, which refactored nbtree sibling link traversal, made
_bt_parallel_seize reset the scan's currPos so that things were
consistent with the state of a serial backend moving between pages.
This overlooked the fact that _bt_readnextpage relied on the existing
currPos state to decide when to end the scan -- even though it came from
before the scan was seized. As a result of all this, parallel nbtree
scans could needlessly behave like full index scans.
To fix, teach _bt_readnextpage to explicitly allow the use of an already
read page's so->currPos when deciding whether to end the scan -- even
during parallel index scans (allow it consistently now). This requires
moving _bt_readnextpage's seizure of the scan to earlier in its loop.
That way _bt_readnextpage either deals with the true so->currPos state,
or an initialized-by-_bt_parallel_seize currPos state set from when the
scan was seized. Now _bt_steppage (the most important _bt_readnextpage
caller) takes the same uniform approach to setting up its call using
details taken from so->currPos -- regardless of whether the scan happens
to be parallel or serial.
The new loop structure in _bt_readnextpage is prone to getting confused
by P_NONE blknos set when the rightmost or leftmost page was reached.
We could avoid that by adding an explicit check, but that would be ugly.
Avoid this problem by teaching _bt_parallel_seize to end the parallel
scan instead of returning a P_NONE next block/blkno. Doing things this
way was arguably a missed opportunity for commit 1bd4bc85. It allows us
to remove a similar "blkno == P_NONE" check from _bt_first.
Oversight in commit 1bd4bc85, which refactored sibling link traversal
(as part of optimizing nbtree backward scan locking).
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Diagnosed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Discussion: https://postgr.es/m/f8efb9c0f8d1a71b44fd7f8e42e49c25@oss.nttdata.com
Diffstat (limited to 'src/backend/access/nbtree/nbtutils.c')
-rw-r--r-- | src/backend/access/nbtree/nbtutils.c | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c index 1b75066fb51..d7603250279 100644 --- a/src/backend/access/nbtree/nbtutils.c +++ b/src/backend/access/nbtree/nbtutils.c @@ -2402,6 +2402,8 @@ _bt_advance_array_keys(IndexScanDesc scan, BTReadPageState *pstate, new_prim_scan: + Assert(pstate->finaltup); /* not on rightmost/leftmost page */ + /* * End this primitive index scan, but schedule another. * |