| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, there were three copies of this code, two of them
identical. There's no good reason for that.
This change is nice on its own, but the main motivation is the AIO patchset,
which needs to add extra checks the deduplicated code, which of course is
easier if there is only one version.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If PinLocalBuffer() were to modify the buf_state, the buf_state in
GetLocalVictimBuffer() would be out of date. Currently that does not happen,
as PinLocalBuffer() only modifies the buf_state if adjust_usagecount=true and
GetLocalVictimBuffer() passes false.
However, it's easy to make this not the case anymore - it cost me a few hours
to debug the consequences.
The minimal fix would be to just refetch the buf_state after after calling
PinLocalBuffer(), but the same danger exists in later parts of the
function. Instead, declare buf_state in the narrower scopes and re-read the
state in conditional branches. Besides being safer, it also fits well with
an upcoming set of cleanup patches that move the contents of the conditional
branches in GetLocalVictimBuffer() into helper functions.
I "broke" this in 794f2594479.
Arguably this should be backpatched, but as the relevant functions are not
exported and there is no actual misbehaviour, I chose to not backpatch, at
least for now.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the distance control heuristics simpler and more aggressive in
preparation for asynchronous I/O.
The v17 version of read_stream.c made a conservative choice to limit the
look-ahead distance when streaming sequential blocks, because it
couldn't benefit very much from looking ahead further yet. It had a
three-behavior model where only random I/O would rapidly increase the
look-ahead distance, to support read-ahead advice. Sequential I/O would
move it towards the io_combine_limit setting, just enough to build one
full-sized synchronous I/O at a time, and then expect kernel read-ahead
to avoid I/O stalls.
That already left I/O performance on the table with advice-based I/O
concurrency, since sequential blocks could be followed by random jumps,
eg with the proposed streaming Bitmap Heap Scan patch.
It is time to delete the cautious middle option and adjust the distance
based on recent I/O needs only, since asynchronous reads will need to be
started ahead of time whether random or sequential. It is still limited
by io_combine_limit, *_io_concurrency, buffer availability and
strategy ring size, as before.
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Tested-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
read_stream.c tries not to issue read-ahead advice when it thinks the
kernel's own read-ahead should be active, ie when using buffered I/O and
reading sequential blocks. It previously gave up too easily, and issued
advice only for the first read of up to io_combine_limit blocks in a
larger range of sequential blocks after random jump. The following read
could suffer an avoidable I/O stall.
Fix, by continuing to issue advice until the corresponding preadv()
calls catch up with the start of the region we're currently issuing
advice for, if ever. That's when the kernel actually sees the
sequential pattern. Advice is now disabled only when the stream is
entirely sequential as far as we can see in the look-ahead window, or
in other words, when a sequential region is larger than we can cover
with the current io_concurrency and io_combine_limit settings.
While refactoring the advice control logic, also get rid of the
"suppress_advice" argument that was passed around between functions to
skip useless posix_fadvise() calls immediately followed by preadv().
read_stream_start_pending_read() can figure that out, so let's
concentrate knowledge of advice heuristics in fewer places (our goal
being to make advice-based I/O concurrency a legacy mode soon).
The problem cases were revealed by Tomas Vondra's extensive regression
testing with many different disk access patterns using Melanie
Plageman's streaming Bitmap Heap Scan patch, in a battle against the
venerable always-issue-advice-and-always-one-block-at-a-time code.
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Tomas Vondra <tomas@vondra.me>
Reported-by: Andres Freund <andres@anarazel.de>
Tested-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2BhUKGJ3HSWciQCz8ekP1Zn7N213RfA4nbuotQawfpq23%2Bw-5Q%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit introduces a new GUC, log_lock_failure, which controls whether
a detailed log message is produced when a lock acquisition fails. Currently,
it only supports logging lock failures caused by SELECT ... NOWAIT.
The log message includes information about all processes holding or
waiting for the lock that couldn't be acquired, helping users analyze and
diagnose the causes of lock failures.
Currently, this option does not log failures from SELECT ... SKIP LOCKED,
as that could generate excessive log messages if many locks are skipped,
causing unnecessary noise.
This mechanism can be extended in the future to support for logging
lock failures from other commands, such as LOCK TABLE ... NOWAIT.
Author: Yuki Seino <seinoyu@oss.nttdata.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/411280a186cc26ef7034e0f2dfe54131@oss.nttdata.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit improves efficiency in FastPathTransferRelationLocks()
and GetLockConflicts(), which iterate over PGPROCs to search for
fast-path locks.
Previously, these functions recalculated the fast-path group during
every loop iteration, even though it remained constant. This update
optimizes the process by calculating the group once and reusing it
throughout the loop.
The functions also now skip empty fast-path groups, avoiding
unnecessary scans of their slots. Additionally, groups belonging to
inactive backends (with pid=0) are always empty, so checking
the group is sufficient to bypass these backends, further enhancing
performance.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/07d5fd6a-71f1-4ce8-8602-4cc6883f4bd1@oss.nttdata.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To avoid pinning too much of the buffer pool at once, read_stream.c
previously used LimitAdditionalPins(). The coding was naive, and only
considered the available buffers at stream construction time.
This commit checks before each StartReadBuffers() call with
GetAdditionalPinLimit(). The result might change over time due to pins
acquired outside this stream by the same backend. No extra CPU cycles
are added to the all-buffered fast-path code, but the I/O-starting path
now considers the up-to-date remaining buffer limit.
In practice it was quite difficult to exceed limits and cause any real
problems in v17, so no back-patch for now, but proposed changes will
make it easier.
Per code review from Andres, in the course of testing his AIO patches.
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the support functions assumed that the caller needed one pin
to make progress, and could optionally use some more, allowing enough
for every connection to do the same. Add a couple more functions for
callers that want to know:
* what the maximum possible number could be, irrespective of currently
held pins, for space planning purposes
* how many additional pins they could acquire right now, without the
special case allowing one pin, for callers that already hold pins and
could already make progress even if no extra pins are available
The pin limit logic began in commit 31966b15. This refactoring is
better suited to read_stream.c, which will be adjusted to respect the
remaining limit as it changes over time in a follow-up commit. It also
computes MaxProportionalPins up front, to avoid performing divisions
whenever a caller needs to check the balance.
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
We can make the output look a bit better by aligning each lock's
definition, so add some padding space to achieve that. This change
makes no practical difference, but casual onlookers will be less
distracted by (lack of) whitespace.
Author: Gurjeet Singh <gurjeet@singh.im>
Discussion: https://postgr.es/m/CABwTF4VxfwDtRV-H22_XK4XeDogaV-Vaobu+af5U=8ZAZn9ZZQ@mail.gmail.com
|
|
|
|
| |
Introduced in commit a68159ff2b3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a couple of places, read_stream.c assumed that io_combine_limit would
be stable during the lifetime of a stream. That is not true in at least
one unusual case: streams held by CURSORs where you could change the GUC
between FETCH commands, with unpredictable results.
Fix, by storing stream->io_combine_limit and referring only to that
after construction. This mirrors the treatment of the other important
setting {effective,maintenance}_io_concurrency, which is stored in
stream->max_ios.
One of the cases was the queue overflow space, which was sized for
io_combine_limit and could be overrun if the GUC was increased. Since
that coding was a little hard to follow, also introduce a variable for
better readability instead of open-coding the arithmetic. Doing so
revealed an off-by-one thinko while clamping max_pinned_buffers to
INT16_MAX, though that wasn't a live bug due to the current limits on
GUC values.
Back-patch to 17.
Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Teach parallel nbtree index scans to use an LWLock (not a spinlock) to
protect the scan's shared descriptor state.
Preparation for an upcoming patch that will add skip scan optimizations
to nbtree. That patch will create the need to occasionally allocate
memory while the scan descriptor is locked, while copying datums that
were serialized by another backend.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
latch.c now only contains the Latch related functions, which build on
the WaitEventSet abstraction. Most of the platform-dependent stuff is
now in waiteventset.c.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/8a507fb6-df28-49d3-81a5-ede180d7f0fb@iki.fi
|
|
|
|
|
|
|
|
|
|
| |
This is in preparation for splitting WaitEventSet related functions to
a separate source file. That will hide the details of WaitEventSet
from WaitLatch, so it must use an exposed function instead of
modifying WaitEventSet->exit_on_postmaster_death directly.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/8a507fb6-df28-49d3-81a5-ede180d7f0fb@iki.fi
|
|
|
|
|
|
|
|
|
| |
The only caller was removed in commit 80a8f95b3b. I don't foresee
needing it any time soon, and I'm working on some big changes in this
area, so let's remove it out of the way.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/8a507fb6-df28-49d3-81a5-ede180d7f0fb@iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The FP_LOCK_SLOTS_PER_BACKEND macro looks like a constant, but it
depends on the max_locks_per_transaction GUC, and thus can change. This
is non-obvious and confusing, so make it look more like a function by
renaming it to FastPathLockSlotsPerBackend().
While at it, use the macro when initializing fast-path shared memory,
instead of using the formula.
Reported-by: Andres Freund
Discussion: https://postgr.es/m/ffiwtzc6vedo6wb4gbwelon5nefqg675t5c7an2ta7pcz646cg%40qwmkdb3l4ett
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the internal queue of buffers was capped at max_ios * 4,
though not less than io_combine_limit, at allocation time. That was
done in the first version based on conservative theories about resource
usage and heuristics pending later work. The configured I/O depth could
not always be reached with dense random streams generated by ANALYZE,
VACUUM, the proposed Bitmap Heap Scan patch, and also sequential streams
with the proposed AIO subsystem to name some examples.
The new formula is (max_ios + 1) * io_combine_limit, enough buffers for
the full configured I/O concurrency level using the full configured I/O
combine size, plus the buffers from one finished but not yet consumed
full-sized I/O. Significantly more memory would be needed for high GUC
values if the client code requests a large per-buffer data size, but
that is discouraged (existing and proposed stream users try to keep it
under a few words, if not zero).
With this new formula, an intermediate variable could have overflowed
under maximum GUC values, so its data type is adjusted to cope.
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
9d9b9d46f3c5 has added spinlocks to protect the fields in ProcSignal
flags, introducing a code path in ProcSignalInit() where a spinlock
could be released twice if the pss_pid field of a ProcSignalSlot is
found as already set. Multiple spinlock releases have no effect with
most spinlock implementations, but this could cause the code to run into
issues when the spinlock is acquired concurrently by a different
process.
This sanity check on pss_pid generates a LOG that can be delayed until
after the spinlock is released as, like older versions up to v17, the
code expects the initialization of the ProcSignalSlot to happen even if
pss_pid is found incorrect. The code is changed so as the old pss_pid
is read while holding the slot's spinlock, with the LOG from the sanity
check generated after releasing the spinlock, preventing the double
release.
Author: Maksim Melnikov <m.melnikov@postgrespro.ru>
Co-authored-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/dca47527-2d8b-4e3b-b5a0-e2deb73371a4@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit is a rework of 2421e9a51d20, about which Andres Freund has
raised some concerns as it is valuable to have both track_io_timing and
track_wal_io_timing in some cases, as the WAL write and fsync paths can
be a major bottleneck for some workloads. Hence, it can be relevant to
not calculate the WAL timings in environments where pg_test_timing
performs poorly while capturing some IO data under track_io_timing for
the non-WAL IO paths. The opposite can be also true: it should be
possible to disable the non-WAL timings and enable the WAL timings (the
previous GUC setups allowed this possibility).
track_wal_io_timing is added back in this commit, controlling if WAL
timings should be calculated in pg_stat_io for the read, fsync and write
paths, as done previously with pg_stat_wal. pg_stat_wal previously
tracked only the sync and write parts (now removed), read stats is new
data tracked in pg_stat_io, all three are aggregated if
track_wal_io_timing is enabled. The read part matters during recovery
or if a XLogReader is used.
Extra note: more control over if the types of timings calculated in
pg_stat_io could be done with a GUC that lists pairs of (IOObject,IOOp).
Reported-by: Andres Freund <andres@anarazel.de>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3opf2wh2oljco6ldyqf7ukabw3jijnnhno6fjb4mlu6civ5h24@fcwmhsgmlmzu
|
|
|
|
|
|
|
|
|
| |
This basically mirrors the changes done in the predecessor commit. While there
isn't currently a need to get these paths in critical sections, it seems a
shame to unnecessarily allocate memory in these paths now that relpath()
doesn't allocate anymore.
Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For AIO, and also some other recent patches, we need the ability to call
relpath() in a critical section. Until now that was not feasible, as it
allocated memory.
The fact that relpath() allocated memory also made it awkward to use in log
messages because we had to take care to free the memory afterwards. Which we
e.g. didn't do for when zeroing out an invalid buffer.
We discussed other solutions, e.g. filling a pre-allocated buffer that's
passed to relpath(), but they all came with plenty downsides or were larger
projects. The easiest fix seems to be to make relpath() return the path by
value.
To be able to return the path by value we need to determine the maximum length
of a relation path. This patch adds a long #define that computes the exact
maximum, which is verified to be correct in a regression test.
As this change the signature of relpath(), extensions using it will need to
adapt their code. We discussed leaving a backward-compat shim in place, but
decided it's not worth it given the use of relpath() doesn't seem widespread.
Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
|
|
|
|
|
|
|
| |
So far the various dependencies were documented in the comment above
MAX_BACKENDS, but not checked.
Discussion: https://postgr.es/m/CA+COZaBO_s3LfALq=b+HcBHFSOEGiApVjrRacCe4VP9m7CJsNQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Jacob reported that comments for LW_SHARED_MASK referenced a MAX_BACKENDS
limit of 2^23-1, but that MAX_BACKENDS is actually limited to 2^18-1. The
limit was lowered in 48354581a49c, but the comment in lwlock.c wasn't updated.
Instead of just fixing the comment, it seems better to directly base the
lwlock defines on MAX_BACKENDS and add static assertions to ensure that there
is enough space. That way there's no comment that can go out of sync in the
future.
As part of that change I noticed that for some reason the high bit wasn't used
for flags, which seems somewhat odd. Redefine the flag values to start at the
highest bit.
Reported-by: Jacob Brazeal <jacob.brazeal@gmail.com>
Reviewed-by: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaBO_s3LfALq=b+HcBHFSOEGiApVjrRacCe4VP9m7CJsNQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MAX_BACKENDS influences many things besides postmaster. I e.g. noticed that we
don't have static assertions ensuring BUF_REFCOUNT_MASK is big enough for
MAX_BACKENDS, adding them would require including postmaster.h in
buf_internals.h which doesn't seem right.
While at that, add MAX_BACKENDS_BITS, as that's useful in various places for
static assertions (to be added in subsequent commits).
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/wptizm4qt6yikgm2pt52xzyv6ycmqiutloyvypvmagn7xvqkce@d4xuv3mylpg4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The four following attributes are removed from pg_stat_wal:
* wal_write
* wal_sync
* wal_write_time
* wal_sync_time
a051e71e28a1 has added an equivalent of this information in pg_stat_io
with more granularity as this now spreads across the backend types, IO
context and IO objects. So, keeping the same information in pg_stat_wal
has little benefits.
Another benefit of this commit is the removal of PendingWalStats,
simplifying an upcoming patch to add per-backend WAL statistics, which
already support IO statistics and which have access to the write/sync
stats data of WAL.
The GUC track_wal_io_timing, that was used to enable or disable the
aggregation of the write and sync timings for WAL, is also removed.
pgstat_prepare_io_time() is simplified.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To implement AIO writes, the backend initiating writes needs to transfer the
lock ownership to the AIO subsystem, so the lock held during the write can be
released in another backend.
Other backends need to be able to "complete" an asynchronously started IO to
avoid deadlocks (consider e.g. one backend starting IO for a buffer and then
waiting for a heavyweight lock held by another relation followed by the
current holder of the heavyweight lock waiting for the IO to complete).
To that end, this commit adds LWLockDisown() and LWLockReleaseDisowned(). If
code uses LWLockDisown() it's the code's responsibility to ensure that the
lock is released in case of errors.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi
|
|
|
|
|
|
|
|
| |
Remove a number of (char *) casts that are unnecessary. Or in some
cases, rewrite the code to make the purpose of the cast clearer.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When considering a local buffer, the GetBufferDescriptor() call in
BufferGetLSNAtomic() would be retrieving a shared buffer with a bad
buffer ID. Since the code checks whether the buffer is shared before
using the retrieved BufferDesc, this issue did not lead to any
malfunction. Nonetheless this seems like trouble waiting to happen,
so fix it by ensuring that GetBufferDescriptor() is only called when
we know the buffer is shared.
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAHewXNku-o46-9cmUgyv6LkSZ25doDrWq32p=oz9kfD8ovVJMg@mail.gmail.com
Backpatch-through: 13
|
|
|
|
|
|
| |
Builds configured with Valgrind but without assertions would fail due to
a typo in the recent change. This should be included when back-patching
2a8a0067 into v17.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By calling wipe_mem() on per-buffer data memory that has been released,
we are also telling Valgrind that the memory is "noaccess". We need to
set it to "undefined" before giving it to the registered callback to
fill in, when a slot is reused.
As discovered by build farm animal skink when the VACUUM streamification
patches landed (the first users of per-buffer data).
Pushing to master only for now, to clear the error on skink. It's also
possible that external code might discover the per-buffer data feature
in v17, and reasonable to expect Valgrind not to produce spurious
memcheck reports, but the back-patch is deferred until after the
imminent minor release is out of the way.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bg6aXpi2FEHqeLOzE%2BxYw%3DOV%2B-N5jhOEnnV%2BF0USM9xA%40mail.gmail.com
|
|
|
|
|
|
| |
Commit 755a4c10d19d prevented StartReadBuffers() from crossing md.c
segment boundaries in one operation, but a comment about that
possibility remained.
|
|
|
|
|
|
|
|
| |
Remove (char *) casts no longer needed after XLogRegisterData() and
XLogRegisterBufData() argument type change.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
| |
Remove some (char *) casts related to uses of the pg_checksum_page()
function. These casts are useless, because everything involved
already has the right type. Moreover, these casts actually silently
discarded a const qualifier. The declaration of a higher-level
function needs to be adjusted to fix that.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
| |
Remove (char *) casts around memory functions such as memcmp(),
memcpy(), or memset() where the cast is useless. Since these
functions don't take char * arguments anyway, these casts are at best
complicated casts to (void *), about which see commit 7f798aca1d5.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
| |
The top comment of StrategySyncStart() mentions BufferSync(), but this
function calls BgBufferSync(), not BufferSync().
Oversight in 9cd00c457e6a.
Author: Ashutosh Bapat
Discussion: https://postgr.es/m/CAExHW5tgkjag8i-s=RFrCn5KAWDrC4zEPPkfUKczfccPOxBRQQ@mail.gmail.com
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These functions tried to check the state of the buffer's content lock
even for local buffers. Since we don't use the content lock for a
local buffer, that would lead to a "false" result from
LWLockHeldByMeInMode, which would mean a misleading "false" answer
from BufferIsExclusiveLocked (we'd rather that case always return
"true") or an assertion failure in BufferIsDirty.
The core code never applies these two functions to local buffers,
and apparently no extensions do either, since we've not heard
complaints. Still, in the name of future-proofing, let's fix
them to act as though a pinned local buffer is content-locked.
Author: Srinath Reddy <srinath2133@gmail.com>
Discussion: https://postgr.es/m/19396ef77f8.1098c4a1810508.2255483659262451647@zohocorp.com
|
|
|
|
|
|
|
|
| |
Some places spelled it "it's", which is short for "it is".
In passing, fix a couple other nearby grammatical errors.
Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaAO8g1KJCV0T48=CkJMjAnnfTGLWOATz+2aCh40c2Nm+g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sizeof() call should reference buffer.data, because that's the
buffer we're reading data into, not the whole PGAlignedBuffer union.
This was introduced by 44cac93464, which replaced the simple buffer
with a PGAlignedBuffer field.
It's benign, because the buffer is the largest field of the union, so
the sizes are the same. But it's easy to trip over this in a patch, so
fix and backpatch. Commit 44cac93464 went into 12, but that's EOL.
Backpatch-through: 13
Discussion: https://postgr.es/m/928bdab1-6567-449f-98c4-339cd2203b87@vondra.me
|
|
|
|
|
|
|
|
|
|
| |
This makes use of the new PageData type.
PageGetSpecialPointer() had to be turned back into a macro, because it
is used in a way that sometimes it takes const and returns const and
sometimes takes non-const and returns non-const.
Discussion: https://www.postgresql.org/message-id/flat/692ee0da-49da-4d32-8dca-da224cc2800e@eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a WaitEventSetWait() caller asks for multiple events, an already set
latch would previously prevent other events from being reported at the
same time. Now, we'll also poll the kernel for other events that would
fit in the caller's output buffer with a zero wait time. This policy
change doesn't affect callers that ask for only one event.
The main caller affected is the postmaster. If its latch is set
extremely frequently by backends launching workers and workers exiting,
we don't want it to handle only those jobs and ignore incoming client
connections.
Back-patch to 16 where the postmaster began using the API. The
fast-return policy changed here is older than that, but doesn't cause
any known problems in earlier releases.
Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/Z1n5UpAiGDmFcMmd%40nathan
|
|
|
|
|
|
|
|
|
|
| |
These have been #ifdef'd out for a long time, and in fact have
been uncompilable since commit 48354581a of 2016-04-10. The
fact that nobody noticed for so long demonstrates their lack of
usefulness, so let's remove them rather than fix them.
Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaB+9CN_f63PPRoVhHjYmCwwmb_9CWLxqCJdMWDqs1a-JA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently in pg_stat_io view, IOs are counted as blocks of size
BLCKSZ. There are two limitations with this design:
* The actual number of I/O requests sent to the kernel is lower because
I/O requests may be merged before being sent. Additionally, it gives
the impression that all I/Os are done in block size, which shadows the
benefits of merging I/O requests.
* Some patches are under work to extend pg_stat_io for the tracking of
operations that may not be linked to the block size. For example, WAL
read IOs are done in variable bytes and it is not possible to correctly
show these IOs in pg_stat_io view, and we want to keep all this data in
a single system view rather than spread it across multiple relations to
ease monitoring.
WaitReadBuffers() can now be tracked as a single read operation
worth N blocks. Same for ExtendBufferedRelShared() and
ExtendBufferedRelLocal() for extensions.
Three columns are added to pg_stat_io for reads, writes and extensions
for the byte calculations. op_bytes, which was always hardcoded to
BLCKSZ, is removed. IO backend statistics are updated to reflect these
changes.
Bump catalog version.
Author: Nazir Bilal Yavuz
Reviewed-by: Bertrand Drouvot, Melanie Plageman
Discussion: https://postgr.es/m/CAN55FZ0oqxBaaHAEsj=xFqkzE3n5P=3RA1V_igXwL-RV7QRzyw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pgstat_count_io_op() function, which counts a single I/O operation,
wraps pgstat_count_io_op_n() with a counter value of 1. The latter is
declared in pgstat.h and used nowhere in the code, so let's remove it in
favor of the former.
This change makes also the code more symmetric with
pgstat_count_io_op_time(), that already uses a similar set of arguments,
except that it counts also the I/O time. This will ease a bit the
integration of a follow-up patch that adds byte-level tracking in
pg_stat_io for some of its attributes, lifting the current restriction
based on BLCKSZ as all I/O operations are assumed to be block-based.
Author: Nazir Bilal Yavuz
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/CAN55FZ32ze812=yjyZg1QeXhKvACUM_Nu0_gyPQcUKKuVHL5xA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit introduces a new parameter named
autovacuum_worker_slots that controls how many autovacuum worker
slots to reserve during server startup. Modifying this new
parameter's value does require a server restart, but it should
typically be set to the upper bound of what you might realistically
need to set autovacuum_max_workers. With that new parameter in
place, autovacuum_max_workers can now be changed with a SIGHUP
(e.g., pg_ctl reload).
If autovacuum_max_workers is set higher than
autovacuum_worker_slots, a WARNING is emitted, and the server will
only start up to autovacuum_worker_slots workers at a given time.
If autovacuum_max_workers is set to a value less than the number of
currently-running autovacuum workers, the existing workers will
continue running, but no new workers will be started until the
number of running autovacuum workers drops below
autovacuum_max_workers.
Reviewed-by: Sami Imseih, Justin Pryzby, Robert Haas, Andres Freund, Yogesh Sharma
Discussion: https://postgr.es/m/20240410212344.GA1824549%40nathanxps13
|
|
|
|
|
| |
Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/5812a0b9-b0cf-4151-9a14-d9f00e4f2858@gmail.com
|
|
|
|
| |
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
| |
Commit 34486b609 effectively redefined isBackgroundWorker as meaning
"not a regular backend", whereas before it had the narrower
meaning of AmBackgroundWorkerProcess(). For clarity, rename the
field to isRegularBackend and invert its sense.
Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cause parallel workers to not check datallowconn, rolcanlogin, and
ACL_CONNECT privileges. The leader already checked these things
(except for rolcanlogin which might have been checked for a different
role). Re-checking can accomplish little except to induce unexpected
failures in applications that might not even be aware that their query
has been parallelized. We already had the principle that parallel
workers rely on their leader to pass a valid set of authorization
information, so this change just extends that a bit further.
Also, modify the ReservedConnections, datconnlimit and rolconnlimit
logic so that these limits are only enforced against regular backends,
and only regular backends are counted while checking if the limits
were already reached. Previously, background processes that had an
assigned database or role were subject to these limits (with rather
random exclusions for autovac workers and walsenders), and the set of
existing processes that counted against each limit was quite haphazard
as well. The point of these limits, AFAICS, is to ensure the
availability of PGPROC slots for regular backends. Since all other
types of processes have their own separate pools of PGPROC slots, it
makes no sense either to enforce these limits against them or to count
them while enforcing the limit.
While edge-case failures of these sorts have been possible for a
long time, the problem got a good deal worse with commit 5a2fed911
(CVE-2024-10978), which caused parallel workers to make some of these
checks using the leader's current role where before we had used its
AuthenticatedUserId, thus allowing parallel queries to fail after
SET ROLE. The previous behavior was fairly accidental and I have
no desire to return to it.
This patch includes reverting 73c9f91a1, which was an emergency hack
to suppress these same checks in some cases. It wasn't complete,
as shown by a recent bug report from Laurenz Albe. We can also revert
fd4d93d26 and 492217301, which hacked around the same problems in one
regression test.
In passing, remove the special case for autovac workers in
CheckMyDatabase; it seems cleaner to have AutoVacWorkerMain pass
the INIT_PG_OVERRIDE_ALLOW_CONNS flag, now that that does what's
needed.
Like 5a2fed911, back-patch to supported branches (which sadly no
longer includes v12).
Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The need for this was missed in commit 93db6cbda, with the result
being that if we launch a slotsync worker it would consume one of
the PGPROCs in the max_connections pool. That could lead to inability
to launch the worker, or to subsequent failures of connection requests
that should have succeeded according to the configured settings.
Rather than create some one-off infrastructure to support this,
let's group the slotsync worker with the existing autovac launcher
in a new category of "special worker" processes. These are kind of
like auxiliary processes, but they cannot use that infrastructure
because they need to be able to run transactions.
For the moment, make these processes share the PGPROC freelist
used for autovac workers (which previously supplied the autovac
launcher too). This is partly to avoid an ABI change in v17,
and partly because it seems silly to have a freelist with
at most two members. This might be worth revisiting if we grow
enough workers in this category.
Tom Lane and Hou Zhijie. Back-patch to v17.
Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RelationTruncate() does three things, while holding an
AccessExclusiveLock and preventing checkpoints:
1. Logs the truncation.
2. Drops buffers, even if they're dirty.
3. Truncates some number of files.
Step 2 could previously be canceled if it had to wait for I/O, and step
3 could and still can fail in file APIs. All orderings of these
operations have data corruption hazards if interrupted, so we can't give
up until the whole operation is done. When dirty pages were discarded
but the corresponding blocks were left on disk due to ERROR, old page
versions could come back from disk, reviving deleted data (see
pgsql-bugs #18146 and several like it). When primary and standby were
allowed to disagree on relation size, standbys could panic (see
pgsql-bugs #18426) or revive data unknown to visibility management on
the primary (theorized).
Changes:
* WAL is now unconditionally flushed first
* smgrtruncate() is now called in a critical section, preventing
interrupts and causing PANIC on file API failure
* smgrtruncate() has a new parameter for existing fork sizes,
because it can't call smgrnblocks() itself inside a critical section
The changes apply to RelationTruncate(), smgr_redo() and
pg_truncate_visibility_map(). That last is also brought up to date with
other evolutions of the truncation protocol.
The VACUUM FileTruncate() failure mode had been discussed in older
reports than the ones referenced below, with independent analysis from
many people, but earlier theories on how to fix it were too complicated
to back-patch. The more recently invented cancellation bug was
diagnosed by Alexander Lakhin. Other corruption scenarios were spotted
by me while iterating on this patch and earlier commit 75818b3a.
Back-patch to all supported releases.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reported-by: rootcause000@gmail.com
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/18146-04e908c662113ad5%40postgresql.org
Discussion: https://postgr.es/m/18426-2d18da6586f152d6%40postgresql.org
|