| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
Some places spelled it "it's", which is short for "it is".
In passing, fix a couple other nearby grammatical errors.
Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaAO8g1KJCV0T48=CkJMjAnnfTGLWOATz+2aCh40c2Nm+g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds four fields to the statistics of relations, aggregating
the amount of time spent for each operation on a relation:
- total_vacuum_time, for manual vacuum.
- total_autovacuum_time, for vacuum done by the autovacuum daemon.
- total_analyze_time, for manual analyze.
- total_autoanalyze_time, for analyze done by the autovacuum daemon.
This gives users the option to derive the average time spent for these
operations with the help of the related "count" fields.
Bump catalog version (for the catalog changes) and PGSTAT_FILE_FORMAT_ID
(for the additions in PgStat_StatTabEntry).
Author: Sami Imseih
Reviewed-by: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/CAA5RZ0uVOGBYmPEeGF2d1B_67tgNjKx_bKDuL+oUftuoz+=Y1g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The right mix of DDL and VACUUM could corrupt a catalog page header such
that PageIsVerified() durably fails, requiring a restore from backup.
This affects only catalogs that both have a syscache and have DDL code
that uses syscache tuples to construct updates. One of the test
permutations shows a variant not yet fixed.
This makes !TransactionIdIsValid(TM_FailureData.xmax) possible with
TM_Deleted. I think core and PGXN are indifferent to that.
Per bug #17821 from Alexander Lakhin. Back-patch to v13 (all supported
versions). The test case is v17+, since it uses INJECTION_POINT.
Discussion: https://postgr.es/m/17821-dd8c334263399284@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
| |
Assume twophase.c is the performance-sensitive caller, and preserve its
choice of unlikely() branch hint. Add some retrospective rationale for
that choice. Back-patch to v17, for the next commit to use it.
Reviewed (in earlier versions) by Michael Paquier.
Discussion: https://postgr.es/m/17821-dd8c334263399284@postgresql.org
Discussion: https://postgr.es/m/20250116010051.f3.nmisch@google.com
|
|
|
|
| |
Discussion: https://www.postgresql.org/message-id/flat/5b558da8-99fb-0a99-83dd-f72f05388517@enterprisedb.com
|
|
|
|
| |
Discussion: https://www.postgresql.org/message-id/flat/692ee0da-49da-4d32-8dca-da224cc2800e@eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
XLogPageRead() checks immediately for an invalid WAL record header on a
standby, to be able to handle the case of continuation records that need
to be read across two different sources. As written, the check was too
generic, applying to any target LSN. Based on an analysis by Kyotaro
Horiguchi, what really matters is to make sure that the page header is
checked when attempting to read a LSN at the boundary of a segment, to
handle the case of a continuation record that spawns across multiple
pages when dealing with multiple segments, as WAL receivers are spawned
they request WAL from the beginning of a segment. This fix has been
proposed by Kyotaro Horiguchi.
This could cause standbys to loop infinitely when dealing with a
continuation record during a timeline jump, in the case where the
contents of the record in the follow-up page are invalid.
Some regression tests are added to check such scenarios, able to
reproduce the original problem. In the test, the contents of a
continuation record are overwritten with junk zeros on its follow-up
page, and replayed on standbys. This is inspired by 039_end_of_wal.pl,
and is enough to show how standbys should react on promotion by not
being stuck. Without the fix, the test would fail with a timeout. The
test to reproduce the problem has been written by Alexander Kukushkin.
The original check has been introduced in 066871980183, for a similar
problem.
Author: Kyotaro Horiguchi, Alexander Kukushkin
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAFh8B=mozC+e1wGJq0H=0O65goZju+6ab5AU7DEWCSUA2OtwDg@mail.gmail.com
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit reverts 8f67f994e8ea (down to v13) and c3de0f9eed38 (down to
v17), as these are proving to not be completely correct regarding two
aspects:
- In v17 and newer branches, c3de0f9eed38's check for epoch handling is
incorrect, and does not correctly handle frozen epochs. A logic closer
to widen_snapshot_xid() should be used. The 2PC code should try to
integrate deeper with FullTransactionIds, 5a1dfde8334b being not enough.
- In v13 and newer branches, 8f67f994e8ea is a workaround for the real
issue, which is that we should not attempt CLOG lookups without reaching
consistency. This exists since 728bd991c3c4, and this is reachable with
ProcessTwoPhaseBuffer() called by restoreTwoPhaseData() at the beginning
of recovery.
Per discussion with Noah Misch.
Discussion: https://postgr.es/m/20250116010051.f3.nmisch@google.com
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the several members of HeapScanDescData which are specific to
Bitmap Heap Scans into a new struct, BitmapHeapScanDescData, which
inherits from HeapScanDescData.
This reduces the size of the HeapScanDescData for other types of scans
and will allow us to add additional bitmap heap scan-specific members in
the future without fear of bloating the HeapScanDescData.
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me
|
|
|
|
|
| |
Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp
execution.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add more comments at the top of vacuumlazy.c on heap relation vacuuming
implementation.
Previously vacuumlazy.c only had details related to dead TID storage.
This commit adds a more general summary to help future developers
understand the heap relation vacuum design and implementation at a high
level.
Reviewed-by: Alena Rybakina, Robert Haas, Andres Freund, Bilal Yavuz
Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes commit 7406ab623fe in that the gist strategy number
mapping support function is changed to use the CompareType enum as
input, instead of the "well-known" RT*StrategyNumber strategy numbers.
This is a bit cleaner, since you are not dealing with two sets of
strategy numbers. Also, this will enable us to subsume this system
into a more general system of using CompareType to define operator
semantics across index methods.
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function expects an "int64" as result and stores the number of
pages to add to the index scan bitmap as an "int", multiplying its final
result by 10. For a relation large enough, this can theoretically
overflow if counting more than (INT32_MAX / 10) pages, knowing that the
number of pages is upper-bounded by MaxBlockNumber.
To avoid the overflow, this commit redefines "totalpages", used to
calculate the result, to be an "int64" rather than an "int".
Reported-by: Evgeniy Gorbanyov
Author: James Hunter
Discussion: https://www.postgresql.org/message-id/07704817-6fa0-460c-b1cf-cd18f7647041@basealt.ru
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quite a bit of code within nbtutils.c is only called during nbtree
preprocessing. Move that code into a new .c file, nbtpreprocesskeys.c.
Also reorder some of the functions within the new file for clarity.
This commit has no functional impact. It is strictly mechanical.
Author: Peter Geoghegan <pg@bowt.ie>
Suggested-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAH2-WznwNn1BDOpWxHBUK1f3Rdw8pO9UCenWXnvT=n9GO8GnLA@mail.gmail.com
Discussion: https://postgr.es/m/86930045-5df5-494a-b4f1-815bc3fbcce0%40iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the NOT ENFORCED/ENFORCED flag for constraints,
with support for check constraints.
The plan is to eventually support this for foreign key constraints,
where it is typically more useful.
Note that CHECK constraints do not currently support ALTER operations,
so changing the enforceability of an existing constraint isn't
possible without dropping and recreating it. This could be added
later.
Author: Amul Sul <amul.sul@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Tested-by: Triveni N <triveni.n@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
6f3820f37 adjusted the assert-enabled validation of the CompactAttribute
to call a new external function to perform the validation. That commit
made it so the function was only available when building with
USE_ASSERT_CHECKING, and because TupleDescCompactAttr() is a static
inline function, the call to verify_compact_attribute() was compiled
into any extension which uses TupleDescCompactAttr(). This caused issues
for such extensions when loading the assert-enabled extension into
PostgreSQL versions without asserts enabled due to that function being
unavailable in core.
To fix this, make verify_compact_attribute() available unconditionally,
but make it do nothing unless building with USE_ASSERT_CHECKING.
Author: Andrew Kane <andrew@ankane.org>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAOdR5yHfMEMW00XGo=v1zCVUS6Huq2UehXdvKnwtXPTcZwXhmg@mail.gmail.com
|
|
|
|
|
|
| |
Oversight in commit 1bd4bc85, which made nbtree backwards scans operate
off of a copy of each page's left link as of the time of its call to
_bt_readpage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The originally submitted code (using bit masking) was correct when the
number of slots was restricted to be a power of two -- but that
limitation was removed during development that led to commit
53c2a97a9266, which made the bank selection code incorrect. This led to
always using a smaller number of banks than available. Change said code
to use integer modulo instead, which works correctly with an arbitrary
number of banks.
It's likely that we could improve on this to avoid runtime use of
integer division. But with this change we're, at least, not wasting
memory on unused banks, and more banks mean less contention, which is
likely to have a much higher performance impact than a single
instruction's latency.
Author: Yura Sokolov <y.sokolov@postgrespro.ru>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/9444dc46-ca47-43ed-9058-89c456316306@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move nbtree's detection of RowCompare quals that are unsatisfiable due
to having a NULL in their first row element: rather than detecting these
cases at the point where _bt_first builds its insertion scan key, do so
earlier, during preprocessing proper. This brings the RowCompare case
in line every other case involving an unsatisfiable-due-to-NULL qual.
nbtree now consistently detects such unsatisfiable quals -- even when
they happen to involve a key that isn't examined by _bt_first at all.
Affected cases thereby avoid useless full index scans that cannot
possibly return any matching rows.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzmySVXst2hFrOATC-zw1Byg1XC-jYUS314=mzuqsNwk+Q@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new structure relieves _bt_first from having separate calls to
_bt_start_array_keys for the serial case and parallel case. This saves
code, and seems clearer.
Follow-up to work from commits 4e6e375b and b5ee4e52.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=XjUZjBjHJdhTvuH5MwoJObWGoM2RG2LyFg5WUdWyk=A@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit introduces a new parameter named
autovacuum_worker_slots that controls how many autovacuum worker
slots to reserve during server startup. Modifying this new
parameter's value does require a server restart, but it should
typically be set to the upper bound of what you might realistically
need to set autovacuum_max_workers. With that new parameter in
place, autovacuum_max_workers can now be changed with a SIGHUP
(e.g., pg_ctl reload).
If autovacuum_max_workers is set higher than
autovacuum_worker_slots, a WARNING is emitted, and the server will
only start up to autovacuum_worker_slots workers at a given time.
If autovacuum_max_workers is set to a value less than the number of
currently-running autovacuum workers, the existing workers will
continue running, but no new workers will be started until the
number of running autovacuum workers drops below
autovacuum_max_workers.
Reviewed-by: Sami Imseih, Justin Pryzby, Robert Haas, Andres Freund, Yogesh Sharma
Discussion: https://postgr.es/m/20240410212344.GA1824549%40nathanxps13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, this was notionally used only for the entry point of the
tree and as a convenient parent for other contexts.
For shared memory, the creator previously allocated the entry point
in this context, but attaching backends didn't have access to that,
so they just used the caller's context. For the sake of consistency,
allocate every instance of an entry point in the caller's context.
For local memory, allocate the control object in the caller's context
as well. This commit also makes the "leaf context" the notional parent
of the child contexts used for nodes, so it's a bit of a misnomer,
but a future commit will make the node contexts independent rather
than children, so leave it this way for now to avoid code churn.
The memory context parameter for RT_CREATE is now unused in the case
of shared memory, so remove it and adjust callers to match.
In passing, remove unused "context" member from struct TidStore,
which seems to have been an oversight.
Reviewed by Masahiko Sawada
Discussion: https://postgr.es/m/CANWCAZZDCo4k5oURg_pPxM6+WZ1oiG=sqgjmQiELuyP0Vtrwig@mail.gmail.com
|
|
|
|
|
| |
Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/5812a0b9-b0cf-4151-9a14-d9f00e4f2858@gmail.com
|
|
|
|
| |
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the beginning of recovery, an orphaned two-phase file in an epoch
different than the one defined in the checkpoint record could not be
removed based on the assumptions that AdjustToFullTransactionId() relies
on, assuming that all files would be either from the current epoch or
from the previous epoch.
If the checkpoint epoch was 0 while the 2PC file was orphaned and in the
future, AdjustToFullTransactionId() would underflow the epoch used to
build the 2PC file path. In non-assert builds, this would create a
WARNING message referring to a 2PC file with an epoch of "FFFFFFFF" (or
UINT32_MAX), as an effect of the underflow calculation, leaving the
orphaned file around.
Some tests are added with dummy 2PC files in the past and the future,
checking that these are properly removed.
Issue introduced by 5a1dfde8334b, that has switched two-phase state
files to use FullTransactionIds.
Reported-by: Vitaly Davydov
Author: Michael Paquier
Reviewed-by: Vitaly Davydov
Discussion: https://postgr.es/m/13b5b6-676c3080-4d-531db900@47931709
Backpatch-through: 17
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before 728bd991c3c4, that has improved the support for 2PC files during
recovery, the initial logic scanning files in pg_twophase was done so as
files in the future of the transaction ID horizon were checked first,
followed by a check if a transaction ID is aborted or committed which
could involve a pg_xact lookup. After this commit, these checks have
been done in reverse order.
Files detected as in the future do not have a state that can be checked
in pg_xact, hence this caused recovery to fail abruptly should an
orphaned 2PC file in the future of the transaction ID horizon exist in
pg_twophase at the beginning of recovery.
A test is added to check for this scenario, using an empty 2PC with a
transaction ID large enough to be in the future when running the test.
This test is added in 16 and older versions for now. 17 and newer
versions are impacted by a second bug caused by the addition of the
epoch in the 2PC file names. An equivalent test will be added in these
branches in a follow-up commit, once the second set of issues reported
are fixed.
Author: Vitaly Davydov, Michael Paquier
Discussion: https://postgr.es/m/11e597-676ab680-8d-374f23c0@145466129
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
| |
Commit 34486b609 effectively redefined isBackgroundWorker as meaning
"not a regular backend", whereas before it had the narrower
meaning of AmBackgroundWorkerProcess(). For clarity, rename the
field to isRegularBackend and invert its sense.
Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cause parallel workers to not check datallowconn, rolcanlogin, and
ACL_CONNECT privileges. The leader already checked these things
(except for rolcanlogin which might have been checked for a different
role). Re-checking can accomplish little except to induce unexpected
failures in applications that might not even be aware that their query
has been parallelized. We already had the principle that parallel
workers rely on their leader to pass a valid set of authorization
information, so this change just extends that a bit further.
Also, modify the ReservedConnections, datconnlimit and rolconnlimit
logic so that these limits are only enforced against regular backends,
and only regular backends are counted while checking if the limits
were already reached. Previously, background processes that had an
assigned database or role were subject to these limits (with rather
random exclusions for autovac workers and walsenders), and the set of
existing processes that counted against each limit was quite haphazard
as well. The point of these limits, AFAICS, is to ensure the
availability of PGPROC slots for regular backends. Since all other
types of processes have their own separate pools of PGPROC slots, it
makes no sense either to enforce these limits against them or to count
them while enforcing the limit.
While edge-case failures of these sorts have been possible for a
long time, the problem got a good deal worse with commit 5a2fed911
(CVE-2024-10978), which caused parallel workers to make some of these
checks using the leader's current role where before we had used its
AuthenticatedUserId, thus allowing parallel queries to fail after
SET ROLE. The previous behavior was fairly accidental and I have
no desire to return to it.
This patch includes reverting 73c9f91a1, which was an emergency hack
to suppress these same checks in some cases. It wasn't complete,
as shown by a recent bug report from Laurenz Albe. We can also revert
fd4d93d26 and 492217301, which hacked around the same problems in one
regression test.
In passing, remove the special case for autovac workers in
CheckMyDatabase; it seems cleaner to have AutoVacWorkerMain pass
the INIT_PG_OVERRIDE_ALLOW_CONNS flag, now that that does what's
needed.
Like 5a2fed911, back-patch to supported branches (which sadly no
longer includes v12).
Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
|
|
|
|
| |
Oversight in commit 5bf748b86b.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Per git log, the last time someone tried to do something with
pgrminclude was around 2011. Many (not all) of the "pgrminclude
ignore" annotations are of a newer date but seem to have just been
copied around during refactorings and file moves and don't seem to
reflect an actual need anymore.
There have been some parallel experiments with include-what-you-use
(IWYU) annotations, but these don't seem to correspond very strongly
to pgrminclude annotations, so there is no value in keeping the
existing ones even for that kind of thing.
So, wipe them all away. We can always add new ones in the future
based on actual needs.
Discussion: https://www.postgresql.org/message-id/flat/2d4dc7b2-cb2e-49b1-b8ca-ba5f7024f05b%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
5983a4cff added CompactAttribute as an abbreviated alternative to
FormData_pg_attribute to allow more cache-friendly processing in tasks
related to TupleDescs. That commit contained some assert-only code to
check that the CompactAttribute had been populated correctly, however,
the method used to do that checking caused the TupleDesc's
CompactAttribute to be zeroed before it was repopulated and compared to
the snapshot taken before the memset call. This caused issues as the type
cache caches TupleDescs in shared memory which can be used by multiple
backend processes at the same time. There was a window of time between
the zero and repopulation of the CompactAttribute where another process
would mistakenly think that the CompactAttribute is invalid due to the
memset.
To fix this, instead of taking a snapshot of the CompactAttribute and
calling populate_compact_attribute() and comparing the snapshot to the
freshly populated TupleDesc's CompactAttribute, refactor things so we
can just populate a temporary CompactAttribute on the stack. This way
we don't touch the TupleDesc's memory.
Reported-by: Alexander Lakhin, SQLsmith
Discussion: https://postgr.es/m/ca3a256a-5d12-42db-aabe-a75a030d9fb9@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reset btpo_cycleid to 0 in btree_xlog_vacuum for consistency with
_bt_delitems_vacuum (the corresponding original execution code). This
makes things neater.
There might be some performance benefit to being consistent like this.
When btvacuumpage doesn't call _bt_delitems_vacuum, it can still
proactively reset btpo_cycleid to 0 via a separate hint-like update
mechanism (it does so whenever it sees that it isn't already set to 0).
And so it's possible that being consistent about resetting btpo_cycleid
like this will save work later on, after standby promotion: subsequent
VACUUMs won't need to clear btpo_cycleid using the hint-like update
mechanism as often as they otherwise would.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CAH2-Wz=+LDFxn9NZyEsCo8ifcyKt6+n-VLyygySEHgMz+oynqw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we convert CompactAttribute.attalign from a char, which is directly
derived from pg_attribute.attalign into a uint8, which stores the number
of bytes to align the column's value by in the tuple.
This allows tuple deformation and tuple size calculations to move away
from using the inefficient att_align_nominal() macro, which manually
checks each TYPALIGN_* char to translate that into the alignment bytes
for the given type. Effectively, this commit changes those to TYPEALIGN
calls, which are branchless and only perform some simple arithmetic with
some bit-twiddling.
The removed branches were often mispredicted by CPUs, especially so in
real-world tables which often contain a mishmash of different types
with different alignment requirements.
Author: David Rowley
Reviewed-by: Andres Freund, Victor Yegorov
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
28328ec87b45725 addressed one overflow danger in
SampleHeapTupleVisible() but introduced another, albeit a less likely
one. Modify the binary search code to remove this danger.
Reported-by: Richard Guo
Reviewed-by: Richard Guo, Ranier Vilela
Discussion: https://postgr.es/m/CAMbWs4_bE%2BNscChbKWzw6HZOipCUyXfA5133qvoXQ654D3B2gQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
The column is no longer needed as the offset is now cached in the
CompactAttribute struct per commit 5983a4cff.
Author: David Rowley
Reviewed-by: Andres Freund, Victor Yegorov
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new compact_attrs array stores a few select fields from
FormData_pg_attribute in a more compact way, using only 16 bytes per
column instead of the 104 bytes that FormData_pg_attribute uses. Using
CompactAttribute allows performance-critical operations such as tuple
deformation to be performed without looking at the FormData_pg_attribute
element in TupleDesc which means fewer cacheline accesses.
For some workloads, tuple deformation can be the most CPU intensive part
of processing the query. Some testing with 16 columns on a table
where the first column is variable length showed around a 10% increase in
transactions per second for an OLAP type query performing aggregation on
the 16th column. However, in certain cases, the increases were much
higher, up to ~25% on one AMD Zen4 machine.
This also makes pg_attribute.attcacheoff redundant. A follow-on commit
will remove it, thus shrinking the FormData_pg_attribute struct by 4
bytes.
Author: David Rowley
Reviewed-by: Andres Freund, Victor Yegorov
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
|
|
|
|
|
|
|
| |
b7493e1ab35 removed leftover mentions of XLOG_HEAP2_FREEZE_PAGE records
from comments but neglected to remove one mention of FREEZE_PAGE.
Reported off-list by Alexander Lakhin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consistently reset so->scanBehind at the beginning of nbtree array
advancement, even during sktrig_required=false calls (calls where array
advancement is triggered by an unsatisfied non-required array scan key).
Otherwise, it's possible for queries to fail to return all relevant
tuples to the scan given a low-order required scan key that was
previously deemed "satisfied" by a truncated high key attribute value.
This only happened at the point where a later non-required array scan
key needed to be "advanced" once on the next leaf page (that is, once
the right sibling of the truncated high key page was reached).
The underlying issue was that later code within _bt_advance_array_keys
assumed that the so->scanBehind flag must have been set using the
current page's high key (not the previous page's high key). Any later
successful recheck call to _bt_check_compare would therefore spuriously
be prevented from making _bt_advance_array_keys return true, based on
the faulty belief that the truncated attribute must be from the scan's
current tuple (i.e. the non-pivot tuple at the start of the next page).
_bt_advance_array_keys would return false for the tuple, ultimately
resulting in _bt_checkkeys failing to return a matching tuple.
Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp
execution.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkJKncfqyAUTeuB5GgRhT1vhsWO2q11dbZNqKmvjopP_g@mail.gmail.com
Backpatch: 17-, where commit 5bf748b8 first appears.
|
|
|
|
|
|
|
|
|
|
|
| |
f83d709760d merged the separate XLOG_HEAP2_FREEZE_PAGE records into a
new combined prune, freeze, and vacuum record with opcode
XLOG_HEAP2_PRUNE_VACUUM_SCAN. Remove the last few references to
XLOG_HEAP2_FREEZE_PAGE records which were accidentally left behind.
Reported-by: Tomas Vondra
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA%2BTgmoY1tYff-1CEn8kYt5FsOrynTbtr%3DUZw%3D7mTC1Hv1HpeBQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the repurposing of TBMIterator as an interface for both parallel
and serial iteration through TIDBitmaps in commit 7f9d4187e7bab10329cc,
bitmap table scans may now use it.
Modify bitmap table scan code to use the TBMIterator. This requires
moving around a bit of code, so a few variables are initialized
elsewhere.
Author: Melanie Plageman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add and use TBMPrivateIterator, which replaces the current TBMIterator
for serial use cases, and repurpose TBMIterator to be a unified
interface for both the serial ("private") and parallel ("shared") TID
Bitmap iterator interfaces. This encapsulation simplifies call sites for
callers supporting both parallel and serial TID Bitmap access.
TBMIterator is not yet used in this commit.
Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi
|
|
|
|
|
|
|
|
|
|
|
| |
68d9662be1c4b70 made HeapScanDesc->rs_ntuples unsigned but neglected to
change how it was being used in SampleHeapTupleVisible().
Return early if rs_ntuples is 0 to avoid overflowing and incorrectly
executing the loop code in SampleHeapTupleVisible().
Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAot_xQoZyPZjpj1aBUPrPykY5mOPHGyvfe%3Djz%2BWowdA3A%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HeapScanDescData.rs_cindex and rs_ntuples can't be less than 0. All scan
types using the heap scan descriptor expect these values to be >= 0.
Make that expectation clear by making rs_cindex and rs_ntuples unsigned.
Also remove the test in heapam_scan_bitmap_next_tuple() that checks if
rs_cindex < 0. This was never true, but now that rs_cindex is unsigned,
it makes even less sense.
While we are at it, initialize both rs_cindex and rs_ntuples to 0 in
initscan().
Author: Melanie Plageman
Reviewed-by: Dilip Kumar
Discussion: https://postgr.es/m/CAAKRu_ZxF8cDCM_BFi_L-t%3DRjdCZYP1usd1Gd45mjHfZxm0nZw%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Heap vacuum already counts and logs pages with newly frozen tuples. Now
count and log the number of pages newly set all-visible and all-frozen
in the visibility map.
Pages that are all-visible but not all-frozen are debt for future
aggressive vacuums. The counts of newly all-visible and all-frozen pages
give us insight into the rate at which this debt is being accrued and
paid down.
Author: Melanie Plageman
Reviewed-by: Masahiko Sawada, Alastair Turner, Nitin Jadhav, Andres Freund, Bilal Yavuz, Tomas Vondra
Discussion: https://postgr.es/m/flat/CAAKRu_ZQe26xdvAqo4weHLR%3DivQ8J4xrSfDDD8uXnh-O-6P6Lg%40mail.gmail.com#6d8d2b4219394f774889509bf3bdc13d,
https://postgr.es/m/ctdjzroezaxmiyah3gwbwm67defsrwj2b5fpfs4ku6msfpxeia%40mwjyqlhwr2wu
|
|
|
|
|
|
|
|
|
|
|
|
| |
It can be useful to know the state of a relation page's VM bits before
visibilitymap_set(). visibilitymap_set() has the old value on hand, so
returning it is simple. This commit does not use visibilitymap_set()'s
new return value.
Author: Melanie Plageman
Reviewed-by: Masahiko Sawada, Andres Freund, Nitin Jadhav, Bilal Yavuz
Discussion: https://postgr.es/m/flat/CAAKRu_ZQe26xdvAqo4weHLR%3DivQ8J4xrSfDDD8uXnh-O-6P6Lg%40mail.gmail.com#6d8d2b4219394f774889509bf3bdc13d,
https://postgr.es/m/ctdjzroezaxmiyah3gwbwm67defsrwj2b5fpfs4ku6msfpxeia%40mwjyqlhwr2wu
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename frozen_pages to new_frozen_tuple_pages in LVRelState, the struct
used for tracking state during vacuuming of a heap relation.
frozen_pages sounds like it tracks pages set all-frozen. That is a
misnomer. It only includes pages with at least one newly frozen tuple.
It also includes pages that are not all-frozen.
Author: Melanie Plageman
Reviewed-by: Andres Freund, Masahiko Sawada, Nitin Jadhav, Bilal Yavuz
Discussion: https://postgr.es/m/ctdjzroezaxmiyah3gwbwm67defsrwj2b5fpfs4ku6msfpxeia%40mwjyqlhwr2wu
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the 'whenTaken' and 'lsn' fields from SnapshotData. After the
removal of the "snapshot too old" feature, they were never set to a
non-zero value.
This largely reverts commit 3e2f3c2e423, which added the
OldestActiveSnapshot tracking, and the init_toast_snapshot()
function. That was only required for setting the 'whenTaken' and 'lsn'
fields. SnapshotToast is now a constant again, like SnapshotSelf and
SnapshotAny. I kept a thin get_toast_snapshot() wrapper around
SnapshotToast though, to check that you have a registered or active
snapshot. That's still a useful sanity check.
Reviewed-by: Nathan Bossart, Andres Freund, Tom Lane
Discussion: https://www.postgresql.org/message-id/cd4b4f8c-e63a-41c0-95f6-6e6cd9b83f6d@iki.fi
|
|
|
|
| |
The term "buffer context lock" is outdated as of commit 5d5087363d.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
parallel_vacuum_reset_dead_items used a local variable to hold a
pointer from the passed vacrel, purely as a shorthand. This pointer
was later freed and a new allocation was made and stored to the
struct. Then the local pointer was mistakenly referenced again.
This apparently happened not to break anything since the freed chunk
would have been put on the context's freelist, so it was accidentally
the same pointer anyway, in which case the DSA handle was correctly
updated. The minimal fix is to change two places so they access
dead_items through the vacrel. This coding style is a maintenance
hazard, so while at it get rid of most other similar usages, which
were inconsistently used anyway.
Analysis and patch by Vallimaharajan G, with further defensive coding
by me
Backpath to v17, when TidStore came in
Discussion: https://postgr.es/m/1936493cc38.68cb2ef27266.7456585136086197135@zohocorp.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 811af9786b introduced palloc() calls into systable_beginscan()
and systable_beginscan_ordered(). But there was no pfree(), as is the
usual style.
It turns out that an ANALYZE of a partitioned table can invoke many
thousand system table index scans, and this memory is not cleaned up
until the end of the command, so this can temporarily leak quite a bit
of memory. Maybe there are improvements to be made at a higher level
about this, but for now, insert a couple of corresponding pfree()
calls to fix this particular issue.
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://www.postgresql.org/message-id/Z0XTfIq5xUtbkiIh@pryzbyj2023
|