| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
This can result in buffers failing to be properly flushed at
checkpoint time, leading to data loss.
Report, diagnosis, and patch by Jeff Davis.
|
|
|
|
|
|
|
|
| |
Give the correct name of the GUC parameter being complained of.
Also, emit a more suitable SQLSTATE (INVALID_PARAMETER_VALUE,
not the default INTERNAL_ERROR).
Gurjeet Singh, errcode adjustment by me
|
|
|
|
| |
Dave Kerr, backpatched by Simon Riggs
|
|
|
|
|
|
|
| |
This bug was introduced by commit 20d98ab6e4110087d1816cd105a40fcc8ce0a307,
so backpatch this to 9.0-9.2 like that one.
This fixes bug #6710, reported by Tarvi Pillessaar
|
|
|
|
|
|
|
|
|
| |
WALSender now woken up after each background flush by WALwriter, avoiding
multi-second replication delay for an all-async commit workload.
Replication delay reduced from 7s with default settings to 200ms, allowing
significantly reduced data loss at failover.
Andres Freund and Simon Riggs
|
|
|
|
|
|
|
|
| |
Per discussion, it does not seem like a good idea to change the behavior of
age(xid) in a minor release, even though the old definition causes the
function to fail on hot standby slaves. Therefore, revert commit
5829387381d2e4edf84652bb5a712f6185860670 and follow-on commits in the back
branches only.
|
|
|
|
|
|
|
|
|
|
|
|
| |
AbortOutOfAnyTransaction failed to do anything if the state it saw on
entry corresponded to failing partway through StartTransaction. I fixed
AbortCurrentTransaction to cope with that case way back in commit
60b2444cc3ba037630c9b940c3c9ef01b954b87b, but evidently overlooked that
AbortOutOfAnyTransaction should do likewise.
Back-patch to all supported branches. It's not clear that this omission
has any more-than-cosmetic consequences, but it's also not clear that it
doesn't, so back-patching seems the least risky choice.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
When using synchronous replication, we waited for the commit record to be
replicated, but if we our transaction didn't write any other WAL records,
that's not required because we don't even flush the WAL locally to disk in
that case. This lead to long waits when committing a transaction that only
modified a temporary table. Bug spotted by Thom Brown.
|
|
|
|
|
|
|
|
|
| |
Initialise ckptXidEpoch from starting checkpoint and maintain the correct
value as we roll forwards. This allows GetNextXidAndEpoch() to return the
correct epoch when executed during recovery. Backpatch to 9.0 when the
problem is first observable by a user.
Bug report from Daniel Farina
|
|
|
|
|
|
|
|
| |
Previously we used ReadRecPtr rather than EndRecPtr, which was
not a serious error but caused pg_stat_replication to report
incorrect replay_location until at least one WAL record is replayed.
Fujii Masao
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a longstanding thinko in replay of NEXTOID and checkpoint records: we
tried to advance nextOid only if it was behind the value in the WAL record,
but the comparison would draw the wrong conclusion if OID wraparound had
occurred since the previous value. Better to just unconditionally assign
the new value, since OID assignment shouldn't be happening during replay
anyway.
The consequences of a failure to update nextOid would be pretty minimal,
since we have long had the code set up to obtain another OID and try again
if the generated value is already in use. But in the worst case there
could be significant performance glitches while such loops iterate through
many already-used OIDs before finding a free one.
The odds of a wraparound happening during WAL replay would be small in a
crash-recovery scenario, and the length of any ensuing OID-assignment stall
quite limited anyway. But neither of these statements hold true for a
replication slave that follows a WAL stream for a long period; its behavior
upon going live could be almost unboundedly bad. Hence it seems worth
back-patching this fix into all supported branches.
Already fixed in HEAD in commit c6d76d7c82ebebb7210029f7382c0ebe2c558bca.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RestoreBkpBlocks was in the habit of zeroing and refilling the target
buffer; which was perfectly safe when the code was written, but is unsafe
during Hot Standby operation. The reason is that we have coding rules
that allow backends to continue accessing a tuple in a heap relation while
holding only a pin on its buffer. Such a backend could see transiently
zeroed data, if WAL replay had occasion to change other data on the page.
This has been shown to be the cause of bug #6425 from Duncan Rance (who
deserves kudos for developing a sufficiently-reproducible test case) as
well as Bridget Frey's re-report of bug #6200. It most likely explains the
original report as well, though we don't yet have confirmation of that.
To fix, change the code so that only bytes that are supposed to change will
change, even transiently. This actually saves cycles in RestoreBkpBlocks,
since it's not writing the same bytes twice.
Also fix seq_redo, which has the same disease, though it has to work a bit
harder to meet the requirement.
So far as I can tell, no other WAL replay routines have this type of bug.
In particular, the index-related replay routines, which would certainly be
broken if they had to meet the same standard, are not at risk because we
do not have coding rules that allow access to an index page when not
holding a buffer lock on it.
Back-patch to 9.0 where Hot Standby was added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
smgrdounlink takes care to not throw an ERROR if it fails to unlink
something, but that caution was rendered useless by commit
3396000684b41e7e9467d1abc67152b39e697035, which put an smgrexists call in
front of it; smgrexists *does* throw error if anything looks funny, such
as getting a permissions error from trying to open the file. If that
happens post-commit, you get a PANIC, and what's worse the same logic
appears in the WAL replay code, so the database even fails to restart.
Restore the intended behavior by removing the smgrexists call --- it isn't
accomplishing anything that we can't do better by adjusting mdunlink's
ideas of whether it ought to warn about ENOENT or not.
Per report from Joseph Shraibman of unrecoverable crash after trying to
drop a table whose FSM fork had somehow gotten chmod'd to 000 permissions.
Backpatch to 8.4, where the bogus coding was introduced.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
we don't reach consistency before replaying all of the WAL. Rename the
variable to reachedConsistency, to make its intention clearer.
In master, that was an active bug because of the recent patch to
immediately PANIC if a reference to a missing page is found in WAL after
reaching consistency, as Tom Lane's test case demonstrated. In 9.1 and 9.0,
the only consequence was a misleading "consistent recovery state reached at
%X/%X" message in the log at the beginning of crash recovery (the database
is not consistent at that point yet). In 8.4, the log message was not
printed in crash recovery, even though there was a similar
reachedMinRecoveryPoint local variable that was also set early. So,
backpatch to 9.1 and 9.0.
|
|
|
|
|
|
|
|
|
| |
There was a timing window between when oldestActiveXid was derived
and when it should have been derived that only shows itself under
heavy load. Move code around to ensure correct timing of derivation.
No change to StartupSUBTRANS() code, which is where this failed.
Bug report by Chris Redekop
|
|
|
|
| |
Patch by me, bug report by Chris Redekop, analysis by Florian Pflug
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
streamed backup, throw an error and refuse to start up. The restore has not
finished correctly in that case and the data directory is possibly corrupt.
We already errored out in case of archive recovery, but could not during
crash recovery because we couldn't distinguish between the case that
pg_start_backup() was called and the database then crashed (must not error,
data is OK), and the case that we're restoring from a backup and not all
the needed WAL was replayed (data can be corrupt).
To distinguish those cases, add a line to backup_label to indicate
whether the backup was taken with pg_start/stop_backup(), or by streaming
(ie. pg_basebackup).
This is a different implementation than what I committed to 9.2 a week ago.
That implementation was not back-patchable because it required re-initdb.
Fujii Masao
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code tried to synchronize by unlinking the init file twice,
but that doesn't actually work: it leaves a window wherein a third process
could read the already-stale init file but miss the SI messages that would
tell it the data is stale. The result would be bizarre failures in catalog
accesses, typically "could not read block 0 in file ..." later during
startup.
Instead, hold RelCacheInitLock across both the unlink and the sending of
the SI messages. This is more straightforward, and might even be a bit
faster since only one unlink call is needed.
This has been wrong since it was put in (in 2002!), so back-patch to all
supported releases.
|
|
|
|
|
|
|
|
|
|
| |
Fix a whole bunch of signal handlers that had been hacked to do things that
might change errno, without adding the necessary save/restore logic for
errno. Also make some minor fixes in unix_latch.c, and clean up bizarre
and unsafe scheme for disowning the process's latch. While at it, rename
the PGPROC latch field to procLatch for consistency with 9.2.
Issues noted while reviewing a patch by Peter Geoghegan.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original definition had the problem that timeouts exceeding about 2100
seconds couldn't be specified on 32-bit machines. Milliseconds seem like
sufficient resolution, and finer grain than that would be fantasy anyway
on many platforms.
Back-patch to 9.1 so that this aspect of the latch API won't change between
9.1 and later releases.
Peter Geoghegan
|
|
|
|
|
| |
We had previously (af26857a2775e7ceb0916155e931008c2116632f)
established the U.S. spellings as standard.
|
|
|
|
| |
Kevin Grittner
|
|
|
|
| |
renumbered the resource managers. This should fix the buildfarm..
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ReadRecord's habit of using both direct references to tmpRecPtr and
references to *RecPtr (which is pointing at tmpRecPtr) triggers an
optimization bug in gcc 4.6.0, which apparently has forgotten about
aliasing rules. Avoid the compiler bug, and make the code more readable
to boot, by getting rid of the direct references. Improve the comments
while at it.
Back-patch to all supported versions, in case they get built with 4.6.0.
Tom Lane, with some cosmetic suggestions from Alex Hunsaker
|
| |
|
| |
|
|
|
|
|
| |
just check that it's not running and PANIC if it was, but that can rightfully
happen if recovery stops at recovery target.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SSI patch inserted a call of RegisterPredicateLockingXid into
GetNewTransactionId, which was a bad idea on a couple of grounds. First,
it's not necessary to hold XidGenLock while manipulating that shared
memory, and doing so is bad because XidGenLock is a high-contention lock
that should be held for as short a time as possible. (Not to mention that
it adds an entirely unnecessary deadlock hazard, since we must take
SerializableXactHashLock as well.) Second, the specific place where it was
put was between extending CLOG and advancing nextXid, which could result in
unpleasant behavior in case of a failure there. Pull the call out to
AssignTransactionId, which is much safer and arguably better from a
modularity standpoint too.
There is more work to do to clean up the failure-before-advancing-nextXid
issue, but that is a separate change that will need to be back-patched.
So for the moment I just want to make GetNewTransactionId look the same as
it did in prior versions.
|
|
|
|
|
|
|
|
|
|
| |
Before commit c016ce728139be95bb0dc7c4e5640507334c2339, this wasn't
needed, but now that multiple resource manager IDs can percolate down
through here, we have to make sure we know which one we've got.
Otherwise, we can confuse (for example) an XLOG_XACT_COMMIT record
with an XLOG_CHECKPOINT_SHUTDOWN record.
Review by Jaime Casanova
|
|
|
|
|
|
|
|
|
| |
crash recovery, and throw an error if not. hubert depesz lubaczewski pointed
out that that situation also happens in the crash recovery following a
system crash that happens during an online backup.
We might want to do something smarter in 9.1, like put the check back for
backups taken with pg_basebackup, but that's for another patch.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous functions of assign hooks are now split between check hooks
and assign hooks, where the former can fail but the latter shouldn't.
Aside from being conceptually clearer, this approach exposes the
"canonicalized" form of the variable value to guc.c without having to do
an actual assignment. And that lets us fix the problem recently noted by
Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus
log messages about "parameter "wal_buffers" cannot be changed without
restarting the server". There may be some speed advantage too, because
this design lets hook functions avoid re-parsing variable values when
restoring a previous state after a rollback (they can store a pre-parsed
representation of the value instead). This patch also resolves a
longstanding annoyance about custom error messages from variable assign
hooks: they should modify, not appear separately from, guc.c's own message
about "invalid parameter value".
|
|
|
|
|
|
| |
Also avoid hardcoding the current default state by giving it the name
"on" and replace with a meaningful name that reflects its behaviour.
Coding only, no change in behaviour.
|
|
|
|
|
|
|
|
| |
This means one less thing to configure when setting up synchronous
replication, and also avoids some ambiguity around what the behavior
should be when the settings of these variables conflict.
Fujii Masao, with additional hacking by me.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
archive recovery.
It's possible to restore an online backup without recovery.conf, by simply
copying all the necessary WAL files to pg_xlog. "pg_basebackup -x" does that
too. That's the use case where this cross-check is useful.
Backpatch to 9.0. We used to do this in earlier versins, but in 9.0 the code
was inadvertently changed so that the check is only performed after archive
recovery.
Fujii Masao.
|
|
|
|
|
|
|
|
|
| |
Change location LOG message so it works each time we pause, not
just for final pause.
Ensure that we pause only if we are in Hot Standby and can connect
to allow us to run resume function. This change supercedes the
code to override parameter recoveryPauseAtTarget to false if not
attempting to enter Hot Standby, which is now removed.
|
|
|
|
|
|
| |
Startup process waited for cleanup lock but when hot_standby = off
the pid was not registered, so that the bgwriter would not wake
the waiting process as intended.
|
|
|
|
|
|
|
|
| |
ensure that they use different checkpoints as the starting point. We use
the checkpoint redo location as a unique identifier for the base backup in
the end-of-backup record, and in the backup history file name.
Bug spotted by Fujii Masao.
|
|
|
|
|
| |
Without this, the startup process goes into a tight loop, consuming
100% of one CPU and failing to respond to interrupts.
|
|
|
|
|
| |
Fujii Masao, but with the proposed behavior change reverted, and the
rest adjusted accordingly.
|
|
|
|
| |
opposed to O_DSYNC.
|
|
|
|
| |
Fujii Masao
|
|
|
|
|
|
|
|
| |
than doing it aggressively whenever the tail-XID pointer is advanced, because
this way we don't need to do it while holding SerializableXactHashLock.
This also fixes bug #5915 spotted by YAMAMOTO Takashi, and removes an
obsolete comment spotted by Kevin Grittner.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
periodically rescan the archive for new timelines, while waiting for new WAL
segments to arrive. This allows you to set up a standby server that follows
the TLI change if another standby server is promoted to master. Before this,
you had to restart the standby server to make it notice the new timeline.
This patch only scans the archive for TLI changes, it won't follow a TLI
change in streaming replication. That is much needed too, but it would be a
much bigger patch than I dare to sneak in this late in the release cycle.
There was discussion on improving the sanity checking of the WAL segments so
that the system would notice more reliably if the new timeline isn't an
ancestor of the current one, but that is not included in this patch.
Reviewed by Fujii Masao.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.
This version has very strict behaviour; more relaxed options
may be added at a later date.
Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime
Casanova, Heikki Linnakangas and Robert Haas, plus the assistance
of many other design reviewers.
|