aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam/twophase.c
diff options
context:
space:
mode:
authorMichael Paquier <michael@paquier.xyz>2021-03-22 08:30:53 +0900
committerMichael Paquier <michael@paquier.xyz>2021-03-22 08:30:53 +0900
commit595b9cba2ab0cdd057e02d3c23f34a8bcfd90a2d (patch)
treeffa6c941c8cf788ba502076dd3d2f13677b8cbfd /src/backend/access/transam/twophase.c
parentac897c483485d3858ada23ca49650a0f2742a50f (diff)
downloadpostgresql-595b9cba2ab0cdd057e02d3c23f34a8bcfd90a2d.tar.gz
postgresql-595b9cba2ab0cdd057e02d3c23f34a8bcfd90a2d.zip
Fix timeline assignment in checkpoints with 2PC transactions
Any transactions found as still prepared by a checkpoint have their state data read from the WAL records generated by PREPARE TRANSACTION before being moved into their new location within pg_twophase/. While reading such records, the WAL reader uses the callback read_local_xlog_page() to read a page, that is shared across various parts of the system. This callback, since 1148e22a, has introduced an update of ThisTimeLineID when reading a record while in recovery, which is potentially helpful in the context of cascading WAL senders. This update of ThisTimeLineID interacts badly with the checkpointer if a promotion happens while some 2PC data is read from its record, as, by changing ThisTimeLineID, any follow-up WAL records would be written to an timeline older than the promoted one. This results in consistency issues. For instance, a subsequent server restart would cause a failure in finding a valid checkpoint record, resulting in a PANIC, for instance. This commit changes the code reading the 2PC data to reset the timeline once the 2PC record has been read, to prevent messing up with the static state of the checkpointer. It would be tempting to do the same thing directly in read_local_xlog_page(). However, based on the discussion that has led to 1148e22a, users may rely on the updates of ThisTimeLineID when a WAL record page is read in recovery, so changing this callback could break some cases that are working currently. A TAP test reproducing the issue is added, relying on a PITR to precisely trigger a promotion with a prepared transaction still tracked. Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao and myself. Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com Backpatch-through: 10
Diffstat (limited to 'src/backend/access/transam/twophase.c')
-rw-r--r--src/backend/access/transam/twophase.c15
1 files changed, 14 insertions, 1 deletions
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 6023e7c16fb..89335b64a24 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1316,7 +1316,11 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
* twophase files and ReadTwoPhaseFile should be used instead.
*
* Note clearly that this function can access WAL during normal operation,
- * similarly to the way WALSender or Logical Decoding would do.
+ * similarly to the way WALSender or Logical Decoding would do. While
+ * accessing WAL, read_local_xlog_page() may change ThisTimeLineID,
+ * particularly if this routine is called for the end-of-recovery checkpoint
+ * in the checkpointer itself, so save the current timeline number value
+ * and restore it once done.
*/
static void
XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
@@ -1324,6 +1328,7 @@ XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
XLogRecord *record;
XLogReaderState *xlogreader;
char *errormsg;
+ TimeLineID save_currtli = ThisTimeLineID;
xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
XL_ROUTINE(.page_read = &read_local_xlog_page,
@@ -1338,6 +1343,14 @@ XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
XLogBeginRead(xlogreader, lsn);
record = XLogReadRecord(xlogreader, &errormsg);
+
+ /*
+ * Restore immediately the timeline where it was previously, as
+ * read_local_xlog_page() could have changed it if the record was read
+ * while recovery was finishing or if the timeline has jumped in-between.
+ */
+ ThisTimeLineID = save_currtli;
+
if (record == NULL)
ereport(ERROR,
(errcode_for_file_access(),