diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2015-05-28 17:33:03 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2015-05-28 17:33:03 -0400 |
commit | d8179b001ae574da00c8f4549577095bf90f3337 (patch) | |
tree | 7ab6e7ba9d73d166723d0740e686abb7615d66c7 /src/backend/access/transam/xlog.c | |
parent | d5442cb2434c303fa2afc747cdac65df958ff8ac (diff) | |
download | postgresql-d8179b001ae574da00c8f4549577095bf90f3337.tar.gz postgresql-d8179b001ae574da00c8f4549577095bf90f3337.zip |
Fix fsync-at-startup code to not treat errors as fatal.
Commit 2ce439f3379aed857517c8ce207485655000fc8e introduced a rather serious
regression, namely that if its scan of the data directory came across any
un-fsync-able files, it would fail and thereby prevent database startup.
Worse yet, symlinks to such files also caused the problem, which meant that
crash restart was guaranteed to fail on certain common installations such
as older Debian.
After discussion, we agreed that (1) failure to start is worse than any
consequence of not fsync'ing is likely to be, therefore treat all errors
in this code as nonfatal; (2) we should not chase symlinks other than
those that are expected to exist, namely pg_xlog/ and tablespace links
under pg_tblspc/. The latter restriction avoids possibly fsync'ing a
much larger part of the filesystem than intended, if the user has left
random symlinks hanging about in the data directory.
This commit takes care of that and also does some code beautification,
mainly moving the relevant code into fd.c, which seems a much better place
for it than xlog.c, and making sure that the conditional compilation for
the pre_sync_fname pass has something to do with whether pg_flush_data
works.
I also relocated the call site in xlog.c down a few lines; it seems a
bit silly to be doing this before ValidateXLOGDirectoryStructure().
The similar logic in initdb.c ought to be made to match this, but that
change is noncritical and will be dealt with separately.
Back-patch to all active branches, like the prior commit.
Abhijit Menon-Sen and Tom Lane
Diffstat (limited to 'src/backend/access/transam/xlog.c')
-rw-r--r-- | src/backend/access/transam/xlog.c | 54 |
1 files changed, 12 insertions, 42 deletions
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 087b6be084d..666fa37e2a3 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -866,8 +866,6 @@ static void WALInsertLockAcquireExclusive(void); static void WALInsertLockRelease(void); static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt); -static void fsync_pgdata(char *datadir); - /* * Insert an XLOG record represented by an already-constructed chain of data * chunks. This is a low-level routine; to construct the WAL record header @@ -5951,18 +5949,6 @@ StartupXLOG(void) (errmsg("database system was interrupted; last known up at %s", str_time(ControlFile->time)))); - /* - * If we previously crashed, there might be data which we had written, - * intending to fsync it, but which we had not actually fsync'd yet. - * Therefore, a power failure in the near future might cause earlier - * unflushed writes to be lost, even though more recent data written to - * disk from here on would be persisted. To avoid that, fsync the entire - * data directory. - */ - if (ControlFile->state != DB_SHUTDOWNED && - ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY) - fsync_pgdata(data_directory); - /* This is just to allow attaching to startup process with a debugger */ #ifdef XLOG_REPLAY_DELAY if (ControlFile->state != DB_SHUTDOWNED) @@ -5977,6 +5963,18 @@ StartupXLOG(void) ValidateXLOGDirectoryStructure(); /* + * If we previously crashed, there might be data which we had written, + * intending to fsync it, but which we had not actually fsync'd yet. + * Therefore, a power failure in the near future might cause earlier + * unflushed writes to be lost, even though more recent data written to + * disk from here on would be persisted. To avoid that, fsync the entire + * data directory. + */ + if (ControlFile->state != DB_SHUTDOWNED && + ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY) + SyncDataDirectory(); + + /* * Initialize on the assumption we want to recover to the latest timeline * that's active according to pg_control. */ @@ -11602,31 +11600,3 @@ SetWalWriterSleeping(bool sleeping) XLogCtl->WalWriterSleeping = sleeping; SpinLockRelease(&XLogCtl->info_lck); } - -/* - * Issue fsync recursively on PGDATA and all its contents. - */ -static void -fsync_pgdata(char *datadir) -{ - if (!enableFsync) - return; - - /* - * If possible, hint to the kernel that we're soon going to fsync the data - * directory and its contents. - */ -#if defined(HAVE_SYNC_FILE_RANGE) || \ - (defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)) - walkdir(datadir, pre_sync_fname); -#endif - - /* - * Now we do the fsync()s in the same order. - * - * It's important to fsync the destination directory itself as individual - * file fsyncs don't guarantee that the directory entry for the file is - * synced. - */ - walkdir(datadir, fsync_fname); -} |