aboutsummaryrefslogtreecommitdiff
path: root/src/backend/storage/file/copydir.c
Commit message (Collapse)AuthorAge
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Refactor dir/file permissionsStephen Frost2018-04-07
| | | | | | | | | | | | | | | | | | | Consolidate directory and file create permissions for tools which work with the PG data directory by adding a new module (common/file_perm.c) that contains variables (pg_file_create_mode, pg_dir_create_mode) and constants to initialize them (0600 for files and 0700 for directories). Convert mkdir() calls in the backend to MakePGDirectory() if the original call used default permissions (always the case for regular PG directories). Add tests to make sure permissions in PGDATA are set correctly by the tools which modify the PG data directory. Authors: David Steele <david@pgmasters.net>, Adam Brightwell <adam.brightwell@crunchydata.com> Reviewed-By: Michael Paquier, with discussion amongst many others. Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Clean up assorted messiness around AllocateDir() usage.Tom Lane2017-12-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a couple of low-probability bugs that could lead to reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE) concerning directory-open or file-open failures. It also fixes places where we took shortcuts in reporting such errors, either by using elog instead of ereport or by using ereport but forgetting to specify an errcode. And it eliminates a lot of just plain redundant error-handling code. In service of all this, export fd.c's formerly-static function ReadDirExtended, so that external callers can make use of the coding pattern dir = AllocateDir(path); while ((de = ReadDirExtended(dir, path, LOG)) != NULL) if they'd like to treat directory-open failures as mere LOG conditions rather than errors. Also fix FreeDir to be a no-op if we reach it with dir == NULL, as such a coding pattern would cause. Then, remove code at many call sites that was throwing an error or log message for AllocateDir failure, as ReadDir or ReadDirExtended can handle that job just fine. Aside from being a net code savings, this gets rid of a lot of not-quite-up-to-snuff reports, as mentioned above. (In some places these changes result in replacing a custom error message such as "could not open tablespace directory" with more generic wording "could not open directory", but it was agreed that the custom wording buys little as long as we report the directory name.) In some other call sites where we can't just remove code, change the error reports to be fully project-style-compliant. Also reorder code in restoreTwoPhaseData that was acquiring a lock between AllocateDir and ReadDir; in the unlikely but surely not impossible case that LWLockAcquire changes errno, AllocateDir failures would be misreported. There is no great value in opening the directory before acquiring TwoPhaseStateLock, so just do it in the other order. Also fix CheckXLogRemoved to guarantee that it preserves errno, as quite a number of call sites are implicitly assuming. (Again, it's unlikely but I think not impossible that errno could change during a SpinLockAcquire. If so, this function was broken for its own purposes as well as breaking callers.) And change a few places that were using not-per-project-style messages, such as "could not read directory" when "could not open directory" is more correct. Back-patch the exporting of ReadDirExtended, in case we have occasion to back-patch some fix that makes use of it; it's not needed right now but surely making it global is pretty harmless. Also back-patch the restoreTwoPhaseData and CheckXLogRemoved fixes. The rest of this is essentially cosmetic and need not get back-patched. Michael Paquier, with a bit of additional work by me Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
* Increase distance between flush requests during bulk file copies.Tom Lane2017-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | copy_file() reads and writes data 64KB at a time (with default BLCKSZ), and historically has issued a pg_flush_data request after each write. This turns out to interact really badly with macOS's new APFS file system: a large file copy takes over 100X longer than it ought to on APFS, as reported by Brent Dearth. While that's arguably a macOS bug, it's not clear whether Apple will do anything about it in the near future, and in any case experimentation suggests that issuing flushes a bit less often can be helpful on other platforms too. Hence, rearrange the logic in copy_file() so that flush requests are issued once per N writes rather than every time through the loop. I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still results in a noticeable speed degradation on APFS), but 1MB elsewhere. In limited testing on Linux and FreeBSD, this seems slightly faster than the previous code, and certainly no worse. It helps noticeably on macOS even with the older HFS filesystem. A simpler change would have been to just increase the size of the copy buffer without changing the loop logic, but that seems likely to trash the processor cache without really helping much. Back-patch to 9.6 where we introduced msync() as an implementation option for pg_flush_data(). The problem seems specific to APFS's mmap/msync support, so I don't think we need to go further back. Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
* Refactor new file permission handlingPeter Eisentraut2017-09-23
| | | | | | | | | | | | | | | | | | | The file handling functions from fd.c were called with a diverse mix of notations for the file permissions when they were opening new files. Almost all files created by the server should have the same permissions set. So change the API so that e.g. OpenTransientFile() automatically uses the standard permissions set, and OpenTransientFilePerm() is a new function that takes an explicit permissions set for the few cases where it is needed. This also saves an unnecessary argument for call sites that are just opening an existing file. While we're reviewing these APIs, get rid of the FileName typedef and use the standard const char * for the file name and mode_t for the file mode. This makes these functions match other file handling functions and removes an unnecessary layer of mysteriousness. We can also get rid of a few casts that way. Author: David Steele <david@pgmasters.net>
* Fix new warnings from GCC 7Peter Eisentraut2017-04-17
| | | | | This addresses the new warning types -Wformat-truncation -Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.
* Create and use wait events for read, write, and fsync operations.Robert Haas2017-03-18
| | | | | | | | | | | | | Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and 6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from pg_stat_activity when a backend was stuck waiting for another backend, but it's also fairly common for a backend to be stuck waiting for an I/O. Add wait events for those operations, too. Rushabh Lathia, with further hacking by me. Reviewed and tested by Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed. Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Allow to trigger kernel writeback after a configurable number of writes.Andres Freund2016-03-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently writes to the main data files of postgres all go through the OS page cache. This means that some operating systems can end up collecting a large number of dirty buffers in their respective page caches. When these dirty buffers are flushed to storage rapidly, be it because of fsync(), timeouts, or dirty ratios, latency for other reads and writes can increase massively. This is the primary reason for regular massive stalls observed in real world scenarios and artificial benchmarks; on rotating disks stalls on the order of hundreds of seconds have been observed. On linux it is possible to control this by reducing the global dirty limits significantly, reducing the above problem. But global configuration is rather problematic because it'll affect other applications; also PostgreSQL itself doesn't always generally want this behavior, e.g. for temporary files it's undesirable. Several operating systems allow some control over the kernel page cache. Linux has sync_file_range(2), several posix systems have msync(2) and posix_fadvise(2). sync_file_range(2) is preferable because it requires no special setup, whereas msync() requires the to-be-flushed range to be mmap'ed. For the purpose of flushing dirty data posix_fadvise(2) is the worst alternative, as flushing dirty data is just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages from the page cache. Thus the feature is enabled by default only on linux, but can be enabled on all systems that have any of the above APIs. While desirable and likely possible this patch does not contain an implementation for windows. With the infrastructure added, writes made via checkpointer, bgwriter and normal user backends can be flushed after a configurable number of writes. Each of these sources of writes controlled by a separate GUC, checkpointer_flush_after, bgwriter_flush_after and backend_flush_after respectively; they're separate because the number of flushes that are good are separate, and because the performance considerations of controlled flushing for each of these are different. A later patch will add checkpoint sorting - after that flushes from the ckeckpoint will almost always be desirable. Bgwriter flushes are most of the time going to be random, which are slow on lots of storage hardware. Flushing in backends works well if the storage and bgwriter can keep up, but if not it can have negative consequences. This patch is likely to have negative performance consequences without checkpoint sorting, but unfortunately so has sorting without flush control. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Expose fsync_fname as a public API.Robert Haas2013-09-04
| | | | Andres Freund
* Create libpgcommon, and move pg_malloc et al to itAlvaro Herrera2013-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | libpgcommon is a new static library to allow sharing code among the various frontend programs and backend; this lets us eliminate duplicate implementations of common routines. We avoid libpgport, because that's intended as a place for porting issues; per discussion, it seems better to keep them separate. The first use case, and the only implemented by this patch, is pg_malloc and friends, which many frontend programs were already using. At the same time, we can use this to provide palloc emulation functions for the frontend; this way, some palloc-using files in the backend can also be used by the frontend cleanly. To do this, we change palloc() in the backend to be a function instead of a macro on top of MemoryContextAlloc(). This was previously believed to cause loss of performance, but this implementation has been tweaked by Tom and Andres so that on modern compilers it provides a slight improvement over the previous one. This lets us clean up some places that were already with localized hacks. Most of the pg_malloc/palloc changes in this patch were authored by Andres Freund. Zoltán Böszörményi also independently provided a form of that. libpgcommon infrastructure was authored by Álvaro.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Add OpenTransientFile, with automatic cleanup at end-of-xact.Heikki Linnakangas2012-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | Files opened with BasicOpenFile or PathNameOpenFile are not automatically cleaned up on error. That puts unnecessary burden on callers that only want to keep the file open for a short time. There is AllocateFile, but that returns a buffered FILE * stream, which in many cases is not the nicest API to work with. So add function called OpenTransientFile, which returns a unbuffered fd that's cleaned up like the FILE* returned by AllocateFile(). This plugs a few rare fd leaks in error cases: 1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile 2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't use OpenTransientFile here because the fd is supposed to persist over transaction boundaries. 3. lo_import/lo_export - fixed by using OpenTransientFile instead of PathNameOpenFile. In addition to plugging those leaks, this replaces many BasicOpenFile() calls with OpenTransientFile() that were not leaking, because the code meticulously closed the file on error. That wasn't strictly necessary, but IMHO it's good for robustness. The same leaks exist in older versions, but given the rarity of the issues, I'm not backpatching this. Not yet, anyway - it might be good to backpatch later, after this mechanism has had some more testing in master branch.
* Improve copydir() code for the case that fsync is off.Tom Lane2012-07-21
| | | | | | | | | | We should avoid calling sync_file_range or posix_fadvise in this case, since (a) we don't really care if the data gets synced, and might as well save the kernel calls; (b) at least on Linux we know that the kernel might block us until it's scheduled the write. Also, avoid making a useless second traversal of the directory tree if we're not actually going to call fsync(2) after all.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Support unlogged tables.Robert Haas2010-12-29
| | | | | | | The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.
* Use symbolic names not octal constants for file permission flags.Tom Lane2010-12-10
| | | | | | | | Purely cosmetic patch to make our coding standards more consistent --- we were doing symbolic some places and octal other places. This patch fixes all C-coded uses of mkdir, chmod, and umask. There might be some other calls I missed. Inconsistency noted while researching tablespace directory permissions issue.
* Move copydir() prototype into its own header file.Robert Haas2010-11-12
| | | | | | | Having this in src/include/port.h makes no sense, now that copydir.c lives in src/backend/strorage rather than src/port. Along the way, remove an obsolete comment from contrib/pg_upgrade that makes reference to the old location.
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* pgindent run for 9.0, second runBruce Momjian2010-07-06
|
* Move copydir.c from src/port to src/backend/storage/fileRobert Haas2010-07-02
The previous commit to make copydir() interruptible prevented postgres.exe from linking on MinGW and Cygwin, because on those platforms libpgport_srv.a can't freely reference symbols defined by the backend. Since that code is already backend-specific anyway, just move the whole file into the backend rather than adding further kludges to deal with the symbols needed by CHECK_FOR_INTERRUPTS(). This probably needs some further cleanup, but this commit just moves the file as-is, which should hopefully be enough to turn the buildfarm green again.