aboutsummaryrefslogtreecommitdiff
path: root/src/backend/replication/basebackup.c
Commit message (Collapse)AuthorAge
* Move basebackup code to new directory src/backend/backupRobert Haas2022-08-10
| | | | | | Reviewed by David Steele and Justin Pryzby Discussion: http://postgr.es/m/CA+TgmoafqboATDSoXHz8VLrSwK_MDhjthK4hEpYjqf9_1Fmczw%40mail.gmail.com
* Replace pgwin32_is_junction() with lstat().Thomas Munro2022-08-06
| | | | | | | | | | | | | | | | | | | Now that lstat() reports junction points with S_IFLNK/S_ISLINK(), and unlink() can unlink them, there is no need for conditional code for Windows in a few places. That was expressed by testing for WIN32 or S_ISLNK, which we can now constant-fold. The coding around pgwin32_is_junction() was a bit suspect anyway, as we never checked for errors, and we also know that errors can be spuriously reported because of transient sharing violations on this OS. The lstat()-based code has handling for that. This also reverts 4fc6b6ee on master only. That was done because lstat() didn't previously work for symlinks (junction points), but now it does. Tested-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com
* Remove dead pread and pwrite replacement code.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | | | | | | | pread() and pwrite() are in SUSv2, and all targeted Unix systems have them. Previously, we defined pg_pread and pg_pwrite to emulate these function with lseek() on old Unixen. The names with a pg_ prefix were a reminder of a portability hazard: they might change the current file position. That hazard is gone, so we can drop the prefixes. Since the remaining replacement code is Windows-only, move it into src/port/win32p{read,write}.c, and move the declarations into src/include/port/win32_port.h. No need for vestigial HAVE_PREAD, HAVE_PWRITE macros as they were only used for declarations in port.h which have now moved into win32_port.h. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
* Remove configure probes for symlink/readlink, and dead code.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | | | symlink() and readlink() are in SUSv2 and all targeted Unix systems have them. We have partial emulation on Windows. Code that raised runtime errors on systems without it has been dead for years, so we can remove that and also references to such systems in the documentation. Define HAVE_READLINK and HAVE_SYMLINK macros on Unix. Our Windows replacement functions based on junction points can't be used for relative paths or for non-directories, so the macros can be used to check for full symlink support. The places that deal with tablespaces can just use symlink functions without checking the macros. (If they did check the macros, they'd need to provide an #else branch with a runtime or compile time error, and it'd be dead code.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
* Clean up some residual confusion between OIDs and RelFileNumbers.Robert Haas2022-07-28
| | | | | | | | | | | | Commit b0a55e43299c4ea2a9a8c757f9c26352407d0ccc missed a few places where we are referring to the number used as a part of the relation filename as an "OID". We now want to call that a "RelFileNumber". Some of these places actually made it sound like the OID in question is pg_class.oid rather than pg_class.relfilenode, which is especially good to clean up. Dilip Kumar with some editing by me.
* Prevent BASE_BACKUP in the middle of another backup in the same session.Fujii Masao2022-07-20
| | | | | | | | | | | | | | | | | | | | | | | | Multiple non-exclusive backups are able to be run conrrently in different sessions. But, in the same session, only one non-exclusive backup can be run at the same moment. If pg_backup_start (pg_start_backup in v14 or before) is called in the middle of another non-exclusive backup in the same session, an error is thrown. However, previously, in logical replication walsender mode, even if that walsender session had already called pg_backup_start and started a non-exclusive backup, it could execute BASE_BACKUP command and start another non-exclusive backup. Which caused subsequent pg_backup_stop to throw an error because BASE_BACKUP unexpectedly reset the session state marked by pg_backup_start. This commit prevents BASE_BACKUP command in the middle of another non-exclusive backup in the same session. Back-patch to all supported branches. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas Discussion: https://postgr.es/m/3374718f-9fbf-a950-6d66-d973e027f44c@oss.nttdata.com
* Fix code comments still referring to pg_start/stop_backup()Michael Paquier2022-07-01
| | | | | | | | | pg_start_backup() and pg_stop_backup() have been respectively renamed to pg_backup_start() and pg_backup_stop() as of 39969e2, but a few comments did not get the call. Reviewed-by: Kyotaro Horiguchi, David Steele Discussion: https://postgr.es/m/YrqGlj1+4DF3dbZ/@paquier.xyz
* Remove duplicated word in comment of basebackup.cMichael Paquier2022-04-20
| | | | | | | Oversight in 39969e2. Author: Martín Marqués Discussion: https://postgr.es/m/CABeG9LviA01oHC5h=ksLUuhMyXxmZR_tftRq6q3341CMT=j=4g@mail.gmail.com
* Rename backup_compression.{c,h} to compression.{c,h}Michael Paquier2022-04-12
| | | | | | | | | | | | | | | | | | Compression option handling (level, algorithm or even workers) can be used across several parts of the system and not only base backups. Structures, objects and routines are renamed in consequence, to remove the concept of base backups from this part of the code making this change straight-forward. pg_receivewal, that has gained support for LZ4 since babbbb5, will make use of this infrastructure for its set of compression options, bringing more consistency with pg_basebackup. This cleanup needs to be done before releasing a beta of 15. pg_dump is a potential future target, as well, and adding more compression options to it may happen in 16~. Author: Michael Paquier Reviewed-by: Robert Haas, Georgios Kokolatos Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
* pgstat: remove stats_temp_directory.Andres Freund2022-04-06
| | | | | | | | | | | | | | | | With stats now being stored in shared memory, the GUC isn't needed anymore. However, the pg_stat_tmp directory and PG_STAT_TMP_DIR define are kept, as pg_stat_statements (and some out-of-core extensions) store data in it. Docs will be updated in a subsequent commit, together with the other pending docs updates due to shared memory stats. Author: Andres Freund <andres@anarazel.de> Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220330233550.eiwsbearu6xhuqwe@alap3.anarazel.de Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
* pgstat: stats collector references in comments.Andres Freund2022-04-06
| | | | | | | | | | | | | | | | | | Soon the stats collector will be no more, with statistics instead getting stored in shared memory. There are a lot of references to the stats collector in comments. This commit replaces most of these references with "cumulative statistics system", with the remaining ones getting replaced as part of subsequent commits. This is done separately from the - quite large - shared memory statistics patch to make review easier. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de
* Remove exclusive backup modeStephen Frost2022-04-06
| | | | | | | | | | | | | | | | | | | | | | Exclusive-mode backups have been deprecated since 9.6 (when non-exclusive backups were introduced) due to the issues they can cause should the system crash while one is running and generally because non-exclusive provides a much better interface. Further, exclusive backup mode wasn't really being tested (nor was most of the related code- like being able to log in just to stop an exclusive backup and the bits of the state machine related to that) and having to possibly deal with an exclusive backup and the backup_label file existing during pg_basebackup, pg_rewind, etc, added other complexities that we are better off without. This patch removes the exclusive backup mode, the various special cases for dealing with it, and greatly simplifies the online backup code and documentation. Authors: David Steele, Nathan Bossart Reviewed-by: Chapman Flack Discussion: https://postgr.es/m/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.net https://postgr.es/m/CAHg+QDfiM+WU61tF6=nPZocMZvHDzCK47Kneyb0ZRULYzV5sKQ@mail.gmail.com
* Replace BASE_BACKUP COMPRESSION_LEVEL option with COMPRESSION_DETAIL.Robert Haas2022-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are more compression parameters that can be specified than just an integer compression level, so rename the new COMPRESSION_LEVEL option to COMPRESSION_DETAIL before it gets released. Introduce a flexible syntax for that option to allow arbitrary options to be specified without needing to adjust the main replication grammar, and common code to parse it that is shared between the client and the server. This commit doesn't actually add any new compression parameters, so the only user-visible change is that you can now type something like pg_basebackup --compress gzip:level=5 instead of writing just pg_basebackup --compress gzip:5. However, it should make it easy to add new options. If for example gzip starts offering fries, we can support pg_basebackup --compress gzip:level=5,fries=true for the benefit of users who want fries with that. Along the way, this fixes a few things in pg_basebackup so that the pg_basebackup can be used with a server-side compression algorithm that pg_basebackup itself does not understand. For example, pg_basebackup --compress server-lz4 could still succeed even if only the server and not the client has LZ4 support, provided that the other options to pg_basebackup don't require the client to decompress the archive. Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker. Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com
* Allow extensions to add new backup targets.Robert Haas2022-03-15
| | | | | | | | | | | | Commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc allowed for base backup targets, meaning that we could do something with the backup other than send it to the client, but all of those targets had to be baked in to the core code. This commit makes it possible for extensions to define additional backup targets. Patch by me, reviewed by Abhijit Menon-Sen. Discussion: http://postgr.es/m/CA+TgmoaqvdT-u3nt+_kkZ7bgDAyqDB0i-+XOMmr5JN2Rd37hxw@mail.gmail.com
* Add support for zstd base backup compression.Robert Haas2022-03-08
| | | | | | | | | | | Both client-side compression and server-side compression are now supported for zstd. In addition, a backup compressed by the server using zstd can now be decompressed by the client in order to accommodate the use of -Fp. Jeevan Ladhe, with some edits by me. Discussion: http://postgr.es/m/CA+Tgmobyzfbz=gyze2_LL1ZumZunmaEKbHQxjrFkOR7APZGu-g@mail.gmail.com
* Add suport for server-side LZ4 base backup compression.Robert Haas2022-02-11
| | | | | | | | | | | | | LZ4 compression can be a lot faster than gzip compression, so users may prefer it even if the compression ratio is not as good. We will want pg_basebackup to support LZ4 compression and decompression on the client side as well, and there is a pending patch for that, but it's by a different author, so I am committing this part separately for that reason. Jeevan Ladhe, reviewed by Tushar Ahuja and by me. Discussion: http://postgr.es/m/CANm22Cg9cArXEaYgHVZhCnzPLfqXCZLAzjwTq7Fc0quXRPfbxA@mail.gmail.com
* Remove server support for the previous base backup protocol.Robert Haas2022-02-10
| | | | | | | | | | Commit cc333f32336f5146b75190f57ef587dff225f565 added a new COPY sub-protocol for taking base backups, but retained support for the previous protocol. For the same reasons articulated in the message for commit 9cd28c2e5f11dfeef64a14035b82e70acead65fd, remove support for the previous protocol from the server. Discussion: http://postgr.es/m/CA+TgmoazKcKUWtqVa0xZqSzbKgTH+X-aw4V7GyLD68EpDLMh8A@mail.gmail.com
* Server-side gzip compression.Robert Haas2022-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pg_basebackup's --compression option now lets you write either "client-gzip" or "server-gzip" instead of just "gzip" to specify where the compression should be performed. If you write simply "gzip" it's taken to mean "client-gzip" unless you also use --target, in which case it is interpreted to mean "server-gzip", because that's the only thing that makes any sense in that case. To make this work, the BASE_BACKUP command now takes new COMPRESSION and COMPRESSION_LEVEL options. At present, pg_basebackup cannot decompress .gz files, so server-side compression will cause a failure if (1) -Ft is not used or (2) -R is used or (3) -D- is used without --no-manifest. Along the way, I removed the information message added by commit 5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you specified no compression level and told you that the default level had been used instead. That seemed like more output than most people would want. Also along the way, this adds a check to the server for unrecognized base backup options. This repairs a bug introduced by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454. This commit also adds some new test cases for pg_verifybackup. They take a server-side backup with and without compression, and then extract the backup if we have the OS facilities available to do so, and then run pg_verifybackup on the extracted directory. That is a good test of the functionality added by this commit and also improves test coverage for the backup target patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for pg_verifybackup itself. Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which this is a part has also had review and/or testing from Tushar Ahuja, Suraj Kharage, Dipesh Pandit, and Mark Dilger. Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
* Support base backup targets.Robert Haas2022-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied, it is sent to the server as the value of the TARGET option to the BASE_BACKUP command. If DETAIL is included, it is sent as the value of the new TARGET_DETAIL option to the BASE_BACKUP command. If the target is anything other than 'client', pg_basebackup assumes that it will now be the server's job to write the backup in a location somehow defined by the target, and that it therefore needs to write nothing locally. However, the server will still send messages to the client for progress reporting purposes. On the server side, we now support two additional types of backup targets. There is a 'blackhole' target, which just throws away the backup data without doing anything at all with it. Naturally, this should only be used for testing and debugging purposes, since you will not actually have a backup when it finishes running. More usefully, there is also a 'server' target, so you can now use something like 'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some location on the server. We can extend this to more types of targets in the future, and might even want to create an extensibility mechanism for adding new target types. Since WAL fetching is handled with separate client-side logic, it's not part of this mechanism; thus, backups with non-default targets must use -Xnone or -Xfetch. Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which this is a part has also had review and/or testing from Tushar Ahuja, Suraj Kharage, Dipesh Pandit, and Mark Dilger. Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
* Modify pg_basebackup to use a new COPY subprotocol for base backups.Robert Haas2022-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the new approach, all files across all tablespaces are sent in a single COPY OUT operation. The CopyData messages are no longer raw archive content; rather, each message is prefixed with a type byte that describes its purpose, e.g. 'n' signifies the start of a new archive and 'd' signifies archive or manifest data. This protocol is significantly more extensible than the old approach, since we can later create more message types, though not without concern for backward compatibility. The new protocol sends a few things to the client that the old one did not. First, it sends the name of each archive explicitly, instead of letting the client compute it. This is intended to make it easier to write future patches that might send archives in a format other that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress messages rather than allowing the client to assume that progress is defined by the number of bytes received. This will help with future features where the server compresses the data, or sends it someplace directly rather than transmitting it to the client. The old protocol is still supported for compatibility with previous releases. The new protocol is selected by means of a new TARGET option to the BASE_BACKUP command. Currently, the only supported target is 'client'. Support for additional targets will be added in a later commit. Patch by me. The patch set of which this is a part has had review and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage, Dipesh Pandit, and Mark Dilger. Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Fix thinko in assertion in basebackup.c.Robert Haas2021-11-10
| | | | | | | Commit 5a1007a5088cd6ddf892f7422ea8dbaef362372f tried to introduce an assertion that the block size was at least twice the size of a tar block, but I got the math wrong. My error was reported to me off-list.
* Have the server properly terminate tar archives.Robert Haas2021-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | Earlier versions of PostgreSQL featured a version of pg_basebackup that wanted to edit tar archives but was too dumb to parse them properly. The server made things easier for the client by failing to add the two blocks of zero bytes that ought to end a tar file, leaving it up to the client to do that. But since commit 23a1c6578c87fca0e361c4f5f9a07df5ae1f9858, we don't need this hack any more, because pg_basebackup is now smarter and can parse tar files even if they are properly terminated! So change the server to always properly terminate the tar files. Older versions of pg_basebackup can't talk to new servers anyway, so there's no compatibility break. On the pg_basebackup side, we see still need to add the terminating zero bytes if we're talking to an older server, but not when the server is v15+. Hopefully at some point we'll be able to remove some of this compatibility cruft, but it seems best to hang on to it for now. In passing, add a file header comment to bbstreamer_tar.c, to make it clearer what's going on here. Discussion: http://postgr.es/m/CA+TgmoZbNzsWwM4BE5Jb_qHncY817DYZwGf+2-7hkMQ27ZwsMQ@mail.gmail.com
* Introduce 'bbsink' abstraction to modularize base backup code.Robert Haas2021-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The base backup code has accumulated a healthy number of new features over the years, but it's becoming increasingly difficult to maintain and further enhance that code because there's no real separation of concerns. For example, the code that understands knows the details of how we send data to the client using the libpq protocol is scattered throughout basebackup.c, rather than being centralized in one place. To try to improve this situation, introduce a new 'bbsink' object which acts as a recipient for archives generated during the base backup progress and also for the backup manifest. This commit introduces three types of bbsink: a 'copytblspc' bbsink forwards the backup to the client using one COPY OUT operation per tablespace and another for the manifest, a 'progress' bbsink performs command progress reporting, and a 'throttle' bbsink performs rate-limiting. The 'progress' and 'throttle' bbsink types also forward the data to a successor bbsink; at present, the last bbsink in the chain will always be of type 'copytblspc'. There are plans to add more types of 'bbsink' in future commits. This abstraction is a bit leaky in the case of progress reporting, but this still seems cleaner than what we had before. Patch by me, reviewed and tested by Andres Freund, Sumanta Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger, and Jeevan Ladhe. Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
* When fetching WAL for a basebackup, report errors with a sensible TLI.Robert Haas2021-10-29
| | | | | | | | | | | | | | | | | | | | The previous code used ThisTimeLineID, which need not even be initialized here, although it usually was in practice, because pg_basebackup issues IDENTIFY_SYSTEM before calling BASE_BACKUP, and that initializes ThisTimeLineID as a side effect. That's not really good enough, though, not only because we shoudn't be counting on side effects like that, but also because the TLI could change meanwhile. Fortunately, we have convenient access to more meaningful TLI values, so use those instead. Because of the way this logic is coded, the consequences of using a possibly-incorrect TLI here are no worse than a slightly confusing error message, I don't want to take any risk here, so no back-patch at least for now. Patch by me, reviewed by Kyotaro Horiguchi and Michael Paquier Discussion: http://postgr.es/m/CA+TgmoZRNWGWYDX9RgTXMG6_nwSdB=PB-PPRUbvMUTGfmL2sHQ@mail.gmail.com
* Refactor basebackup.c's _tarWriteDir() function.Robert Haas2021-10-12
| | | | | | | | | | | | | | | Sometimes, we replace a symbolic link that we find in the data directory with an actual directory within the tarfile that we create. _tarWriteDir was responsible both for making this substitution and also for writing the tar header for the resulting directory into the tar file. Make it do only the first of those things, and rename to convert_link_to_directory. Substantially larger refactoring of this source file is planned, but this little bit seemed to make sense to commit independently. Discussion: http://postgr.es/m/CA+Tgmobz6tuv5tr-WxURe5JA1vVcGz85k4kkvoWxcyHvDpEqFA@mail.gmail.com
* Flexible options for BASE_BACKUP.Robert Haas2021-10-05
| | | | | | | | | | | | | | | | | | | | | | | | Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's hard to extend. Instead, adopt the same kind of syntax we've used for SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's not necessary for all of the option names to be parser keywords. In the new syntax, most of the options now take an optional Boolean argument. To match our practice in other in places, the options which the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the new syntax called WAIT and VERIFY_CHECKUMS, and the default value is false. In the new syntax, the FAST option has been replaced by a CHECKSUM option whose value may be 'fast' or 'spread'. This commit does not remove support for the old syntax. It just adds the new one as an additional option, and makes pg_basebackup prefer the new syntax when the server is new enough to support it. Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov, Fujii Masao, and Tushar Ahuja. Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
* Initial pgindent and pgperltidy run for v14.Tom Lane2021-05-12
| | | | | | | | Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.
* Use correct format placeholder for block numbersPeter Eisentraut2021-04-17
| | | | Should be %u rather than %d.
* Code review for server's handling of "tablespace map" files.Tom Lane2021-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While looking at Robert Foggia's report, I noticed a passel of other issues in the same area: * The scheme for backslash-quoting newlines in pathnames is just wrong; it will misbehave if the last ordinary character in a pathname is a backslash. I'm not sure why we're bothering to allow newlines in tablespace paths, but if we're going to do it we should do it without introducing other problems. Hence, backslashes themselves have to be backslashed too. * The author hadn't read the sscanf man page very carefully, because this code would drop any leading whitespace from the path. (I doubt that a tablespace path with leading whitespace could happen in practice; but if we're bothering to allow newlines in the path, it sure seems like leading whitespace is little less implausible.) Using sscanf for the task of finding the first space is overkill anyway. * While I'm not 100% sure what the rationale for escaping both \r and \n is, if the idea is to allow Windows newlines in the file then this code failed, because it'd throw an error if it saw \r followed by \n. * There's no cross-check for an incomplete final line in the map file, which would be a likely apparent symptom of the improper-escaping bug. On the generation end, aside from the escaping issue we have: * If needtblspcmapfile is true then do_pg_start_backup will pass back escaped strings in tablespaceinfo->path values, which no caller wants or is prepared to deal with. I'm not sure if there's a live bug from that, but it looks like there might be (given the dubious assumption that anyone actually has newlines in their tablespace paths). * It's not being very paranoid about the possibility of random stuff in the pg_tblspc directory. IMO we should ignore anything without an OID-like name. The escaping rule change doesn't seem back-patchable: it'll require doubling of backslashes in the tablespace_map file, which is basically a basebackup format change. The odds of that causing trouble are considerably more than the odds of the existing bug causing trouble. The rest of this seems somewhat unlikely to cause problems too, so no back-patch.
* Simplify printing of LSNsPeter Eisentraut2021-02-23
| | | | | | | | | | Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs. Convert all applicable code to use it. Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Revert "Add key management system" (978f869b99) & later commitsBruce Momjian2020-12-27
| | | | | | | | | | The patch needs test cases, reorganization, and cfbot testing. Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive) and 08db7c63f3..ccbe34139b. Reported-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
* Add key management systemBruce Momjian2020-12-25
| | | | | | | | | | | | | | | This adds a key management system that stores (currently) two data encryption keys of length 128, 192, or 256 bits. The data keys are AES256 encrypted using a key encryption key, and validated via GCM cipher mode. A command to obtain the key encryption key must be specified at initdb time, and will be run at every database server start. New parameters allow a file descriptor open to the terminal to be passed. pg_upgrade support has also been added. Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us Author: Masahiko Sawada, me, Stephen Frost
* Change SHA2 implementation based on OpenSSL to use EVP digest routinesMichael Paquier2020-12-04
| | | | | | | | | | | | | | | | | | | | | | | | The use of low-level hash routines is not recommended by upstream OpenSSL since 2000, and pgcrypto already switched to EVP as of 5ff4a67. This takes advantage of the refactoring done in 87ae969 that has introduced the allocation and free routines for cryptographic hashes. Since 1.1.0, OpenSSL does not publish the contents of the cryptohash contexts, forcing any consumers to rely on OpenSSL for all allocations. Hence, the resource owner callback mechanism gains a new set of routines to track and free cryptohash contexts when using OpenSSL, preventing any risks of leaks in the backend. Nothing is needed in the frontend thanks to the refactoring of 87ae969, and the resowner knowledge is isolated into cryptohash_openssl.c. Note that this also fixes a failure with SCRAM authentication when using FIPS in OpenSSL, but as there have been few complaints about this problem and as this causes an ABI breakage, no backpatch is done. Author: Michael Paquier Reviewed-by: Daniel Gustafsson, Heikki Linnakangas Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz Discussion: https://postgr.es/m/20180911030250.GA27115@paquier.xyz
* Move SHA2 routines to a new generic API layer for crypto hashesMichael Paquier2020-12-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two new routines to allocate a hash context and to free it are created, as these become necessary for the goal behind this refactoring: switch the all cryptohash implementations for OpenSSL to use EVP (for FIPS and also because upstream does not recommend the use of low-level cryptohash functions for 20 years). Note that OpenSSL hides the internals of cryptohash contexts since 1.1.0, so it is necessary to leave the allocation to OpenSSL itself, explaining the need for those two new routines. This part is going to require more work to properly track hash contexts with resource owners, but this not introduced here. Still, this refactoring makes the move possible. This reduces the number of routines for all SHA2 implementations from twelve (SHA{224,256,386,512} with init, update and final calls) to five (create, free, init, update and final calls) by incorporating the hash type directly into the hash context data. The new cryptohash routines are moved to a new file, called cryptohash.c for the fallback implementations, with SHA2 specifics becoming a part internal to src/common/. OpenSSL specifics are part of cryptohash_openssl.c. This infrastructure is usable for more hash types, like MD5 or HMAC. Any code paths using the internal SHA2 routines are adapted to report correctly errors, which are most of the changes of this commit. The zones mostly impacted are checksum manifests, libpq and SCRAM. Note that e21cbb4 was a first attempt to switch SHA2 to EVP, but it lacked the refactoring needed for libpq, as done here. This patch has been tested on Linux and Windows, with and without OpenSSL, and down to 1.0.1, the oldest version supported on HEAD. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
* Message fixes and style improvementsPeter Eisentraut2020-09-14
|
* code: replace 'master' with 'primary' where appropriate.Andres Freund2020-07-08
| | | | | | | | | Also changed "in the primary" to "on the primary", and added a few "the" before "primary". Author: Andres Freund Reviewed-By: David Steele Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
* Improve server code to read files as part of a base backup.Robert Haas2020-06-17
| | | | | | | | | | | | | | Don't use fread(), since that doesn't necessarily set errno. We could use read() instead, but it's even better to use pg_pread(), which allows us to avoid some extra calls to seek to the desired location in the file. Also, advertise a wait event while reading from a file, as we do for most other places where we're reading data from files. Patch by me, reviewed by Hamid Akhtar. Discussion: http://postgr.es/m/CA+TgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA@mail.gmail.com
* Minor code cleanup for perform_base_backup().Robert Haas2020-06-17
| | | | | | | | | | | | Merge two calls to sendDir() that are exactly the same except for the fifth argument. Adjust comments to match. Also, don't bother checking whether tblspc_map_file is NULL. We initialize it in all cases, so it can't be. Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi. Discussion: http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
* Don't export basebackup.c's sendTablespace().Robert Haas2020-06-17
| | | | | | | | | | | | | Commit 72d422a5227ef6f76f412486a395aba9f53bf3f0 made xlog.c call sendTablespace() with the 'sizeonly' argument set to true, which required basebackup.c to export sendTablespace(). However, that's kind of ugly, so instead defer the call to sendTablespace() until basebackup.c regains control. That way, it can still be a static function. Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi. Discussion: http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
* Assorted cleanup of tar-related code.Robert Haas2020-06-15
| | | | | | | | | | | | | Introduce TAR_BLOCK_SIZE and replace many instances of 512 with the new constant. Introduce function tarPaddingBytesRequired and use it to replace numerous repetitions of (x + 511) & ~511. Add preprocessor guards against multiple inclusion to pgtar.h. Reformat the prototype for tarCreateHeader so it doesn't extend beyond 80 characters. Discussion: http://postgr.es/m/CA+TgmobWbfReO9-XFk8urR1K4wTNwqoHx_v56t7=T8KaiEoKNw@mail.gmail.com
* Rename SLRU structures and associated LWLocks.Tom Lane2020-05-15
| | | | | | | | | | | | | | | | | | | | | | | | | | Originally, the names assigned to SLRUs had no purpose other than being shmem lookup keys, so not a lot of thought went into them. As of v13, though, we're exposing them in the pg_stat_slru view and the pg_stat_reset_slru function, so it seems advisable to take a bit more care. Rename them to names based on the associated on-disk storage directories (which fortunately we *did* think about, to some extent; since those are also visible to DBAs, consistency seems like a good thing). Also rename the associated LWLocks, since those names are likewise user-exposed now as wait event names. For the most part I only touched symbols used in the respective modules' SimpleLruInit() calls, not the names of other related objects. This renaming could have been taken further, and maybe someday we will do so. But for now it seems undesirable to change the names of any globally visible functions or structs, so some inconsistency is unavoidable. (But I *did* terminate "oldserxid" with prejudice, as I found that name both unreadable and not descriptive of the SLRU's contents.) Table 27.12 needs re-alphabetization now, but I'll leave that till after the other LWLock renamings I have in mind. Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
* Rename exposed identifiers to say "backup manifest".Robert Haas2020-04-23
| | | | | | | | | | Function names declared "extern" now use BackupManifest in the name rather than just Manifest, and data types use backup_manifest rather than just manifest. Per note from Michael Paquier. Discussion: http://postgr.es/m/20200418125713.GG350229@paquier.xyz
* Move the server's backup manifest code to a separate file.Robert Haas2020-04-20
| | | | | | | | basebackup.c is already a pretty big and complicated file, so it makes more sense to keep the backup manifest support routines in a separate file, for clarity and ease of maintenance. Discussion: http://postgr.es/m/CA+TgmoavRak5OdP76P8eJExDYhPEKWjMb0sxW7dF01dWFgE=uA@mail.gmail.com
* Fix collection of typos and grammar mistakes in the treeMichael Paquier2020-04-10
| | | | | | | This fixes some comments and documentation new as of Postgres 13. Author: Justin Pryzby Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
* Exclude backup_manifest file that existed in database, from BASE_BACKUP.Fujii Masao2020-04-09
| | | | | | | | | | | | | If there is already a backup_manifest file in the database cluster, it belongs to the past backup that was used to start this server. It is not correct for the backup being taken now. So this commit changes pg_basebackup so that it always skips such backup_manifest file. The backup_manifest file for the current backup will be injected separately if users want it. Author: Fujii Masao Reviewed-by: Robert Haas Discussion: https://postgr.es/m/78f76a3d-1a28-a97d-0394-5c96985dd1c0@oss.nttdata.com
* Be more careful about time_t vs. pg_time_t in basebackup.c.Robert Haas2020-04-03
| | | | | | | | | | | | | | | lapwing is complaining that about a call to pg_gmtime, saying that it "expected 'const pg_time_t *' but argument is of type 'time_t *'". I at first thought that the problem had someting to do with const, but Thomas Munro suggested that it might be just because time_t and pg_time_t are different identifers. lapwing is i686 rather than x86_64, and pg_time_t is always int64, so that seems like a good guess. There is other code that just casts time_t to pg_time_t without any conversion function, so try that approach here. Introduced in commit 0d8c9c1210c44b36ec2efcb223a1dfbe897a3661.
* Generate backup manifests for base backups, and validate them.Robert Haas2020-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A manifest is a JSON document which includes (1) the file name, size, last modification time, and an optional checksum for each file backed up, (2) timelines and LSNs for whatever WAL will need to be replayed to make the backup consistent, and (3) a checksum for the manifest itself. By default, we use CRC-32C when checksumming data files, because we are trying to detect corruption and user error, not foil an adversary. However, pg_basebackup and the server-side BASE_BACKUP command now have options to select a different algorithm, so users wanting a cryptographic hash function can select SHA-224, SHA-256, SHA-384, or SHA-512. Users not wanting file checksums at all can disable them, or disable generating of the backup manifest altogether. Using a cryptographic hash function in place of CRC-32C consumes significantly more CPU cycles, which may slow down backups in some cases. A new tool called pg_validatebackup can validate a backup against the manifest. If no checksums are present, it can still check that the right files exist and that they have the expected sizes. If checksums are present, it can also verify that each file has the expected checksum. Additionally, it calls pg_waldump to verify that the expected WAL files are present and parseable. Only plain format backups can be validated directly, but tar format backups can be validated after extracting them. Robert Haas, with help, ideas, review, and testing from David Steele, Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan Chalke, Amit Kapila, Andres Freund, and Noah Misch. Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
* Report NULL as total backup size if it's not estimated.Fujii Masao2020-03-24
| | | | | | | | | | | | | | Previously 0 was reported in pg_stat_progress_basebackup.total_backup if the total backup size was not estimated. Per discussion, our consensus is that NULL is better choise as the value in total_backup in that case. So this commit makes pg_stat_progress_basebackup view report NULL in total_backup column if the estimation is disabled. Bump catversion. Author: Fujii Masao Reviewed-by: Amit Langote, Magnus Hagander, Alvaro Herrera Discussion: https://postgr.es/m/CABUevExnhOD89zBDuPvfAAh243RzNpwCPEWNLtMYpKHMB8gbAQ@mail.gmail.com