aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Update comment in heapgetpage() regarding PD_ALL_VISIBLE vs. Hot Standby.Robert Haas2012-12-14
| | | | Pavan Deolasee, slightly modified by me
* NLS: Use msgmerge --previous optionPeter Eisentraut2012-12-13
| | | | It provides some additional help to translators.
* Allow a streaming replication standby to follow a timeline switch.Heikki Linnakangas2012-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, streaming replication would refuse to start replicating if the timeline in the primary doesn't exactly match the standby. The situation where it doesn't match is when you have a master, and two standbys, and you promote one of the standbys to become new master. Promoting bumps up the timeline ID, and after that bump, the other standby would refuse to continue. There's significantly more timeline related logic in streaming replication now. First of all, when a standby connects to primary, it will ask the primary for any timeline history files that are missing from the standby. The missing files are sent using a new replication command TIMELINE_HISTORY, and stored in standby's pg_xlog directory. Using the timeline history files, the standby can follow the latest timeline present in the primary (recovery_target_timeline='latest'), just as it can follow new timelines appearing in an archive directory. START_REPLICATION now takes a TIMELINE parameter, to specify exactly which timeline to stream WAL from. This allows the standby to request the primary to send over WAL that precedes the promotion. The replication protocol is changed slightly (in a backwards-compatible way although there's little hope of streaming replication working across major versions anyway), to allow replication to stop when the end of timeline reached, putting the walsender back into accepting a replication command. Many thanks to Amit Kapila for testing and reviewing various versions of this patch.
* Make xlog_internal.h includable in frontend context.Heikki Linnakangas2012-12-13
| | | | | | | This makes unnecessary the ugly hack used to #include postgres.h in pg_basebackup. Based on Alvaro Herrera's patch
* In multi-insert, don't go into infinite loop on a huge tuple and fillfactor.Heikki Linnakangas2012-12-12
| | | | | | | | | | | | | | | | | | If a tuple is larger than page size minus space reserved for fillfactor, heap_multi_insert would never find a page that it fits in and repeatedly ask for a new page from RelationGetBufferForTuple. If a tuple is too large to fit on any page, taking fillfactor into account, RelationGetBufferForTuple will always expand the relation. In a normal insert, heap_insert will accept that and put the tuple on the new page. heap_multi_insert, however, does a fillfactor check of its own, and doesn't accept the newly-extended page RelationGetBufferForTuple returns, even though there is no other choice to make the tuple fit. Fix that by making the logic in heap_multi_insert more like the heap_insert logic. The first tuple is always put on the page RelationGetBufferForTuple gives us, and the fillfactor check is only applied to the subsequent tuples. Report from David Gould, although I didn't use his patch.
* Add defenses against integer overflow in dynahash numbuckets calculations.Tom Lane2012-12-11
| | | | | | | | | | | | | | | The dynahash code requires the number of buckets in a hash table to fit in an int; but since we calculate the desired hash table size dynamically, there are various scenarios where we might calculate too large a value. The resulting overflow can lead to infinite loops, division-by-zero crashes, etc. I (tgl) had previously installed some defenses against that in commit 299d1716525c659f0e02840e31fbe4dea3, but that covered only one call path. Moreover it worked by limiting the request size to work_mem, but in a 64-bit machine it's possible to set work_mem high enough that the problem appears anyway. So let's fix the problem at the root by installing limits in the dynahash.c functions themselves. Trouble report and patch by Jeff Davis.
* Disable event triggers in standalone mode.Tom Lane2012-12-11
| | | | | | | Per discussion, this seems necessary to allow recovery from broken event triggers, or broken indexes on pg_event_trigger. Dimitri Fontaine
* Fix performance problems with autovacuum truncation in busy workloads.Kevin Grittner2012-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
* Consistency check should compare last record replayed, not last record read.Heikki Linnakangas2012-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | EndRecPtr is the last record that we've read, but not necessarily yet replayed. CheckRecoveryConsistency should compare minRecoveryPoint with the last replayed record instead. This caused recovery to think it's reached consistency too early. Now that we do the check in CheckRecoveryConsistency correctly, we have to move the call of that function to after redoing a record. The current place, after reading a record but before replaying it, is wrong. In particular, if there are no more records after the one ending at minRecoveryPoint, we don't enter hot standby until one extra record is generated and read by the standby, and CheckRecoveryConsistency is called. These two bugs conspired to make the code appear to work correctly, except for the small window between reading the last record that reaches minRecoveryPoint, and replaying it. In the passing, rename recoveryLastRecPtr, which is the last record replayed, to lastReplayedEndRecPtr. This makes it slightly less confusing with replayEndRecPtr, which is the last record read that we're about to replay. Original report from Kyotaro HORIGUCHI, further diagnosis by Fujii Masao. Backpatch to 9.0, where Hot Standby subtly changed the test from "minRecoveryPoint < EndRecPtr" to "minRecoveryPoint <= EndRecPtr". The former works because where the test is performed, we have always read one more record than we've replayed.
* Add mode where contrib installcheck runs each module in a separately named ↵Andrew Dunstan2012-12-11
| | | | | | | | | | | | | | | | | | | | | database. Normally each module is tested in a database named contrib_regression, which is dropped and recreated at the beginhning of each pg_regress run. This new mode, enabled by adding USE_MODULE_DB=1 to the make command line, runs most modules in a database with the module name embedded in it. This will make testing pg_upgrade on clusters with the contrib modules a lot easier. Second attempt at this, this time accomodating make versions older than 3.82. Still to be done: adapt to the MSVC build system. Backpatch to 9.0, which is the earliest version it is reasonably possible to test upgrading from.
* Update minimum recovery point on truncation.Heikki Linnakangas2012-12-10
| | | | | | | | | If a file is truncated, we must update minRecoveryPoint. Once a file is truncated, there's no going back; it would not be safe to stop recovery at a point earlier than that anymore. Per report from Kyotaro HORIGUCHI. Backpatch to 8.4. Before that, minRecoveryPoint was not updated during recovery at all.
* Fix the tracking of min recovery point timeline.Heikki Linnakangas2012-12-10
| | | | | | | | Forgot to update it at the right place. Also, consider checkpoint record that switches to new timelne to be on the new timeline. This fixes erroneous "requested timeline 2 does not contain minimum recovery point" errors, pointed out by Amit Kapila while testing another patch.
* Fix assorted bugs in privileges-for-types patch.Tom Lane2012-12-09
| | | | | | | | Commit 729205571e81b4767efc42ad7beb53663e08d1ff added privileges on data types, but there were a number of oversights. The implementation of default privileges for types missed a few places, and pg_dump was utterly innocent of the whole concept. Per bug #7741 from Nathan Alden, and subsequent wider investigation.
* Support automatically-updatable views.Tom Lane2012-12-08
| | | | | | | | | | | | | | | | | This patch makes "simple" views automatically updatable, without the need to create either INSTEAD OF triggers or INSTEAD rules. "Simple" views are those classified as updatable according to SQL-92 rules. The rewriter transforms INSERT/UPDATE/DELETE commands on such views directly into an equivalent command on the underlying table, which will generally have noticeably better performance than is possible with either triggers or user-written rules. A view that has INSTEAD OF triggers or INSTEAD rules continues to operate the same as before. For the moment, security_barrier views are not considered simple. Also, we do not support WITH CHECK OPTION. These features may be added in future. Dean Rasheed, reviewed by Amit Kapila
* Correct xmax test for COPY FREEZESimon Riggs2012-12-07
|
* Optimize COPY FREEZE with CREATE TABLE also.Simon Riggs2012-12-07
| | | | Jeff Davis, additional test by me
* Clarify that COPY FREEZE is not a hard rule.Simon Riggs2012-12-07
| | | | | Remove message when FREEZE not honoured, clarify reasons in comments and docs.
* Improve pl/pgsql to support composite-type expressions in RETURN.Tom Lane2012-12-06
| | | | | | | | | | | | | | | For some reason lost in the mists of prehistory, RETURN was only coded to allow a simple reference to a composite variable when the function's return type is composite. Allow an expression instead, while preserving the efficiency of the original code path in the case where the expression is indeed just a composite variable's name. Likewise for RETURN NEXT. As is true in various other places, the supplied expression must yield exactly the number and data types of the required columns. There was some discussion of relaxing that for pl/pgsql, but no consensus yet, so this patch doesn't address that. Asif Rehman, reviewed by Pavel Stehule
* Background worker processesAlvaro Herrera2012-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background workers are postmaster subprocesses that run arbitrary user-specified code. They can request shared memory access as well as backend database connections; or they can just use plain libpq frontend database connections. Modules listed in shared_preload_libraries can register background workers in their _PG_init() function; this is early enough that it's not necessary to provide an extra GUC option, because the necessary extra resources can be allocated early on. Modules can install more than one bgworker, if necessary. Care is taken that these extra processes do not interfere with other postmaster tasks: only one such process is started on each ServerLoop iteration. This means a large number of them could be waiting to be started up and postmaster is still able to quickly service external connection requests. Also, shutdown sequence should not be impacted by a worker process that's reasonably well behaved (i.e. promptly responds to termination signals.) The current implementation lets worker processes specify their start time, i.e. at what point in the server startup process they are to be started: right after postmaster start (in which case they mustn't ask for shared memory access), when consistent state has been reached (useful during recovery in a HOT standby server), or when recovery has terminated (i.e. when normal backends are allowed). In case of a bgworker crash, actions to take depend on registration data: if shared memory was requested, then all other connections are taken down (as well as other bgworkers), just like it were a regular backend crashing. The bgworker itself is restarted, too, within a configurable timeframe (which can be configured to be never). More features to add to this framework can be imagined without much effort, and have been discussed, but this seems good enough as a useful unit already. An elementary sample module is supplied. Author: Álvaro Herrera This patch is loosely based on prior patches submitted by KaiGai Kohei, and unsubmitted code by Simon Riggs. Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund, Heikki Linnakangas, Simon Riggs, Amit Kapila
* Fix intermittent crash in DROP INDEX CONCURRENTLY.Tom Lane2012-12-05
| | | | | | | | | | | | | | | When deleteOneObject closes and reopens the pg_depend relation, we must see to it that the relcache pointer held by the calling function (typically performMultipleDeletions) is updated. Usually the relcache entry is retained so that the pointer value doesn't change, which is why the problem had escaped notice ... but after a cache flush event there's no guarantee that the same memory will be reassigned. To fix, change the recursive functions' APIs so that we pass around a "Relation *" not just "Relation". Per investigation of occasional buildfarm failures. This is trivial to reproduce with -DCLOBBER_CACHE_ALWAYS, which points up the sad lack of any buildfarm member running that way on a regular basis.
* Update comment at top of index_createAlvaro Herrera2012-12-05
| | | | | | I neglected to update it in commit f4c4335. Michael Paquier
* Ensure recovery pause feature doesn't pause unless users can connect.Tom Lane2012-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | If we're not in hot standby mode, then there's no way for users to connect to reset the recoveryPause flag, so we shouldn't pause. The code was aware of this but the test to see if pausing was safe was seriously inadequate: it wasn't paying attention to reachedConsistency, and besides what it was testing was that we could legally enter hot standby, not that we have done so. Get rid of that in favor of checking LocalHotStandbyActive, which because of the coding in CheckRecoveryConsistency is tantamount to checking that we have told the postmaster to enter hot standby. Also, move the recoveryPausesHere() call that reacts to asynchronous recoveryPause requests so that it's not in the middle of application of a WAL record. I put it next to the recoveryStopsHere() call --- in future those are going to need to interact significantly, so this seems like a good waystation. Also, don't bother trying to read another WAL record if we've already decided not to continue recovery. This was no big deal when the code was written originally, but now that reading a record might entail actions like fetching an archive file, it seems a bit silly to do it like that. Per report from Jeff Janes and subsequent discussion. The pause feature needs quite a lot more work, but this gets rid of some indisputable bugs, and seems safe enough to back-patch.
* Oops, meant to change the comment in writeTimeLineHistory.Heikki Linnakangas2012-12-05
|
* Must not reach consistency before XLOG_BACKUP_RECORDSimon Riggs2012-12-05
| | | | | | | | | | When waiting for an XLOG_BACKUP_RECORD the minRecoveryPoint will be incorrect, so we must not declare recovery as consistent before we have seen the record. Major bug allowing recovery to end too early in some cases, allowing people to see inconsistent db. This patch to HEAD and 9.2, other fix required for 9.1 and 9.0 Simon Riggs and Andres Freund, bug report by Jeff Janes
* Attempt to un-break Windows builds with USE_LDAP.Tom Lane2012-12-04
| | | | | The buildfarm shows this case is entirely broken, and I'm betting the reason is lack of any include file.
* Include isinf.o in libecpg if isinf() is not available on the system.Michael Meskes2012-12-04
| | | | Patch done by Jiang Guiqing <jianggq@cn.fujitsu.com>.
* Downgrade a status message from LOG to DEBUG2.Heikki Linnakangas2012-12-04
| | | | | I never intended this to be anything other than a debugging aid, but forgot to change the level before committing.
* Write exact xlog position of timeline switch in the timeline history file.Heikki Linnakangas2012-12-04
| | | | | | | This allows us to do some more rigorous sanity checking for various incorrect point-in-time recovery scenarios, and provides more information for debugging purposes. It will also come handy in the upcoming patch to allow timeline switches to be replicated by streaming replication.
* In initdb.c, move auth warning code into main() from secondary function.Bruce Momjian2012-12-04
|
* Fix build of LDAP URL featurePeter Eisentraut2012-12-04
| | | | | | Some code was not ifdef'ed out for non-LDAP builds. patch from Bruce Momjian
* Track the timeline associated with minRecoveryPoint, for more sanity checks.Heikki Linnakangas2012-12-04
| | | | | | | | | | | | | This allows recovery to notice certain incorrect recovery scenarios. If a server has recovered to point X on timeline 5, and you restart recovery, it better be on timeline 5 when it reaches point X again, not on some timeline with a higher ID. This can happen e.g if you a standby server is shut down, a new timeline appears in the WAL archive, and the standby server is restarted. It will try to follow the new timeline, which is wrong because some WAL on the old timeline was already replayed before shutdown. Requires an initdb (or at least pg_resetxlog), because this adds a field to the control file.
* Add support for LDAP URLsPeter Eisentraut2012-12-03
| | | | Allow specifying LDAP authentication parameters as RFC 4516 LDAP URLs.
* In initdb.c, rename some newly created functions, and move the directoryBruce Momjian2012-12-03
| | | | | | creation and xlog symlink creation to separate functions. Per suggestions from Andrew Dunstan.
* Add initdb --sync-only option to sync the data directory to durableBruce Momjian2012-12-03
| | | | | | | | | | | | storage. Have pg_upgrade use it, and enable server options fsync=off and full_page_writes=off. Document that users turning fsync from off to on should run initdb --sync-only. [ Previous commit was incorrectly applied as a git merge. ]
* Revert initdb --sync-only patch that had incorrect commit messages.Bruce Momjian2012-12-03
|
* dummy commitBruce Momjian2012-12-03
|
* In pg_upgrade, fix bug where no users were dumped in pg_dumpallBruce Momjian2012-12-03
| | | | | | | binary-upgrade mode; instead only skip dumping the current user. This bug was introduced in during the removal of split_old_dump(). Bug discovered during local testing.
* Revert "Add mode where contrib installcheck runs each module in a separately ↵Andrew Dunstan2012-12-03
| | | | | | named database." This reverts commit e2b3c21b05c78c3a726b189242e41d4aa4422bf1.
* Avoid holding vmbuffer pin after VACUUM.Simon Riggs2012-12-03
| | | | | | | | | | | | During VACUUM if we pause to perform a cycle of index cleanup we drop the vmbuffer pin, so we should do the same thing when heap scan completes. This avoids holding vmbuffer pin across the main index cleanup in VACUUM, which could be minutes or hours longer than necessary for correctness. Bug report and suggested fix from Pavan Deolasee
* Attempt to unbreak MSVC builds broken by ↵Andrew Dunstan2012-12-03
| | | | | | f21bb9cfb5646e1793dcc9c0ea697bab99afa523. We can't use type uint, so use uint32.
* Refactor inCommit flag into generic delayChkpt flag.Simon Riggs2012-12-03
| | | | | | | | | | Rename PGXACT->inCommit flag into delayChkpt flag, and generalise comments to allow use in other situations, such as the forthcoming potential use in checksum patch. Replace wait loop to look for VXIDs with delayChkpt set. No user visible changes, not behaviour changes at present. Simon Riggs, reviewed and rebased by Jeff Davis
* Clarify locking for PageGetLSN() in XLogCheckBuffer()Simon Riggs2012-12-03
|
* Clarify when to use PageSetLSN/PageGetLSN().Simon Riggs2012-12-03
| | | | | | | Update README to explain prerequisites for correct access to LSN fields of a page. Independent chunk removed from checksums patch to reduce size of patch.
* Refactor the code implementing standby-mode logic.Heikki Linnakangas2012-12-03
| | | | | It is now easier to see that it's a state machine, making the code easier to understand overall.
* Add mode where contrib installcheck runs each module in a separately named ↵Andrew Dunstan2012-12-02
| | | | | | | | | | | | | | | | | database. Normally each module is tested in aq database named contrib_regression, which is dropped and recreated at the beginhning of each pg_regress run. This mode, enabled by adding USE_MODULE_DB=1 to the make command line, runs most modules in a database with the module name embedded in it. This will make testing pg_upgrade on clusters with the contrib modules a lot easier. Still to be done: adapt to the MSVC build system. Backpatch to 9.0, which is the earliest version it is reasonably possible to test upgrading from.
* Update time zone data files to tzdata release 2012j.Tom Lane2012-12-02
| | | | | DST law changes in Cuba, Israel, Jordan, Libya, Palestine, Western Samoa, and portions of Brazil.
* Reduce scope of changes for COPY FREEZE.Simon Riggs2012-12-02
| | | | | | | | Allow support only for freezing tuples by explicit command. Previous coding mistakenly extended slightly beyond what was agreed as correct on -hackers. So essentially a partial revoke of earlier work, leaving just the COPY FREEZE command.
* Don't advance checkPoint.nextXid near the end of a checkpoint sequence.Tom Lane2012-12-02
| | | | | | | | | | | | | | | | | | | | | | This reverts commit c11130690d6dca64267201a169cfb38c1adec5ef in favor of actually fixing the problem: namely, that we should never have been modifying the checkpoint record's nextXid at this point to begin with. The nextXid should match the state as of the checkpoint's logical WAL position (ie the redo point), not the state as of its physical position. It's especially bogus to advance it in some wal_levels and not others. In any case there is no need for the checkpoint record to carry the same nextXid shown in the XLOG_RUNNING_XACTS record just emitted by LogStandbySnapshot, as any replay operation will already have adopted that value as current. This fixes bug #7710 from Tarvi Pillessaar, and probably also explains bug #6291 from Daniel Farina, in that if a checkpoint were in progress at the instant of XID wraparound, the epoch bump would be lost as reported. (And, of course, these days there's at least a 50-50 chance of a checkpoint being in progress at any given instant.) Diagnosed by me and independently by Andres Freund. Back-patch to all branches supporting hot standby.
* Rearrange storage of data in xl_running_xacts.Simon Riggs2012-12-02
| | | | | | | | | | | | | Previously we stored all xids mixed together. Now we store top-level xids first, followed by all subxids. Also skip logging any subxids if the snapshot is suboverflowed, since there are potentially large numbers of them and they are not useful in that case anyway. Has value in the envisaged design for decoding of WAL. No planned effect on Hot Standby. Andres Freund, reviewed by me
* XidEpoch++ if wraparound during checkpoint.Simon Riggs2012-12-02
| | | | | | | | | | | | | | | | If wal_level = hot_standby we update the checkpoint nextxid, though in the case where a wraparound occurred half-way through a checkpoint we would neglect updating the epoch also. Updating the nextxid is arguably the wrong thing to do, but changing that may introduce subtle bugs into hot standby startup, while updating the value doesn't cause any known bugs yet. Minimal fix now to HEAD and backbranches, wider fix later in HEAD. Bug reported in #6291 by Daniel Farina and slightly differently in Cause analysis and recommended fixes from Tom Lane and Andres Freund. Applied patch is minimal version of Andres Freund's work.