aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Adjust comment in .history file to match recovery target specified. CommentSimon Riggs2010-03-19
| | | | | | | | present since 8.0 was never fully meaningful, since two recovery targets cannot be specified. Refactor recovery target type to make this change and associated code easier to understand. No change in function. Bug report arising from internal support question.
* Reset btpo.xact following recovery of btree delete page. Add btpo_xactSimon Riggs2010-03-19
| | | | | | | field into WAL record and reset it from there, rather than using FrozenTransactionId which can lead to some corner case bugs. Problem report and suggested route to a fix from Heikki, details by me.
* Add restartpoint_command option to recovery.conf. Fix bug in %r handlingHeikki Linnakangas2010-03-18
| | | | | | | | | in recovery_end_command, it always came out as 0 because InRedo was cleared before recovery_end_command was executed. Also, always take ControlFileLock when reading checkpoint location for %r. The recovery_end_command bug and the missing locking was present in 8.4 as well, that part of this patch will be backported separately.
* Remove incorrect comment from GetWriteRecPtr(): the return value is alwaysSimon Riggs2010-03-15
| | | | correct, as described in comments at start of xlog.c
* Add missing reset of need_initialization in reloptions code.Tom Lane2010-03-11
| | | | | This resulted in useless extra work during every call of parseRelOptions, but no bad effects other than that. Noted by Alvaro.
* pg_start_backup() can use a share lock to lock ControlFileLockItagaki Takahiro2010-03-10
| | | | | | | | | instead of an exclusive lock. The change is almost for code cleanup. Since there seems to be no performance benefits from it, backports should not be needed. Fujii Masao
* pgindent run for 9.0Bruce Momjian2010-02-26
|
* Make pg_stop_backup's reporting a bit more verbose in hopes of makingTom Lane2010-02-25
| | | | | | error cases less intimidating for novices. Per discussion. Greg Smith
* Clean up handling of XactReadOnly and RecoveryInProgress checks.Tom Lane2010-02-20
| | | | | | | | | | | | | | | | | | Add some checks that seem logically necessary, in particular let's make real sure that HS slave sessions cannot create temp tables. (If they did they would think that temp tables belonging to the master's session with the same BackendId were theirs. We *must* not allow myTempNamespace to become set in a slave session.) Change setval() and nextval() so that they are only allowed on temp sequences in a read-only transaction. This seems consistent with what we allow for table modifications in read-only transactions. Since an HS slave can't have a temp sequence, this also provides a nicer cure for the setval PANIC reported by Erik Rijkers. Make the error messages more uniform, and have them mention the specific command being complained of. This seems worth the trifling amount of extra code, since people are likely to see such messages a lot more than before.
* Don't use O_DIRECT when writing WAL files if archiving or streaming isHeikki Linnakangas2010-02-19
| | | | | | | | | | enabled. Bypassing the kernel cache is counter-productive in that case, because the archiver/walsender process will read from the WAL file soon after it's written, and if it's not cached the read will cause a physical read, eating I/O bandwidth available on the WAL drive. Also, walreceiver process does unaligned writes, so disable O_DIRECT in walreceiver process for that reason too.
* Fix STOP WAL LOCATION in backup history files no to return the nextItagaki Takahiro2010-02-19
| | | | | | | | | | | segment of XLOG_BACKUP_END record even if the the record is placed at a segment boundary. Furthermore the previous implementation could return nonexistent segment file name when the boundary is in segments that has "FE" suffix; We never use segments with "FF" suffix. Backpatch to 8.0, where hot backup was introduced. Reported by Fujii Masao.
* Stamp HEAD as 9.0devel, and update various places that were referring to 8.5Tom Lane2010-02-17
| | | | (hope I got 'em all). Per discussion, this release will be 9.0 not 8.5.
* When updating ShmemVariableCache from a checkpoint record, be sure to setTom Lane2010-02-17
| | | | | | | | | | all the values derived from oldestXid, not just that field. Brain fade in one of my patches associated with flat file removal, exposed by a report from Fujii Masao. With this change, xidVacLimit should always be valid, so remove a couple of bits of complexity associated with the previous assumption that sometimes it wouldn't get set right away.
* Replace the pg_listener-based LISTEN/NOTIFY mechanism with an in-memory queue.Tom Lane2010-02-16
| | | | | | | | | | | | In addition, add support for a "payload" string to be passed along with each notify event. This implementation should be significantly more efficient than the old one, and is also more compatible with Hot Standby usage. There is not yet any facility for HS slaves to receive notifications generated on the master, although such a thing is possible in future. Joachim Wieland, reviewed by Jeff Davis; also hacked on by me.
* Wrap calls to SearchSysCache and related functions using macros.Robert Haas2010-02-14
| | | | | | | | | | | | The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.
* Fix relcache init file invalidation during Hot Standby for the caseSimon Riggs2010-02-13
| | | | | | | | where a database has a non-default tablespaceid. Pass thru MyDatabaseId and MyDatabaseTableSpace to allow file path to be re-created in standby and correct invalidation to take place in all cases. Update and rework xact_commit_desc() debug messages. Bug report from Tom by code inspection. Fix by me.
* Introduce WAL records to log reuse of btree pages, allowing conflictSimon Riggs2010-02-13
| | | | | resolution during Hot Standby. Page reuse interlock requested by Tom. Analysis and patch by me.
* Reduce the chatter to the log when starting a standby server. Don'tHeikki Linnakangas2010-02-12
| | | | | | | | | echo all the recovery.conf options. Don't emit the "initializing recovery connections" message, which doesn't mean anything to a user. Remove the "starting archive recovery" message and replace the "automatic recovery in progress" message with a more informative message saying whether the server is doing PITR, normal archive recovery, or standby mode.
* If primary_conninfo is not set, don't try to establish streamingHeikki Linnakangas2010-02-12
| | | | connection.
* Check for partial WAL files in standby mode. If restore_command restoresHeikki Linnakangas2010-02-12
| | | | | | | a partial WAL file, assume it's because the file is just being copied to the archive and treat it the same as "file not found" in standby mode. pg_standby has a similar check, so it seems reasonable to have the same level of protection in the built-in standby mode.
* Generic implementation of red-black binary tree. It's planned to use inTeodor Sigaev2010-02-11
| | | | | | several places, but for now only GIN uses it during index creation. Using self-balanced tree greatly speeds up index creation in corner cases with preordered data.
* Now that streaming replication switches between streaming mode andHeikki Linnakangas2010-02-10
| | | | | | | restoring from archive, the last WAL segment is not necessarily open at the end of recovery. Fix assertion that assumed that. Fujii Masao, fixing the assertion failure reported by Martin Pihlak.
* Fix up rickety handling of relation-truncation interlocks.Tom Lane2010-02-09
| | | | | | | | | | | | | | | | | | | | Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr relation entries, so that they will get reset to InvalidBlockNumber whenever an smgr-level flush happens. Because we now send smgr invalidation messages immediately (not at end of transaction) when a relation truncation occurs, this ensures that other backends will reset their values before they next access the relation. We no longer need the unreliable assumption that a VACUUM that's doing a truncation will hold its AccessExclusive lock until commit --- in fact, we can intentionally release that lock as soon as we've completed the truncation. This patch therefore reverts (most of) Alvaro's patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can also get rid of assorted no-longer-needed relcache flushes, which are far more expensive than an smgr flush because they kill a lot more state. In passing this patch fixes smgr_redo's failure to perform visibility-map truncation, and cleans up some rather dubious assumptions in freespace.c and visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of date.
* Fix bug in GIN WAL redo cleanup function: don't free fake relcache entryHeikki Linnakangas2010-02-09
| | | | | | while it's still being used. Backpatch to 8.4, where the fake relcache method was introduced.
* Remove piece of code to zero out minRecoveryPoint when starting crashHeikki Linnakangas2010-02-08
| | | | | | | | | recovery. It's zeroed out whenever a checkpoint is written, so the only scenario where the removed code did anything is when you kill archive recovery, remove recovery.conf, and start up the server, so that it goes into crash recovery instead. That's a "don't do that" scenario, but it seems better to not clear minRecoveryPoint but instead update it like we do in archive recovery, which is what will now happen.
* Remove some more dead VACUUM-FULL-only code.Tom Lane2010-02-08
|
* Remove old-style VACUUM FULL (which was known for a little while asTom Lane2010-02-08
| | | | | | | | | | | | | | | | | VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.
* Create a "relation mapping" infrastructure to support changing the relfilenodesTom Lane2010-02-07
| | | | | | | | | | | | | | | | | | | | | | | of shared or nailed system catalogs. This has two key benefits: * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs. * We no longer have to use an unsafe reindex-in-place approach for reindexing shared catalogs. CLUSTER on nailed catalogs now works too, although I left it disabled on shared catalogs because the resulting pg_index.indisclustered update would only be visible in one database. Since reindexing shared system catalogs is now fully transactional and crash-safe, the former special cases in REINDEX behavior have been removed; shared catalogs are treated the same as non-shared. This commit does not do anything about the recently-discussed problem of deadlocks between VACUUM FULL/CLUSTER on a system catalog and other concurrent queries; will address that in a separate patch. As a stopgap, parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid such failures during the regression tests.
* Restructure CLUSTER/newstyle VACUUM FULL/ALTER TABLE support so that swappingTom Lane2010-02-04
| | | | | | | | | | | | | | | | | | | of old and new toast tables can be done either at the logical level (by swapping the heaps' reltoastrelid links) or at the physical level (by swapping the relfilenodes of the toast tables and their indexes). This is necessary infrastructure for upcoming changes to support CLUSTER/VAC FULL on shared system catalogs, where we cannot change reltoastrelid. The physical swap saves a few catalog updates too. We unfortunately have to keep the logical-level swap logic because in some cases we will be adding or deleting a toast table, so there's no possibility of a physical swap. However, that only happens as a consequence of schema changes in the table, which we do not need to support for system catalogs, so such cases aren't an obstacle for that. In passing, refactor the cluster support functions a little bit to eliminate unnecessarily-duplicated code; and fix the problem that while CLUSTER had been taught to rename the final toast table at need, ALTER TABLE had not.
* Move the responsibility of writing a "unlogged WAL operation" record fromHeikki Linnakangas2010-02-03
| | | | | | heap_sync() to the callers, because heap_sync() is sometimes called even if the operation itself is WAL-logged. This eliminates the bogus unlogged records from CLUSTER that Simon Riggs reported, patch by Fujii Masao.
* Revoke augmentation of WAL records for btree delete, per discussion.Simon Riggs2010-02-01
|
* Augment WAL records for btree delete with GetOldestXmin() to reduceSimon Riggs2010-01-29
| | | | | | | | false positives during Hot Standby conflict processing. Simple patch to enhance conflict processing, following previous discussions. Controlled by parameter minimize_standby_conflicts = on | off, with default off allows measurement of performance impact to see whether it should be set on all the time.
* Filter recovery conflicts based upon dboid from relfilenode of WALSimon Riggs2010-01-29
| | | | | | | | records for heap and btree. Minor change, mostly API changes to pass through the required values. This is a simple change though also provides the refactoring required for further enhancements to conflict processing using the relOid. Changes only have effect during Hot Standby.
* Fix crashing bug at the end of recovery in Streaming Replication, whenHeikki Linnakangas2010-01-28
| | | | restore_command is not given. Fujii Masao.
* Fix bug in wasender's xlogid boundary handling, reported by Erik Rijkers.Heikki Linnakangas2010-01-27
| | | | | | | | LogwrtRqst.Write can be set to non-existent FF log segment, we mustn't try to send that in XLogSend(). Also fix similar bug in ReadRecord(), which I just introduced in the ReadRecord() refactoring patch.
* Make standby server continuously retry restoring the next WAL segment withHeikki Linnakangas2010-01-27
| | | | | | | | | | | | | | | | | | | | | | | | restore_command, if the connection to the primary server is lost. This ensures that the standby can recover automatically, if the connection is lost for a long time and standby falls behind so much that the required WAL segments have been archived and deleted in the master. This also makes standby_mode useful without streaming replication; the server will keep retrying restore_command every few seconds until the trigger file is found. That's the same basic functionality pg_standby offers, but without the bells and whistles. To implement that, refactor the ReadRecord/FetchRecord functions. The FetchRecord() function introduced in the original streaming replication patch is removed, and all the retry logic is now in a new function called XLogReadPage(). XLogReadPage() is now responsible for executing restore_command, launching walreceiver, and waiting for new WAL to arrive from primary, as required. This also changes the life cycle of walreceiver. When launched, it now only tries to connect to the master once, and exits if the connection fails, or is lost during streaming for any reason. The startup process detects the death, and re-launches walreceiver if necessary.
* Fix longstanding gripe that we check for 0000000001.history at start ofSimon Riggs2010-01-26
| | | | archive recovery, even when we know it is never present.
* Fix assorted core dumps and Assert failures that could occur duringTom Lane2010-01-24
| | | | | | | | | | | | | | AbortTransaction or AbortSubTransaction, when trying to clean up after an error that prevented (sub)transaction start from completing: * access to TopTransactionResourceOwner that might not exist * assert failure in AtEOXact_GUC, if AtStart_GUC not called yet * assert failure or core dump in AfterTriggerEndSubXact, if AfterTriggerBeginSubXact not called yet Per testing by injecting elog(ERROR) at successive steps in StartTransaction and StartSubTransaction. It's not clear whether all of these cases could really occur in the field, but at least one of them is easily exposed by simple stress testing, as per my accidental discovery yesterday.
* In HS, Startup process sets SIGALRM when waiting for buffer pin. IfSimon Riggs2010-01-23
| | | | | | | woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock.
* Replace ALTER TABLE ... SET STATISTICS DISTINCT with a more general mechanism.Robert Haas2010-01-22
| | | | | | | | | Attributes can now have options, just as relations and tablespaces do, and the reloptions code is used to parse, validate, and store them. For simplicity and because these options are not performance critical, we store them in a separate cache rather than the main relcache. Thanks to Alex Hunsaker for the review.
* Write a WAL record whenever we perform an operation without WAL-loggingHeikki Linnakangas2010-01-20
| | | | | | | | that would've been WAL-logged if archiving was enabled. If we encounter such records in archive recovery anyway, we know that some data is missing from the log. A WARNING is emitted in that case. Original patch by Fujii Masao, with changes by me.
* Fix incorrect comparison of scan key in GIN. Per report fromTeodor Sigaev2010-01-18
| | | | Vyacheslav Kalinin <vka@mgcp.com>
* Teach standby conflict resolution to use SIGUSR1Simon Riggs2010-01-16
| | | | | | | | | | Conflict reason is passed through directly to the backend, so we can take decisions about the effect of the conflict based upon the local state. No specific changes, as yet, though this prepares for later work. CancelVirtualTransaction() sends signals while holding ProcArrayLock. Introduce errdetail_abort() to give message detail explaining that the abort was caused by conflict processing. Remove CONFLICT_MODE states in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.
* Introduce Streaming Replication.Heikki Linnakangas2010-01-15
| | | | | | | | | | | | | | | | | | | | This includes two new kinds of postmaster processes, walsenders and walreceiver. Walreceiver is responsible for connecting to the primary server and streaming WAL to disk, while walsender runs in the primary server and streams WAL from disk to the client. Documentation still needs work, but the basics are there. We will probably pull the replication section to a new chapter later on, as well as the sections describing file-based replication. But let's do that as a separate patch, so that it's easier to see what has been added/changed. This patch also adds a new section to the chapter about FE/BE protocol, documenting the protocol used by walsender/walreceivxer. Bump catalog version because of two new functions, pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for monitoring the progress of replication. Fujii Masao, with additional hacking by me
* Add point_ops opclass for GiST.Teodor Sigaev2010-01-14
|
* First part of refactoring of code for ResolveRecoveryConflict. PurposesSimon Riggs2010-01-14
| | | | | | | | of this are to centralise the conflict code to allow further change, as well as to allow passing through the full reason for the conflict through to the conflicting backends. Backend state alters how we can handle different types of conflict so this is now required. As originally suggested by Heikki, no longer optional.
* Remove partial, broken support for NULL pointers when fetching attributes.Robert Haas2010-01-10
| | | | | | | | | | | | | | Previously, fastgetattr() and heap_getattr() tested their fourth argument against a null pointer, but any attempt to use them with a literal-NULL fourth argument evaluated to *(void *)0, resulting in a compiler error. Remove these NULL tests to avoid leading future readers of this code to believe that this has a chance of working. Also clean up related legacy code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr(). The new coding standard is that any code which calls a getattr-type function or macro which takes an isnull argument MUST pass a valid boolean pointer. Per discussion with Bruce Momjian, Tom Lane, Alvaro Herrera.
* During Hot Standby, set DatabasePath correctly during relcache init fileSimon Riggs2010-01-09
| | | | | | | | | | | | | | | deletion, so that we attempt to unlink the correct filepath. unlink() errors are ignorable there, so lack of a DatabasePath initialization step did not cause visible problems until a related bug showed up on Solaris. Code refactored from xact_redo_commit() to ProcessCommittedInvalidationMessages() in inval.c. Recovery may replay shared invalidation messages for many databases, so we cannot SetDatabasePath() once as we do in normal backends. Read the databaseid from the shared invalidation messages, then set DatabasePath temporarily before calling RelationCacheInitFileInvalidate(). Problem report by Robert Treat, analysis and fix by me.
* Remove all the special-case code for INT64_IS_BUSTED, per decision thatTom Lane2010-01-07
| | | | | | | | we're not going to support that anymore. I did keep the 64-bit-CRC-with-32-bit-arithmetic code, since it has a performance excuse to live. It's a bit moot since that's all ifdef'd out, of course.
* Support ALTER TABLESPACE name SET/RESET ( tablespace_options ).Robert Haas2010-01-05
| | | | | | | | | This patch only supports seq_page_cost and random_page_cost as parameters, but it provides the infrastructure to scalably support many more. In particular, we may want to add support for effective_io_concurrency, but I'm leaving that as future work for now. Thanks to Tom Lane for design help and Alvaro Herrera for the review.