aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Code review for LIKE ... INCLUDING INDEXES patch. Fix failure to propagateTom Lane2007-12-01
| | | | | | | | | | constraint status of copied indexes (bug #3774), as well as various other small bugs such as failure to pstrdup when needed. Allow INCLUDING INDEXES indexes to be merged with identical declared indexes (perhaps not real useful, but the code is there and having it not apply to LIKE indexes seems pretty unorthogonal). Avoid useless work in generateClonedIndexStmt(). Undo some poorly chosen API changes, and put a couple of routines in modules that seem to be better places for them.
* Avoid incrementing the CommandCounter when CommandCounterIncrement is calledTom Lane2007-11-30
| | | | | | | | | | | | | | | | | | | | but no database changes have been made since the last CommandCounterIncrement. This should result in a significant improvement in the number of "commands" that can typically be performed within a transaction before hitting the 2^32 CommandId size limit. In particular this buys back (and more) the possible adverse consequences of my previous patch to fix plan caching behavior. The implementation requires tracking whether the current CommandCounter value has been "used" to mark any tuples. CommandCounter values stored into snapshots are presumed not to be used for this purpose. This requires some small executor changes, since the executor used to conflate the curcid of the snapshot it was using with the command ID to mark output tuples with. Separating these concepts allows some small simplifications in executor APIs. Something for the TODO list: look into having CommandCounterIncrement not do AcceptInvalidationMessages. It seems fairly bogus to be doing it there, but exactly where to do it instead isn't clear, and I'm disinclined to mess with asynchronous behavior during late beta.
* Improve GIN index build's tracking of memory usage by usingTom Lane2007-11-16
| | | | | | | | | GetMemoryChunkSpace, not just the palloc request size. This brings the allocatedMemory counter close enough to reality (as measured by MemoryContextStats printouts) that I think we can get rid of the arbitrary factor-of-2 adjustment that was put into the code initially. Given the sensitivity of GIN build to work memory size, not using as much of work memory as we're allowed to seems a pretty bad idea.
* Repair still another bug in the btree page split WAL reduction patch:Tom Lane2007-11-16
| | | | | | | it failed for splits of non-leaf pages because in such pages the first data key on a page is suppressed, and so we can't just copy the first key from the right page to reconstitute the left page's high key. Problem found by Koichi Suzuki, patch by Heikki.
* Small comment spacing improvement.Bruce Momjian2007-11-16
|
* Fix pgindent to properly handle 'else' and single-line comments on theBruce Momjian2007-11-15
| | | | | same line; previous fix was only partial. Re-run pgindent on files that need it.
* Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian2007-11-15
| | | | avoid this problem in the future.)
* When logging the recovery.conf parameters, show them quoted as they wouldPeter Eisentraut2007-11-15
| | | | appear in the configuration file.
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Prevent re-use of a deleted relation's relfilenode until after the nextTom Lane2007-11-15
| | | | | | | | | | checkpoint. This guards against an unlikely data-loss scenario in which we re-use the relfilenode, then crash, then replay the deletion and recreation of the file. Even then we'd be OK if all insertions into the new relation had been WAL-logged ... but that's not guaranteed given all the no-WAL-logging optimizations that have recently been added. Patch by Heikki Linnakangas, per a discussion last month.
* Clean up some stray references to tsearch2.Tom Lane2007-11-13
|
* Ensure that typmod decoration on a datatype name is validated in all cases,Tom Lane2007-11-11
| | | | | | | | | | | | | | even in code paths where we don't pay any subsequent attention to the typmod value. This seems needed in view of the fact that 8.3's generalized typmod support will accept a lot of bogus syntax, such as "timestamp(foo)" or "record(int, 42)" --- if we allow such things to pass without comment, users will get confused. Per a recent example from Greg Stark. To implement this in a way that's not very vulnerable to future bugs-of-omission, refactor the API of parse_type.c's TypeName lookup routines so that typmod validation is folded into the base lookup operation. Callers can still choose not to receive the encoded typmod, but we'll check the decoration anyway if it's present.
* Reduce error level of ROLLBACK outside a transaction from WARNING toBruce Momjian2007-11-10
| | | | NOTICE.
* Use "alternative" instead of "alternate" where it is clearer.Peter Eisentraut2007-11-07
|
* - Add check of already changed page while replay WAL. This touches onlyTeodor Sigaev2007-10-29
| | | | | | | | | | | | | ginRedoInsert(), because other ginRedo* functions rewrite whole page or make changes which could be applied several times without consistent's loss - Remove check of identifying of corresponding split record: it's possible that replaying of WAL starts after actual page split, but before removing of that split from incomplete splits list. In this case, that check cause FATAL error. Per stress test which reproduces bug reported by Craig McElroy <craig.mcelroy@contegix.com>
* Fix coredump during replay WAL after crash. Change entrySplitPage() to preventTeodor Sigaev2007-10-29
| | | | | | | | usage of any information from system catalog, because it could be called during replay of WAL. Per bug report from Craig McElroy <craig.mcelroy@contegix.com>. Patch doesn't change on-disk storage.
* Rearrange vacuum-related bits in PGPROC as a bitmask, to better supportAlvaro Herrera2007-10-24
| | | | | | | | | having several of them. Add two more flags: whether the process is executing an ANALYZE, and whether a vacuum is for Xid wraparound (which is obviously only set by autovacuum). Sneakily move the worker's recently-acquired PostAuthDelay to a more useful place.
* Keep heap_page_prune from marking the buffer dirty when it didn'tTom Lane2007-10-24
| | | | | really change anything. Per report from Itagaki Takahiro. Fix by Pavan Deolasee.
* Tweak toast-related logic in heapam.c so that the toaster is only invokedTom Lane2007-10-16
| | | | | | | when relkind = RELKIND_RELATION. This syncs these tests with the Asserts in tuptoaster.c, and ensures that we won't ever try to, for example, compress a sequence's tuple. Problem found by Greg Stark while stress-testing with much-smaller-than-normal page sizes.
* When telling the bgwriter that we need a checkpoint because too much xlogTom Lane2007-10-12
| | | | | | | | | has been consumed, recheck against the latest value of RedoRecPtr before really sending the signal. This avoids useless checkpoint activity if XLogWrite is executed when we have a very stale local copy of RedoRecPtr. The potential for useless checkpoint is very much worse in 8.3 because of the walwriter process (which never does XLogInsert), so while this behavior was intentional, it needs to be changed. Per report from Itagaki Takahiro.
* Remove incorrect use of VARSIZE() on a toasted datum. We can just remove itTom Lane2007-10-11
| | | | | instead of fix it, since once we've set toast_action[i] to 'p' it no longer matters what toast_sizes[i] is. Greg Stark
* Avoid assuming that struct varattrib_pointer doesn't get padded by theTom Lane2007-10-01
| | | | | | | | | | | compiler --- at least on ARM, it does. I suspect that the varvarlena patch has been creating larger-than-intended toast pointers all along on ARM, but it wasn't exposed until the latest tweak added some Asserts that calculated the expected size in a different way. We could probably have fixed this by adding __attribute__((packed)) as is done for ItemPointerData, but struct varattrib_pointer isn't really all that useful anyway, so it seems cleanest to just get rid of it and have only struct varattrib_1b_e. Per results from buildfarm member quagga.
* Add an extra header byte to TOAST-pointer datums to represent their sizeTom Lane2007-09-30
| | | | | | | explicitly. This means a TOAST pointer takes 18 bytes instead of 17 --- still smaller than in 8.2 --- which seems a good tradeoff to ensure we won't have painted ourselves into a corner if we want to support multiple types of TOAST pointer later on. Per discussion with Greg Stark.
* Adjust recovery PS display as agreed with Simon: 'waiting for XXX'Tom Lane2007-09-30
| | | | | | while the restore_command does its thing, then 'recovering XXX' while processing the segment file. These operations are heavyweight enough that an extra PS display set shouldn't bother anyone.
* Make recovery show the current input WAL segment name in the startupTom Lane2007-09-29
| | | | | process' PS display. After a suggestion by Simon (not exactly his patch though).
* Make archive recovery always start a new timeline, rather than only when aTom Lane2007-09-29
| | | | | | | recovery stop time was used. This avoids a corner-case risk of trying to overwrite an existing archived copy of the last WAL segment, and seems simpler and cleaner all around than the original definition. Per example from Jon Colverson and subsequent analysis by Simon.
* Some small tuptoaster improvements from Greg Stark. Avoid unnecessaryTom Lane2007-09-26
| | | | | | decompression of an already-compressed external value when we have to copy it; save a few cycles when a value is too short for compression; and annotate various lines that are currently unreachable.
* Minor improvements in backup and recovery:Tom Lane2007-09-26
| | | | | | | | | | | | | | | | | | | | | - create a separate archive_mode GUC, on which archive_command is dependent - %r option in recovery.conf sends last restartpoint to recovery command - %r used in pg_standby, updated README - minor other code cleanup in pg_standby - doc on Warm Standby now mentions pg_standby and %r - log_restartpoints recovery option emits LOG message at each restartpoint - end of recovery now displays last transaction end time, as requested by Warren Little; also shown at each restartpoint - restart archiver if needed to carry away WAL files at shutdown Simon Riggs
* Fix regex, LIKE, and some other second-rank text-manipulation functionsTom Lane2007-09-21
| | | | | | to not cause needless copying of text datums that have 1-byte headers. Greg Stark, in response to performance gripe from Guillaume Smet and ITAGAKI Takahiro.
* Improve handling of prune/no-prune decisions by storing a page's oldestTom Lane2007-09-21
| | | | | | unpruned XMAX in its header. At the cost of 4 bytes per page, this keeps us from performing heap_page_prune when there's no chance of pruning anything. Seems to be necessary per Heikki's preliminary performance testing.
* Fix comments that misspelled TransactionIdIsInProgress, per Heikki.Tom Lane2007-09-21
|
* HOT updates. When we update a tuple without changing any of its indexedTom Lane2007-09-20
| | | | | | | | | | | | columns, and the new version can be stored on the same heap page, we no longer generate extra index entries for the new version. Instead, index searches follow the HOT-chain links to ensure they find the correct tuple version. In addition, this patch introduces the ability to "prune" dead tuples on a per-page basis, without having to do a complete VACUUM pass to recover space. VACUUM is still needed to clean up dead index entries, however. Pavan Deolasee, with help from a bunch of other people.
* Remove GIN interface section, which is now documented in SGML.Bruce Momjian2007-09-14
| | | | Heikki Linnakangas
* Redefine the lp_flags field of item pointers as having four states, ratherTom Lane2007-09-12
| | | | | | | | | than two independent bits (one of which was never used in heap pages anyway, or at least hadn't been in a very long time). This gives us flexibility to add the HOT notions of redirected and dead item pointers without requiring anything so klugy as magic values of lp_off and lp_len. The state values are chosen so that for the states currently in use (pre-HOT) there is no change in the physical representation.
* Rename recently-added pg_stat_activity column from txn_start to xact_start,Tom Lane2007-09-11
| | | | for consistency with other column names such as in pg_stat_database.
* Replace the former method of determining snapshot xmax --- to wit, callingTom Lane2007-09-08
| | | | | | | | | | | | | | | ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid" variable that is updated during transaction commit or abort. Since latestCompletedXid is written only in places that had to lock ProcArrayLock exclusively anyway, and is read only in places that had to lock ProcArrayLock shared anyway, it adds no new locking requirements to the system despite being cluster-wide. Moreover, removing ReadNewTransactionId from snapshot acquisition eliminates the need to take both XidGenLock and ProcArrayLock at the same time. Since XidGenLock is sometimes held across I/O this can be a significant win. Some preliminary benchmarking suggested that this patch has no effect on average throughput but can significantly improve the worst-case transaction times seen in pgbench. Concept by Florian Pflug, implementation by Tom Lane.
* Don't take ProcArrayLock while exiting a transaction that has no XID; there isTom Lane2007-09-07
| | | | | | | | | | no need for serialization against snapshot-taking because the xact doesn't affect anyone else's snapshot anyway. Per discussion. Also, move various info about the interlocking of transactions and snapshots out of code comments and into a hopefully-more-cohesive discussion in access/transam/README. Also, remove a couple of now-obsolete comments about having to force some WAL to be written to persuade RecordTransactionCommit to do its thing.
* Improve page split in rtree emulation. Now if splitted result hasTeodor Sigaev2007-09-07
| | | | | | | | | big misalignement, then it tries to split page basing on distribution of boxe's centers. Per report from Dolafi, Tom <dolafit@janelia.hhmi.org> Backpatch is needed, change doesn't affect on-disk storage.
* Quick hack to make the VXID of a prepared transaction be -1/XID,Tom Lane2007-09-05
| | | | | so that different prepared xacts can be told apart in the pg_locks view. Per suggestion from Florian.
* Implement lazy XID allocation: transactions that do not modify any databaseTom Lane2007-09-05
| | | | | | | | | | | | | | | rows will normally never obtain an XID at all. We already did things this way for subtransactions, but this patch extends the concept to top-level transactions. In applications where there are lots of short read-only transactions, this should improve performance noticeably; not so much from removal of the actual XID-assignments, as from reduction of overhead that's driven by the rate of XID consumption. We add a concept of a "virtual transaction ID" so that active transactions can be uniquely identified even if they don't have a regular XID. This is a much lighter-weight concept: uniqueness of VXIDs is only guaranteed over the short term, and no on-disk record is made about them. Florian Pflug, with some editorialization by Tom.
* Implement function-local GUC parameter settings, as per recent discussion.Tom Lane2007-09-03
| | | | | | | There are still some loose ends: I didn't do anything about the SET FROM CURRENT idea yet, and it's not real clear whether we are happy with the interaction of SET LOCAL with function-local settings. The documentation is a bit spartan, too.
* Add a debug logging message when a resource manager rejects an attemptedTom Lane2007-08-28
| | | | restart point. Per suggestion from Simon Riggs.
* Tsearch2 functionality migrates to core. The bulk of this work is byTom Lane2007-08-21
| | | | | | | | Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing, so anything that's broken is probably my fault. Documentation is nonexistent as yet, but let's land the patch so we can get some portability testing done.
* Fix oversight in async-commit patch: there were some places in heapam.cTom Lane2007-08-14
| | | | | | that still thought they could set HEAP_XMAX_COMMITTED immediately after seeing the other transaction commit. Make them use the same logic as tqual.c does to determine if the hint bit can be set yet.
* Fix two bugs induced in VACUUM FULL by async-commit patch.Tom Lane2007-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | First, we cannot assume that XLogAsyncCommitFlush guarantees hint bits will be settable, because clog.c's inexact LSN bookkeeping results in windows where a previously flushed transaction is considered unhintable because it shares an LSN slot with a later unflushed transaction. But repair_frag requires XMIN_COMMITTED to be correct so that it can distinguish tuples moved by the current vacuum. Since not being able to set the bit is an uncommon corner case, the most practical way of dealing with it seems to be to abandon shrinking (ie, don't invoke repair_frag) when we find a non-dead tuple whose XMIN_COMMITTED bit couldn't be set. Second, it is possible for the same reason that a RECENTLY_DEAD tuple does not get its XMAX_COMMITTED bit set during scan_heap. But by the time repair_frag examines the tuple it might be possible to set the bit. We therefore must take buffer content lock when calling HeapTupleSatisfiesVacuum a second time, else we can get an Assert failure in SetBufferCommitInfoNeedsSave. This latter bug is latent in existing releases, but I think it cannot actually occur without async commit, since the first HeapTupleSatisfiesVacuum call should always have set the bit. So I'm not going to back-patch it. In passing, reduce the existing "cannot shrink relation" messages from NOTICE to LOG level. The new message must be no higher than LOG if we don't want unpredictable regression test failures, and consistency seems like a good idea. Also arrange that only one such message is reported per VACUUM FULL; in typical scenarios you could get spammed with many such messages, which seems a bit useless.
* Switch over to using the src/timezone functions for formatting timestampsTom Lane2007-08-04
| | | | | | | | | | | | | | displayed in the postmaster log. This avoids Windows-specific problems with localized time zone names that are in the wrong encoding, and generally seems like a good idea to forestall other potential platform-dependent issues. To preserve the existing behavior that all backends will log in the same time zone, create a new GUC variable log_timezone that can only be changed on a system-wide basis, and reference log-related calculations to that zone instead of the TimeZone variable. This fixes the issue reported by Hiroshi Saito that timestamps printed by xlog.c startup could be improperly localized on Windows. We still need a simpler patch for that problem in the back branches, however.
* Support an optional asynchronous commit mode, in which we don't flush WALTom Lane2007-08-01
| | | | | | before reporting a transaction committed. Data consistency is still guaranteed (unlike setting fsync = off), but a crash may lose the effects of the last few transactions. Patch by Simon, some editorialization by Tom.
* Create a new dedicated Postgres process, "wal writer", which exists to writeTom Lane2007-07-24
| | | | | | | | | | | | | | and fsync WAL at convenient intervals. For the moment it just tries to offload this work from backends, but soon it will be responsible for guaranteeing a maximum delay before asynchronously-committed transactions will be flushed to disk. This is a portion of Simon Riggs' async-commit patch, committed to CVS separately because a background WAL writer seems like it might be a good idea independently of the async-commit feature. I rebased walwriter.c on bgwriter.c because it seemed like a more appropriate way of handling signals; while the startup/shutdown logic in postmaster.c is more like autovac because we want walwriter to quit before we start the shutdown checkpoint.
* Improve logging of checkpoints. Patch by Greg Smith, worked overTom Lane2007-06-30
| | | | by Heikki and a little bit by me.
* Implement "distributed" checkpoints in which the checkpoint I/O is spreadTom Lane2007-06-28
| | | | | | | | | | | | | over a fairly long period of time, rather than being spat out in a burst. This happens only for background checkpoints carried out by the bgwriter; other cases, such as a shutdown checkpoint, are still done at full speed. Remove the "all buffers" scan in the bgwriter, and associated stats infrastructure, since this seems no longer very useful when the checkpoint itself is properly throttled. Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas, and some minor API editorialization by me.