aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Fix more instances of "the the" in comments.Heikki Linnakangas2013-12-13
| | | | Plus one instance of "to to" in the docs.
* Add GUC to enable WAL-logging of hint bits, even with checksums disabled.Heikki Linnakangas2013-12-13
| | | | | | | | | | | | | WAL records of hint bit updates is useful to tools that want to examine which pages have been modified. In particular, this is required to make the pg_rewind tool safe (without checksums). This can also be used to test how much extra WAL-logging would occur if you enabled checksums, without actually enabling them (which you can't currently do without re-initdb'ing). Sawada Masahiko, docs by Samrat Revagade. Reviewed by Dilip Kumar, with further changes by me.
* Fix ancient docs/comments thinko: XID comparison is mod 2^32, not 2^31.Tom Lane2013-12-12
| | | | Pointed out by Gianni Ciolli.
* Allow time delayed standbys and recoverySimon Riggs2013-12-12
| | | | | | | | | Set min_recovery_apply_delay to force a delay in recovery apply for commit and restore point WAL records. Other records are replayed immediately. Delay is measured between WAL record time and local standby time. Robert Haas, Fabrízio de Royes Mello and Simon Riggs Detailed review by Mitsumasa Kondo
* Remove bogus executable permissions on xlog.c.Tom Lane2013-12-11
| | | | | Apparently fat-fingered in 1a3d104475ce01326fc00601ed66ac4d658e37e5. Noted by Peter Geoghegan.
* Under wal_level=logical, when saving old tuples, always save OID.Robert Haas2013-12-11
| | | | | | | There's no real point in not doing this. It doesn't cost anything in performance or space. So let's go wild. Andres Freund, with substantial editing as to style by me.
* Add a new reloption, user_catalog_table.Robert Haas2013-12-10
| | | | | | | | | | | | | | | When this reloption is set and wal_level=logical is configured, we'll record the CIDs stamped by inserts, updates, and deletes to the table just as we would for an actual catalog table. This will allow logical decoding to use historical MVCC snapshots to access such tables just as they access ordinary catalog tables. Replication solutions built around the logical decoding machinery will likely need to set this operation for their configuration tables; it might also be needed by extensions which perform table access in their output functions. Andres Freund, reviewed by myself and others.
* Add new wal_level, logical, sufficient for logical decoding.Robert Haas2013-12-10
| | | | | | | | | | | | | | | | | | | | | | | When wal_level=logical, we'll log columns from the old tuple as configured by the REPLICA IDENTITY facility added in commit 07cacba983ef79be4a84fcd0e0ca3b5fcb85dd65. This makes it possible a properly-configured logical replication solution to correctly follow table updates even if they change the chosen key columns, or, with REPLICA IDENTITY FULL, even if the table has no key at all. Note that updates which do not modify the replica identity column won't log anything extra, making the choice of a good key (i.e. one that will rarely be changed) important to performance when wal_level=logical is configured. Each insert, update, or delete to a catalog table will also log the CMIN and/or CMAX values of stamped by the current transaction. This is necessary because logical decoding will require access to historical snapshots of the catalog in order to decode some data types, and the CMIN/CMAX values that we may need in order to judge row visibility may have been overwritten by the time we need them. Andres Freund, reviewed in various versions by myself, Heikki Linnakangas, KONDO Mitsumasa, and many others.
* Fix improper abort during update chain lockingAlvaro Herrera2013-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 247c76a98909, I added some code to do fine-grained checking of MultiXact status of locking/updating transactions when traversing an update chain. There was a thinko in that patch which would have the traversing abort, that is return HeapTupleUpdated, when the other transaction is a committed lock-only. In this case we should ignore it and return success instead. Of course, in the case where there is a committed update, HeapTupleUpdated is the correct return value. A user-visible symptom of this bug is that in REPEATABLE READ and SERIALIZABLE transaction isolation modes spurious serializability errors can occur: ERROR: could not serialize access due to concurrent update In order for this to happen, there needs to be a tuple that's key-share- locked and also updated, and the update must abort; a subsequent transaction trying to acquire a new lock on that tuple would abort with the above error. The reason is that the initial FOR KEY SHARE is seen as committed by the new locking transaction, which triggers this bug. (If the UPDATE commits, then the serialization error is correctly reported.) When running a query in READ COMMITTED mode, what happens is that the locking is aborted by the HeapTupleUpdated return value, then EvalPlanQual fetches the newest version of the tuple, which is then the only version that gets locked. (The second time the tuple is checked there is no misbehavior on the committed lock-only, because it's not checked by the code that traverses update chains; so no bug.) Only the newest version of the tuple is locked, not older ones, but this is harmless. The isolation test added by this commit illustrates the desired behavior, including the proper serialization errors that get thrown. Backpatch to 9.3.
* Don't include unused space in LOG_NEWPAGE records.Heikki Linnakangas2013-12-04
| | | | | This is the same trick we use when taking a full page image of a buffer passed to XLogInsert.
* Fix full-page writes of internal GIN pages.Heikki Linnakangas2013-12-03
| | | | | | | | | | | | | | | Insertion to a non-leaf GIN page didn't make a full-page image of the page, which is wrong. The code used to do it correctly, but was changed (commit 853d1c3103fa961ae6219f0281885b345593d101) because the redo-routine didn't track incomplete splits correctly when the page was restored from a full page image. Of course, that was not right way to fix it, the redo routine should've been fixed instead. The redo-routine was surreptitiously fixed in 2010 (commit 4016bdef8aded77b4903c457050622a5a1815c16), so all we need to do now is revert the code that creates the record to its original form. This doesn't change the format of the WAL record. Backpatch to all supported versions.
* Report exit code from external recovery commands properlyPeter Eisentraut2013-12-02
| | | | | | | | | | When an external recovery command such as restore_command or archive_cleanup_command fails, report the exit code properly, distinguishing signals and normal exists, using the existing wait_result_to_str() facility, instead of just reporting the return value from system(). Reviewed-by: Peter Geoghegan <pg@heroku.com>
* Fix a couple of bugs in MultiXactId freezingAlvaro Herrera2013-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look into a multixact to check the members against cutoff_xid. This means that a very old Xid could survive hidden within a multi, possibly outliving its CLOG storage. In the distant future, this would cause clog lookup failures: ERROR: could not access status of transaction 3883960912 DETAIL: Could not open file "pg_clog/0E78": No such file or directory. This mostly was problematic when the updating transaction aborted, since in that case the row wouldn't get pruned away earlier in vacuum and the multixact could possibly survive for a long time. In many cases, data that is inaccessible for this reason way can be brought back heuristically. As a second bug, heap_freeze_tuple() didn't properly handle multixacts that need to be frozen according to cutoff_multi, but whose updater xid is still alive. Instead of preserving the update Xid, it just set Xmax invalid, which leads to both old and new tuple versions becoming visible. This is pretty rare in practice, but a real threat nonetheless. Existing corrupted rows, unfortunately, cannot be repaired in an automated fashion. Existing physical replicas might have already incorrectly frozen tuples because of different behavior than in master, which might only become apparent in the future once pg_multixact/ is truncated; it is recommended that all clones be rebuilt after upgrading. Following code analysis caused by bug report by J Smith in message CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com and privately by F-Secure. Backpatch to 9.3, where freezing of MultiXactIds was introduced. Analysis and patch by Andres Freund, with some tweaks by Álvaro.
* Don't TransactionIdDidAbort in HeapTupleGetUpdateXidAlvaro Herrera2013-11-29
| | | | | | | | | | | | | | | | It is dangerous to do so, because some code expects to be able to see what's the true Xmax even if it is aborted (particularly while traversing HOT chains). So don't do it, and instead rely on the callers to verify for abortedness, if necessary. Several race conditions and bugs fixed in the process. One isolation test changes the expected output due to these. This also reverts commit c235a6a589b, which is no longer necessary. Backpatch to 9.3, where this function was introduced. Andres Freund
* Truncate pg_multixact/'s contents during crash recoveryAlvaro Herrera2013-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9dc842f08 of 8.2 era prevented MultiXact truncation during crash recovery, because there was no guarantee that enough state had been setup, and because it wasn't deemed to be a good idea to remove data during crash recovery anyway. Since then, due to Hot-Standby, streaming replication and PITR, the amount of time a cluster can spend doing crash recovery has increased significantly, to the point that a cluster may even never come out of it. This has made not truncating the content of pg_multixact/ not defensible anymore. To fix, take care to setup enough state for multixact truncation before crash recovery starts (easy since checkpoints contain the required information), and move the current end-of-recovery actions to a new TrimMultiXact() function, analogous to TrimCLOG(). At some later point, this should probably done similarly to the way clog.c is doing it, which is to just WAL log truncations, but we can't do that for the back branches. Back-patch to 9.0. 8.4 also has the problem, but since there's no hot standby there, it's much less pressing. In 9.2 and earlier, this patch is simpler than in newer branches, because multixact access during recovery isn't required. Add appropriate checks to make sure that's not happening. Andres Freund
* Fix full-table-vacuum request mechanism for MultiXactIdsAlvaro Herrera2013-11-29
| | | | | | | | | | | | | | | | | While autovacuum dutifully launched anti-multixact-wraparound vacuums when the multixact "age" was reached, the vacuum code was not aware that it needed to make them be full table vacuums. As the resulting partial-table vacuums aren't capable of actually increasing relminmxid, autovacuum continued to launch anti-wraparound vacuums that didn't have the intended effect, until age of relfrozenxid caused the vacuum to finally be a full table one via vacuum_freeze_table_age. To fix, introduce logic for multixacts similar to that for plain TransactionIds, using the same GUCs. Backpatch to 9.3, where permanent MultiXactIds were introduced. Andres Freund, some cleanup by Álvaro
* Replace hardcoded 200000000 with autovacuum_freeze_max_ageAlvaro Herrera2013-11-29
| | | | | | | | | | | | Parts of the code used autovacuum_freeze_max_age to determine whether anti-multixact-wraparound vacuums are necessary, while others used a hardcoded 200000000 value. This leads to problems when autovacuum_freeze_max_age is set to a non-default value. Use the latter everywhere. Backpatch to 9.3, where vacuuming of multixacts was introduced. Andres Freund
* Fix assorted race conditions in the new timeout infrastructure.Tom Lane2013-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent handle_sig_alarm from losing control partway through due to a query cancel (either an asynchronous SIGINT, or a cancel triggered by one of the timeout handler functions). That would at least result in failure to schedule any required future interrupt, and might result in actual corruption of timeout.c's data structures, if the interrupt happened while we were updating those. We could still lose control if an asynchronous SIGINT arrives just as the function is entered. This wouldn't break any data structures, but it would have the same effect as if the SIGALRM interrupt had been silently lost: we'd not fire any currently-due handlers, nor schedule any new interrupt. To forestall that scenario, forcibly reschedule any pending timer interrupt during AbortTransaction and AbortSubTransaction. We can avoid any extra kernel call in most cases by not doing that until we've allowed LockErrorCleanup to kill the DEADLOCK_TIMEOUT and LOCK_TIMEOUT events. Another hazard is that some platforms (at least Linux and *BSD) block a signal before calling its handler and then unblock it on return. When we longjmp out of the handler, the unblock doesn't happen, and the signal is left blocked indefinitely. Again, we can fix that by forcibly unblocking signals during AbortTransaction and AbortSubTransaction. These latter two problems do not manifest when the longjmp reaches postgres.c, because the error recovery code there kills all pending timeout events anyway, and it uses sigsetjmp(..., 1) so that the appropriate signal mask is restored. So errors thrown outside any transaction should be OK already, and cleaning up in AbortTransaction and AbortSubTransaction should be enough to fix these issues. (We're assuming that any code that catches a query cancel error and doesn't re-throw it will do at least a subtransaction abort to clean up; but that was pretty much required already by other subsystems.) Lastly, ProcSleep should not clear the LOCK_TIMEOUT indicator flag when disabling that event: if a lock timeout interrupt happened after the lock was granted, the ensuing query cancel is still going to happen at the next CHECK_FOR_INTERRUPTS, and we want to report it as a lock timeout not a user cancel. Per reports from Dan Wood. Back-patch to 9.3 where the new timeout handling infrastructure was introduced. We may at some point decide to back-patch the signal unblocking changes further, but I'll desist from that until we hear actual field complaints about it.
* Refine our definition of what constitutes a system relation.Robert Haas2013-11-28
| | | | | | | | | | | | | | | | | | | | | | | Although user-defined relations can't be directly created in pg_catalog, it's possible for them to end up there, because you can create them in some other schema and then use ALTER TABLE .. SET SCHEMA to move them there. Previously, such relations couldn't afterwards be manipulated, because IsSystemRelation()/IsSystemClass() rejected all attempts to modify objects in the pg_catalog schema, regardless of their origin. With this patch, they now reject only those objects in pg_catalog which were created at initdb-time, allowing most operations on user-created tables in pg_catalog to proceed normally. This patch also adds new functions IsCatalogRelation() and IsCatalogClass(), which is similar to IsSystemRelation() and IsSystemClass() but with a slightly narrower definition: only TOAST tables of system catalogs are included, rather than *all* TOAST tables. This is currently used only for making decisions about when invalidation messages need to be sent, but upcoming logical decoding patches will find other uses for this information. Andres Freund, with some modifications by me.
* Another gin_desc fix.Heikki Linnakangas2013-11-28
| | | | The number of items inserted was incorrectly printed as if it was a boolean.
* Fix gin_desc routine to match the WAL format.Heikki Linnakangas2013-11-28
| | | | | | | In the GIN incomplete-splits patch, I used BlockIdDatas to store the block number of left and right children, when inserting a downlink after a split to an internal page posting list page. But gin_desc thought they were stored as BlockNumbers.
* Unbreak buildfarmAlvaro Herrera2013-11-28
| | | | | I removed an intermediate commit before pushing and forgot to test the resulting tree :-(
* Use a more granular approach to follow update chainsAlvaro Herrera2013-11-28
| | | | | | | | Instead of simply checking the KEYS_UPDATED bit, we need to check whether each lock held on the future version of the tuple conflicts with the lock we're trying to acquire. Per bug report #8434 by Tomonari Katsumata
* Compare Xmin to previous Xmax when locking an update chainAlvaro Herrera2013-11-28
| | | | | | | | | | | | Not doing so causes us to traverse an update chain that has been broken by concurrent page pruning. All other code that traverses update chains uses this check as one of the cases in which to stop iterating, so replicate it here too. Failure to do so leads to erroneous CLOG, subtrans or multixact lookups. Per discussion following the bug report by J Smith in CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com as diagnosed by Andres Freund.
* Don't try to set InvalidXid as page pruning hintAlvaro Herrera2013-11-28
| | | | | | | | | | | | | | If a transaction updates/deletes a tuple just before aborting, and a concurrent transaction tries to prune the page concurrently, the pruner may see HeapTupleSatisfiesVacuum return HEAPTUPLE_DELETE_IN_PROGRESS, but a later call to HeapTupleGetUpdateXid() return InvalidXid. This would cause an assertion failure in development builds, but would be otherwise Mostly Harmless. Fix by checking whether the updater Xid is valid before trying to apply it as page prune point. Reported by Andres in 20131124000203.GA4403@alap2.anarazel.de
* Cope with heap_fetch failure while locking an update chainAlvaro Herrera2013-11-28
| | | | | | | | | | | The reason for the fetch failure is that the tuple was removed because it was dead; so the failure is innocuous and can be ignored. Moreover, there's no need for further work and we can return success to the caller immediately. EvalPlanQualFetch is doing something very similar to this already. Report and test case from Andres Freund in 20131124000203.GA4403@alap2.anarazel.de
* Get rid of the post-recovery cleanup step of GIN page splits.Heikki Linnakangas2013-11-27
| | | | | | | | | | | | | | | | | | Replace it with an approach similar to what GiST uses: when a page is split, the left sibling is marked with a flag indicating that the parent hasn't been updated yet. When the parent is updated, the flag is cleared. If an insertion steps on a page with the flag set, it will finish split before proceeding with the insertion. The post-recovery cleanup mechanism was never totally reliable, as insertion to the parent could fail e.g because of running out of memory or disk space, leaving the tree in an inconsistent state. This also divides the responsibility of WAL-logging more clearly between the generic ginbtree.c code, and the parts specific to entry and posting trees. There is now a common WAL record format for insertions and deletions, which is written by ginbtree.c, followed by tree-specific payload, which is returned by the placetopage- and split- callbacks.
* More GIN refactoring.Heikki Linnakangas2013-11-27
| | | | | | | | | | | | Separate the insertion payload from the more static portions of GinBtree. GinBtree now only contains information related to searching the tree, and the information of what to insert is passed separately. Add root block number to GinBtree, instead of passing it around all the functions as argument. Split off ginFinishSplit() from ginInsertValue(). ginFinishSplit is responsible for finding the parent and inserting the downlink to it.
* Change SET LOCAL/CONSTRAINTS/TRANSACTION and ABORT behaviorBruce Momjian2013-11-25
| | | | | | | Change SET LOCAL/CONSTRAINTS/TRANSACTION behavior outside of a transaction block from error (post-9.3) to warning. (Was nothing in <= 9.3.) Also change ABORT outside of a transaction block from notice to warning.
* Fix Hot-Standby initialization of clog and subtrans.Heikki Linnakangas2013-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These bugs can cause data loss on standbys started with hot_standby=on at the moment they start to accept read only queries, by marking committed transactions as uncommited. The likelihood of such corruptions is small unless the primary has a high transaction rate. 5a031a5556ff83b8a9646892715d7fef415b83c3 fixed bugs in HS's startup logic by maintaining less state until at least STANDBY_SNAPSHOT_PENDING state was reached, missing the fact that both clog and subtrans are written to before that. This only failed to fail in common cases because the usage of ExtendCLOG in procarray.c was superflous since clog extensions are actually WAL logged. f44eedc3f0f347a856eea8590730769125964597/I then tried to fix the missing extensions of pg_subtrans due to the former commit's changes - which are not WAL logged - by performing the extensions when switching to a state > STANDBY_INITIALIZED and not performing xid assignments before that - again missing the fact that ExtendCLOG is unneccessary - but screwed up twice: Once because latestObservedXid wasn't updated anymore in that state due to the earlier commit and once by having an off-by-one error in the loop performing extensions. This means that whenever a CLOG_XACTS_PER_PAGE (32768 with default settings) boundary was crossed between the start of the checkpoint recovery started from and the first xl_running_xact record old transactions commit bits in pg_clog could be overwritten if they started and committed in that window. Fix this mess by not performing ExtendCLOG() in HS at all anymore since it's unneeded and evidently dangerous and by performing subtrans extensions even before reaching STANDBY_SNAPSHOT_PENDING. Analysis and patch by Andres Freund. Reported by Christophe Pettus. Backpatch down to 9.0, like the previous commit that caused this.
* Avoid acquiring spinlock when checking if recovery has finished, for speed.Heikki Linnakangas2013-11-22
| | | | | | | | | | | | RecoveryIsInProgress() can be called very frequently. During normal operation, it just checks a backend-local variable and returns quickly, but during hot standby, it checks a spinlock-protected shared variable. Those spinlock acquisitions can become a point of contention on a busy hot standby system. Replace the spinlock acquisition with a memory barrier. Per discussion with Andres Freund, Ants Aasma and Merlin Moncure.
* Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.Tom Lane2013-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the ability to write TABLE( function1(), function2(), ...) as a single FROM-clause entry. The result is the concatenation of the first row from each function, followed by the second row from each function, etc; with NULLs inserted if any function produces fewer rows than others. This is believed to be a much more useful behavior than what Postgres currently does with multiple SRFs in a SELECT list. This syntax also provides a reasonable way to combine use of column definition lists with WITH ORDINALITY: put the column definition list inside TABLE(), where it's clear that it doesn't control the ordinality column as well. Also implement SQL-compliant multiple-argument UNNEST(), by turning UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)). The SQL standard specifies TABLE() with only a single function, not multiple functions, and it seems to require an implicit UNNEST() which is not what this patch does. There may be something wrong with that reading of the spec, though, because if it's right then the spec's TABLE() is just a pointless alternative spelling of UNNEST(). After further review of that, we might choose to adopt a different syntax for what this patch does, but in any case this functionality seems clearly worthwhile. Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and significantly revised by me
* More GIN refactoring.Heikki Linnakangas2013-11-20
| | | | | | | | | | | Split off the portion of ginInsertValue that inserts the tuple to current level into a separate function, ginPlaceToPage. ginInsertValue's charter is now to recurse up the tree to insert the downlink, when a page split is required. This is in preparation for a patch to change the way incomplete splits are handled, which will need to do these operations separately. And IMHO makes the code more readable anyway.
* Refactor the internal GIN B-tree interface for forming a downlink.Heikki Linnakangas2013-11-20
| | | | | | This creates a new gin-btree callback function for creating a downlink for a page. Previously, ginxlog.c duplicated the logic used during normal operation.
* Further GIN refactoring.Heikki Linnakangas2013-11-20
| | | | | Merge some functions that were always called together. Makes the code little bit more readable.
* Fix bug in GIN posting tree root creation.Heikki Linnakangas2013-11-13
| | | | | | | | | | The root page is filled with as many items as fit, and the rest are inserted using normal insertions. However, I fumbled the variable names, and the code actually memcpy'd all the items on the page, overflowing the buffer. While at it, rename the variable to make the distinction more clear. Reported by Teodor Sigaev. This bug was introduced by my recent refactorings, so no backpatching required.
* Fix race condition in GIN posting tree page deletion.Heikki Linnakangas2013-11-08
| | | | | | | | | | | | | | | | | | If a page is deleted, and reused for something else, just as a search is following a rightlink to it from its left sibling, the search would continue scanning whatever the new contents of the page are. That could lead to incorrect query results, or even something more curious if the page is reused for a different kind of a page. To fix, modify the search algorithm to lock the next page before releasing the previous one, and refrain from deleting pages from the leftmost branch of the tree. Add a new Concurrency section to the README, explaining why this works. There is a lot more one could say about concurrency in GIN, but that's for another patch. Backpatch to all supported versions.
* Fix setting of right bound at GIN page split.Heikki Linnakangas2013-11-07
| | | | Broken by my refactoring.
* Fix missing argument and function prototypes.Heikki Linnakangas2013-11-06
| | | | Not sure how I missed these in previous commit.
* Misc GIN refactoring.Heikki Linnakangas2013-11-06
| | | | | | | | | | | | | | | Merge the isEnoughSpace and placeToPage functions in the b-tree interface into one function that tries to put a tuple on page, and returns false if it doesn't fit. Move createPostingTree function to gindatapage.c, and change its contract so that it can be passed more items than fit on the root page. It's in a better position than the callers to know how many items fit. Move ginMergeItemPointers out of gindatapage.c, into a separate file. These changes make no difference now, but reduce the footprint of Alexander Korotkov's upcoming patch to pack item pointers more tightly.
* Prevent memory leaks from accumulating across printtup() calls.Tom Lane2013-11-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically, printtup() has assumed that it could prevent memory leakage by pfree'ing the string result of each output function and manually managing detoasting of toasted values. This amounts to assuming that datatype output functions never leak any memory internally; an assumption we've already decided to be bogus elsewhere, for example in COPY OUT. range_out in particular is known to leak multiple kilobytes per call, as noted in bug #8573 from Godfried Vanluffelen. While we could go in and fix that leak, it wouldn't be very notationally convenient, and in any case there have been and undoubtedly will again be other leaks in other output functions. So what seems like the best solution is to run the output functions in a temporary memory context that can be reset after each row, as we're doing in COPY OUT. Some quick experimentation suggests this is actually a tad faster than the retail pfree's anyway. This patch fixes all the variants of printtup, except for debugtup() which is used in standalone mode. It doesn't seem worth worrying about query-lifespan leaks in standalone mode, and fixing that case would be a bit tedious since debugtup() doesn't currently have any startup or shutdown functions. While at it, remove manual detoast management from several other output-function call sites that had copied it from printtup(). This doesn't make a lot of difference right now, but in view of recent discussions about supporting "non-flattened" Datums, we're going to want that code gone eventually anyway. Back-patch to 9.2 where range_out was introduced. We might eventually decide to back-patch this further, but in the absence of known major leaks in older output functions, I'll refrain for now.
* Retry after buffer locking failure during SPGiST index creation.Tom Lane2013-11-02
| | | | | | | | | The original coding thought this case was impossible, but it can happen if the bgwriter or checkpointer processes decide to write out an index page while creation is still proceeding, leading to a bogus "unexpected spgdoinsert() failure" error. Problem reported by Jonathan S. Katz. Teodor Sigaev
* Use appendStringInfoString instead of appendStringInfo where possible.Robert Haas2013-10-31
| | | | | | | This shaves a few cycles, and generally seems like good programming practice. David Rowley
* Prevent using strncpy with src == dest in TupleDescInitEntry.Tom Lane2013-10-28
| | | | | | | | | | | | | | | | | The C and POSIX standards state that strncpy's behavior is undefined when source and destination areas overlap. While it remains dubious whether any implementations really misbehave when the pointers are exactly equal, some platforms are now starting to force the issue by complaining when an undefined call occurs. (In particular OS X 10.9 has been seen to dump core here, though the exact set of circumstances needed to trigger that remain elusive. Similar behavior can be expected to be optional on Linux and other platforms in the near future.) So tweak the code to explicitly do nothing when nothing need be done. Back-patch to all active branches. In HEAD, this also lets us get rid of an exception in valgrind.supp. Per discussion of a report from Matthias Schmitt.
* Fix typos in comments.Heikki Linnakangas2013-10-24
|
* Consistently use unsigned arithmetic for alignment calculations.Noah Misch2013-10-20
| | | | | | | | This avoids an assumption about the signed number representation. It is anticipated to have no functional changes on supported configurations; many two's complement assumptions remain elsewhere. Per a suggestion from Andres Freund.
* TYPEALIGN doesn't work on int64 on 32-bit platforms.Heikki Linnakangas2013-10-08
| | | | | | | | | | | | | | The TYPEALIGN macro, and the related ones like MAXALIGN, don't work with values larger than intptr_t, because TYPEALIGN casts the argument to intptr_t to do the arithmetic. That's not a problem when dealing with pointers or lengths or offsets related to pointers, but the XLogInsert scaling patch added a call to MAXALIGN with an XLogRecPtr argument. To fix, add wider variants of the macros, called TYPEALIGN64 and MAXALIGN64, which are just like the existing variants but work with uint64 instead of intptr_t. Report and patch by David Rowley, analysis by Andres Freund.
* Fix bugs in SSI tuple locking.Heikki Linnakangas2013-10-08
| | | | | | | | | | | | | 1. In heap_hot_search_buffer(), the PredicateLockTuple() call is passed wrong offset number. heapTuple->t_self is set to the tid of the first tuple in the chain that's visited, not the one actually being read. 2. CheckForSerializableConflictIn() uses the tuple's t_ctid field instead of t_self to check for exiting predicate locks on the tuple. If the tuple was updated, but the updater rolled back, t_ctid points to the aborted dead tuple. Reported by Hannu Krosing. Backpatch to 9.1.
* Minor GIN code refactoring.Heikki Linnakangas2013-10-03
| | | | | | | | | | | It makes for cleaner code to have separate Get/Add functions for PostingItems and ItemPointers. A few callsites that have to deal with both types need to be duplicated because of this, but all the callers have to know which one they're dealing with anyway. Overall, this reduces the amount of casting required. Extracted from Alexander Korotkov's larger patch to change the data page format.
* Fix pgindent comment breakageAlvaro Herrera2013-09-24
|