postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Update copyrights for 2013	Bruce Momjian	2013-01-01
\| \| \| \| \|	Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
*	Fix multiple problems in WAL replay.	Tom Lane	2012-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.
*	Fix GiST buffering build bug, which caused "failed to re-find parent" errors.	Heikki Linnakangas	2012-08-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use a hash table to track the parents of inner pages, but when inserting to a leaf page, the caller of gistbufferinginserttuples() must pass a correct block number of the leaf's parent page. Before gistProcessItup() descends to a child page, it checks if the downlink needs to be adjusted to accommodate the new tuple, and updates the downlink if necessary. However, updating the downlink might require splitting the page, which might move the downlink to a page to the right. gistProcessItup() doesn't realize that, so when it descends to the leaf page, it might pass an out-of-date parent block number as a result. Fix that by returning the block a tuple was inserted to from gistbufferinginserttuples(). This fixes the bug reported by Zdeněk Jílovec.
*	Run pgindent on 9.2 source tree in preparation for first 9.3	Bruce Momjian	2012-06-10
\| \| \| \|	commit-fest.
*	Change the way parent pages are tracked during buffered GiST build.	Heikki Linnakangas	2012-05-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used to mimic the way a stack is constructed when descending the tree during normal GiST inserts, but that was quite complicated during a buffered build. It was also wrong: in GiST, the left-to-right relationships on different levels might not match each other, so that when you know the parent of a child page, you won't necessarily find the parent of the page to the right of the child page by following the rightlinks at the parent level. This sometimes led to "could not re-find parent" errors while building a GiST index. We now use a simple hash table to track the parent of every internal page. Whenever a page is split, and downlinks are moved from one page to another, we update the hash table accordingly. This is also better for performance than the old method, as we never need to move right to re-find the parent page, which could take a significant amount of time for buffers that were created much earlier in the index build.
*	Fix bug in gistRelocateBuildBuffersOnSplit().	Heikki Linnakangas	2012-05-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we create a temporary copy of the old node buffer, in stack, we mustn't leak that into any of the long-lived data structures. Before this patch, when we called gistPopItupFromNodeBuffer(), it got added to the array of "loaded buffers". After gistRelocateBuildBuffersOnSplit() exits, the pointer added to the loaded buffers array points to garbage. Often that goes unnotied, because when we go through the array of loaded buffers to unload them, buffers with a NULL pageBuffer are ignored, which can often happen by accident even if the pointer points to garbage. This patch fixes that by marking the temporary copy in stack explicitly as temporary, and refrain from adding buffers marked as temporary to the array of loaded buffers. While we're at it, initialize nodeBuffer->pageBlocknum to InvalidBlockNumber and improve comments a bit. This isn't strictly necessary, but makes debugging easier.
*	Update copyright notices for year 2012.	Bruce Momjian	2012-01-01
\|
*	Support GiST index support functions that want to cache data across calls.	Tom Lane	2011-09-30
\| \| \| \| \| \| \| \| \| \| \| \|	pg_trgm was already doing this unofficially, but the implementation hadn't been thought through very well and leaked memory. Restructure the core GiST code so that it actually works, and document it. Ordinarily this would have required an extra memory context creation/destruction for each GiST index search, but I was able to avoid that in the normal case of a non-rescanned search by finessing the handling of the RBTree. It used to have its own context always, but now shares a context with the scan-lifespan data structures, unless there is more than one rescan call. This should make the added overhead unnoticeable in typical cases.
*	Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h.	Tom Lane	2011-09-09
\| \| \| \| \| \| \| \| \| \| \|	As per my recent proposal, this refactors things so that these typedefs and macros are available in a header that can be included in frontend-ish code. I also changed various headers that were undesirably including utils/timestamp.h to include datatype/timestamp.h instead. Unsurprisingly, this showed that half the system was getting utils/timestamp.h by way of xlog.h. No actual code changes here, just header refactoring.
*	Buffering GiST index build algorithm.	Heikki Linnakangas	2011-09-08
\| \| \| \| \| \| \| \| \|	When building a GiST index that doesn't fit in cache, buffers are attached to some internal nodes in the index. This speeds up the build by avoiding random I/O that would otherwise be needed to traverse all the way down the tree to the find right leaf page for tuple. Alexander Korotkov
*	Change the way the offset of downlink is stored in GISTInsertStack.	Heikki Linnakangas	2011-07-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GISTInsertStack.childoffnum used to mean "offset of the downlink in this node, pointing to the child node in the stack". It's now replaced with downlinkoffnum, which means "offset of the downlink in the parent of this node". gistFindPath() already used childoffnum with this new meaning, and had an extra step at the end to pull all the childoffnum values down one node in the stack, to adjust the stack for the meaning that childoffnum had elsewhere. That's no longer required. The reason to do this now is this new representation is more convenient for the GiST fast build patch that Alexander Korotkov is working on. While we're at it, replace the linked list used in gistFindPath with a standard List, and make gistFindPath() static. Alexander Korotkov, with some changes by me.
*	Make GIN and GIST pass the index collation to all their support functions.	Tom Lane	2011-04-22
\| \| \| \| \| \| \|	Experimentation with contrib/btree_gist shows that the majority of the GIST support functions potentially need collation information. Safest policy seems to be to pass it to all of them, instead of making assumptions about which ones could possibly need it.
*	pgindent run before PG 9.1 beta 1.	Bruce Momjian	2011-04-10
\|
*	Stamp copyrights for year 2011.	Bruce Momjian	2011-01-01
\|
*	Support unlogged tables.	Robert Haas	2010-12-29
\| \| \| \| \| \| \|	The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.
*	Rewrite the GiST insertion logic so that we don't need the post-recovery	Heikki Linnakangas	2010-12-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cleanup stage to finish incomplete inserts or splits anymore. There was two reasons for the cleanup step: 1. When a new tuple was inserted to a leaf page, the downlink in the parent needed to be updated to contain (ie. to be consistent with) the new key. Updating the parent in turn might require recursively updating the parent of the parent. We now handle that by updating the parent while traversing down the tree, so that when we insert the leaf tuple, all the parents are already consistent with the new key, and the tree is consistent at every step. 2. When a page is split, we need to insert the downlink for the new right page(s), and update the downlink for the original page to not include keys that moved to the right page(s). We now handle that by setting a new flag, F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is set, scans always follow the rightlink, regardless of the NSN mechanism used to detect concurrent page splits. That way the tree is consistent right after split, even though the downlink is still missing. This is very similar to the way B-tree splits are handled. When the downlink is inserted in the parent, the flag is cleared. To keep the insertion algorithm simple, when an insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it finishes the split before doing anything else. These changes allow removing the whole "invalid tuple" mechanism, but I retained the scan code to still follow invalid tuples correctly. While we don't create any such tuples anymore, we want to handle them gracefully in case you pg_upgrade a GiST index that has them. If we encounter any on an insert, though, we just throw an error saying that you need to REINDEX. The issue that got me into doing this is that if you did a checkpoint while an insert or split was in progress, and the checkpoint finishes quickly so that there is no WAL record related to the insert between RedoRecPtr and the checkpoint record, recovery from that checkpoint would not know to finish the incomplete insert. IOW, we have the same issue we solved with the rm_safe_restartpoint mechanism during normal operation too. It's highly unlikely to happen in practice, and this fix is far too large to backpatch, so we're just going to live with in previous versions, but this refactoring fixes it going forward. With this patch, you don't get the annoying 'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices anymore if you crash at an unfortunate moment.
*	Update comment to match later code changes.	Tom Lane	2010-12-04
\|
*	KNNGIST, otherwise known as order-by-operator support for GIST.	Tom Lane	2010-12-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit represents a rather heavily editorialized version of Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1 patches. I redid the opclass API to add a separate Distance method instead of turning the Consistent method into an illogical mess, fixed some bit-rot in the rbtree interfaces, and generally worked over the code style and comments. There's still no non-code documentation to speak of, but I'll work on that separately. Some contrib-module changes are also yet to come (right now, point <-> point is the only KNN-ified operator). Teodor Sigaev and Tom Lane
*	The GiST scan algorithm uses LSNs to detect concurrent pages splits, but	Heikki Linnakangas	2010-11-16
\| \| \| \| \| \| \| \| \| \| \| \| \|	temporary indexes are not WAL-logged. We used a constant LSN for temporary indexes, on the assumption that we don't need to worry about concurrent page splits in temporary indexes because they're only visible to the current session. But that assumption is wrong, it's possible to insert rows and split pages in the same session, while a scan is in progress. For example, by opening a cursor and fetching some rows, and INSERTing new rows before fetching some more. Fix by generating fake increasing LSNs, used in place of real LSNs in temporary GiST indexes.
*	Remove cvs keywords from all files.	Magnus Hagander	2010-09-20
\|
*	Update copyright for the year 2010.	Bruce Momjian	2010-01-02
\|
*	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list	Bruce Momjian	2009-06-11
\| \| \| \|	provided by Andrew.
*	Update copyright for 2009.	Bruce Momjian	2009-01-01
\|
*	Fix GiST's killing tuple: GISTScanOpaque->curpos wasn't	Teodor Sigaev	2008-10-22
\| \| \| \| \| \|	correctly set. As result, killtuple() marks as dead wrong tuple on page. Bug was introduced by me while fixing possible duplicates during GiST index scan.
*	Remove mark/restore support in GIN and GiST indexes.	Teodor Sigaev	2008-10-20
\| \| \| \| \|	Per Tom's comment. Also revome useless GISTScanOpaque->flags field.
*	During repeated rescan of GiST index it's possible that scan key	Teodor Sigaev	2008-10-17
\| \| \| \| \| \| \| \| \|	is NULL but SK_SEARCHNULL is not set. Add checking IS NULL of keys to set during key initialization. If key is NULL and SK_SEARCHNULL is not set then nothnig can be satisfied. With assert-enabled compilation that causes coredump. Bug was introduced in 8.3 by support of IS NULL index scan.
*	Fix possible duplicate tuples while GiST scan. Now page is processed	Teodor Sigaev	2008-08-23
\| \| \| \| \| \| \| \| \|	at once and ItemPointers are collected in memory. Remove tuple's killing by killtuple() if tuple was moved to another page - it could produce unaceptable overhead. Backpatch up to 8.1 because the bug was introduced by GiST's concurrency support.
*	Improve our #include situation by moving pointer types away from the	Alvaro Herrera	2008-06-19
\| \| \| \| \| \| \|	corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.
*	Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation	Heikki Linnakangas	2008-06-12
\| \| \| \| \| \| \| \| \| \|	forks. XLogOpenRelation() and the associated light-weight relation cache in xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument, instead of Relation. For functions that still need a Relation struct during WAL replay, there's a new function called CreateFakeRelcacheEntry() that returns a fake entry like XLogOpenRelation() used to.
*	Replace "amgetmulti" AM functions with "amgetbitmap", in which the whole	Tom Lane	2008-04-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	indexscan always occurs in one call, and the results are returned in a TIDBitmap instead of a limited-size array of TIDs. This should improve speed a little by reducing AM entry/exit overhead, and it is necessary infrastructure if we are ever to support bitmap indexes. In an only slightly related change, add support for TIDBitmaps to preserve (somewhat lossily) the knowledge that particular TIDs reported by an index need to have their quals rechecked when the heap is visited. This facility is not really used yet; we'll need to extend the forced-recheck feature to plain indexscans before it's useful, and that hasn't been coded yet. The intent is to use it to clean up 8.3's horrid @@@ kluge for text search with weighted queries. There might be other uses in future, but that one alone is sufficient reason. Heikki Linnakangas, with some adjustments by me.
*	Update copyrights in source tree to 2008.	Bruce Momjian	2008-01-01
\|
*	Support an optional asynchronous commit mode, in which we don't flush WAL	Tom Lane	2007-08-01
\| \| \| \| \| \|	before reporting a transaction committed. Data consistency is still guaranteed (unlike setting fsync = off), but a crash may lose the effects of the last few transactions. Patch by Simon, some editorialization by Tom.
*	Refactor the index AM API slightly: move currentItemData and	Neil Conway	2007-01-20
\| \| \| \| \| \| \|	currentMarkData from IndexScanDesc to the opaque structs for the AMs that need this information (currently gist and hash). Patch from Heikki Linnakangas, fixes by Neil Conway.
*	Update CVS HEAD for 2007 copyright. Back branches are typically not	Bruce Momjian	2007-01-05
\| \| \| \|	back-stamped for this.
*	pgindent run for 8.2.	Bruce Momjian	2006-10-04
\|
*	Make recovery from WAL be restartable, by executing a checkpoint-like	Tom Lane	2006-08-07
\| \| \| \| \| \| \| \| \| \|	operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.
*	Tweak fillfactor code as per my recent proposal. Fix nbtsort.c so that	Tom Lane	2006-07-11
\| \| \| \| \| \|	it can handle small fillfactors for ordinary-sized index entries without failing on large ones; fix nbtinsert.c to distinguish leaf and nonleaf pages; change the minimum fillfactor to 10% for all index types.
*	Alphabetically order reference to include files, "G" - "M".	Bruce Momjian	2006-07-11
\|
*	Code review for FILLFACTOR patch. Change WITH grammar as per earlier	Tom Lane	2006-07-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	discussion (including making def_arg allow reserved words), add missed opt_definition for UNIQUE case. Put the reloptions support code in a less random place (I chose to make a new file access/common/reloptions.c). Eliminate header inclusion creep. Make the index options functions safely user-callable (seems like client apps might like to be able to test validity of options before trying to make an index). Reduce overhead for normal case with no options by allowing rd_options to be NULL. Fix some unmaintainably klugy code, including getting rid of Natts_pg_class_fixed at long last. Some stylistic cleanup too, and pay attention to keeping comments in sync with code. Documentation still needs work, though I did fix the omissions in catalogs.sgml and indexam.sgml.
*	Add FILLFACTOR to CREATE INDEX.	Bruce Momjian	2006-07-02
\| \| \| \|	ITAGAKI Takahiro
*	Changes	Teodor Sigaev	2006-06-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php) * possible call pickSplit() for second and below columns * add spl_(l\|r)datum_exists to GIST_SPLITVEC - pickSplit should check its values to use already defined spl_(l\|r)datum for splitting. pickSplit should set spl_(l\|r)datum_exists to 'false' (if they was 'true') to signal to caller about using spl_(l\|r)datum. * support for old pickSplit(): not very optimal but correct split * remove 'bytes' field from GISTENTRY: in any case size of value is defined by it's type. * split GIST_SPLITVEC to two structures: one for using in picksplit and second - for internal use. * some code refactoring * support of subsplit to rtree opclasses TODO: add support of subsplit to contrib modules
*	Som improve page split in multicolumn GiST index.	Teodor Sigaev	2006-05-29
\| \| \| \| \| \|	If user picksplit on n-th column generate equals left and right unions then it calls picksplit on n+1-th column.
*	* Add support NULL to GiST.	Teodor Sigaev	2006-05-24
\| \| \| \| \| \| \| \|	* some refactoring and simplify code int gistutil.c and gist.c * now in some cases it can be called used-defined picksplit method for non-first column in index, but here is a place to do more. * small fix of docs related to support NULL.
*	Simplify gistSplit() and some refactoring related code.	Teodor Sigaev	2006-05-19
\|
*	Reduce size of critial section during vacuum full, critical	Teodor Sigaev	2006-05-17
\| \| \| \| \| \| \| \|	sections now isn't nested. All user-defined functions now is called outside critsections. Small improvements in WAL protocol. TODO: improve XLOG replay
*	Reduce size of critical section and remove call of user-defined functions in	Teodor Sigaev	2006-05-10
\| \| \| \| \| \|	insertion and deletion, modify gistSplit() to do not use buffers. TODO: gistvacuumcleanup and XLOG
*	Improve gist XLOG code to follow the coding rules needed to prevent	Tom Lane	2006-03-30
\| \| \| \| \| \| \|	torn-page problems. This introduces some issues of its own, mainly that there are now some critical sections of unreasonably broad scope, but it's a step forward anyway. Further cleanup will require some code refactoring that I'd prefer to get Oleg and Teodor involved in.
*	Arrange to emit a description of the current XLOG record as error context	Tom Lane	2006-03-24
\| \| \| \| \| \| \| \| \|	when an error occurs during xlog replay. Also, replace the former risky 'write into a fixed-size buffer with no overflow detection' API for XLOG record description routines; use an expansible StringInfo instead. (The latter accounts for most of the patch bulk.) Qingqing Zhou
*	Update copyright for 2006. Update scripts.	Bruce Momjian	2006-03-05
\|
*	Add simple sanity checks on newly-read pages to GiST, too.	Tom Lane	2005-11-06
\|