aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/gin
Commit message (Collapse)AuthorAge
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Remove obsolete XLogRecPtr macrosAlvaro Herrera2012-12-28
| | | | | | | | | | | | | | | | | This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance. These were useful for brevity when XLogRecPtrs were split in xlogid/xrecoff; but now that they are simple uint64's, they are just clutter. The only downside to making this change would be ease of backporting patches, but that has been negated by other substantive changes to the involved code anyway. The clarity of simpler expressions makes the change worthwhile. Most of the changes are mechanical, but in a couple of places, the patch author chose to invert the operator sense, making the code flow more logical (and more in line with preceding comments). Author: Andres Freund Eyeballed by Dimitri Fontaine and Alvaro Herrera
* Split out rmgr rm_desc functions into their own filesAlvaro Herrera2012-11-28
| | | | | This is necessary (but not sufficient) to have them compilable outside of a backend environment.
* Fix multiple problems in WAL replay.Tom Lane2012-11-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.
* Split heapam_xlog.h from heapam.hAlvaro Herrera2012-08-28
| | | | | | | | | | | | The heapam XLog functions are used by other modules, not all of which are interested in the rest of the heapam API. With this, we let them get just the XLog stuff in which they are interested and not pollute them with unrelated includes. Also, since heapam.h no longer requires xlog.h, many files that do include heapam.h no longer get xlog.h automatically, including a few headers. This is useful because heapam.h is getting pulled in by execnodes.h, which is in turn included by a lot of files.
* Remove unreachable codePeter Eisentraut2012-07-16
| | | | | | | The Solaris Studio compiler warns about these instances, unlike more mainstream compilers such as gcc. But manual inspection showed that the code is clearly not reachable, and we hope no worthy compiler will complain about removing this code.
* Cosmetic cleanup of ginInsertValue().Tom Lane2012-07-13
| | | | | | | | Make it clearer that the passed stack mustn't be empty, and that we are not supposed to fall off the end of the stack in the main loop. Tighten the loop that extracts the root block number, too. Markus Wanner and Tom Lane
* Add new function log_newpage_buffer.Robert Haas2012-06-14
| | | | | | | | When I implemented the ginbuildempty() function as part of implementing unlogged tables, I falsified the note in the header comment for log_newpage. Although we could fix that up by changing the comment, it seems cleaner to add a new function which is specifically intended to handle this case. So do that.
* Lots of doc corrections.Robert Haas2012-04-23
| | | | Josh Kupershmidt
* Fix misleading output from gin_desc().Tom Lane2012-04-06
| | | | | | | | | | | XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed with a list link field labeled as "blkno", which was confusing, especially when the link was empty (InvalidBlockNumber). Print the metapage block number instead, since that's what's actually being updated. We could include the link values too as a separate field, but not clear it's worth the trouble. Back-patch to 8.4 where the dubious code was added.
* Fix some more bugs in GIN's WAL replay logic.Tom Lane2012-02-26
| | | | | | | | | | | | | | | | | | In commit 4016bdef8aded77b4903c457050622a5a1815c16 I fixed a bunch of ginxlog.c bugs having to do with not handling XLogReadBuffer failures correctly. However, in ginRedoUpdateMetapage and ginRedoDeleteListPages, I unaccountably thought that failure to read the metapage would be impossible and just put in an elog(PANIC) call. This is of course wrong: failure is exactly what will happen if the index got dropped (or rebuilt) between creation of the WAL record and the crash we're trying to recover from. I believe this explains Nicholas Wilson's recent report of these errors getting reached. Also, fix memory leak in forgetIncompleteSplit. This wasn't of much concern when the code was written, but in a long-running standby server page split records could be expected to accumulate indefinitely. Back-patch to 8.4 --- before that, GIN didn't have a metapage.
* Improve comment.Tom Lane2012-02-04
|
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Fix erroneous replay of GIN_UPDATE_META_PAGE WAL records.Tom Lane2011-11-25
| | | | | | | | | | | | | A simple thinko in ginRedoUpdateMetapage, namely failing to increment a loop counter, led to inserting records into the last pending-list page in the wrong order (the opposite of that intended). So far as I can tell, this would not upset the code that eventually flushes pending items into the main part of the GIN index. But it did break the code that searched the pending list for matches, resulting in transient failure to find matching entries during index lookups, as illustrated in bug #6307 from Maksym Boguk. Back-patch to 8.4 where the incorrect code was introduced.
* Clean up the #include mess a little.Tom Lane2011-09-04
| | | | | | | | | | | | | | | | | walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.
* Remove unnecessary #include references, per pgrminclude script.Bruce Momjian2011-09-01
|
* Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.hAlvaro Herrera2011-07-04
| | | | | This lets us stop including rel.h into execnodes.h, which is a widely used header.
* Pgindent run before 9.1 beta2.Bruce Momjian2011-06-09
|
* Make GIN and GIST pass the index collation to all their support functions.Tom Lane2011-04-22
| | | | | | | Experimentation with contrib/btree_gist shows that the majority of the GIST support functions potentially need collation information. Safest policy seems to be to pass it to all of them, instead of making assumptions about which ones could possibly need it.
* Pass collations to functions in FunctionCallInfoData, not FmgrInfo.Tom Lane2011-04-12
| | | | | | | | | | | Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Tweak collation setup for GIN index comparison functions.Tom Lane2011-04-08
| | | | | | | Honor index column's collation spec if there is one, don't go to the expense of calling get_typcollation when we can reasonably assume that all GIN storage types will use default collation, and be sure to set a collation for the comparePartialFn too.
* Clean up cruft around collation initialization for tupdescs and scankeys.Tom Lane2011-03-26
| | | | | I found actual bugs in GiST and plpgsql; the rest of this is cosmetic but meant to decrease the odds of future bugs of omission.
* Revise collation derivation method and expression-tree representation.Tom Lane2011-03-19
| | | | | | | | | | | | | | | | | | | All expression nodes now have an explicit output-collation field, unless they are known to only return a noncollatable data type (such as boolean or record). Also, nodes that can invoke collation-aware functions store a separate field that is the collation value to pass to the function. This avoids confusion that arises when a function has collatable inputs and noncollatable output type, or vice versa. Also, replace the parser's on-the-fly collation assignment method with a post-pass over the completed expression tree. This allows us to use a more complex (and hopefully more nearly spec-compliant) assignment rule without paying for it in extra storage in every expression node. Fix assorted bugs in the planner's handling of collations by making collation one of the defining properties of an EquivalenceClass and by converting CollateExprs into discardable RelabelType nodes during expression preprocessing.
* Add backwards-compatible declarations of some core GIN support functions.Tom Lane2011-02-16
| | | | | | | | | These are needed to support reloading dumps of 9.0 installations containing contrib/intarray or contrib/tsearch2. Since not only regular dump/reload but binary upgrade would fail, it seems worth the trouble to carry these stubs for awhile. Note that the contrib opclasses referencing these functions will still work fine, since GIN doesn't actually pay any attention to the declared signature of a support function.
* Per-column collation supportPeter Eisentraut2011-02-08
| | | | | | | | This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
* Refactor GIN's handling of duplicate search entries.Tom Lane2011-01-08
| | | | | | | | The original coding could combine duplicate entries only when they originated from the same qual condition. In particular it could not combine cases where multiple qual conditions all give rise to full-index scan requests, which is an expensive case well worth optimizing. Refactor so that duplicates are recognized across all the quals.
* Fix GIN to support null keys, empty and null items, and full index scans.Tom Lane2011-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per my recent proposal(s). Null key datums can now be returned by extractValue and extractQuery functions, and will be stored in the index. Also, placeholder entries are made for indexable items that are NULL or contain no keys according to extractValue. This means that the index is now always complete, having at least one entry for every indexed heap TID, and so we can get rid of the prohibition on full-index scans. A full-index scan is implemented much the same way as partial-match scans were already: we build a bitmap representing all the TIDs found in the index, and then drive the results off that. Also, introduce a concept of a "search mode" that can be requested by extractQuery when the operator requires matching to empty items (this is just as cheap as matching to a single key) or requires a full index scan (which is not so cheap, but it sure beats failing or giving wrong answers). The behavior remains backward compatible for opclasses that don't return any null keys or request a non-default search mode. Using these features, we can now make the GIN index opclass for anyarray behave in a way that matches the actual anyarray operators for &&, <@, @>, and = ... which it failed to do before in assorted corner cases. This commit fixes the core GIN code and ginarrayprocs.c, updates the documentation, and adds some simple regression test cases for the new behaviors using the array operators. The tsearch and contrib GIN opclass support functions still need to be looked over and probably fixed. Another thing I intend to fix separately is that this is pretty inefficient for cases where more than one scan condition needs a full-index search: we'll run duplicate GinScanEntrys, each one of which builds a large bitmap. There is some existing logic to merge duplicate GinScanEntrys but it needs refactoring to make it work for entries belonging to different scan keys. Note that most of gin.h has been split out into a new file gin_private.h, so that gin.h doesn't export anything that's not supposed to be used by GIN opclasses or the rest of the backend. I did quite a bit of other code beautification work as well, mostly fixing comments and choosing more appropriate names for things.
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Support unlogged tables.Robert Haas2010-12-29
| | | | | | | The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.
* Generalize concept of temporary relations to "relation persistence".Robert Haas2010-12-13
| | | | | | | | | | | | | | | This commit replaces pg_class.relistemp with pg_class.relpersistence; and also modifies the RangeVar node type to carry relpersistence rather than istemp. It also removes removes rd_istemp from RelationData and instead performs the correct computation based on relpersistence. For clarity, we add three new macros: RelationNeedsWAL(), RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we can clarify the purpose of each check that previous depended on rd_istemp. This is intended as infrastructure for the upcoming unlogged tables patch, as well as for future possible work on global temporary tables.
* Create core infrastructure for KNNGIST.Tom Lane2010-12-02
| | | | | | | | | | | | | | | | | | | This is a heavily revised version of builtin_knngist_core-0.9. The ordering operators are no longer mixed in with actual quals, which would have confused not only humans but significant parts of the planner. Instead, ordering operators are carried separately throughout planning and execution. Since the API for ambeginscan and amrescan functions had to be changed anyway, this commit takes the opportunity to rationalize that a bit. RelationGetIndexScan no longer forces a premature index_rescan call; instead, callers of index_beginscan must call index_rescan too. Aside from making the AM-side initialization logic a bit less peculiar, this has the advantage that we do not make a useless extra am_rescan call when there are runtime key values. AMs formerly could not assume that the key values passed to amrescan were actually valid; now they can. Teodor Sigaev and Tom Lane
* Remove useless whitespace at end of linesPeter Eisentraut2010-11-23
|
* Cleanup various comparisons with the constant "true".Robert Haas2010-11-14
| | | | Itagaki Takahiro, with slight modifications.
* Fix a passel of inappropriately-named global functions in GIN.Tom Lane2010-10-17
| | | | | | | | | | | | | | | | The GIN code has absolutely no business exporting GIN-specific functions with names as generic as compareItemPointers() or newScanKey(); that's just trouble waiting to happen. I got annoyed about this again just now and decided to fix it. This commit ensures that all global symbols defined in access/gin/ have names including "gin" or "Gin". There were a couple of cases, like names involving "PostingItem", where arguably the names were already sufficiently nongeneric; but I figured as long as I was risking creating merge problems for unapplied GIN patches I might as well impose a uniform policy. I didn't touch any static symbol names. There might be some places where it'd be appropriate to rename some static functions to match siblings that are exported, but I'll leave that for another time.
* Improve GIN indexscan cost estimation.Tom Lane2010-10-17
| | | | | | | | | | | | | The better estimate requires more statistics than we previously stored: in particular, counts of "entry" versus "data" pages within the index, as well as knowledge of the number of distinct key values. We collect this information during initial index build and update it during VACUUM, storing the info in new fields on the index metapage. No initdb is required because these fields will read as zeroes in a pre-existing index, and the new gincostestimate code is coded to behave (reasonably) sanely if they are zeroes. Teodor Sigaev, reviewed by Jan Urbanski, Tom Lane, and Itagaki Takahiro.
* Fix assorted bugs in GIN's WAL replay logic.Tom Lane2010-10-11
| | | | | | | | | | | | | | The original coding was quite sloppy about handling the case where XLogReadBuffer fails (because the page has since been deleted). This would result in either "bad buffer id: 0" or an Assert failure during replay, if indeed the page were no longer there. In a couple of places it also neglected to check whether the change had already been applied, which would probably result in corrupted index contents. I believe that bug #5703 is an instance of the first problem. These issues could show up without replication, but only if you were unfortunate enough to crash between modification of a GIN index and the next checkpoint. Back-patch to 8.2, which is as far back as GIN has WAL support.
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Fix an additional set of problems in GIN's handling of lossy page pointers.Tom Lane2010-08-01
| | | | | | | | | | | | | | | | | Although the key-combining code claimed to work correctly if its input contained both lossy and exact pointers for a single page in a single TID stream, in fact this did not work, and could not work without pretty fundamental redesign. Modify keyGetItem so that it will not return such a stream, by handling lossy-pointer cases a bit more explicitly than we did before. Per followup investigation of a gripe from Artur Dabrowski. An example of a query that failed given his data set is select count(*) from search_tab where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* | dd:*')) and (to_tsvector('german', keywords ) @@ to_tsquery('german', 'aa:*')); Back-patch to 8.4 where the lossy pointer code was introduced.
* Rewrite the rbtree routines so that an RBNode is the first field of theTom Lane2010-08-01
| | | | | | | | | | | | | | | | | | | | struct representing a tree entry, rather than being a separately allocated piece of storage. This API is at least as clean as the old one (if not more so --- there were some bizarre choices in there) and it permits a very substantial memory savings, on the order of 2X in ginbulk.c's usage. Also, fix minor memory leaks in code called by ginEntryInsert, in particular in ginInsertValue and entryFillRoot, as well as ginEntryInsert itself. These leaks resulted in the GIN index build context continuing to bloat even after we'd filled it to maintenance_work_mem and started to dump data out to the index. In combination these fixes restore the GIN index build code to honoring the maintenance_work_mem limit about as well as it did in 8.4. Speed seems on par with 8.4 too, maybe even a bit faster, for a non-pathological case in which HEAD was formerly slower. Back-patch to 9.0 so we don't have a performance regression from 8.4.
* Rewrite the key-combination logic in GIN's keyGetItem() and scanGetItem()Tom Lane2010-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | routines to make them behave better in the presence of "lossy" index pointers. The previous coding was outright incorrect for some cases, as recently reported by Artur Dabrowski: scanGetItem would fail to return index entries in cases where one index key had multiple exact pointers on the same page as another key had a lossy pointer. Also, keyGetItem was extremely inefficient for cases where a single index key generates multiple "entry" streams, such as an @@ operator with a multiple-clause tsquery. The presence of a lossy page pointer in any one stream defeated its ability to use the opclass consistentFn, resulting in probing many heap pages that didn't really need to be visited. In Artur's example case, a query like WHERE tsvector @@ to_tsquery('a & b') was about 50X slower than the theoretically equivalent WHERE tsvector @@ to_tsquery('a') AND tsvector @@ to_tsquery('b') The way that I chose to fix this was to have GIN call the consistentFn twice with both TRUE and FALSE values for the in-doubt entry stream, returning a hit if either call produces TRUE, but not if they both return FALSE. The code handles this for the case of a single in-doubt entry stream, but punts (falling back to the stupid behavior) if there's more than one lossy reference to the same page. The idea could be scaled up to deal with multiple lossy references, but I think that would probably be wasted complexity. At least to judge by Artur's example, such cases don't occur often enough to be worth trying to optimize. Back-patch to 8.4. 8.3 did not have lossy GIN index pointers, so not subject to these problems.
* pgindent run for 9.0Bruce Momjian2010-02-26
|
* Generic implementation of red-black binary tree. It's planned to use inTeodor Sigaev2010-02-11
| | | | | | several places, but for now only GIN uses it during index creation. Using self-balanced tree greatly speeds up index creation in corner cases with preordered data.
* Fix bug in GIN WAL redo cleanup function: don't free fake relcache entryHeikki Linnakangas2010-02-09
| | | | | | while it's still being used. Backpatch to 8.4, where the fake relcache method was introduced.
* Remove old-style VACUUM FULL (which was known for a little while asTom Lane2010-02-08
| | | | | | | | | | | | | | | | | VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.
* Fix incorrect comparison of scan key in GIN. Per report fromTeodor Sigaev2010-01-18
| | | | Vyacheslav Kalinin <vka@mgcp.com>
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Allow read only connections during recovery, known as Hot Standby.Simon Riggs2009-12-19
| | | | | | | | | | | | Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
* Fix multicolumn GIN's wrong results with fastupdate enabled.Teodor Sigaev2009-11-13
| | | | | | | | User-defined consistent functions believes the check array contains at least one true element which was not a true for scanning pending list. Per report from Yury Don <yura@vpcit.ru>
* Make sure that GIN fast-insert and regular code paths enforce the sameTom Lane2009-10-02
| | | | | | | | | | | tuple size limit. Improve the error message for index-tuple-too-large so that it includes the actual size, the limit, and the index name. Sync with the btree occurrences of the same error. Back-patch to 8.4 because it appears that the out-of-sync problem is occurring in the field. Teodor and Tom