aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/index/indexam.c
Commit message (Collapse)AuthorAge
...
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Add support for index-only scans in GiST.Heikki Linnakangas2015-03-26
| | | | | | | | | | | | | | This adds a new GiST opclass method, 'fetch', which is used to reconstruct the original Datum from the value stored in the index. Also, the 'canreturn' index AM interface function gains a new 'attno' argument. That makes it possible to use index-only scans on a multi-column index where some of the opclasses support index-only scans but some do not. This patch adds support in the box and point opclasses. Other opclasses can added later as follow-on patches (btree_gist would be particularly interesting). Anastasia Lubennikova, with additional fixes and modifications by me.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Introduce logical decoding.Robert Haas2014-03-03
| | | | | | | | | | | | | | | | | | | | | | This feature, building on previous commits, allows the write-ahead log stream to be decoded into a series of logical changes; that is, inserts, updates, and deletes and the transactions which contain them. It is capable of handling decoding even across changes to the schema of the effected tables. The output format is controlled by a so-called "output plugin"; an example is included. To make use of this in a real replication system, the output plugin will need to be modified to produce output in the format appropriate to that system, and to perform filtering. Currently, information can be extracted from the logical decoding system only via SQL; future commits will add the ability to stream changes via walsender. Andres Freund, with review and other contributions from many other people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan, Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve Singer.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Remove some unused and seldom used fields from RelationAmInfo.Heikki Linnakangas2013-04-16
| | | | | | | This saves some memory from each index relcache entry. At least on a 64-bit machine, it saves just enough to shrink a typical relcache entry's memory usage from 2k to 1k. That's nice if you have a lot of backends and a lot of indexes.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian2012-06-10
| | | | commit-fest.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Replace simple constant pg_am.amcanreturn with an AM support function.Tom Lane2011-12-18
| | | | | | | | | The need for this was debated when we put in the index-only-scan feature, but at the time we had no near-term expectation of having AMs that could support such scans for only some indexes; so we kept it simple. However, the SP-GiST AM forces the issue, so let's fix it. This patch only installs the new API; no behavior actually changes.
* Improve index-only scans to avoid repeated access to the index page.Tom Lane2011-10-09
| | | | | | | | | | | | We copy all the matched tuples off the page during _bt_readpage, instead of expensively re-locking the page during each subsequent tuple fetch. This costs a bit more local storage, but not more than 2*BLCKSZ worth, and the reduction in LWLock traffic is certainly worth that. What's more, this lets us get rid of the API wart in the original patch that said an index AM could randomly decline to supply an index tuple despite having asserted pg_am.amcanreturn. That will be important for future improvements in the index-only-scan feature, since the executor will now be able to rely on having the index data available.
* Support index-only scans using the visibility map to avoid heap fetches.Tom Lane2011-10-07
| | | | | | | | | | | | | When a btree index contains all columns required by the query, and the visibility map shows that all tuples on a target heap page are visible-to-all, we don't need to fetch that heap page. This patch depends on the previous patches that made the visibility map reliable. There's a fair amount left to do here, notably trying to figure out a less chintzy way of estimating the cost of an index-only scan, but the core functionality seems ready to commit. Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
* Remove unnecessary #include references, per pgrminclude script.Bruce Momjian2011-09-01
|
* Avoid having two copies of the HOT-chain search logic.Robert Haas2011-06-27
| | | | | | | | | | | | | It's been like this since HOT was originally introduced, but the logic is complex enough that this is a recipe for bugs, as we've already found out with SSI. So refactor heap_hot_search_buffer() so that it can satisfy the needs of index_getnext(), and make index_getnext() use that rather than duplicating the logic. This change was originally proposed by Heikki Linnakangas as part of a larger refactoring oriented towards allowing index-only scans. I extracted and adjusted this part, since it seems to have independent merit. Review by Jeff Davis.
* Make non-MVCC snapshots exempt from predicate locking. Scans with non-MVCCHeikki Linnakangas2011-06-15
| | | | | | | | snapshots, like in REINDEX, are basically non-transactional operations. The DDL operation itself might participate in SSI, but there's separate functions for that. Kevin Grittner and Dan Ports, with some changes by me.
* Pgindent run before 9.1 beta2.Bruce Momjian2011-06-09
|
* The row-version chaining in Serializable Snapshot Isolation was still wrong.Heikki Linnakangas2011-05-30
| | | | | | | | | | On further analysis, it turns out that it is not needed to duplicate predicate locks to the new row version at update, the lock on the version that the transaction saw as visible is enough. However, there was a different bug in the code that checks for dangerous structures when a new rw-conflict happens. Fix that bug, and remove all the row-version chaining related code. Kevin Grittner & Dan Ports, with some comment editorialization by me.
* Add an Assert that indexam.c isn't used on an index awaiting reindexing.Tom Lane2011-04-16
| | | | | | | This might have caught the recent embarrassment over trying to modify pg_index while its indexes were being rebuilt. Noah Misch
* Pass collations to functions in FunctionCallInfoData, not FmgrInfo.Tom Lane2011-04-12
| | | | | | | | | | | Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Revise collation derivation method and expression-tree representation.Tom Lane2011-03-19
| | | | | | | | | | | | | | | | | | | All expression nodes now have an explicit output-collation field, unless they are known to only return a noncollatable data type (such as boolean or record). Also, nodes that can invoke collation-aware functions store a separate field that is the collation value to pass to the function. This avoids confusion that arises when a function has collatable inputs and noncollatable output type, or vice versa. Also, replace the parser's on-the-fly collation assignment method with a post-pass over the completed expression tree. This allows us to use a more complex (and hopefully more nearly spec-compliant) assignment rule without paying for it in extra storage in every expression node. Fix assorted bugs in the planner's handling of collations by making collation one of the defining properties of an EquivalenceClass and by converting CollateExprs into discardable RelabelType nodes during expression preprocessing.
* Fix incorrect access to pg_index.indcollation.Tom Lane2011-03-06
| | | | | | | | | Since this field is after a variable-length field, it can't simply be accessed via the C struct for pg_index. Fortunately, the relcache already did the dirty work of pulling the information out to where it can be accessed easily, so this is a one-line fix. Andres Freund
* Per-column collation supportPeter Eisentraut2011-02-08
| | | | | | | | This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
* Implement genuine serializable isolation level.Heikki Linnakangas2011-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Create core infrastructure for KNNGIST.Tom Lane2010-12-02
| | | | | | | | | | | | | | | | | | | This is a heavily revised version of builtin_knngist_core-0.9. The ordering operators are no longer mixed in with actual quals, which would have confused not only humans but significant parts of the planner. Instead, ordering operators are carried separately throughout planning and execution. Since the API for ambeginscan and amrescan functions had to be changed anyway, this commit takes the opportunity to rationalize that a bit. RelationGetIndexScan no longer forces a premature index_rescan call; instead, callers of index_beginscan must call index_rescan too. Aside from making the AM-side initialization logic a bit less peculiar, this has the advantage that we do not make a useless extra am_rescan call when there are runtime key values. AMs formerly could not assume that the key values passed to amrescan were actually valid; now they can. Teodor Sigaev and Tom Lane
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* pgindent run for 9.0Bruce Momjian2010-02-26
|
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Allow read only connections during recovery, known as Hot Standby.Simon Riggs2009-12-19
| | | | | | | | | | | | Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
* Support deferrable uniqueness constraints.Tom Lane2009-07-29
| | | | | | | | | | The current implementation fires an AFTER ROW trigger for each tuple that looks like it might be non-unique according to the index contents at the time of insertion. This works well as long as there aren't many conflicts, but won't scale to massive unique-key reassignments. Improving that case is a TODO item. Dean Rasheed
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Implement "fastupdate" support for GIN indexes, in which we try to accumulateTom Lane2009-03-24
| | | | | | | | | | | | | | | | | | multiple index entries in a holding area before adding them to the main index structure. This helps because bulk insert is (usually) significantly faster than retail insert for GIN. This patch also removes GIN support for amgettuple-style index scans. The API defined for amgettuple is difficult to support with fastupdate, and the previously committed partial-match feature didn't really work with it either. We might eventually figure a way to put back amgettuple support, but it won't happen for 8.4. catversion bumped because of change in GIN's pg_am entry, and because the format of GIN indexes changed on-disk (there's a metapage now, and possibly a pending list). Teodor Sigaev
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Fix small query-lifespan memory leak introduced by 8.4 change in index AM APITom Lane2008-10-10
| | | | for bitmap index scans. Per report and test case from Kevin Grittner.
* Initialize the minimum frozen Xid in vac_update_datfrozenxid usingAlvaro Herrera2008-09-11
| | | | | | | | | | | | | | | | | | | | GetOldestXmin() instead of RecentGlobalXmin; this is safer because we do not depend on the latter being correctly set elsewhere, and while it is more expensive, this code path is not performance-critical. This is a real risk for autovacuum, because it can execute whole cycles without doing a single vacuum, which would mean that RecentGlobalXmin would stay at its initialization value, FirstNormalTransactionId, causing a bogus value to be inserted in pg_database. This bug could explain some recent reports of failure to truncate pg_clog. At the same time, change the initialization of RecentGlobalXmin to InvalidTransactionId, and ensure that it's set to something else whenever it's going to be used. Using it as FirstNormalTransactionId in HOT page pruning could incur in data loss. InitPostgres takes care of setting it to a valid value, but the extra checks are there to prevent "special" backends from behaving in unusual ways. Per Tom Lane's detailed problem dissection in 29544.1221061979@sss.pgh.pa.us
* Improve our #include situation by moving pointer types away from theAlvaro Herrera2008-06-19
| | | | | | | corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.
* Restructure some header files a bit, in particular heapam.h, by removing someAlvaro Herrera2008-05-12
| | | | | | | | | | | | unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.
* Phase 2 of project to make index operator lossiness be determined at runtimeTom Lane2008-04-13
| | | | | | | | | | | | instead of plan time. Extend the amgettuple API so that the index AM returns a boolean indicating whether the indexquals need to be rechecked, and make that rechecking happen in nodeIndexscan.c (currently the only place where it's expected to be needed; other callers of index_getnext are just erroring out for now). For the moment, GIN and GIST have stub logic that just always sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing that control down to the opclass consistent() functions. The planner no longer pays any attention to amopreqcheck, and that catalog column will go away in due course.
* Create new routines systable_beginscan_ordered, systable_getnext_ordered,Tom Lane2008-04-12
| | | | | | | | | | | | | | systable_endscan_ordered that have API similar to systable_beginscan etc (in particular, the passed-in scankeys have heap not index attnums), but guarantee ordered output, unlike the existing functions. For the moment these are just very thin wrappers around index_beginscan/index_getnext/etc. Someday they might need to get smarter; but for now this is just a code refactoring exercise to reduce the number of direct callers of index_getnext, in preparation for changing that function's API. In passing, remove index_getnext_indexitem, which has been dead code for quite some time, and will have even less use than that in the presence of run-time-lossy indexes.
* Replace "amgetmulti" AM functions with "amgetbitmap", in which the wholeTom Lane2008-04-10
| | | | | | | | | | | | | | | | | | indexscan always occurs in one call, and the results are returned in a TIDBitmap instead of a limited-size array of TIDs. This should improve speed a little by reducing AM entry/exit overhead, and it is necessary infrastructure if we are ever to support bitmap indexes. In an only slightly related change, add support for TIDBitmaps to preserve (somewhat lossily) the knowledge that particular TIDs reported by an index need to have their quals rechecked when the heap is visited. This facility is not really used yet; we'll need to extend the forced-recheck feature to plain indexscans before it's useful, and that hasn't been coded yet. The intent is to use it to clean up 8.3's horrid @@@ kluge for text search with weighted queries. There might be other uses in future, but that one alone is sufficient reason. Heikki Linnakangas, with some adjustments by me.
* Move the HTSU_Result enum definition into snapshot.h, to avoid includingAlvaro Herrera2008-03-26
| | | | | | tqual.h into heapam.h. This makes all inclusion of tqual.h explicit. I also sorted alphabetically the includes on some source files.
* Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files.Alvaro Herrera2008-03-26
| | | | Per complaint from Tom Lane.
* Separate snapshot management code from tuple visibility code, create aAlvaro Herrera2008-03-26
| | | | | | | | | | | | | snapmgmt.c file for the former. The header files have also been reorganized in three parts: the most basic snapshot definitions are now in a new file snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c. tqual.h has been reduced to the bare minimum. This patch is just a first step towards managing live snapshots within a transaction; there is no functionality change. Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and subsequent discussion.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* HOT updates. When we update a tuple without changing any of its indexedTom Lane2007-09-20
| | | | | | | | | | | | columns, and the new version can be stored on the same heap page, we no longer generate extra index entries for the new version. Instead, index searches follow the HOT-chain links to ensure they find the correct tuple version. In addition, this patch introduces the ability to "prune" dead tuples on a per-page basis, without having to do a complete VACUUM pass to recover space. VACUUM is still needed to clean up dead index entries, however. Pavan Deolasee, with help from a bunch of other people.
* Fix up pgstats counting of live and dead tuples to recognize that committedTom Lane2007-05-27
| | | | | | | | | | | and aborted transactions have different effects; also teach it not to assume that prepared transactions are always committed. Along the way, simplify the pgstats API by tying counting directly to Relations; I cannot detect any redeeming social value in having stats pointers in HeapScanDesc and IndexScanDesc structures. And fix a few corner cases in which counts might be missed because the relation's pgstat_info pointer hadn't been set.
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.