aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* tableam: bitmap table scan.Andres Freund2019-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | This moves bitmap heap scan support to below an optional tableam callback. It's optional as the whole concept of bitmap heapscans is fairly block specific. This basically moves the work previously done in bitgetpage() into the new scan_bitmap_next_block callback, and the direct poking into the buffer done in BitmapHeapNext() into the new scan_bitmap_next_tuple() callback. The abstraction is currently somewhat leaky because nodeBitmapHeapscan.c's prefetching and visibilitymap based logic remains - it's likely that we'll later have to move more into the AM. But it's not trivial to do so without introducing a significant amount of code duplication between the AMs, so that's a project for later. Note that now nodeBitmapHeapscan.c and the associated node types are a bit misnamed. But it's not clear whether renaming wouldn't be a cure worse than the disease. Either way, that'd be best done in a separate commit. Author: Andres Freund Reviewed-By: Robert Haas (in an older version) Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* tableam: sample scan.Andres Freund2019-03-31
| | | | | | | | | | | | | | | | | | | | This moves sample scan support to below tableam. It's not optional as there is, in contrast to e.g. bitmap heap scans, no alternative way to perform tablesample queries. If an AM can't deal with the block based API, it will have to throw an ERROR. The tableam callbacks for this are block based, but given the current TsmRoutine interface, that seems to be required. The new interface doesn't require TsmRoutines to perform visibility checks anymore - that requires the TsmRoutine to know details about the AM, which we want to avoid. To continue to allow taking the returned number of tuples account SampleScanState now has a donetuples field (which previously e.g. existed in SystemRowsSamplerData), which is only incremented after the visibility check succeeds. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* tableam: Formatting and other minor cleanups.Andres Freund2019-03-31
| | | | | The superflous heapam_xlog.h includes were reported by Peter Geoghegan.
* Fix nbtree high key "continuescan" row compare bug.Peter Geoghegan2019-03-31
| | | | | | | | | | | | | Commit 29b64d1d mishandled skipping over truncated high key attributes during row comparisons. The row comparison key matching loop would loop forever when a truncated attribute was encountered for a row compare subkey. Fix by following the example of other code in the loop: advance the current subkey, or break out of the loop when the last subkey is reached. Add test coverage for the relevant _bt_check_rowcompare() code path. The new test case is somewhat tied to nbtree implementation details, which isn't ideal, but seems unavoidable.
* Add test case exercising formerly-unreached code in inheritance_planner.Tom Lane2019-03-31
| | | | | | | | There was some debate about whether the code I'd added to remap AppendRelInfos obtained from the initial SELECT planning run is actually necessary. Add a test case demonstrating that it is. Discussion: https://postgr.es/m/23831.1553873385@sss.pgh.pa.us
* Compute root->qual_security_level in a less random place.Tom Lane2019-03-31
| | | | | | | | | | | | We can set this up once and for all in subquery_planner's initial survey of the flattened rangetable, rather than incrementally adjusting it in build_simple_rel. The previous approach made it rather hard to reason about exactly when the value would be available, and we were definitely using it in some places before the final value was computed. Noted while fooling around with Amit Langote's patch to delay creation of inheritance child rels. That didn't break this code, but it made it even more fragile, IMO.
* Skip redundant anti-wraparound vacuumsMichael Paquier2019-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | An anti-wraparound vacuum has to be by definition aggressive as it needs to work on all the pages of a relation. However it can happen that due to some concurrent activity an anti-wraparound vacuum is marked as non-aggressive, which makes it redundant with a previous run, and it is actually useless as an anti-wraparound vacuum should process all the pages of a relation. This commit makes such vacuums to be skipped. An anti-wraparound vacuum not aggressive can be found easily by mixing low values of autovacuum_freeze_max_age (to control anti-wraparound) and autovacuum_freeze_table_age (to control the aggressiveness). 28a8fa9 has added some extra logging printing all the possible combinations of anti-wraparound and aggressive vacuums, which now gets simplified as an anti-wraparound vacuum also non-aggressive gets skipped. Per discussion mainly between Andrew Dunstan, Robert Haas, Álvaro Herrera, Kyotaro Horiguchi, Masahiko Sawada, and myself. Author: Kyotaro Horiguchi, Michael Paquier Reviewed-by: Andrew Dunstan, Álvaro Herrera Discussion: https://postgr.es/m/20180914153554.562muwr3uwujno75@alvherre.pgsql
* Have pg_upgrade's Makefile honor NO_TEMP_INSTALLAndrew Dunstan2019-03-31
| | | | | | Backpatch to 9.5, when pg_upgrade's location changed. Discussion: https://postgr.es/m/5506b8fa-7dad-8483-053c-7ca7ef04f01a@2ndQuadrant.com
* tableam: Move heap specific logic from estimate_rel_size below tableam.Andres Freund2019-03-30
| | | | | | | | | | | | | This just moves the table/matview[/toast] determination of relation size to a callback, and uses a copy of the existing logic to implement that callback for heap. It probably would make sense to also move the index specific logic into a callback, so the metapage handling (and probably more) can be index specific. But that's a separate task. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* tableam: VACUUM and ANALYZE support.Andres Freund2019-03-30
| | | | | | | | | | | | This is a relatively straightforward move of the current implementation to sit below tableam. As the current analyze sampling implementation is pretty inherently block based, the tableam analyze interface is as well. It might make sense to generalize that at some point, but that seems like a larger project that shouldn't be undertaken at the same time as the introduction of tableam. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Fix typoTomas Vondra2019-03-31
| | | | Author: John Naylor
* Speed up planning when partitions can be pruned at plan time.Tom Lane2019-03-30
| | | | | | | | | | | | | | | | | | | | | | Previously, the planner created RangeTblEntry and RelOptInfo structs for every partition of a partitioned table, even though many of them might later be deemed uninteresting thanks to partition pruning logic. This incurred significant overhead when there are many partitions. Arrange to postpone creation of these data structures until after we've processed the query enough to identify restriction quals for the partitioned table, and then apply partition pruning before not after creation of each partition's data structures. In this way we need not open the partition relations at all for partitions that the planner has no real interest in. For queries that can be proven at plan time to access only a small number of partitions, this patch improves the practical maximum number of partitions from under 100 to perhaps a few thousand. Amit Langote, reviewed at various times by Dilip Kumar, Jesper Pedersen, Yoshikazu Imai, and David Rowley Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
* Fix compiler warnings in multivariate MCV codeTomas Vondra2019-03-30
| | | | | | | | Compiler warnings were observed on gcc 3.4.6 (on gaur). The assert is unnecessary, as the indexes are uint16 and so always >= 0. Reported-by: Tom Lane
* Additional fixes of memory alignment in pg_mcv_list codeTomas Vondra2019-03-30
| | | | | | | | | Commit d85e0f366a tried to fix memory alignment issues in serialization and deserialization of pg_mcv_list values, but it was a few bricks shy. The arrays of uint16 indexes in serialized items was not aligned, and the both the values and isnull flags were using the same pointer. Per investigation by Tom Lane on gaur.
* Avoid crash in partitionwise join planning under GEQO.Tom Lane2019-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While trying to plan a partitionwise join, we may be faced with cases where one or both input partitions for a particular segment of the join have been pruned away. In HEAD and v11, this is problematic because earlier processing didn't bother to make a pruned RelOptInfo fully valid. With an upcoming patch to make partition pruning more efficient, this'll be even more problematic because said RelOptInfo won't exist at all. The existing code attempts to deal with this by retroactively making the RelOptInfo fully valid, but that causes crashes under GEQO because join planning is done in a short-lived memory context. In v11 we could probably have fixed this by switching to the planner's main context while fixing up the RelOptInfo, but that idea doesn't scale well to the upcoming patch. It would be better not to mess with the base-relation data structures during join planning, anyway --- that's just a recipe for order-of-operations bugs. In many cases, though, we don't actually need the child RelOptInfo, because if the input is certainly empty then the join segment's result is certainly empty, so we can skip making a join plan altogether. (The existing code ultimately arrives at the same conclusion, but only after doing a lot more work.) This approach works except when the pruned-away partition is on the nullable side of a LEFT, ANTI, or FULL join, and the other side isn't pruned. But in those cases the existing code leaves a lot to be desired anyway --- the correct output is just the result of the unpruned side of the join, but we were emitting a useless outer join against a dummy Result. Pending somebody writing code to handle that more nicely, let's just abandon the partitionwise-join optimization in such cases. When the modified code skips making a join plan, it doesn't make a join RelOptInfo either; this requires some upper-level code to cope with nulls in part_rels[] arrays. We would have had to have that anyway after the upcoming patch. Back-patch to v11 since the crash is demonstrable there. Discussion: https://postgr.es/m/8305.1553884377@sss.pgh.pa.us
* Generated columnsPeter Eisentraut2019-03-30
| | | | | | | | | | | | | | This is an SQL-standard feature that allows creating columns that are computed from expressions rather than assigned, similar to a view or materialized view but on a column basis. This implements one kind of generated column: stored (computed on write). Another kind, virtual (computed on read), is planned for the future, and some room is left for it. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
* Small code simplification for REINDEX CONCURRENTLYPeter Eisentraut2019-03-30
| | | | This was left over from an earlier code structure.
* Tweak some nbtree-related code comments.Peter Geoghegan2019-03-29
|
* Fix memory alignment in pg_mcv_list serializationTomas Vondra2019-03-29
| | | | | | | | | | | | | Blind attempt at fixing ia64, hppa an sparc builds. The serialized representation of MCV lists did not enforce proper memory alignment for internal fields, resulting in deserialization issues on platforms that are more sensitive to this (ia64, sparc and hppa). This forces a catalog version bump, because the layout of serialized pg_mcv_list changes. Broken since 7300a699.
* Show table access methods as such in psql's \dA.Andres Freund2019-03-29
| | | | | | | Previously we didn't display a type for table access methods. Author: Haribabu Kommi Discussion: CAJrrPGeeYOqP3hkZyohDx_8dot4zvPuPMDBmhJ=iC85cTBNeYw@mail.gmail.com
* tableam: Comment fixes.Andres Freund2019-03-29
| | | | | Author: Haribabu Kommi Discussion: CAJrrPGeeYOqP3hkZyohDx_8dot4zvPuPMDBmhJ=iC85cTBNeYw@mail.gmail.com
* Allow existing VACUUM options to take a Boolean argument.Robert Haas2019-03-29
| | | | | | | | | | | | This makes VACUUM work more like EXPLAIN already does without changing the meaning of any commands that already work. It is intended to facilitate the addition of future VACUUM options that may take non-Boolean parameters or that default to false. Masahiko Sawada, reviewed by me. Discussion: http://postgr.es/m/CA+TgmobpYrXr5sUaEe_T0boabV0DSm=utSOZzwCUNqfLEEm8Mw@mail.gmail.com Discussion: http://postgr.es/m/CAD21AoBaFcKBAeL5_++j+Vzir2vBBcF4juW7qH8b3HsQY=Q6+w@mail.gmail.com
* Warn more strongly about the dangers of exclusive backup mode.Robert Haas2019-03-29
| | | | | | | | | | | Especially, warn about the hazards of mishandling the backup_label file. Adjust a couple of server messages to be more clear about the hazards associated with removing backup_label files, too. David Steele and Robert Haas, reviewed by Laurenz Albe, Martín Marqués, Peter Eisentraut, and Magnus Hagander. Discussion: http://postgr.es/m/7d85c387-000e-16f0-e00b-50bf83c22127@pgmasters.net
* Fix incorrect code in new REINDEX CONCURRENTLY codePeter Eisentraut2019-03-29
| | | | | | | The previous code was adding pointers to transient variables to a list, but by the time the list was read, the variable might be gone, depending on the compiler. Fix it by making copies in the proper memory context.
* REINDEX CONCURRENTLYPeter Eisentraut2019-03-29
| | | | | | | | | | | | | | | | This adds the CONCURRENTLY option to the REINDEX command. A REINDEX CONCURRENTLY on a specific index creates a new index (like CREATE INDEX CONCURRENTLY), then renames the old index away and the new index in place and adjusts the dependencies, and then drops the old index (like DROP INDEX CONCURRENTLY). The REINDEX command also has the capability to run its other variants (TABLE, DATABASE) with the CONCURRENTLY option (but not SYSTEM). The reindexdb command gets the --concurrently option. Author: Michael Paquier, Andreas Karlsson, Peter Eisentraut Reviewed-by: Andres Freund, Fujii Masao, Jim Nasby, Sergei Kornilov Discussion: https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee
* tableam: relation creation, VACUUM FULL/CLUSTER, SET TABLESPACE.Andres Freund2019-03-28
| | | | | | | | | | | | | | | | This moves the responsibility for: - creating the storage necessary for a relation, including creating a new relfilenode for a relation with existing storage - non-transactional truncation of a relation - VACUUM FULL / CLUSTER's rewrite of a table below tableam. This is fairly straight forward, with a bit of complexity smattered in to move the computation of xid / multixid horizons below the AM, as they don't make sense for every table AM. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Fix typo.Thomas Munro2019-03-29
| | | | Author: Masahiko Sawada
* Fix a few comment copy & pastos.Andres Freund2019-03-28
|
* Fix deserialization of pg_mcv_list valuesTomas Vondra2019-03-28
| | | | | | | | | | | | | | | | | | There were multiple issues in deserialization of pg_mcv_list values. Firstly, the data is loaded from syscache, but the deserialization was performed after ReleaseSysCache(), at which point the data might have already disappeared. Fixed by moving the calls in statext_mcv_load, and using the same NULL-handling code as existing stats. Secondly, the deserialized representation used pointers into the serialized representation. But that is also unsafe, because the data may disappear at any time. Fixed by reworking and simplifying the deserialization code to always copy all the data. And thirdly, when deserializing values for types passed by value, the code simply did memcpy(d,s,typlen) which however does not work on bigendian machines. Fixed by using fetch_att/store_att_byval.
* Use FullTransactionId for the transaction stack.Thomas Munro2019-03-28
| | | | | | | | | | | | | | | | Provide GetTopFullTransactionId() and GetCurrentFullTransactionId(). The intended users of these interfaces are access methods that use xids for visibility checks but don't want to have to go back and "freeze" existing references some time later before the 32 bit xid counter wraps around. Use a new struct to serialize the transaction state for parallel query, because FullTransactionId doesn't fit into the previous serialization scheme very well. Author: Thomas Munro Reviewed-by: Heikki Linnakangas Discussion: https://postgr.es/m/CAA4eK1%2BMv%2Bmb0HFfWM9Srtc6MVe160WFurXV68iAFMcagRZ0dQ%40mail.gmail.com
* Add basic infrastructure for 64 bit transaction IDs.Thomas Munro2019-03-28
| | | | | | | | | | | | | | | | | | | | | | | | Instead of inferring epoch progress from xids and checkpoints, introduce a 64 bit FullTransactionId type and use it to track xid generation. This fixes an unlikely bug where the epoch is reported incorrectly if the range of active xids wraps around more than once between checkpoints. The only user-visible effect of this commit is to correct the epoch used by txid_current() and txid_status(), also visible with pg_controldata, in those rare circumstances. It also creates some basic infrastructure so that later patches can use 64 bit transaction IDs in more places. The new type is a struct that we pass by value, as a form of strong typedef. This prevents the sort of accidental confusion between TransactionId and FullTransactionId that would be possible if we were to use a plain old uint64. Author: Thomas Munro Reported-by: Amit Kapila Reviewed-by: Andres Freund, Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/CAA4eK1%2BMv%2Bmb0HFfWM9Srtc6MVe160WFurXV68iAFMcagRZ0dQ%40mail.gmail.com
* tableam: Support for an index build's initial table scan(s).Andres Freund2019-03-27
| | | | | | | | | | | | | To support building indexes over tables of different AMs, the scans to do so need to be routed through the table AM. While moving a fair amount of code, nearly all the changes are just moving code to below a callback. Currently the range based interface wouldn't make much sense for non block based table AMs. But that seems aceptable for now. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Minor improvements for the multivariate MCV listsTomas Vondra2019-03-27
| | | | | | | | | | | | | | The MCV build should always call get_mincount_for_mcv_list(), as the there is no other logic to decide whether the MCV list represents all the data. So just remove the (ngroups > nitems) condition. Also, when building MCV lists, the number of items was limited by the statistics target (i.e. up to 10000). But when deserializing the MCV list, a different value (8192) was used to check the input, causing an error. Simply ensure that the same value is used in both places. This should have been included in 7300a69950, but I forgot to include it in that commit.
* Add support for multivariate MCV listsTomas Vondra2019-03-27
| | | | | | | | | | | | | | | | Introduce a third extended statistic type, supported by the CREATE STATISTICS command - MCV lists, a generalization of the statistic already built and used for individual columns. Compared to the already supported types (n-distinct coefficients and functional dependencies), MCV lists are more complex, include column values and allow estimation of much wider range of common clauses (equality and inequality conditions, IS NULL, IS NOT NULL etc.). Similarly to the other types, a new pseudo-type (pg_mcv_list) is used. Author: Tomas Vondra Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
* Avoid passing query tlist around separately from root->processed_tlist.Tom Lane2019-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the dim past, the planner kept the fully-processed version of the query targetlist (the result of preprocess_targetlist) in grouping_planner's local variable "tlist", and only grudgingly passed it to individual other routines as needed. Later we discovered a need to still have it available after grouping_planner finishes, and invented the root->processed_tlist field for that purpose, but it wasn't used internally to grouping_planner; the tlist was still being passed around separately in the same places as before. Now comes a proposed patch to allow appendrel expansion to add entries to the processed tlist, well after preprocess_targetlist has finished its work. To avoid having to pass around the tlist explicitly, it's proposed to allow appendrel expansion to modify root->processed_tlist. That makes aliasing the tlist with assorted parameters and local variables really scary. It would accidentally work as long as the tlist is initially nonempty, because then the List header won't move around, but it's not exactly hard to think of ways for that to break. Aliased values are poor programming practice anyway. Hence, get rid of local variables and parameters that can be identified with root->processed_tlist, in favor of just using that field directly. And adjust comments to match. (Some of the new comments speak as though it's already possible for appendrel expansion to modify the tlist; that's not true yet, but will happen in a later patch.) Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
* pgbench: doExecuteCommand -> executeMetaCommandAlvaro Herrera2019-03-27
| | | | | | | | | | | The new function is only in charge of meta commands, not SQL commands. This change makes the code a little clearer: now all the state changes are effected by advanceConnectionState. It also removes one indent level, which makes the diff look bulkier than it really is. Author: Fabien Coelho Reviewed-by: Kirk Jamison Discussion: https://postgr.es/m/alpine.DEB.2.21.1811240904500.12627@lancre
* Suppress uninitialized-variable warning.Tom Lane2019-03-27
| | | | | Apparently Andres' compiler is smart enough to see that hpage must be initialized before use ... but mine isn't.
* Improve error handling of column references in expression transformationMichael Paquier2019-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Column references are not allowed in default expressions and partition bound expressions, and are restricted as such once the transformation of their expressions is done. However, trying to use more complex column references can lead to confusing error messages. For example, trying to use a two-field column reference name for default expressions and partition bounds leads to "missing FROM-clause entry for table", which makes no sense in their respective context. In order to make the errors generated more useful, this commit adds more verbose messages when transforming column references depending on the context. This has a little consequence though: for example an expression using an aggregate with a column reference as argument would cause an error to be generated for the column reference, while the aggregate was the problem reported before this commit because column references get transformed first. The confusion exists for default expressions for a long time, and the problem is new as of v12 for partition bounds. Still per the lack of complaints on the matter no backpatch is done. The patch has been written by Amit Langote and me, and Tom Lane has provided the improvement of the documentation for default expressions on the CREATE TABLE page. Author: Amit Langote, Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20190326020853.GM2558@paquier.xyz
* Fix off-by-one error in txid_status().Thomas Munro2019-03-27
| | | | | | | | | | | | | The transaction ID returned by GetNextXidAndEpoch() is in the future, so we can't attempt to access its status or we might try to read a CLOG page that doesn't exist. The > vs >= confusion probably stemmed from the choice of a variable name containing the word "last" instead of "next", so fix that too. Back-patch to 10 where the function arrived. Author: Thomas Munro Discussion: https://postgr.es/m/CA%2BhUKG%2Buua_BV5cyfsioKVN2d61Lukg28ECsWTXKvh%3DBtN2DPA%40mail.gmail.com
* Switch some palloc/memset calls to palloc0Michael Paquier2019-03-27
| | | | | | | | | | | Some code paths have been doing some allocations followed by an immediate memset() to initialize the allocated area with zeros, this is a bit overkill as there are already interfaces to do both things in one call. Author: Daniel Gustafsson Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/vN0OodBPkKs7g2Z1uyk3CUEmhdtspHgYCImhlmSxv1Xn6nY1ZnaaGHL8EWUIQ-NEv36tyc4G5-uA3UXUF2l4sFXtK_EQgLN1hcgunlFVKhA=@yesql.se
* Switch function current_schema[s]() to be parallel-unsafeMichael Paquier2019-03-27
| | | | | | | | | | | | | | | | | | | | | When invoked for the first time in a session, current_schema() and current_schemas() can finish by creating a temporary schema. Currently those functions are parallel-safe, however if for a reason or another they get launched across multiple parallel workers, they would fail when attempting to create a temporary schema as temporary contexts are not supported in this case. The original issue has been spotted by buildfarm members crake and lapwing, after commit c5660e0 has introduced the first regression tests based on current_schema() in the tree. After that, 396676b has introduced a workaround to avoid parallel plans but that was not completely right either. Catversion is bumped. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20190118024618.GF1883@paquier.xyz
* Track unowned relations in doubly-linked listTomas Vondra2019-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | Relations dropped in a single transaction are tracked in a list of unowned relations. With large number of dropped relations this resulted in poor performance at the end of a transaction, when the relations are removed from the singly linked list one by one. Commit b4166911 attempted to address this issue (particularly when it happens during recovery) by removing the relations in a reverse order, resulting in O(1) lookups in the list of unowned relations. This did not work reliably, though, and it was possible to trigger the O(N^2) behavior in various ways. Instead of trying to remove the relations in a specific order with respect to the linked list, which seems rather fragile, switch to a regular doubly linked. That allows us to remove relations cheaply no matter where in the list they are. As b4166911 was a bugfix, backpatched to all supported versions, do the same thing here. Reviewed-by: Alvaro Herrera Discussion: https://www.postgresql.org/message-id/flat/80c27103-99e4-1d0c-642c-d9f3b94aaa0a%402ndquadrant.com Backpatch-through: 9.4
* Compute XID horizon for page level index vacuum on primary.Andres Freund2019-03-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the xid horizon was only computed during WAL replay. That had two major problems: 1) It relied on knowing what the table pointed to looks like. That was easy enough before the introducing of tableam (we knew it had to be heap, although some trickery around logging the heap relfilenodes was required). But to properly handle table AMs we need per-database catalog access to look up the AM handler, which recovery doesn't allow. 2) Not knowing the xid horizon also makes it hard to support logical decoding on standbys. When on a catalog table, we need to be able to conflict with slots that have an xid horizon that's too old. But computing the horizon by visiting the heap only works once consistency is reached, but we always need to be able to detect conflicts. There's also a secondary problem, in that the current method performs redundant work on every standby. But that's counterbalanced by potentially computing the value when not necessary (either because there's no standby, or because there's no connected backends). Solve 1) and 2) by moving computation of the xid horizon to the primary and by involving tableam in the computation of the horizon. To address the potentially increased overhead, increase the efficiency of the xid horizon computation for heap by sorting the tids, and eliminating redundant buffer accesses. When prefetching is available, additionally perform prefetching of buffers. As this is more of a maintenance task, rather than something routinely done in every read only query, we add an arbitrary 10 to the effective concurrency - thereby using IO concurrency, when not globally enabled. That's possibly not the perfect formula, but seems good enough for now. Bumps WAL format, as latestRemovedXid is now part of the records, and the heap's relfilenode isn't anymore. Author: Andres Freund, Amit Khandekar, Robert Haas Reviewed-By: Robert Haas Discussion: https://postgr.es/m/20181212204154.nsxf3gzqv3gesl32@alap3.anarazel.de https://postgr.es/m/20181214014235.dal5ogljs3bmlq44@alap3.anarazel.de https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Fix partitioned index creation bug with dropped columnsAlvaro Herrera2019-03-26
| | | | | | | | | | | | | | | | | | | ALTER INDEX .. ATTACH PARTITION fails if the partitioned table where the index is defined contains more dropped columns than its partition, with this message: ERROR: incorrect attribute map The cause was that one caller of CompareIndexInfo was passing the number of attributes of the partition rather than the parent, which confused the length check. Repair. This can cause pg_upgrade to fail when used on such a database. Leave some more objects around after regression tests, so that the case is detected by pg_upgrade test suite. Remove some spurious empty lines noticed while looking for other cases of the same problem. Discussion: https://postgr.es/m/20190326213924.GA2322@alvherre.pgsql
* Build "other rels" of appendrel baserels in a separate step.Tom Lane2019-03-26
| | | | | | | | | | | | | | | | | | | | | | | Up to now, otherrel RelOptInfos were built at the same time as baserel RelOptInfos, thanks to recursion in build_simple_rel(). However, nothing in query_planner's preprocessing cares at all about otherrels, only baserels, so we don't really need to build them until just before we enter make_one_rel. This has two benefits: * create_lateral_join_info did a lot of extra work to propagate lateral-reference information from parents to the correct children. But if we delay creation of the children till after that, it's trivial (and much harder to break, too). * Since we have all the restriction quals correctly assigned to parent appendrels by this point, it'll be possible to do plan-time pruning and never make child RelOptInfos at all for partitions that can be pruned away. That's not done here, but will be later on. Amit Langote, reviewed at various times by Dilip Kumar, Jesper Pedersen, Yoshikazu Imai, and David Rowley Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
* Add ORDER BY to more ICU regression test cases.Tom Lane2019-03-26
| | | | | Commit c77e12208 didn't fully fix the problem. Per buildfarm and local testing.
* Fix oversight in data-type change for autovacuum_vacuum_cost_delay.Tom Lane2019-03-26
| | | | | | | | | | | | | Commit caf626b2c missed that the relevant reloptions entry needs to be moved from the intRelOpts[] array to realRelOpts[]. Somewhat surprisingly, it seems to work anyway, perhaps because the desired default and limit values are all integers. We ought to have either a simpler data structure or better cross-checking here, but that's for another patch. Nikolay Shaplov Discussion: https://postgr.es/m/4861742.12LTaSB3sv@x200m
* psql: Schema-qualify typecast in one \d queryAlvaro Herrera2019-03-26
| | | | Bug introduced in my commit bc87f22ef6ef
* Get rid of duplicate child RTE for a partitioned table.Tom Lane2019-03-26
| | | | | | | | | | | | | | | | | | We've been creating duplicate RTEs for partitioned tables just because we do so for regular inheritance parent tables. But unlike regular-inheritance parents which are themselves regular tables and thus need to be scanned, partitioned tables don't need the extra RTE. This makes the conditions for building a child RTE the same as those for building an AppendRelInfo, allowing minor simplification in expand_single_inheritance_child. Since the planner's actual processing is driven off the AppendRelInfo list, nothing much changes beyond that, we just have one fewer useless RTE entry. Amit Langote, reviewed and hacked a bit by me Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
* Improve psql's \d display of foreign key constraintsAlvaro Herrera2019-03-26
| | | | | | | | | | | | | | | | | When used on a partition containing foreign keys coming from one of its ancestors, \d would (rather unhelpfully) print the details about the pg_constraint row in the partition. This becomes a bit frustrating when the user tries things like dropping the FK in the partition; instead, show the details for the foreign key on the table where it is defined. Also, when a table is referenced by a foreign key on a partitioned table, we would show multiple "Referenced by" lines, one for each partition, which gets unwieldy pretty fast. Modify that so that it shows only one line for the ancestor partitioned table where the FK is defined. Discussion: https://postgr.es/m/20181204143834.ym6euxxxi5aeqdpn@alvherre.pgsql Reviewed-by: Tom Lane, Amit Langote, Peter Eisentraut