aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/cache
Commit message (Collapse)AuthorAge
...
* Generated columnsPeter Eisentraut2019-03-30
| | | | | | | | | | | | | | This is an SQL-standard feature that allows creating columns that are computed from expressions rather than assigned, similar to a view or materialized view but on a column basis. This implements one kind of generated column: stored (computed on write). Another kind, virtual (computed on read), is planned for the future, and some room is left for it. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
* tableam: relation creation, VACUUM FULL/CLUSTER, SET TABLESPACE.Andres Freund2019-03-28
| | | | | | | | | | | | | | | | This moves the responsibility for: - creating the storage necessary for a relation, including creating a new relfilenode for a relation with existing storage - non-transactional truncation of a relation - VACUUM FULL / CLUSTER's rewrite of a table below tableam. This is fairly straight forward, with a bit of complexity smattered in to move the computation of xid / multixid horizons below the AM, as they don't make sense for every table AM. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Collations with nondeterministic comparisonPeter Eisentraut2019-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
* Further reduce memory footprint of CLOBBER_CACHE_ALWAYS testing.Tom Lane2019-03-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some buildfarm members using CLOBBER_CACHE_ALWAYS have been having OOM problems of late. Commit 2455ab488 addressed this problem by recovering space transiently used within RelationBuildPartitionDesc, but it turns out that leaves quite a lot on the table, because other subroutines of RelationBuildDesc also leak memory like mad. Let's move the temp-context management into RelationBuildDesc so that leakage from the other subroutines is also recovered. I examined this issue by arranging for postgres.c to dump the size of MessageContext just before resetting it in each command cycle, and then running the update.sql regression test (which is one of the two that are seeing buildfarm OOMs) with and without CLOBBER_CACHE_ALWAYS. Before 2455ab488, the peak space usage with CCA was as much as 250MB. That patch got it down to ~80MB, but with this patch it's about 0.5MB, and indeed the space usage now seems nearly indistinguishable from a non-CCA build. RelationBuildDesc's traditional behavior of not worrying about leaking transient data is of many years' standing, so I'm pretty hesitant to change that without more evidence that it'd be useful in a normal build. (So far as I can see, non-CCA memory consumption is about the same with or without this change, whuch if anything suggests that it isn't useful.) Hence, configure the patch so that we recover space only when CLOBBER_CACHE_ALWAYS or CLOBBER_CACHE_RECURSIVELY is defined. However, that choice can be overridden at compile time, in case somebody would like to do some performance testing and try to develop evidence for changing that decision. It's possible that we ought to back-patch this change, but in the absence of back-branch OOM problems in the buildfarm, I'm not in a hurry to do that. Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
* Fix typos in commit 8586bf7ed8.Amit Kapila2019-03-11
| | | | | Author: Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KNv1Mg2krf4E9ssWFnE=8A9mZ1VbVywXBZTFSzb+wP2g@mail.gmail.com
* Move hash_any prototype from access/hash.h to utils/hashutils.hAlvaro Herrera2019-03-11
| | | | | | | | | | | | | | | | | | | | | ... as well as its implementation from backend/access/hash/hashfunc.c to backend/utils/hash/hashfn.c. access/hash is the place for the hash index AM, not really appropriate for generic facilities, which is what hash_any is; having things the old way meant that anything using hash_any had to include the AM's include file, pointlessly polluting its namespace with unrelated, unnecessary cruft. Also move the HTEqual strategy number to access/stratnum.h from access/hash.h. To avoid breaking third-party extension code, add an #include "utils/hashutils.h" to access/hash.h. (An easily removed line by committers who enjoy their asbestos suits to protect them from angry extension authors.) Discussion: https://postgr.es/m/201901251935.ser5e4h6djt2@alvherre.pgsql
* Tighten use of OpenTransientFile and CloseTransientFileMichael Paquier2019-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixes two sets of issues related to the use of transient files in the backend: 1) OpenTransientFile() has been used in some code paths with read-write flags while read-only is sufficient, so switch those calls to be read-only where necessary. These have been reported by Joe Conway. 2) When opening transient files, it is up to the caller to close the file descriptors opened. In error code paths, CloseTransientFile() gets called to clean up things before issuing an error. However in normal exit paths, a lot of callers of CloseTransientFile() never actually reported errors, which could leave a file descriptor open without knowing about it. This is an issue I complained about a couple of times, but never had the courage to write and submit a patch, so here we go. Note that one frontend code path is impacted by this commit so as an error is issued when fetching control file data, making backend and frontend to be treated consistently. Reported-by: Joe Conway, Michael Paquier Author: Michael Paquier Reviewed-by: Álvaro Herrera, Georgios Kokolatos, Joe Conway Discussion: https://postgr.es/m/20190301023338.GD1348@paquier.xyz Discussion: https://postgr.es/m/c49b69ec-e2f7-ff33-4f17-0eaa4f2cef27@joeconway.com
* Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.Robert Haas2019-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We still require AccessExclusiveLock on the partition itself, because otherwise an insert that violates the newly-imposed partition constraint could be in progress at the same time that we're changing that constraint; only the lock level on the parent relation is weakened. To make this safe, we have to cope with (at least) three separate problems. First, relevant DDL might commit while we're in the process of building a PartitionDesc. If so, find_inheritance_children() might see a new partition while the RELOID system cache still has the old partition bound cached, and even before invalidation messages have been queued. To fix that, if we see that the pg_class tuple seems to be missing or to have a null relpartbound, refetch the value directly from the table. We can't get the wrong value, because DETACH PARTITION still requires AccessExclusiveLock throughout; if we ever want to change that, this will need more thought. In testing, I found it quite difficult to hit even the null-relpartbound case; the race condition is extremely tight, but the theoretical risk is there. Second, successive calls to RelationGetPartitionDesc might not return the same answer. The query planner will get confused if lookup up the PartitionDesc for a particular relation does not return a consistent answer for the entire duration of query planning. Likewise, query execution will get confused if the same relation seems to have a different PartitionDesc at different times. Invent a new PartitionDirectory concept and use it to ensure consistency. This ensures that a single invocation of either the planner or the executor sees the same view of the PartitionDesc from beginning to end, but it does not guarantee that the planner and the executor see the same view. Since this allows pointers to old PartitionDesc entries to survive even after a relcache rebuild, also postpone removing the old PartitionDesc entry until we're certain no one is using it. For the most part, it seems to be OK for the planner and executor to have different views of the PartitionDesc, because the executor will just ignore any concurrently added partitions which were unknown at plan time; those partitions won't be part of the inheritance expansion, but invalidation messages will trigger replanning at some point. Normally, this happens by the time the very next command is executed, but if the next command acquires no locks and executes a prepared query, it can manage not to notice until a new transaction is started. We might want to tighten that up, but it's material for a separate patch. There would still be a small window where a query that started just after an ATTACH PARTITION command committed might fail to notice its results -- but only if the command starts before the commit has been acknowledged to the user. All in all, the warts here around serializability seem small enough to be worth accepting for the considerable advantage of being able to add partitions without a full table lock. Although in general the consequences of new partitions showing up between planning and execution are limited to the query not noticing the new partitions, run-time partition pruning will get confused in that case, so that's the third problem that this patch fixes. Run-time partition pruning assumes that indexes into the PartitionDesc are stable between planning and execution. So, add code so that if new partitions are added between plan time and execution time, the indexes stored in the subplan_map[] and subpart_map[] arrays within the plan's PartitionedRelPruneInfo get adjusted accordingly. There does not seem to be a simple way to generalize this scheme to cope with partitions that are removed, mostly because they could then get added back again with different bounds, but it works OK for added partitions. This code does not try to ensure that every backend participating in a parallel query sees the same view of the PartitionDesc. That currently doesn't matter, because we never pass PartitionDesc indexes between backends. Each backend will ignore the concurrently added partitions which it notices, and it doesn't matter if different backends are ignoring different sets of concurrently added partitions. If in the future that matters, for example because we allow writes in parallel query and want all participants to do tuple routing to the same set of partitions, the PartitionDirectory concept could be improved to share PartitionDescs across backends. There is a draft patch to serialize and restore PartitionDescs on the thread where this patch was discussed, which may be a useful place to start. Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs, Amit Langote, and Michael Paquier for discussion, and to Alvaro Herrera for some review. Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
* tableam: introduce table AM infrastructure.Andres Freund2019-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces the concept of table access methods, i.e. CREATE ACCESS METHOD ... TYPE TABLE and CREATE TABLE ... USING (storage-engine). No table access functionality is delegated to table AMs as of this commit, that'll be done in following commits. Subsequent commits will incrementally abstract table access functionality to be routed through table access methods. That change is too large to be reviewed & committed at once, so it'll be done incrementally. Docs will be updated at the end, as adding them incrementally would likely make them less coherent, and definitely is a lot more work, without a lot of benefit. Table access methods are specified similar to index access methods, i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with callbacks. In contrast to index AMs that struct needs to live as long as a backend, typically that's achieved by just returning a pointer to a constant struct. Psql's \d+ now displays a table's access method. That can be disabled with HIDE_TABLEAM=true, which is mainly useful so regression tests can be run against different AMs. It's quite possible that this behaviour still needs to be fine tuned. For now it's not allowed to set a table AM for a partitioned table, as we've not resolved how partitions would inherit that. Disallowing allows us to introduce, if we decide that's the way forward, such a behaviour without a compatibility break. Catversion bumped, to add the heap table AM and references to it. Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
* Move code for managing PartitionDescs into a new file, partdesc.cRobert Haas2019-02-21
| | | | | | | | | | This is similar in spirit to the existing partbounds.c file in the same directory, except that there's a lot less code in the new file created by this commit. Pending work in this area proposes to add a bunch more code related to PartitionDescs, though, and this will give us a good place to put it. Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
* Use varargs macro for CACHEDEBUGPeter Eisentraut2019-02-19
| | | | Reviewed-by: Andres Freund <andres@anarazel.de>
* Build out the planner support function infrastructure.Tom Lane2019-02-09
| | | | | | | | | | | | | | | | | | | | | | | | Add support function requests for estimating the selectivity, cost, and number of result rows (if a SRF) of the target function. The lack of a way to estimate selectivity of a boolean-returning function in WHERE has been a recognized deficiency of the planner since Berkeley days. This commit finally fixes it. In addition, non-constant estimates of cost and number of output rows are now possible. We still fall back to looking at procost and prorows if the support function doesn't service the request, of course. To make concrete use of the possibility of estimating output rowcount for SRFs, this commit adds support functions for array_unnest(anyarray) and the integer variants of generate_series; the lack of plausible rowcount estimates for those, even when it's obvious to a human, has been a repeated subject of complaints. Obviously, much more could now be done in this line, but I'm mostly just trying to get the infrastructure in place. Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
* Fix a crash in logical replicationPeter Eisentraut2019-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bug was that determining which columns are part of the replica identity index using RelationGetIndexAttrBitmap() would run eval_const_expressions() on index expressions and predicates across all indexes of the table, which in turn might require a snapshot, but there wasn't one set, so it crashes. There were actually two separate bugs, one on the publisher and one on the subscriber. To trigger the bug, a table that is part of a publication or subscription needs to have an index with a predicate or expression that lends itself to constant expressions simplification. The fix is to avoid the constant expressions simplification in RelationGetIndexAttrBitmap(), so that it becomes safe to call in these contexts. The constant expressions simplification comes from the calls to RelationGetIndexExpressions()/RelationGetIndexPredicate() via BuildIndexInfo(). But RelationGetIndexAttrBitmap() calling BuildIndexInfo() is overkill. The latter just takes pg_index catalog information, packs it into the IndexInfo structure, which former then just unpacks again and throws away. We can just do this directly with less overhead and skip the troublesome calls to eval_const_expressions(). This also removes the awkward cross-dependency between relcache.c and index.c. Bug: #15114 Reported-by: Петър Славов <pet.slavov@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/152110589574.1223.17983600132321618383@wrigleys.postgresql.org/
* Refactor planner's header files.Tom Lane2019-01-29
| | | | | | | | | | | | | | | | | | | | | | | | Create a new header optimizer/optimizer.h, which exposes just the planner functions that can be used "at arm's length", without need to access Paths or the other planner-internal data structures defined in nodes/relation.h. This is intended to provide the whole planner API seen by most of the rest of the system; although FDWs still need to use additional stuff, and more thought is also needed about just what selfuncs.c should rely on. The main point of doing this now is to limit the amount of new #include baggage that will be needed by "planner support functions", which I expect to introduce later, and which will be in relevant datatype modules rather than anywhere near the planner. This commit just moves relevant declarations into optimizer.h from other header files (a couple of which go away because everything got moved), and adjusts #include lists to match. There's further cleanup that could be done if we want to decide that some stuff being exposed by optimizer.h doesn't belong in the planner at all, but I'll leave that for another day. Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
* Make some small planner API cleanups.Tom Lane2019-01-29
| | | | | | | | | | | | | | | | | | | | | | Move a few very simple node-creation and node-type-testing functions from the planner's clauses.c to nodes/makefuncs and nodes/nodeFuncs. There's nothing planner-specific about them, as evidenced by the number of other places that were using them. While at it, rename and_clause() etc to is_andclause() etc, to clarify that they are node-type-testing functions not node-creation functions. And use "static inline" implementations for the shortest ones. Also, modify flatten_join_alias_vars() and some subsidiary functions to take a Query not a PlannerInfo to define the join structure that Vars should be translated according to. They were only using the "parse" field of the PlannerInfo anyway, so this just requires removing one level of indirection. The advantage is that now parse_agg.c can use flatten_join_alias_vars() without the horrid kluge of creating an incomplete PlannerInfo, which will allow that file to be decoupled from relation.h in a subsequent patch. Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
* Fix misc typos in comments.Heikki Linnakangas2019-01-23
| | | | | | Spotted mostly by Fabien Coelho. Discussion: https://www.postgresql.org/message-id/alpine.DEB.2.21.1901230947050.16643@lancre
* Rename RelationData.rd_amroutine to rd_indam.Andres Freund2019-01-21
| | | | | | | | | The upcoming table AM support makes rd_amroutine to generic, as its only about index AMs. The new name makes that clear, and is shorter to boot. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Rephrase references to "time qualification".Andres Freund2019-01-21
| | | | | | | | | Now that the relevant code has, for other reasons, moved out of tqual.[ch], it seems time to refer to visiblity rather than time qualification. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Move remaining code from tqual.[ch] to heapam.h / heapam_visibility.c.Andres Freund2019-01-21
| | | | | | | | | | | | | | Given these routines are heap specific, and that there will be more generic visibility support in via table AM, it makes sense to move the prototypes to heapam.h (routines like HeapTupleSatisfiesVacuum will not be exposed in a generic fashion, because they are too storage specific). Similarly, the code in tqual.c is specific to heap, so moving it into access/heap/ makes sense. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Remove useless bms_copy step in RelationGetIndexAttrBitmap.Tom Lane2019-01-21
| | | | | | | | | | Seems to be from a bad case of copy-and-paste-itis in commit 665d1fad9. It wouldn't be quite so annoying if it didn't contradict the comment half a dozen lines above. David Rowley Discussion: https://postgr.es/m/CAKJS1f95Dyf8Qkdz4W+PbCmT-HTb54tkqUCC8isa2RVgSJ_pXQ@mail.gmail.com
* Flush relcache entries when their FKs are meddled withAlvaro Herrera2019-01-21
| | | | | | | | | | | | | | | Back in commit 100340e2dcd0, we made relcache entries keep lists of the foreign keys applying to the relation -- but we forgot to update CacheInvalidateHeapTuple to flush those entries when new FKs got created or existing ones updated/deleted. No bugs appear to have been reported that would be explained by this ommission, but I noticed the problem while working on an unrelated bugfix which clearly showed it. Fix by adding relcache flush on relevant foreign key changes. Backpatch to 9.6, like the aforementioned commit. Discussion: https://postgr.es/m/201901211927.7mmhschxlejh@alvherre.pgsql Reviewed-by: Tom Lane
* Remove superfluous tqual.h includes.Andres Freund2019-01-21
| | | | | | | | | | | | Most of these had been obsoleted by 568d4138c / the SnapshotNow removal. This is is preparation for moving most of tqual.[ch] into either snapmgr.h or heapam.h, which in turn is in preparation for pluggable table AMs. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Replace uses of heap_open et al with the corresponding table_* function.Andres Freund2019-01-21
| | | | | Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
* Replace heapam.h includes with {table, relation}.h where applicable.Andres Freund2019-01-21
| | | | | | | | | A lot of files only included heapam.h for relation_open, heap_open etc - replace the heapam.h include in those files with the narrower header. Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
* Refactor duplicate code into DeconstructFkConstraintRowAlvaro Herrera2019-01-18
| | | | | | | | | | | My commit 3de241dba86f introduced some code (in tablecmds.c) to obtain data from a pg_constraint row for a foreign key, that already existed in ri_triggers.c. Split it out into its own routine in pg_constraint.c, where it naturally belongs. No functional code changes, only code movement. Backpatch to pg11, because a future bugfix is simpler after this.
* Finish reverting "recheck_on_update" patch.Tom Lane2019-01-15
| | | | | | | | | | | | This reverts commit c203d6cf8 and some follow-on fixes, completing the task begun in commit 5d28c9bd7. If that feature is ever resurrected, the code will look quite a bit different from this, so it seems best to start from a clean slate. The v11 branch is not touched; in that branch, the recheck_on_update storage option remains present, but nonfunctional and undocumented. Discussion: https://postgr.es/m/20190114223409.3tcvejfhlvbucrv5@alap3.anarazel.de
* Don't include heapam.h from others headers.Andres Freund2019-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | heapam.h previously was included in a number of widely used headers (e.g. execnodes.h, indirectly in executor.h, ...). That's problematic on its own, as heapam.h contains a lot of low-level details that don't need to be exposed that widely, but becomes more problematic with the upcoming introduction of pluggable table storage - it seems inappropriate for heapam.h to be included that widely afterwards. heapam.h was largely only included in other headers to get the HeapScanDesc typedef (which was defined in heapam.h, even though HeapScanDescData is defined in relscan.h). The better solution here seems to be to just use the underlying struct (forward declared where necessary). Similar for BulkInsertState. Another problem was that LockTupleMode was used in executor.h - parts of the file tried to cope without heapam.h, but due to the fact that it indirectly included it, several subsequent violations of that goal were not not noticed. We could just reuse the approach of declaring parameters as int, but it seems nicer to move LockTupleMode to lockoptions.h - that's not a perfect location, but also doesn't seem bad. As a number of files relied on implicitly included heapam.h, a significant number of files grew an explicit include. It's quite probably that a few external projects will need to do the same. Author: Andres Freund Reviewed-By: Alvaro Herrera Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
* Don't create relfilenode for relations without storageAlvaro Herrera2019-01-04
| | | | | | | | | | | | Some relation kinds had relfilenode set to some non-zero value, but apparently the actual files did not really exist because creation was prevented elsewhere. Get rid of the phony pg_class.relfilenode values. Catversion bumped, but only because the sanity_test check will fail if run in a system initdb'd with the previous version. Reviewed-by: Kyotaro HORIGUCHI, Michael Paquier Discussion: https://postgr.es/m/20181206215552.fm2ypuxq6nhpwjuc@alvherre.pgsql
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Remove obsolete IndexIs* macrosPeter Eisentraut2018-12-27
| | | | | | | | | Remove IndexIsValid(), IndexIsReady(), IndexIsLive() in favor of accessing the index structure directly. These macros haven't been used consistently, and the original reason of maintaining source compatibility with PostgreSQL 9.2 is gone. Discussion: https://www.postgresql.org/message-id/flat/d419147c-09d4-6196-5d9d-0234b230880a%402ndquadrant.com
* Make collation-aware system catalog columns use "C" collation.Tom Lane2018-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now we allowed text columns in system catalogs to use collation "default", but that isn't really safe because it might mean something different in template0 than it means in a database cloned from template0. In particular, this could mean that cloned pg_statistic entries for such columns weren't entirely valid, possibly leading to bogus planner estimates, though (probably) not any outright failures. In the wake of commit 5e0928005, a better solution is available: if we label such columns with "C" collation, then their pg_statistic entries will also use that collation and hence will be valid independently of the database collation. This also provides a cleaner solution for indexes on such columns than the hack added by commit 0b28ea79c: the indexes will naturally inherit "C" collation and don't have to be forced to use text_pattern_ops. Also, with the planned improvement of type "name" to be collation-aware, this policy will apply cleanly to both text and name columns. Because of the pg_statistic angle, we should also apply this policy to the tables in information_schema. This patch does that by adjusting information_schema's textual domain types to specify "C" collation. That has the user-visible effect that order-sensitive comparisons to textual information_schema view columns will now use "C" collation by default. The SQL standard says that the collation of those view columns is implementation-defined, so I think this is legal per spec. At some point this might allow for translation of such comparisons into indexable conditions on the underlying "name" columns, although additional work will be needed before that can happen. Discussion: https://postgr.es/m/19346.1544895309@sss.pgh.pa.us
* Make pg_statistic and related code account more honestly for collations.Tom Lane2018-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we first put in collations support, we basically punted on teaching pg_statistic, ANALYZE, and the planner selectivity functions about that. They've just used DEFAULT_COLLATION_OID independently of the actual collation of the data. It's time to improve that, so: * Add columns to pg_statistic that record the specific collation associated with each statistics slot. * Teach ANALYZE to use the column's actual collation when comparing values for statistical purposes, and record this in the appropriate slot. (Note that type-specific typanalyze functions are now expected to fill stats->stacoll with the appropriate collation, too.) * Teach assorted selectivity functions to use the actual collation of the stats they are looking at, instead of just assuming it's DEFAULT_COLLATION_OID. This should give noticeably better results in selectivity estimates for columns with nondefault collations, at least for query clauses that use that same collation (which would be the default behavior in most cases). It's still true that comparisons with explicit COLLATE clauses different from the stored data's collation won't be well-estimated, but that's no worse than before. Also, this patch does make the first step towards doing better with that, which is that it's now theoretically possible to collect stats for a collation other than the column's own collation. Patch by me; thanks to Peter Eisentraut for review. Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us
* Drop no-op CoerceToDomain nodes from expressions at planning time.Tom Lane2018-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a domain has no constraints, then CoerceToDomain doesn't really do anything and can be simplified to a RelabelType. This not only eliminates cycles at execution, but allows the planner to optimize better (for instance, match the coerced expression to an index on the underlying column). However, we do have to support invalidating the plan later if a constraint gets added to the domain. That's comparable to the case of a change to a SQL function that had been inlined into a plan, so all the necessary logic already exists for plans depending on functions. We need only duplicate or share that logic for domains. ALTER DOMAIN ADD/DROP CONSTRAINT need to be taught to send out sinval messages for the domain's pg_type entry, since those operations don't update that row. (ALTER DOMAIN SET/DROP NOT NULL do update that row, so no code change is needed for them.) Testing this revealed what's really a pre-existing bug in plpgsql: it caches the SQL-expression-tree expansion of type coercions and had no provision for invalidating entries in that cache. Up to now that was only a problem if such an expression had inlined a SQL function that got changed, which is unlikely though not impossible. But failing to track changes of domain constraints breaks an existing regression test case and would likely cause practical problems too. We could fix that locally in plpgsql, but what seems like a better idea is to build some generic infrastructure in plancache.c to store standalone expressions and track invalidation events for them. (It's tempting to wonder whether plpgsql's "simple expression" stuff could use this code with lower overhead than its current use of the heavyweight plancache APIs. But I've left that idea for later.) Other stuff fixed in passing: * Allow estimate_expression_value() to drop CoerceToDomain unconditionally, effectively assuming that the coercion will succeed. This will improve planner selectivity estimates for cases involving estimatable expressions that are coerced to domains. We could have done this independently of everything else here, but there wasn't previously any need for eval_const_expressions_mutator to know about CoerceToDomain at all. * Use a dlist for plancache.c's list of cached plans, rather than a manually threaded singly-linked list. That eliminates a potential performance problem in DropCachedPlan. * Fix a couple of inconsistencies in typecmds.c about whether operations on domains drop RowExclusiveLock on pg_type. Our common practice is that DDL operations do drop catalog locks, so standardize on that choice. Discussion: https://postgr.es/m/19958.1544122124@sss.pgh.pa.us
* Remove WITH OIDS support, change oid catalog column visibility.Andres Freund2018-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
* Reduce unnecessary list construction in RelationBuildPartitionDesc.Robert Haas2018-11-19
| | | | | | | | | | | | | | | The 'partoids' list which was constructed by the previous version of this code was necessarily identical to 'inhoids'. There's no point to duplicating the list, so avoid that. Instead, construct the array representation directly from the original 'inhoids' list. Also, use an array rather than a list for 'boundspecs'. We know exactly how many items we need to store, so there's really no reason to use a list. Using an array instead reduces the number of memory allocations we perform. Patch by me, reviewed by Michael Paquier and Amit Langote, the latter of whom also helped with rebasing.
* PANIC on fsync() failure.Thomas Munro2018-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some operating systems, it doesn't make sense to retry fsync(), because dirty data cached by the kernel may have been dropped on write-back failure. In that case the only remaining copy of the data is in the WAL. A subsequent fsync() could appear to succeed, but not have flushed the data. That means that a future checkpoint could apparently complete successfully but have lost data. Therefore, violently prevent any future checkpoint attempts by panicking on the first fsync() failure. Note that we already did the same for WAL data; this change extends that behavior to non-temporary data files. Provide a GUC data_sync_retry to control this new behavior, for users of operating systems that don't eject dirty data, and possibly forensic/testing uses. If it is set to on and the write-back error was transient, a later checkpoint might genuinely succeed (on a system that does not throw away buffers on failure); if the error is permanent, later checkpoints will continue to fail. The GUC defaults to off, meaning that we panic. Back-patch to all supported releases. There is still a narrow window for error-loss on some operating systems: if the file is closed and later reopened and a write-back error occurs in the intervening time, but the inode has the bad luck to be evicted due to memory pressure before we reopen, we could miss the error. A later patch will address that with a scheme for keeping files with dirty data open at all times, but we judge that to be too complicated to back-patch. Author: Craig Ringer, with some adjustments by Thomas Munro Reported-by: Craig Ringer Reviewed-by: Robert Haas, Thomas Munro, Andres Freund Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
* Redesign initialization of partition routing structuresAlvaro Herrera2018-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well as the future MERGE) on partitioned tables. This changes the setup for tuple routing so that it does far less work during the initial setup and pushes more work out to when partitions receive tuples. PartitionDispatchData structs for sub-partitioned tables are only created when a tuple gets routed through it. The possibly large arrays in the PartitionTupleRouting struct have largely been removed. The partitions[] array remains but now never contains any NULL gaps. Previously the NULLs had to be skipped during ExecCleanupTupleRouting(), which could add a large overhead to the cleanup when the number of partitions was large. The partitions[] array is allocated small to start with and only enlarged when we route tuples to enough partitions that it runs out of space. This allows us to keep simple single-row partition INSERTs running quickly. Redesign The arrays in PartitionTupleRouting which stored the tuple translation maps have now been removed. These have been moved out into a PartitionRoutingInfo struct which is an additional field in ResultRelInfo. The find_all_inheritors() call still remains by far the slowest part of ExecSetupPartitionTupleRouting(). This commit just removes the other slow parts. In passing also rename the tuple translation maps from being ParentToChild and ChildToParent to being RootToPartition and PartitionToRoot. The old names mislead you into thinking that a partition of some sub-partitioned table would translate to the rowtype of the sub-partitioned table rather than the root partitioned table. Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera Testing help from Jesper Pedersen and Kato Sho. Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com
* Refactor code creating PartitionBoundInfoMichael Paquier2018-11-14
| | | | | | | | | | | | | The code building PartitionBoundInfo based on the constituent partition data read from catalogs has been located in partcache.c, with a specific set of routines dedicated to bound types, like sorting or bound data creation. All this logic is moved to partbounds.c and relocates all the bound-specific logistic into it, with partition_bounds_create() as principal entry point. Author: Amit Langote Reviewed-by: Michael Paquier, Álvaro Herrera Discussion: https://postgr.es/m/3f289da8-6d10-75fe-814a-635e8b191d43@lab.ntt.co.jp
* Disable recheck_on_update optimization to avoid crashes.Tom Lane2018-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code added by commit c203d6cf8 causes a crash in at least one case, where a potentially-optimizable expression index has a storage type different from the input data type. A cursory code review turned up numerous other problems that seem impractical to fix on short notice. Andres argued for revert of that patch some time ago, and if additional senior committers had been paying attention, that's likely what would have happened, but we were not :-( At this point we can't just revert, at least not in v11, because that would mean an ABI break for code touching relcache entries. And we should not remove the (also buggy) support for the recheck_on_update index reloption, since it might already be used in some databases in the field. So this patch just does the as-little-invasive-as-possible measure of disabling the feature as though recheck_on_update were forced off for all indexes. I also removed the related regression tests (which would otherwise fail) and the user-facing documentation of the reloption. We should undertake a more thorough code cleanup if the patch can't be fixed, but not under the extreme time pressure of being already overdue for 11.1 release. Per report from Ondřej Bouda and subsequent private discussion among pgsql-release. Discussion: https://postgr.es/m/20181106185255.776mstcyehnc63ty@alvherre.pgsql
* Fix spelling errors and typos in commentsMagnus Hagander2018-11-02
| | | | Author: Daniel Gustafsson <daniel@yesql.se>
* Remove get_attidentity()Peter Eisentraut2018-10-23
| | | | | | | | All existing uses can get this information more easily from the relation descriptor, so the detour through the syscache is not necessary. Reviewed-by: Michael Paquier <michael@paquier.xyz>
* Remove get_atttypmod()Peter Eisentraut2018-10-23
| | | | | | | This has been unused since 2004. get_atttypetypmodcoll() is often a better alternative. Reviewed-by: Michael Paquier <michael@paquier.xyz>
* Correct attach/detach logic for FKs in partitionsAlvaro Herrera2018-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was no code to handle foreign key constraints on partitioned tables in the case of ALTER TABLE DETACH; and if you happened to ATTACH a partition that already had an equivalent constraint, that one was ignored and a new constraint was created. Adding this to the fact that foreign key cloning reuses the constraint name on the partition instead of generating a new name (as it probably should, to cater to SQL standard rules about constraint naming within schemas), the result was a pretty poor user experience -- the most visible failure was that just detaching a partition and re-attaching it failed with an error such as ERROR: duplicate key value violates unique constraint "pg_constraint_conrelid_contypid_conname_index" DETAIL: Key (conrelid, contypid, conname)=(26702, 0, test_result_asset_id_fkey) already exists. because it would try to create an identically-named constraint in the partition. To make matters worse, if you tried to drop the constraint in the now-independent partition, that would fail because the constraint was still seen as dependent on the constraint in its former parent partitioned table: ERROR: cannot drop inherited constraint "test_result_asset_id_fkey" of relation "test_result_cbsystem_0001_0050_monthly_2018_09" This fix attacks the problem from two angles: first, when the partition is detached, the constraint is also marked as independent, so the drop now works. Second, when the partition is re-attached, we scan existing constraints searching for one matching the FK in the parent, and if one exists, we link that one to the parent constraint. So we don't end up with a duplicate -- and better yet, we don't need to scan the referenced table to verify that the constraint holds. To implement this I made a small change to previously planner-only struct ForeignKeyCacheInfo to contain the constraint OID; also relcache now maintains the list of FKs for partitioned tables too. Backpatch to 11. Reported-by: Michael Vitale (bug #15425) Discussion: https://postgr.es/m/15425-2dbc9d2aa999f816@postgresql.org
* Change rewriter/planner/executor/plancache to depend on RTE rellockmode.Tom Lane2018-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | Instead of recomputing the required lock levels in all these places, just use what commit fdba460a2 made the parser store in the RTE fields. This already simplifies the code measurably in these places, and follow-on changes will remove a bunch of no-longer-needed infrastructure. In a few cases, this change causes us to acquire a higher lock level than we did before. This is OK primarily because said higher lock level should've been acquired already at query parse time; thus, we're saving a useless extra trip through the shared lock manager to acquire a lesser lock alongside the original lock. The only known exception to this is that re-execution of a previously planned SELECT FOR UPDATE/SHARE query, for a table that uses ROW_MARK_REFERENCE or ROW_MARK_COPY methods, might have gotten only AccessShareLock before. Now it will get RowShareLock like the first execution did, which seems fine. While there's more to do, push it in this state anyway, to let the buildfarm help verify that nothing bad happened. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
* Create an RTE field to record the query's lock mode for each relation.Tom Lane2018-09-30
| | | | | | | | | | | | | | | | | | | | | | | | Add RangeTblEntry.rellockmode, which records the appropriate lock mode for each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock, or RowExclusiveLock depending on the RTE's role in the query). This patch creates the field and makes all creators of RTE nodes fill it in reasonably, but for the moment nothing much is done with it. The plan is to replace assorted post-parser logic that re-determines the right lockmode to use with simple uses of rte->rellockmode. For now, just add Asserts in each of those places that the rellockmode matches what they are computing today. (In some cases the match isn't perfect, so the Asserts are weaker than you might expect; but this seems OK, as per discussion.) This passes check-world for me, but it seems worth pushing in this state to see if the buildfarm finds any problems in cases I failed to test. catversion bump due to change of stored rules. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
* Fix assorted bugs in pg_get_partition_constraintdef().Tom Lane2018-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It failed if passed a nonexistent relation OID, or one that was a non-heap relation, because of blindly applying heap_open to a user-supplied OID. This is not OK behavior for a SQL-exposed function; we have a project policy that we should return NULL in such cases. Moreover, since pg_get_partition_constraintdef ought now to work on indexes, restricting it to heaps is flat wrong anyway. The underlying function generate_partition_qual() wasn't on board with indexes having partition quals either, nor for that matter with rels having relispartition set but yet null relpartbound. (One wonders whether the person who wrote the function comment blocks claiming that these functions allow a missing relpartbound had ever tested it.) Fix by testing relispartition before opening the rel, and by using relation_open not heap_open. (If any other relkinds ever grow the ability to have relispartition set, the code will work with them automatically.) Also, don't reject null relpartbound in generate_partition_qual. Back-patch to v11, and all but the null-relpartbound change to v10. (It's not really necessary to change generate_partition_qual at all in v10, but I thought s/heap_open/relation_open/ would be a good idea anyway just to keep the code in sync with later branches.) Per report from Justin Pryzby. Discussion: https://postgr.es/m/20180927200020.GJ776@telsasoft.com
* Add support for nearest-neighbor (KNN) searches to SP-GiSTAlexander Korotkov2018-09-19
| | | | | | | | | | | | | Currently, KNN searches were supported only by GiST. SP-GiST also capable to support them. This commit implements that support. SP-GiST scan stack is replaced with queue, which serves as stack if no ordering is specified. KNN support is provided for three SP-GIST opclasses: quad_point_ops, kd_point_ops and poly_ops (catversion is bumped). Some common parts between GiST and SP-GiST KNNs are extracted into separate functions. Discussion: https://postgr.es/m/570825e8-47d0-4732-2bf6-88d67d2d51c8%40postgrespro.ru Author: Nikita Glukhov, Alexander Korotkov based on GSoC work by Vlad Sterzhanov Review: Andrey Borodin, Alexander Korotkov
* Limit depth of forced recursion for CLOBBER_CACHE_RECURSIVELY.Tom Lane2018-09-07
| | | | | | | | | | | | It's somewhat surprising that we got away with this before. (Actually, since nobody tests this routinely AFAIK, it might've been broken for awhile. But it's definitely broken in the wake of commit f868a8143.) It seems sufficient to limit the forced recursion to a small number of levels. Back-patch to all supported branches, like the preceding patch. Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
* Simplify partitioned table creation vs. relcacheAlvaro Herrera2018-09-05
| | | | | | | | | | | | | | | | | | | In the original code, we were storing the pg_inherits row for a partitioned table too early: enough that we had a hack for relcache to avoid falling flat on its face while reading such a partial entry. If we finish the pg_class creation first and *then* store the pg_inherits entry, we don't need that hack. Also recognize that pg_class.relpartbound is not marked NOT NULL and therefore it's entirely possible to read null values, so having only Assert() protection isn't enough. Change those to if/elog tests instead. This qualifies as a robustness fix, so backpatch to pg11. In passing, remove one access that wasn't actually needed, and reword one message to be like all the others that check for the same thing. Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20180903213916.hh6wasnrdg6xv2ud@alvherre.pgsql
* Fully enforce uniqueness of constraint names.Tom Lane2018-09-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de