aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/cache/relcache.c
Commit message (Collapse)AuthorAge
...
* Update some obsolete comments.Tom Lane2017-03-26
| | | | | Fix a few stray references to expression eval functions that don't exist anymore or don't take the same input representation they used to.
* Implement multivariate n-distinct coefficientsAlvaro Herrera2017-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
* hash: Add write-ahead logging support.Robert Haas2017-03-14
| | | | | | | | | | | | | | | | | | The warning about hash indexes not being write-ahead logged and their use being discouraged has been removed. "snapshot too old" is now supported for tables with hash indexes. Most importantly, barring bugs, hash indexes will now be crash-safe and usable on standbys. This commit doesn't yet add WAL consistency checking for hash indexes, as we now have for other index types; a separate patch has been submitted to cure that lack. Amit Kapila, reviewed and slightly modified by me. The larger patch series of which this is a part has been reviewed and tested by Álvaro Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper Pedersen. Discussion: http://postgr.es/m/CAA4eK1JOBX=YU33631Qh-XivYXtPSALh514+jR8XeD7v+K3r_Q@mail.gmail.com
* Avoid returning stale attribute bitmaps in RelationGetIndexAttrBitmap().Tom Lane2017-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem with the original coding here is that we might receive (and clear) a relcache invalidation signal for the target relation down inside one of the index_open calls we're doing. Since the target is open, we would not drop the relcache entry, just reset its rd_indexvalid and rd_indexlist fields. But RelationGetIndexAttrBitmap() kept going, and would eventually cache and return potentially-obsolete attribute bitmaps. The case where this matters is where the inval signal was from a CREATE INDEX CONCURRENTLY telling us about a new index on a formerly-unindexed column. (In all other cases, the lock we hold on the target rel should prevent any concurrent change in index state.) Even just returning the stale attribute bitmap is not such a problem, because it shouldn't matter during the transaction in which we receive the signal. What hurts is caching the stale data, because it can survive into later transactions, breaking CREATE INDEX CONCURRENTLY's expectation that later transactions will not create new broken HOT chains. The upshot is that there's a window for building corrupted indexes during CREATE INDEX CONCURRENTLY. This patch fixes the problem by rechecking that the set of index OIDs is still the same at the end of RelationGetIndexAttrBitmap() as it was at the start. If not, we loop back and try again. That's a little more than is strictly necessary to fix the bug --- in principle, we could return the stale data but not cache it --- but it seems like a bad idea on general principles for relcache to return data it knows is stale. There might be more hazards of the same ilk, or there might be a better way to fix this one, but this patch definitely improves matters and seems unlikely to make anything worse. So let's push it into today's releases even as we continue to study the problem. Pavan Deolasee and myself Discussion: https://postgr.es/m/CABOikdM2MUq9cyZJi1KyLmmkCereyGp5JQ4fuwKoyKEde_mzkQ@mail.gmail.com
* Update comment in relcache.c.Tom Lane2017-02-06
| | | | | | Commit 665d1fad9 introduced rd_pkindex, and made RelationGetIndexList responsible for updating it, but didn't bother to fix RelationGetIndexList's header comment to say so.
* Fix typos in comments.Heikki Linnakangas2017-02-06
| | | | | | | | | Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
* Tweak catalog indexing abstraction for upcoming WARMAlvaro Herrera2017-01-31
| | | | | | | | | | | | | | | | | | | | | Split the existing CatalogUpdateIndexes into two different routines, CatalogTupleInsert and CatalogTupleUpdate, which do both the heap insert/update plus the index update. This removes over 300 lines of boilerplate code all over src/backend/catalog/ and src/backend/commands. The resulting code is much more pleasing to the eye. Also, by encapsulating what happens in detail during an UPDATE, this facilitates the upcoming WARM patch, which is going to add a few more lines to the update case making the boilerplate even more boring. The original CatalogUpdateIndexes is removed; there was only one use left, and since it's just three lines, we can as well expand it in place there. We could keep it, but WARM is going to break all the UPDATE out-of-core callsites anyway, so there seems to be no benefit in doing so. Author: Pavan Deolasee Discussion: https://www.postgr.es/m/CABOikdOcFYSZ4vA2gYfs=M2cdXzXX4qGHeEiW3fu9PCfkHLa2A@mail.gmail.com
* Logical replicationPeter Eisentraut2017-01-20
| | | | | | | | | | | | | - Add PUBLICATION catalogs and DDL - Add SUBSCRIPTION catalog and DDL - Define logical replication protocol and output plugin - Add logical replication workers From: Petr Jelinek <petr@2ndquadrant.com> Reviewed-by: Steve Singer <steve@ssinger.info> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Erik Rijkers <er@xs4all.nl> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Implement table partitioning.Robert Haas2016-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
* Add macros to make AllocSetContextCreate() calls simpler and safer.Tom Lane2016-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>
* Restore foreign-key-aware estimation of join relation sizes.Tom Lane2016-06-18
| | | | | | | | | | | | | | | | | | | | This patch provides a new implementation of the logic added by commit 137805f89 and later removed by 77ba61080. It differs from the original primarily in expending much less effort per joinrel in large queries, which it accomplishes by doing most of the matching work once per query not once per joinrel. Hopefully, it's also less buggy and better commented. The never-documented enable_fkey_estimates GUC remains gone. There remains work to be done to make the selectivity estimates account for nulls in FK referencing columns; but that was true of the original patch as well. We may be able to address this point later in beta. In the meantime, any error should be in the direction of overestimating rather than underestimating joinrel sizes, which seems like the direction we want to err in. Tomas Vondra and Tom Lane Discussion: <31041.1465069446@sss.pgh.pa.us>
* Refactor to reduce code duplication for function property checking.Tom Lane2016-06-10
| | | | | | | | | | | | | | | | | | | | | | | As noted by Andres Freund, we'd accumulated quite a few similar functions in clauses.c that examine all functions in an expression tree to see if they satisfy some boolean test. Reduce the duplication by inventing a function check_functions_in_node() that applies a simple callback function to each SQL function OID appearing in a given expression node. This also fixes some arguable oversights; for example, contain_mutable_functions() did not check aggregate or window functions for mutability. I doubt that that represents a live bug at the moment, because we don't really consider mutability for aggregates; but it might someday be one. I chose to put check_functions_in_node() in nodeFuncs.c because it seemed like other modules might wish to use it in future. That in turn forced moving set_opfuncid() et al into nodeFuncs.c, as the alternative was for nodeFuncs.c to depend on optimizer/setrefs.c which didn't seem very clean. In passing, teach contain_leaked_vars_walker() about a few more expression node types it can safely look through, and improve the rather messy and undercommented code in has_parallel_hazard_walker(). Discussion: <20160527185853.ziol2os2zskahl7v@alap3.anarazel.de>
* Improve the situation for parallel query versus temp relations.Tom Lane2016-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Transmit the leader's temp-namespace state to workers. This is important because without it, the workers do not really have the same search path as the leader. For example, there is no good reason (and no extant code either) to prevent a worker from executing a temp function that the leader created previously; but as things stood it would fail to find the temp function, and then either fail or execute the wrong function entirely. We still prohibit a worker from creating a temp namespace on its own. In effect, a worker can only see the session's temp namespace if the leader had created it before starting the worker, which seems like the right semantics. Also, transmit the leader's BackendId to workers, and arrange for workers to use that when determining the physical file path of a temp relation belonging to their session. While the original intent was to prevent such accesses entirely, there were a number of holes in that, notably in places like dbsize.c which assume they can safely access temp rels of other sessions anyway. We might as well get this right, as a small down payment on someday allowing workers to access the leader's temp tables. (With this change, directly using "MyBackendId" as a relation or buffer backend ID is deprecated; you should use BackendIdForTempRelations() instead. I left a couple of such uses alone though, as they're not going to be reachable in parallel workers until we do something about localbuf.c.) Move the thou-shalt-not-access-thy-leader's-temp-tables prohibition down into localbuf.c, which is where it actually matters, instead of having it in relation_open(). This amounts to recognizing that access to temp tables' catalog entries is perfectly safe in a worker, it's only the data in local buffers that is problematic. Having done all that, we can get rid of the test in has_parallel_hazard() that says that use of a temp table's rowtype is unsafe in parallel workers. That test was unduly expensive, and if we really did need such a prohibition, that was not even close to being a bulletproof guard for it. (For example, any user-defined function executed in a parallel worker might have attempted such access.)
* pgindent run for 9.6Robert Haas2016-06-09
|
* Revert "Use Foreign Key relationships to infer multi-column join selectivity".Tom Lane2016-06-07
| | | | | | | | | | | | | | This commit reverts 137805f89 as well as the associated commits 015e88942, 5306df283, and 68d704edb. We found multiple bugs in this feature, and there was concern about possible planner slowdown (though to be fair, exhibiting a very large slowdown proved difficult). The way forward requires a considerable rewrite, which may or may not be possible to accomplish in time for beta2. In my judgment reviewing the rewrite will be easier to accomplish starting from a clean slate, so let's temporarily revert what's there now. This also leaves us in a safe state if it turns out to be necessary to postpone the rewrite to the next development cycle. Discussion: <20160429102531.GA13701@huehner.biz>
* Fix hash index vs "snapshot too old" problemmsKevin Grittner2016-05-06
| | | | | | | | | | | | | | | | Hash indexes are not WAL-logged, and so do not maintain the LSN of index pages. Since the "snapshot too old" feature counts on detecting error conditions using the LSN of a table and all indexes on it, this makes it impossible to safely do early vacuuming on any table with a hash index, so add this to the tests for whether the xid used to vacuum a table can be adjusted based on old_snapshot_threshold. While at it, add a paragraph to the docs for old_snapshot_threshold which specifically mentions this and other aspects of the feature which may otherwise surprise users. Problem reported and patch reviewed by Amit Kapila
* Revert CREATE INDEX ... INCLUDING ...Teodor Sigaev2016-04-08
| | | | | | It's not ready yet, revert two commits 690c543550b0d2852060c18d270cdb534d339d9a - unstable test output 386e3d7609c49505e079c40c65919d99feb82505 - patch itself
* CREATE INDEX ... INCLUDING (column[, ...])Teodor Sigaev2016-04-08
| | | | | | | | | | Now indexes (but only B-tree for now) can contain "extra" column(s) which doesn't participate in index structure, they are just stored in leaf tuples. It allows to use index only scan by using single index instead of two or more indexes. Author: Anastasia Lubennikova with minor editorializing by me Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
* Load FK defs into relcache for use by plannerSimon Riggs2016-04-07
| | | | | | | Fastpath ignores this if no triggers defined. Author: Tomas Vondra, with fastpath and comments added by me Reviewers: David Rowley, Simon Riggs
* Restructure index access method API to hide most of it at the C level.Tom Lane2016-01-17
| | | | | | | | | | | | | | | | | | | | | | | | This patch reduces pg_am to just two columns, a name and a handler function. All the data formerly obtained from pg_am is now provided in a C struct returned by the handler function. This is similar to the designs we've adopted for FDWs and tablesample methods. There are multiple advantages. For one, the index AM's support functions are now simple C functions, making them faster to call and much less error-prone, since the C compiler can now check function signatures. For another, this will make it far more practical to define index access methods in installable extensions. A disadvantage is that SQL-level code can no longer see attributes of index AMs; in particular, some of the crosschecks in the opr_sanity regression test are no longer possible from SQL. We've addressed that by adding a facility for the index AM to perform such checks instead. (Much more could be done in that line, but for now we're content if the amvalidate functions more or less replace what opr_sanity used to do.) We might also want to expose some sort of reporting functionality, but this patch doesn't do that. Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily editorialized on by me.
* Make pg_shseclabel available in early backend startupAlvaro Herrera2016-01-05
| | | | | | | | | | | | While the in-core authentication mechanism doesn't need to access pg_shseclabel at all, it's reasonable to think that an authentication hook will want to look at the label for the role logging in, or for rows in other catalogs used during the authentication phase of startup. Catalog version bumped, because this changes the "is nailed" status for pg_shseclabel. Author: Adam Brightwell
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Be more noisy about "wrong number of nailed relations" initfile problems.Tom Lane2015-11-11
| | | | | | | | | | | | | | | | | | | | | In commit 5d1ff6bd559ea8df1b7302e245e690b01b9a4fa4 I added some logic to relcache.c to try to ensure that the regression tests would fail if we made a mistake about which relations belong in the relcache init files. I'm quite sure I tested that, but I must have done so only for the non-shared-catalog case, because a report from Adam Brightwell showed that the regression tests still pass just fine if we bollix the shared-catalog init file in the way this code was supposed to catch. The reason is that that file gets loaded before we do client authentication, so the WARNING is not sent to the client, only to the postmaster log, where it's far too easily missed. The least Rube Goldbergian answer to this is to put an Assert(false) after the elog(WARNING). That will certainly get developers' attention, while not breaking production builds' ability to recover from corner cases with similar symptoms. Since this is only of interest to developers, there seems no need for a back-patch, even though the previous commit went into all branches.
* RLS refactoringStephen Frost2015-09-15
| | | | | | | | | | | | | | | | This refactors rewrite/rowsecurity.c to simplify the handling of the default deny case (reducing the number of places where we check for and add the default deny policy from three to one) by splitting up the retrival of the policies from the application of them. This also allowed us to do away with the policy_id field. A policy_name field was added for WithCheckOption policies and is used in error reporting, when available. Patch by Dean Rasheed, with various mostly cosmetic changes by me. Back-patch to 9.5 where RLS was introduced to avoid unnecessary differences, since we're still in alpha, per discussion with Robert.
* Fix subtransaction cleanup after an outer-subtransaction portal fails.Tom Lane2015-09-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Formerly, we treated only portals created in the current subtransaction as having failed during subtransaction abort. However, if the error occurred while running a portal created in an outer subtransaction (ie, a cursor declared before the last savepoint), that has to be considered broken too. To allow reliable detection of which ones those are, add a bookkeeping field to struct Portal that tracks the innermost subtransaction in which each portal has actually been executed. (Without this, we'd end up failing portals containing functions that had called the subtransaction, thereby breaking plpgsql exception blocks completely.) In addition, when we fail an outer-subtransaction Portal, transfer its resources into the subtransaction's resource owner, so that they're released early in cleanup of the subxact. This fixes a problem reported by Jim Nasby in which a function executed in an outer-subtransaction cursor could cause an Assert failure or crash by referencing a relation created within the inner subtransaction. The proximate cause of the Assert failure is that AtEOSubXact_RelationCache assumed it could blow away a relcache entry without first checking that the entry had zero refcount. That was a bad idea on its own terms, so add such a check there, and to the similar coding in AtEOXact_RelationCache. This provides an independent safety measure in case there are still ways to provoke the situation despite the Portal-level changes. This has been broken since subtransactions were invented, so back-patch to all supported branches. Tom Lane and Michael Paquier
* Fix the logic for putting relations into the relcache init file.Tom Lane2015-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit f3b5565dd4e59576be4c772da364704863e6a835 was a couple of bricks shy of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index into the relcache init file, because that index is not used by any syscache. However, we have historically nailed that index into cache for performance reasons. The upshot was that load_relcache_init_file always decided that the init file was busted and silently ignored it, resulting in a significant hit to backend startup speed. To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around RelationSupportsSysCache(), which can know about additional relations that should be in the init file despite being unknown to syscache.c. Also install some guards against future mistakes of this type: make write_relcache_init_file Assert that all nailed relations get written to the init file, and make load_relcache_init_file emit a WARNING if it takes the "wrong number of nailed relations" exit path. Now that we remove the init files during postmaster startup, that case should never occur in the field, even if we are starting a minor-version update that added or removed rels from the nailed set. So the warning shouldn't ever be seen by end users, but it will show up in the regression tests if somebody breaks this logic. Back-patch to all supported branches, like the previous commit.
* Use a safer method for determining whether relcache init file is stale.Tom Lane2015-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Fix typo in relcache's equalPolicy()Stephen Frost2015-04-17
| | | | | | | | | | | | | The USING policies were not being checked for differences as the same policy was being passed in to both sides of the equal(). This could result in backends not realizing that a policy had been changed, if none of the other attributes had been changed. Fix by passing to equal() the policy1 and policy2 using quals for comparison. No need to back-patch as this is not yet released. Noticed while testing changes to RLS proposed by Dean Rasheed.
* Apply table and domain CHECK constraints in name order.Tom Lane2015-03-23
| | | | | | | | | | | | | | | | | | | | | | | | Previously, CHECK constraints of the same scope were checked in whatever order they happened to be read from pg_constraint. (Usually, but not reliably, this would be creation order for domain constraints and reverse creation order for table constraints, because of differing implementation details.) Nondeterministic results of this sort are problematic at least for testing purposes, and in discussion it was agreed to be a violation of the principle of least astonishment. Therefore, borrow the principle already established for triggers, and apply such checks in name order (using strcmp() sort rules). This lets users control the check order if they have a mind to. Domain CHECK constraints still follow the rule of checking lower nested domains' constraints first; the name sort only applies to multiple constraints attached to the same domain. In passing, I failed to resist the temptation to wordsmith a bit in create_domain.sgml. Apply to HEAD only, since this could result in a behavioral change in existing applications, and the potential regression test failures have not actually been observed in our buildfarm.
* Clean up some mess in row-security patches.Tom Lane2015-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | Fix unsafe coding around PG_TRY in RelationBuildRowSecurity: can't change a variable inside PG_TRY and then use it in PG_CATCH without marking it "volatile". In this case though it seems saner to avoid that by doing a single assignment before entering the TRY block. I started out just intending to fix that, but the more I looked at the row-security code the more distressed I got. This patch also fixes incorrect construction of the RowSecurityPolicy cache entries (there was not sufficient care taken to copy pass-by-ref data into the cache memory context) and a whole bunch of sloppiness around the definition and use of pg_policy.polcmd. You can't use nulls in that column because initdb will mark it NOT NULL --- and I see no particular reason why a null entry would be a good idea anyway, so changing initdb's behavior is not the right answer. The internal value of '\0' wouldn't be suitable in a "char" column either, so after a bit of thought I settled on using '*' to represent ALL. Chasing those changes down also revealed that somebody wasn't paying attention to what the underlying values of ACL_UPDATE_CHR etc really were, and there was a great deal of lackadaiscalness in the catalogs.sgml documentation for pg_policy and pg_policies too. This doesn't pretend to be a complete code review for the row-security stuff, it just fixes the things that were in my face while dealing with the bugs in RelationBuildRowSecurity.
* Correctly handle relcache invalidation corner case during logical decoding.Andres Freund2015-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using a historic snapshot for logical decoding it can validly happen that a relation that's in the relcache isn't visible to that historic snapshot. E.g. if a newly created relation is referenced in the query that uses the SQL interface for logical decoding and a sinval reset occurs. The earlier commit that fixed the error handling for that corner case already improves the situation as a ERROR is better than hitting an assertion... But it's obviously not good enough. So additionally allow that case without an error if a historic snapshot is set up - that won't allow an invalid entry to stay in the cache because it's a) already marked invalid and will thus be rebuilt during the next access b) the syscaches will be reset at the end of decoding. There might be prettier solutions to handle this case, but all that we could think of so far end up being much more complex than this quite simple fix. This fixes the assertion failures reported by the buildfarm (markhor, tick, leech) after the introduction of new regression tests in 89fd41b390a4. The failure there weren't actually directly caused by CLOBBER_CACHE_ALWAYS but the extraordinary long runtimes due to it lead to sinval resets triggering the behaviour. Discussion: 22459.1418656530@sss.pgh.pa.us Backpatch to 9.4 where logical decoding was introduced.
* Improve relcache invalidation handling of currently invisible relations.Andres Freund2015-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | The corner case where a relcache invalidation tried to rebuild the entry for a referenced relation but couldn't find it in the catalog wasn't correct. The code tried to RelationCacheDelete/RelationDestroyRelation the entry. That didn't work when assertions are enabled because the latter contains an assertion ensuring the refcount is zero. It's also more generally a bad idea, because by virtue of being referenced somebody might actually look at the entry, which is possible if the error is trapped and handled via a subtransaction abort. Instead just error out, without deleting the entry. As the entry is marked invalid, the worst that can happen is that the invalid (and at some point unused) entry lingers in the relcache. Discussion: 22459.1418656530@sss.pgh.pa.us There should be no way to hit this case < 9.4 where logical decoding introduced a bug that can hit this. But since the code for handling the corner case is there it should do something halfway sane, so backpatch all the the way back. The logical decoding bug will be handled in a separate commit.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Improve hash_create's API for selecting simple-binary-key hash functions.Tom Lane2014-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if you wanted anything besides C-string hash keys, you had to specify a custom hashing function to hash_create(). Nearly all such callers were specifying tag_hash or oid_hash; which is tedious, and rather error-prone, since a caller could easily miss the opportunity to optimize by using hash_uint32 when appropriate. Replace this with a design whereby callers using simple binary-data keys just specify HASH_BLOBS and don't need to mess with specific support functions. hash_create() itself will take care of optimizing when the key size is four bytes. This nets out saving a few hundred bytes of code space, and offers a measurable performance improvement in tidbitmap.c (which was not exploiting the opportunity to use hash_uint32 for its 4-byte keys). There might be some wins elsewhere too, I didn't analyze closely. In future we could look into offering a similar optimized hashing function for 8-byte keys. Under this design that could be done in a centralized and machine-independent fashion, whereas getting it right for keys of platform-dependent sizes would've been notationally painful before. For the moment, the old way still works fine, so as not to break source code compatibility for loadable modules. Eventually we might want to remove tag_hash and friends from the exported API altogether, since there's no real need for them to be explicitly referenced from outside dynahash.c. Teodor Sigaev and Tom Lane
* Rename pg_rowsecurity -> pg_policy and other fixesStephen Frost2014-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | As pointed out by Robert, we should really have named pg_rowsecurity pg_policy, as the objects stored in that catalog are policies. This patch fixes that and updates the column names to start with 'pol' to match the new catalog name. The security consideration for COPY with row level security, also pointed out by Robert, has also been addressed by remembering and re-checking the OID of the relation initially referenced during COPY processing, to make sure it hasn't changed under us by the time we finish planning out the query which has been built. Robert and Alvaro also commented on missing OCLASS and OBJECT entries for POLICY (formerly ROWSECURITY or POLICY, depending) in various places. This patch fixes that too, which also happens to add the ability to COMMENT on policies. In passing, attempt to improve the consistency of messages, comments, and documentation as well. This removes various incarnations of 'row-security', 'row-level security', 'Row-security', etc, in favor of 'policy', 'row level security' or 'row_security' as appropriate. Happy Thanksgiving!
* Fix relpersistence setting in reindex_indexAlvaro Herrera2014-11-17
| | | | | | | | | | | | | | Buildfarm members with CLOBBER_CACHE_ALWAYS advised us that commit 85b506bbfc2937 was mistaken in setting the relpersistence value of the index directly in the relcache entry, within reindex_index. The reason for the failure is that an invalidation message that comes after mucking with the relcache entry directly, but before writing it to the catalogs, would cause the entry to become rebuilt in place from catalogs with the old contents, losing the update. Fix by passing the correct persistence value to RelationSetNewRelfilenode instead; this routine also writes the updated tuple to pg_class, avoiding the problem. Suggested by Tom Lane.
* Get rid of SET LOGGED indexes persistence kludgeAlvaro Herrera2014-11-15
| | | | | | | | | | | | This removes ATChangeIndexesPersistence() introduced by f41872d0c1239d36 which was too ugly to live for long. Instead, the correct persistence marking is passed all the way down to reindex_index, so that the transient relation built to contain the index relfilenode can get marked correctly right from the start. Author: Fabrízio de Royes Mello Review and editorialization by Michael Paquier and Álvaro Herrera
* Clean up includes from RLS patchStephen Frost2014-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | The initial patch for RLS mistakenly included headers associated with the executor and planner bits in rewrite/rowsecurity.h. Per policy and general good sense, executor headers should not be included in planner headers or vice versa. The include of execnodes.h was a mistaken holdover from previous versions, while the include of relation.h was used for Relation's definition, which should have been coming from utils/relcache.h. This patch cleans these issues up, adds comments to the RowSecurityPolicy struct and the RowSecurityConfigType enum, and changes Relation->rsdesc to Relation->rd_rsdesc to follow Relation field naming convention. Additionally, utils/rel.h was including rewrite/rowsecurity.h, which wasn't a great idea since that was pulling in things not really needed in utils/rel.h (which gets included in quite a few places). Instead, use 'struct RowSecurityDesc' for the rd_rsdesc field and add comments explaining why. Lastly, add an include into access/nbtree/nbtsort.c for utils/sortsupport.h, which was evidently missed due to the above mess. Pointed out by Tom in 16970.1415838651@sss.pgh.pa.us; note that the concerns regarding a similar situation in the custom-path commit still need to be addressed.
* Move the backup-block logic from XLogInsert to a new file, xloginsert.c.Heikki Linnakangas2014-11-06
| | | | | | | | | | | | xlog.c is huge, this makes it a little bit smaller, which is nice. Functions related to putting together the WAL record are in xloginsert.c, and the lower level stuff for managing WAL buffers and such are in xlog.c. Also move the definition of XLogRecord to a separate header file. This causes churn in the #includes of all the files that write WAL records, and redo routines, but it avoids pulling in xlog.h into most places. Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
* Fix relcache for policies, and doc updatesStephen Frost2014-09-26
| | | | | | | | | | | | | | | | | | Andres pointed out that there was an extra ';' in equalPolicies, which made me realize that my prior testing with CLOBBER_CACHE_ALWAYS was insufficient (it didn't always catch the issue, just most of the time). Thanks to that, a different issue was discovered, specifically in equalRSDescs. This change corrects eqaulRSDescs to return 'true' once all policies have been confirmed logically identical. After stepping through both functions to ensure correct behavior, I ran this for about 12 hours of CLOBBER_CACHE_ALWAYS runs of the regression tests with no failures. In addition, correct a few typos in the documentation which were pointed out by Thom Brown (thanks!) and improve the policy documentation further by adding a flushed out usage example based on a unix passwd file. Lastly, clean up a few comments in the regression tests and pg_dump.h.
* Fix whitespacePeter Eisentraut2014-09-26
|
* Code review for row security.Stephen Frost2014-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Buildfarm member tick identified an issue where the policies in the relcache for a relation were were being replaced underneath a running query, leading to segfaults while processing the policies to be added to a query. Similar to how TupleDesc RuleLocks are handled, add in a equalRSDesc() function to check if the policies have actually changed and, if not, swap back the rsdesc field (using the original instead of the temporairly built one; the whole structure is swapped and then specific fields swapped back). This now passes a CLOBBER_CACHE_ALWAYS for me and should resolve the buildfarm error. In addition to addressing this, add a new chapter in Data Definition under Privileges which explains row security and provides examples of its usage, change \d to always list policies (even if row security is disabled- but note that it is disabled, or enabled with no policies), rework check_role_for_policy (it really didn't need the entire policy, but it did need to be using has_privs_of_role()), and change the field in pg_class to relrowsecurity from relhasrowsecurity, based on Heikki's suggestion. Also from Heikki, only issue SET ROW_SECURITY in pg_restore when talking to a 9.5+ server, list Bypass RLS in \du, and document --enable-row-security options for pg_dump and pg_restore. Lastly, fix a number of minor whitespace and typo issues from Heikki, Dimitri, add a missing #include, per Peter E, fix a few minor variable-assigned-but-not-used and resource leak issues from Coverity and add tab completion for role attribute bypassrls as well.
* Row-Level Security Policies (RLS)Stephen Frost2014-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Building on the updatable security-barrier views work, add the ability to define policies on tables to limit the set of rows which are returned from a query and which are allowed to be added to a table. Expressions defined by the policy for filtering are added to the security barrier quals of the query, while expressions defined to check records being added to a table are added to the with-check options of the query. New top-level commands are CREATE/ALTER/DROP POLICY and are controlled by the table owner. Row Security is able to be enabled and disabled by the owner on a per-table basis using ALTER TABLE .. ENABLE/DISABLE ROW SECURITY. Per discussion, ROW SECURITY is disabled on tables by default and must be enabled for policies on the table to be used. If no policies exist on a table with ROW SECURITY enabled, a default-deny policy is used and no records will be visible. By default, row security is applied at all times except for the table owner and the superuser. A new GUC, row_security, is added which can be set to ON, OFF, or FORCE. When set to FORCE, row security will be applied even for the table owner and superusers. When set to OFF, row security will be disabled when allowed and an error will be thrown if the user does not have rights to bypass row security. Per discussion, pg_dump sets row_security = OFF by default to ensure that exports and backups will have all data in the table or will error if there are insufficient privileges to bypass row security. A new option has been added to pg_dump, --enable-row-security, to ask pg_dump to export with row security enabled. A new role capability, BYPASSRLS, which can only be set by the superuser, is added to allow other users to be able to bypass row security using row_security = OFF. Many thanks to the various individuals who have helped with the design, particularly Robert Haas for his feedback. Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean Rasheed, with additional changes and rework by me. Reviewers have included all of the above, Greg Smith, Jeff McCormick, and Robert Haas.
* rename macro isTempOrToastNamespace to isTempOrTempToastNamespaceBruce Momjian2014-08-25
| | | | Done for clarity
* Fix another ancient memory-leak bug in relcache.c.Tom Lane2014-08-24
| | | | | | | | | | | | | | | | | | | | | | | | CheckConstraintFetch() leaked a cstring in the caller's context for each CHECK constraint expression it copied into the relcache. Ordinarily that isn't problematic, but it can be during CLOBBER_CACHE testing because so many reloads can happen during a single query; so complicate the code slightly to allow freeing the cstring after use. Per testing on buildfarm member barnacle. This is exactly like the leak fixed in AttrDefaultFetch() by commit 078b2ed291c758e7125d72c3a235f128d40a232b. (Yes, this time I did look for other instances of the same coding pattern :-(.) Like that patch, no back-patch, since it seems unlikely that there's any problem except under very artificial test conditions. BTW, it strikes me that both of these places would require further work comparable to commit ab8c84db2f7af008151b848cf1d6a4672a39eecd, if we ever supported defaults or check constraints on system catalogs: they both assume they are copying into an empty relcache data structure, and that conceivably wouldn't be the case during recursive reloading of a system catalog. This does not seem worth worrying about for the moment, since there is no near-term prospect of supporting any such thing. So I'll just note the possibility for the archives' sake.
* Prevent memory leaks in RelationGetIndexList, RelationGetIndexAttrBitmap.Tom Lane2014-08-13
| | | | | | | | | | | | | | | | | When replacing rd_indexlist, rd_indexattr, etc, we neglected to pfree any old value of these fields. Under ordinary circumstances, the old value would always be NULL, so this seemed reasonable enough. However, in cases where we're rebuilding a system catalog's relcache entry and another cache flush occurs on that same catalog meanwhile, it's possible for the field to not be NULL when we return to the outer level, because we already refilled it while recovering from the inner flush. This leads to a fairly small session-lifespan leak in CacheMemoryContext. In real-world usage the leak would be too small to notice; but in testing with CLOBBER_CACHE_RECURSIVELY the leakage can add up to the point of causing OOM failures, as reported by Tomas Vondra. The issue has been there a long time, but it only seems worth fixing in HEAD, like the previous fix in this area (commit 078b2ed291c758e7).
* Ooops, I broke initdb with that last patch.Tom Lane2014-05-18
| | | | | | That's what I get for not fully retesting the final version of the patch. The replace_allowed cross-check needs an additional special case for bootstrapping.
* Fix two ancient memory-leak bugs in relcache.c.Tom Lane2014-05-18
| | | | | | | | | | | | | | | | | | | | | RelationCacheInsert() ignored the possibility that hash_search(HASH_ENTER) might find a hashtable entry already present for the same OID. However, that can in fact occur during recursive relcache load scenarios. When it did happen, we overwrote the pointer to the pre-existing Relation, causing a session-lifespan leakage of that entire structure. As far as is known, the pre-existing Relation would always have reference count zero by the time we arrive back at the outer insertion, so add code that deletes the pre-existing Relation if so. If by some chance its refcount is positive, elog a WARNING and allow the pre-existing Relation to be leaked as before. Also, AttrDefaultFetch() was sloppy about leaking the cstring form of the pg_attrdef.adbin value it's copying into the relcache structure. This is only a query-lifespan leakage, and normally not very significant, but it adds up during CLOBBER_CACHE testing. These bugs are of very ancient vintage, but I'll refrain from back-patching since there's no evidence that these leaks amount to anything in ordinary usage.