aboutsummaryrefslogtreecommitdiff
path: root/doc/src
Commit message (Collapse)AuthorAge
...
* Transaction chainingPeter Eisentraut2019-03-24
| | | | | | | | | | | | | Add command variants COMMIT AND CHAIN and ROLLBACK AND CHAIN, which start new transactions with the same transaction characteristics as the just finished one, per SQL standard. Support for transaction chaining in PL/pgSQL is also added. This functionality is especially useful when running COMMIT in a loop in PL/pgSQL. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr> Discussion: https://www.postgresql.org/message-id/flat/28536681-324b-10dc-ade8-ab46f7645a5a@2ndquadrant.com
* Accept XML documents when xmloption = content, as required by SQL:2006+.Tom Lane2019-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were using the SQL:2003 definition, which doesn't allow this, but that creates a serious dump/restore gotcha: there is no setting of xmloption that will allow all valid XML data. Hence, switch to the 2006 definition. Since libxml doesn't accept <!DOCTYPE> directives in the mode we use for CONTENT parsing, the implementation is to detect <!DOCTYPE> in the input and switch to DOCUMENT parsing mode. This should not cost much, because <!DOCTYPE> should be close to the front of the input if it's there at all. It's possible that this causes the error messages for malformed input to be slightly different than they were before, if said input includes <!DOCTYPE>; but that does not seem like a big problem. In passing, buy back a few cycles in parsing of large XML documents by not doing strlen() of the whole input in parse_xml_decl(). Back-patch because dump/restore failures are not nice. This change shouldn't break any cases that worked before, so it seems safe to back-patch. Chapman Flack (revised a bit by me) Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
* Add option -N/--no-sync to pg_checksumsMichael Paquier2019-03-23
| | | | | | | | | | This is an option consistent with what pg_dump, pg_rewind and pg_basebackup provide which is useful for leveraging the I/O effort when testing things, not to be used in a production environment. Author: Michael Paquier Reviewed-by: Michael Banck, Fabien Coelho, Sergei Kornilov Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
* Add options to enable and disable checksums in pg_checksumsMichael Paquier2019-03-23
| | | | | | | | | | | | | | | | | | | | | An offline cluster can now work with more modes in pg_checksums: - --enable enables checksums in a cluster, updating all blocks with a correct checksum, and updating the control file at the end. - --disable disables checksums in a cluster, updating only the control file. - --check is an extra option able to verify checksums for a cluster, and the default used if no mode is specified. When running --enable or --disable, the data folder gets fsync'd for durability, and then it is followed by a control file update and flush to keep the operation consistent should the tool be interrupted, killed or the host unplugged. If no mode is specified in the options, then --check is used for compatibility with older versions of pg_checksums (named pg_verify_checksums in v11 where it was introduced). Author: Michael Banck, Michael Paquier Reviewed-by: Fabien Coelho, Magnus Hagander, Sergei Kornilov Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
* Collations with nondeterministic comparisonPeter Eisentraut2019-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
* Add DNS SRV support for LDAP server discovery.Thomas Munro2019-03-21
| | | | | | | | | | | | | | | | | LDAP servers can be advertised on a network with RFC 2782 DNS SRV records. The OpenLDAP command-line tools automatically try to find servers that way, if no server name is provided by the user. Teach PostgreSQL to do the same using OpenLDAP's support functions, when building with OpenLDAP. For now, we assume that HAVE_LDAP_INITIALIZE (an OpenLDAP extension available since OpenLDAP 2.0 and also present in Apple LDAP) implies that you also have ldap_domain2hostlist() (which arrived in the same OpenLDAP version and is also present in Apple LDAP). Author: Thomas Munro Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/CAEepm=2hAnSfhdsd6vXsM6VZVN0br-FbAZ-O+Swk18S5HkCP=A@mail.gmail.com
* Allow amcheck to re-find tuples using new search.Peter Geoghegan2019-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach contrib/amcheck's bt_index_parent_check() function to take advantage of the uniqueness property of heapkeyspace indexes in support of a new verification option: non-pivot tuples (non-highkey tuples on the leaf level) can optionally be re-found using a new search for each, that starts from the root page. If a tuple cannot be re-found, report that the index is corrupt. The new "rootdescend" verification option is exhaustive, and can therefore make a call to bt_index_parent_check() take a lot longer. Re-finding tuples during verification is mostly intended as an option for backend developers, since the corruption scenarios that it alone is uniquely capable of detecting seem fairly far-fetched. For example, "rootdescend" verification is much more likely to detect corruption of the least significant byte of a key from a pivot tuple in the root page of a B-Tree that already has at least three levels. Typically, only a few tuples on a cousin leaf page are at risk of "getting overlooked" by index scans in this scenario. The corrupt key in the root page is only slightly corrupt: corrupt enough to give wrong answers to some queries, and yet not corrupt enough to allow the problem to be detected without verifying agreement between the leaf page and the root page, skipping at least one internal page level. The existing bt_index_parent_check() checks never cross more than a single level. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/CAH2-Wz=yTWnVu+HeHGKb2AGiADL9eprn-cKYAto4MkKOuiGtRQ@mail.gmail.com
* Make heap TID a tiebreaker nbtree index column.Peter Geoghegan2019-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make nbtree treat all index tuples as having a heap TID attribute. Index searches can distinguish duplicates by heap TID, since heap TID is always guaranteed to be unique. This general approach has numerous benefits for performance, and is prerequisite to teaching VACUUM to perform "retail index tuple deletion". Naively adding a new attribute to every pivot tuple has unacceptable overhead (it bloats internal pages), so suffix truncation of pivot tuples is added. This will usually truncate away the "extra" heap TID attribute from pivot tuples during a leaf page split, and may also truncate away additional user attributes. This can increase fan-out, especially in a multi-column index. Truncation can only occur at the attribute granularity, which isn't particularly effective, but works well enough for now. A future patch may add support for truncating "within" text attributes by generating truncated key values using new opclass infrastructure. Only new indexes (BTREE_VERSION 4 indexes) will have insertions that treat heap TID as a tiebreaker attribute, or will have pivot tuples undergo suffix truncation during a leaf page split (on-disk compatibility with versions 2 and 3 is preserved). Upgrades to version 4 cannot be performed on-the-fly, unlike upgrades from version 2 to version 3. contrib/amcheck continues to work with version 2 and 3 indexes, while also enforcing stricter invariants when verifying version 4 indexes. These stricter invariants are the same invariants described by "3.1.12 Sequencing" from the Lehman and Yao paper. A later patch will enhance the logic used by nbtree to pick a split point. This patch is likely to negatively impact performance without smarter choices around the precise point to split leaf pages at. Making these two mostly-distinct sets of enhancements into distinct commits seems like it might clarify their design, even though neither commit is particularly useful on its own. The maximum allowed size of new tuples is reduced by an amount equal to the space required to store an extra MAXALIGN()'d TID in a new high key during leaf page splits. The user-facing definition of the "1/3 of a page" restriction is already imprecise, and so does not need to be revised. However, there should be a compatibility note in the v12 release notes. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas, Alexander Korotkov Discussion: https://postgr.es/m/CAH2-WzkVb0Kom=R+88fDFb=JSxZMFvbHVC6Mn9LJ2n=X=kS-Uw@mail.gmail.com
* Implement OR REPLACE option for CREATE AGGREGATE.Andrew Gierth2019-03-19
| | | | | | | Aggregates have acquired a dozen or so optional attributes in recent years for things like parallel query and moving-aggregate mode; the lack of an OR REPLACE option to add or change these for an existing agg makes extension upgrades gratuitously hard. Rectify.
* Fix typos in sgml docs about RefetchForeignRow().Andres Freund2019-03-18
| | | | | | | I screwed this up in ad0bda5d24e. Reported-By: Jie Zhang, Michael Paquier, Etsuro Fujita Discussion: https://postgr.es/m/1396E95157071C4EBBA51892C5368521017F2DA203@G08CNEXMBPEKD02.g08.fujitsu.local
* Numeric error suppression in jsonpathAlexander Korotkov2019-03-16
| | | | | | | | | | | Add support of numeric error suppression to jsonpath as it's required by standard. This commit doesn't use PG_TRY()/PG_CATCH() in order to implement that. Instead, it provides internal versions of numeric functions used, which support error suppression. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Alexander Korotkov, Nikita Glukhov Reviewed-by: Tomas Vondra
* Partial implementation of SQL/JSON path languageAlexander Korotkov2019-03-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SQL 2016 standards among other things contains set of SQL/JSON features for JSON processing inside of relational database. The core of SQL/JSON is JSON path language, allowing access parts of JSON documents and make computations over them. This commit implements partial support JSON path language as separate datatype called "jsonpath". The implementation is partial because it's lacking datetime support and suppression of numeric errors. Missing features will be added later by separate commits. Support of SQL/JSON features requires implementation of separate nodes, and it will be considered in subsequent patches. This commit includes following set of plain functions, allowing to execute jsonpath over jsonb values: * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]), * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]). * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]). This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb, jsonpath) correspondingly. These operators will have an index support (implemented in subsequent patches). Catversion bumped, to add new functions and operators. Code was written by Nikita Glukhov and Teodor Sigaev, revised by me. Documentation was written by Oleg Bartunov and Liudmila Mantrova. The work was inspired by Oleg Bartunov. Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
* Enable parallel query with SERIALIZABLE isolation.Thomas Munro2019-03-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the SERIALIZABLE isolation level prevented parallel query from being used. Allow the two features to be used together by sharing the leader's SERIALIZABLEXACT with parallel workers. An extra per-SERIALIZABLEXACT LWLock is introduced to make it safe to share, and new logic is introduced to coordinate the early release of the SERIALIZABLEXACT required for the SXACT_FLAG_RO_SAFE optimization, as follows: The first backend to observe the SXACT_FLAG_RO_SAFE flag (set by some other transaction) will 'partially release' the SERIALIZABLEXACT, meaning that the conflicts and locks it holds are released, but the SERIALIZABLEXACT itself will remain active because other backends might still have a pointer to it. Whenever any backend notices the SXACT_FLAG_RO_SAFE flag, it clears its own MySerializableXact variable and frees local resources so that it can skip SSI checks for the rest of the transaction. In the special case of the leader process, it transfers the SERIALIZABLEXACT to a new variable SavedSerializableXact, so that it can be completely released at the end of the transaction after all workers have exited. Remove the serializable_okay flag added to CreateParallelContext() by commit 9da0cc35, because it's now redundant. Author: Thomas Munro Reviewed-by: Haribabu Kommi, Robert Haas, Masahiko Sawada, Kevin Grittner Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
* During pg_upgrade, conditionally skip transfer of FSMs.Amit Kapila2019-03-15
| | | | | | | | | | | | | If a heap on the old cluster has 4 pages or fewer, and the old cluster was PG v11 or earlier, don't copy or link the FSM. This will shrink space usage for installations with large numbers of small tables. This will allow pg_upgrade to take advantage of commit b0eaa4c51b where we have avoided creation of the free space map for small heap relations. Author: John Naylor Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CACPNZCu4cOdm3uGnNEGXivy7Gz8UWyQjynDpdkPGabQ18_zK6g%40mail.gmail.com
* Sync commentary in transam.h and bki.sgml.Tom Lane2019-03-14
| | | | | | | | | | | | | | Commit a6417078c missed updating some comments in transam.h about reservation of high OIDs for development purposes. Also tamp down an over-optimistic comment there about how easy it'd be to change FirstNormalObjectId. Earlier, commit 09568ec3d failed to update bki.sgml for the split between genbki.pl-assigned OIDs and those assigned during initdb. Also fix genbki.pl so that it will complain if it overruns that split. It's possible that doing so would have no very bad consequences, but that's no excuse for not detecting it.
* Use condition variables to wait for checkpoints.Thomas Munro2019-03-14
| | | | | | | | | | Previously we used a polling/sleeping loop to wait for checkpoints to begin and end, which leads to up to a couple hundred milliseconds of needless thumb-twiddling. Use condition variables instead. Author: Thomas Munro Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKGLY7sDe%2Bbg1K%3DbnEzOofGoo4bJHYh9%2BcDCXJepb6DQmLw%40mail.gmail.com
* Allow ALTER TABLE .. SET NOT NULL to skip provably unnecessary scans.Robert Haas2019-03-13
| | | | | | | | | | If existing CHECK or NOT NULL constraints preclude the presence of nulls, we need not look to see whether any are present. Sergei Kornilov, reviewed by Stephen Frost, Ildar Musin, David Rowley, and by me. Discussion: http://postgr.es/m/81911511895540@web58j.yandex.ru
* Remove extra commaMagnus Hagander2019-03-13
| | | | Author: Christoph Berg <myon@debian.org>
* Rename pg_verify_checksums to pg_checksumsMichael Paquier2019-03-13
| | | | | | | | | | | The current tool name is too restrictive and focuses only on verifying checksums. As more options to control checksums for an offline cluster are planned to be added, switch to a more generic name. Documentation as well as all past references to the tool are updated. Author: Michael Paquier Reviewed-by: Michael Banck, Fabien Coelho, Seigei Kornilov Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
* Add support for hyperbolic functions, as well as log10().Tom Lane2019-03-12
| | | | | | | | | | | | | | | | | | | | | The SQL:2016 standard adds support for the hyperbolic functions sinh(), cosh(), and tanh(). POSIX has long required libm to provide those functions as well as their inverses asinh(), acosh(), atanh(). Hence, let's just expose the libm functions to the SQL level. As with the trig functions, we only implement versions for float8, not numeric. For the moment, we'll assume that all platforms actually do have these functions; if experience teaches otherwise, some autoconf effort may be needed. SQL:2016 also adds support for base-10 logarithm, but with the function name log10(), whereas the name we've long used is log(). Add aliases named log10() for the float8 and numeric versions. Lætitia Avrot Discussion: https://postgr.es/m/CAB_COdguG22LO=rnxDQ2DW1uzv8aQoUzyDQNJjrR4k00XSgm5w@mail.gmail.com
* Remove remaining hard-wired OID references in the initial catalog data.Tom Lane2019-03-12
| | | | | | | | | | | | | | | | | | | | In the v11-era commits that taught genbki.pl to resolve symbolic OID references in the initial catalog data, we didn't bother to make every last reference symbolic; some of the catalogs have so few initial rows that it didn't seem worthwhile. However, the new project policy that OIDs assigned by new patches should be automatically renumberable changes this calculus. A patch that wants to add a row in one of these catalogs would have a problem when the OID it assigns gets renumbered. Hence, do the mop-up work needed to make all OID references in initial data be symbolic, and establish an associated project policy that we'll never again write a hard-wired OID reference there. No catversion bump since the contents of postgres.bki aren't actually changed by this commit. Discussion: https://postgr.es/m/CAH2-WzmMTGMcPuph4OvsO7Ykut0AOCF_i-=eaochT0dd2BN9CQ@mail.gmail.com
* Create a script that can renumber manually-assigned OIDs.Tom Lane2019-03-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds a Perl script renumber_oids.pl, which can reassign a range of manually-assigned OIDs to someplace else by modifying OID fields of the catalog *.dat files and OID-assigning macros in the catalog *.h files. Up to now, we've encouraged new patches that need manually-assigned OIDs to use OIDs just above the range of existing OIDs. Predictably, this leads to patches stepping on each others' toes, as whichever one gets committed first creates an OID conflict that other patch author(s) have to resolve manually. With the availability of renumber_oids.pl, we can eliminate a lot of this hassle. The new project policy, therefore, is: * Encourage new patches to use high OIDs (the documentation suggests choosing a block of OIDs at random in 8000..9999). * After feature freeze in each development cycle, run renumber_oids.pl to move all such OIDs down to lower numbers, thus freeing the high OID range for the next development cycle. This plan should greatly reduce the risk of OID collisions between concurrently-developed patches. Also, if such a collision happens anyway, we have the option to resolve it without much effort by doing an off-schedule OID renumbering to get the first-committed patch out of the way. Or a patch author could use renumber_oids.pl to change their patch's assignments without much pain. This approach does put a premium on not hard-wiring any OID values in places where renumber_oids.pl and genbki.pl can't fix them. Project practice in that respect seems to be pretty good already, but a follow-on patch will sand down some rough edges. John Naylor and Tom Lane, per an idea of Peter Geoghegan's Discussion: https://postgr.es/m/CAH2-WzmMTGMcPuph4OvsO7Ykut0AOCF_i-=eaochT0dd2BN9CQ@mail.gmail.com
* Allow fractional input values for integer GUCs, and improve rounding logic.Tom Lane2019-03-11
| | | | | | | | | | | | | | | | | | | | Historically guc.c has just refused examples like set work_mem = '30.1GB', but it seems more useful for it to take that and round off the value to some reasonable approximation of what the user said. Just rounding to the parameter's native unit would work, but it would lead to rather silly-looking settings, such as 31562138kB for this example. Instead let's round to the nearest multiple of the next smaller unit (if any), producing 30822MB. Also, do the units conversion math in floating point and round to integer (if needed) only at the end. This produces saner results for inputs that aren't exact multiples of the parameter's native unit, and removes another difference in the behavior for integer vs. float parameters. In passing, document the ability to use hex or octal input where it ought to be documented. Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
* Tweak wording on VARIADIC array doc patch.Andrew Dunstan2019-03-11
| | | | Per suggestion from Tom Lane.
* Document incompatibility of comparison expressions with VARIADIC array argumentsAndrew Dunstan2019-03-11
| | | | | | | | | | | | COALESCE, GREATEST and LEAST all look like functions taking variable numbers of arguments, but in fact they are not functions, and so VARIADIC array arguments don't work with them. Add a note to the docs explaining this fact. The consensus is not to try to make this work, but just to document the limitation. Discussion: https://postgr.es/m/CAFj8pRCaAtuXuRtvXf5GmPbAVriUQrNMo7-=TXUFN025S31R_w@mail.gmail.com
* pgbench: increase the maximum number of variables/argumentsAndrew Dunstan2019-03-11
| | | | | | | | | | | | | | | | pgbench's arbitrary limit of 10 arguments for SQL statements or metacommands is far too low. Increase it to 256. This results in a very modest increase in memory usage, not enough to worry about. The maximum includes the SQL statement or metacommand. This is reflected in the comments and revised TAP tests. Simon Riggs and Dagfinn Ilmari Mannsåker with some light editing by me. Reviewed by: David Rowley and Fabien Coelho Discussion: https://postgr.es/m/CANP8+jJiMJOAf-dLoHuR-8GENiK+eHTY=Omw38Qx7j2g0NDTXA@mail.gmail.com
* Fix documentation on partitioning vs. foreign tablesAlvaro Herrera2019-03-10
| | | | | | | | | | | | | | | | | | | 1. The PARTITION OF clause of CREATE FOREIGN TABLE was not explained in the CREATE FOREIGN TABLE reference page. Add it. (Postgres 10 onwards) 2. The limitation that tuple routing cannot target partitions that are foreign tables was not documented clearly enough. Improve wording. (Postgres 10 onwards) 3. The UPDATE tuple re-routing concurrency behavior was explained in the DDL chapter, which doesn't seem the right place. Move it to the UPDATE reference page instead. (Postgres 11 onwards). Authors: Amit Langote, David Rowley. Reviewed-by: Etsuro Fujita. Reported-by: Derek Hans Discussion: https://postgr.es/m/CAGrP7a3Xc1Qy_B2WJcgAD8uQTS_NDcJn06O5mtS_Ne1nYhBsyw@mail.gmail.com
* Reduce the default value of autovacuum_vacuum_cost_delay to 2ms.Tom Lane2019-03-10
| | | | | | | This is a better way to implement the desired change of increasing autovacuum's default resource consumption. Discussion: https://postgr.es/m/28720.1552101086@sss.pgh.pa.us
* Revert "Increase the default vacuum_cost_limit from 200 to 2000"Tom Lane2019-03-10
| | | | | | | | | | | | | | This reverts commit bd09503e633b8077822bb4daf91625b71ac16253. Per discussion, it seems like what we should do instead is to reduce the default value of autovacuum_vacuum_cost_delay by the same factor. That's functionally equivalent as long as the platform can accurately service the smaller delay request, which should be true on anything released in the last 10 years or more. And smaller, more-closely-spaced delays are better in terms of providing a steady I/O load. Discussion: https://postgr.es/m/28720.1552101086@sss.pgh.pa.us
* Convert [autovacuum_]vacuum_cost_delay into floating-point GUCs.Tom Lane2019-03-10
| | | | | | | | | | | | | | | | | | | | | This change makes it possible to specify sub-millisecond delays, which work well on most modern platforms, though that was not true when the cost-delay feature was designed. To support this without breaking existing configuration entries, improve guc.c to allow floating-point GUCs to have units. Also, allow "us" (microseconds) as an input/output unit for time-unit GUCs. (It's not allowed as a base unit, at least not yet.) Likewise change the autovacuum_vacuum_cost_delay reloption to be floating-point; this forces a catversion bump because the layout of StdRdOptions changes. This patch doesn't in itself change the default values or allowed ranges for these parameters, and it should not affect the behavior for any already-allowed setting for them. Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
* Support for INCLUDE attributes in GiST indexesAlexander Korotkov2019-03-10
| | | | | | | | | | | | | | | | | Similarly to B-tree, GiST index access method gets support of INCLUDE attributes. These attributes aren't used for tree navigation and aren't present in non-leaf pages. But they are present in leaf pages and can be fetched during index-only scan. The point of having INCLUDE attributes in GiST indexes is slightly different from the point of having them in B-tree. The main point of INCLUDE attributes in B-tree is to define UNIQUE constraint over part of attributes enabled for index-only scan. In GiST the main point of INCLUDE attributes is to use index-only scan for attributes, whose data types don't have GiST opclasses. Discussion: https://postgr.es/m/73A1A452-AD5F-40D4-BD61-978622FF75C1%40yandex-team.ru Author: Andrey Borodin, with small changes by me Reviewed-by: Andreas Karlsson
* Simplify release-note links to back branches.Tom Lane2019-03-09
| | | | | | | | Now that https://www.postgresql.org/docs/release/ is populated, replace the stopgap text we had under "Prior Releases" with a pointer to that archive. Discussion: https://postgr.es/m/e0f09c9a-bd2b-862a-d379-601dfabc8969@postgresql.org
* Add new clientcert hba option verify-fullMagnus Hagander2019-03-09
| | | | | | | | | | | | | This allows a login to require both that the cn of the certificate matches (like authentication type cert) *and* that another authentication method (such as password or kerberos) succeeds as well. The old value of clientcert=1 maps to the new clientcert=verify-ca, clientcert=0 maps to the new clientcert=no-verify, and the new option erify-full will add the validation of the CN. Author: Julian Markwort, Marius Timmer Reviewed by: Magnus Hagander, Thomas Munro
* Track block level checksum failures in pg_stat_databaseMagnus Hagander2019-03-09
| | | | | | | | | | This adds a column that counts how many checksum failures have occurred on files belonging to a specific database. Both checksum failures during normal backend processing and those created when a base backup detects a checksum failure are counted. Author: Magnus Hagander Reviewed by: Julien Rouhaud
* Fix function signatures of pageinspect in documentationMichael Paquier2019-03-08
| | | | | | | | | | tuple_data_split() lacked the type of the first argument, and heap_page_item_attrs() has reversed the first and second argument, with the bytea argument using an incorrect name. Author: Laurenz Albe Discussion: https://postgr.es/m/8f9ab7b16daf623e87eeef5203a4ffc0dece8dfd.camel@cybertec.at Backpatch-through: 9.6
* Fix minor deficiencies in XMLTABLE, xpath(), xmlexists()Alvaro Herrera2019-03-07
| | | | | | | | | | | | | | | | | | | | | | | | Correctly process nodes of more types than previously. In some cases, nodes were being ignored (nothing was output); in other cases, trying to return them resulted in errors about unrecognized nodes. In yet other cases, necessary escaping (of XML special characters) was not being done. Fix all those (as far as the authors could find) and add regression tests cases verifying the new behavior. I (Álvaro) was of two minds about backpatching these changes. They do seem bugfixes that would benefit most users of the affected functions; but on the other hand it would change established behavior in minor releases, so it seems prudent not to. Authors: Pavel Stehule, Markus Winand, Chapman Flack Discussion: https://postgr.es/m/CAFj8pRA6J25CtAZ2TuRvxK3gat7-bBUYh0rfE2yM7Hj9GD14Dg@mail.gmail.com https://postgr.es/m/8BDB0627-2105-4564-AA76-7849F028B96E@winand.at The elephant in the room as pointed out by Chapman Flack, not fixed in this commit, is that we still have XMLTABLE operating on XPath 1.0 instead of the standard-mandated XQuery (or even its subset XPath 2.0). Fixing that is a major undertaking, however.
* Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.Robert Haas2019-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We still require AccessExclusiveLock on the partition itself, because otherwise an insert that violates the newly-imposed partition constraint could be in progress at the same time that we're changing that constraint; only the lock level on the parent relation is weakened. To make this safe, we have to cope with (at least) three separate problems. First, relevant DDL might commit while we're in the process of building a PartitionDesc. If so, find_inheritance_children() might see a new partition while the RELOID system cache still has the old partition bound cached, and even before invalidation messages have been queued. To fix that, if we see that the pg_class tuple seems to be missing or to have a null relpartbound, refetch the value directly from the table. We can't get the wrong value, because DETACH PARTITION still requires AccessExclusiveLock throughout; if we ever want to change that, this will need more thought. In testing, I found it quite difficult to hit even the null-relpartbound case; the race condition is extremely tight, but the theoretical risk is there. Second, successive calls to RelationGetPartitionDesc might not return the same answer. The query planner will get confused if lookup up the PartitionDesc for a particular relation does not return a consistent answer for the entire duration of query planning. Likewise, query execution will get confused if the same relation seems to have a different PartitionDesc at different times. Invent a new PartitionDirectory concept and use it to ensure consistency. This ensures that a single invocation of either the planner or the executor sees the same view of the PartitionDesc from beginning to end, but it does not guarantee that the planner and the executor see the same view. Since this allows pointers to old PartitionDesc entries to survive even after a relcache rebuild, also postpone removing the old PartitionDesc entry until we're certain no one is using it. For the most part, it seems to be OK for the planner and executor to have different views of the PartitionDesc, because the executor will just ignore any concurrently added partitions which were unknown at plan time; those partitions won't be part of the inheritance expansion, but invalidation messages will trigger replanning at some point. Normally, this happens by the time the very next command is executed, but if the next command acquires no locks and executes a prepared query, it can manage not to notice until a new transaction is started. We might want to tighten that up, but it's material for a separate patch. There would still be a small window where a query that started just after an ATTACH PARTITION command committed might fail to notice its results -- but only if the command starts before the commit has been acknowledged to the user. All in all, the warts here around serializability seem small enough to be worth accepting for the considerable advantage of being able to add partitions without a full table lock. Although in general the consequences of new partitions showing up between planning and execution are limited to the query not noticing the new partitions, run-time partition pruning will get confused in that case, so that's the third problem that this patch fixes. Run-time partition pruning assumes that indexes into the PartitionDesc are stable between planning and execution. So, add code so that if new partitions are added between plan time and execution time, the indexes stored in the subplan_map[] and subpart_map[] arrays within the plan's PartitionedRelPruneInfo get adjusted accordingly. There does not seem to be a simple way to generalize this scheme to cope with partitions that are removed, mostly because they could then get added back again with different bounds, but it works OK for added partitions. This code does not try to ensure that every backend participating in a parallel query sees the same view of the PartitionDesc. That currently doesn't matter, because we never pass PartitionDesc indexes between backends. Each backend will ignore the concurrently added partitions which it notices, and it doesn't matter if different backends are ignoring different sets of concurrently added partitions. If in the future that matters, for example because we allow writes in parallel query and want all participants to do tuple routing to the same set of partitions, the PartitionDirectory concept could be improved to share PartitionDescs across backends. There is a draft patch to serialize and restore PartitionDescs on the thread where this patch was discussed, which may be a useful place to start. Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs, Amit Langote, and Michael Paquier for discussion, and to Alvaro Herrera for some review. Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
* Fix broken markupAlvaro Herrera2019-03-07
|
* Fix the BY {REF,VALUE} clause of XMLEXISTS/XMLTABLEAlvaro Herrera2019-03-07
| | | | | | | | | | | | | | This clause is used to indicate the passing mode of a XML document, but we were doing it wrong: we accepted BY REF and ignored it, and rejected BY VALUE as a syntax error. The reality, however, is that documents are always passed BY VALUE, so rejecting that clause was silly. Change things so that we accept BY VALUE. BY REF continues to be accepted, and continues to be ignored. Author: Chapman Flack Reviewed-by: Pavel Stehule Discussion: https://postgr.es/m/5C297BB7.9070509@anastigmatix.net
* pg_dump: allow multiple rows per insertAlvaro Herrera2019-03-07
| | | | | | | | This is useful to speed up loading data in a different database engine. Authors: Surafel Temesgen and David Rowley. Lightly edited by Álvaro. Reviewed-by: Fabien Coelho Discussion: https://postgr.es/m/CALAY4q9kumSdnRBzvRJvSRf2+BH20YmSvzqOkvwpEmodD-xv6g@mail.gmail.com
* tableam: introduce table AM infrastructure.Andres Freund2019-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces the concept of table access methods, i.e. CREATE ACCESS METHOD ... TYPE TABLE and CREATE TABLE ... USING (storage-engine). No table access functionality is delegated to table AMs as of this commit, that'll be done in following commits. Subsequent commits will incrementally abstract table access functionality to be routed through table access methods. That change is too large to be reviewed & committed at once, so it'll be done incrementally. Docs will be updated at the end, as adding them incrementally would likely make them less coherent, and definitely is a lot more work, without a lot of benefit. Table access methods are specified similar to index access methods, i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with callbacks. In contrast to index AMs that struct needs to live as long as a backend, typically that's achieved by just returning a pointer to a constant struct. Psql's \d+ now displays a table's access method. That can be disabled with HIDE_TABLEAM=true, which is mainly useful so regression tests can be run against different AMs. It's quite possible that this behaviour still needs to be fine tuned. For now it's not allowed to set a table AM for a partitioned table, as we've not resolved how partitions would inherit that. Disallowing allows us to introduce, if we decide that's the way forward, such a behaviour without a compatibility break. Catversion bumped, to add the heap table AM and references to it. Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
* Increase the default vacuum_cost_limit from 200 to 2000Andrew Dunstan2019-03-06
| | | | | | | | | | | | | | | | | | | | | | | | The original 200 default value was set back in f425b605f4e when the cost delay settings were first added. Hardware has improved quite a bit since then and we've also made improvements such as sorting buffers during checkpoints (9cd00c457e6) which should result in less random writes. This low default value was reportedly causing problems with badly configured servers and in the absence of a native method to remove excessive bloat from tables without incurring an AccessExclusiveLock, this often made cleaning up the damage caused by badly configured auto-vacuums difficult. It seems more likely that someone will notice that auto-vacuum is running too quickly than too slowly, so let's go all out and multiple the default value for the setting by 10. With the default vacuum_cost_page_dirty and autovacuum_vacuum_cost_delay (assuming a page size of 8192 bytes), this allows autovacuum a theoretical maximum dirty write rate of around 39MB/s instead of just 3.9MB/s. Author: David Rowley Discussion: https://postgr.es/m/CAKJS1f_YbXC2qTMPyCbmsPiKvZYwpuQNQMohiRXLj1r=8_rYvw@mail.gmail.com
* pg_partition_ancestorsAlvaro Herrera2019-03-04
| | | | | | | | Adds another introspection feature for partitioning, necessary for further psql patches. Reviewed-by: Michaël Paquier Discussion: https://postgr.es/m/20190226222757.GA31622@alvherre.pgsql
* Store tuples for EvalPlanQual in slots, rather than as HeapTuples.Andres Freund2019-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | For the upcoming pluggable table access methods it's quite inconvenient to store tuples as HeapTuples, as that'd require converting tuples from a their native format into HeapTuples. Instead use slots to manage epq tuples. To fit into that scheme, change the foreign data wrapper callback RefetchForeignRow, to store the tuple in a slot. Insist on using the caller provided slot, so it conveniently can be stored in the corresponding EPQ slot. As there is no in core user of RefetchForeignRow, that change was done blindly, but we plan to test that soon. To avoid duplicating that work for row locks, move row locks to just directly use the EPQ slots - it previously temporarily stored tuples in LockRowsState.lr_curtuples, but that doesn't seem beneficial, given we'd possibly end up with a significant number of additional slots. The behaviour of es_epqTupleSet[rti -1] is now checked by es_epqTupleSlot[rti -1] != NULL, as that is distinguishable from a slot containing an empty tuple. Author: Andres Freund, Haribabu Kommi, Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Add --exclude-database option to pg_dumpallAndrew Dunstan2019-03-01
| | | | | | | | | | This option functions similarly to pg_dump's --exclude-table option, but for database names. The option can be given once, and the argument can be a pattern including wildcard characters. Author: Andrew Dunstan. Reviewd-by: Fabien Coelho and Michael Paquier Discussion: https://postgr.es/m/43a54a47-4aa7-c70e-9ca6-648f436dd6e6@2ndQuadrant.com
* Improve docs for ALTER TABLE .. SET TABLESPACEAlvaro Herrera2019-02-28
| | | | | Discussion: https://postgr.es/m/20190220173815.GA7959@alvherre.pgsql Reviewed-by: Robert Haas
* Improve documentation of data_sync_retryMichael Paquier2019-02-28
| | | | | | | | Reflecting an updated parameter value requires a server restart, which was not mentioned in the documentation and in postgresql.conf.sample. Reported-by: Thomas Poty Discussion: https://postgr.es/m/15659-0cd812f13027a2d8@postgresql.org
* Set fallback_application_name for a walreceiver to cluster_namePeter Eisentraut2019-02-27
| | | | | | | | | | | | | | | | By default, the fallback_application_name for a physical walreceiver is "walreceiver". This means that multiple standbys cannot be distinguished easily on a primary, for example in pg_stat_activity or synchronous_standby_names. If cluster_name is set, use that for fallback_application_name in the walreceiver. (If it's not set, it remains "walreceiver".) If someone set cluster_name to identify their instance, we might as well use that by default to identify the node remotely as well. It's still possible to specify another application_name in primary_conninfo explicitly. Reviewed-by: Euler Taveira <euler@timbira.com.br> Discussion: https://www.postgresql.org/message-id/flat/1257eaee-4874-e791-e83a-46720c72cac7@2ndquadrant.com
* Remove unnecessary use of PROCEDURALPeter Eisentraut2019-02-25
| | | | | | | | | | | Remove some unnecessary, legacy-looking use of the PROCEDURAL keyword before LANGUAGE. We mostly don't use this anymore, so some of these look a bit old. There is still some use in pg_dump, which is harder to remove because it's baked into the archive format, so I'm not touching that. Discussion: https://www.postgresql.org/message-id/2330919b-62d9-29ac-8de3-58c024fdcb96@2ndquadrant.com
* doc: Add security information about pg_stat_activityPeter Eisentraut2019-02-21
| | | | | | | Add a basic note that some columns in pg_stat_activity and related views are not visible to all users. Discussion: https://www.postgresql.org/message-id/3018acd9-e5d8-1e85-5ed7-47276cd77569%402ndquadrant.com