aboutsummaryrefslogtreecommitdiff
path: root/src/include
Commit message (Collapse)AuthorAge
* Don't reset changes_since_analyze after a selective-columns ANALYZE.Tom Lane2016-06-06
| | | | | | | | | | | | If we ANALYZE only selected columns of a table, we should not postpone auto-analyze because of that; other columns may well still need stats updates. As committed, the counter is left alone if a column list is given, whether or not it includes all analyzable columns of the table. Per complaint from Tomasz Ostrowski. It's been like this a long time, so back-patch to all supported branches. Report: <ef99c1bd-ff60-5f32-2733-c7b504eb960c@ato.waw.pl>
* Stop the executor if no more tuples can be sent from worker to leader.Robert Haas2016-06-06
| | | | | | | | | | | | | | | | | | | | | If a Gather node has read as many tuples as it needs (for example, due to Limit) it may detach the queue connecting it to the worker before reading all of the worker's tuples. Rather than let the worker continue to generate and send all of the results, have it stop after sending the next tuple. More could be done here to stop the worker even quicker, but this is about as well as we can hope to do for 9.6. This is in response to a problem report from Andreas Seltenreich. Commit 44339b892a04e94bbb472235882dc6f7023bdc65 should be actually be sufficient to fix that example even without this change, but it seems better to do this, too, since we might otherwise waste quite a large amount of effort in one or more workers. Discussion: CAA4eK1KOKGqmz9bGu+Z42qhRwMbm4R5rfnqsLCNqFs9j14jzEA@mail.gmail.com Amit Kapila
* Inline the easy cases in MakeExpandedObjectReadOnly().Tom Lane2016-06-03
| | | | | | | | | | | | | | This attempts to buy back some of whatever performance we lost from fixing bug #14174 by inlining the initial checks in MakeExpandedObjectReadOnly() into the callers. We can do that in a macro without creating multiple- evaluation hazards, so it's pretty much free notationally; and the amount of code added to callers should be minimal as well. (Testing a value can't take many more instructions than passing it to a subroutine.) Might as well inline DatumIsReadWriteExpandedObject() while we're at it. This is an ABI break for callers, so it doesn't seem safe to put into 9.5, but I see no reason not to do it in HEAD.
* Mark read/write expanded values as read-only in ExecProject().Tom Lane2016-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a plan node output expression returns an "expanded" datum, and that output column is referenced in more than one place in upper-level plan nodes, we need to ensure that what is returned is a read-only reference not a read/write reference. Otherwise one of the referencing sites could scribble on or even delete the expanded datum before we have evaluated the others. Commit 1dc5ebc9077ab742, which introduced this feature, supposed that it'd be sufficient to make SubqueryScan nodes force their output columns to read-only state. The folly of that was revealed by bug #14174 from Andrew Gierth, and really should have been immediately obvious considering that the planner will happily optimize SubqueryScan nodes out of the plan without any regard for this issue. The safest fix seems to be to make ExecProject() force its results into read-only state; that will cover every case where a plan node returns expression results. Actually we can delegate this to ExecTargetList() since we can recursively assume that plain Vars will not reference read-write datums. That should keep the extra overhead down to something minimal. We no longer need ExecMakeSlotContentsReadOnly(), which was introduced only in support of the idea that just a few plan node types would need to do this. In the future it would be nice to have the planner account for this problem and inject force-to-read-only expression evaluation nodes into only the places where there's a risk of multiple evaluation. That's not a suitable solution for 9.5 or even 9.6 at this point, though. Report: <20160603124628.9932.41279@wrigleys.postgresql.org>
* Mark PostmasterPid as PGDLLIMPORT.Robert Haas2016-06-03
| | | | | | This is so that extensions can use it. Michael Paquier
* Fix various common mispellings.Greg Stark2016-06-03
| | | | | | | | | | Mostly these are just comments but there are a few in documentation and a handful in code and tests. Hopefully this doesn't cause too much unnecessary pain for backpatching. I relented from some of the most common like "thru" for that reason. The rest don't seem numerous enough to cause problems. Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
* Cosmetic improvements to freeze map code.Robert Haas2016-06-03
| | | | | | | Per post-commit review comments from Andres Freund, improve variable names, comments, and in one place, slightly improve the code structure. Masahiko Sawada
* C comment improvement & typo fix.Kevin Grittner2016-06-02
| | | | Thomas Munro
* Avoid useless closely-spaced writes of statistics files.Tom Lane2016-05-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original intent in the stats collector was that we should not write out stats data oftener than every PGSTAT_STAT_INTERVAL msec. Backends will not make requests at all if they see the existing data is newer than that, and the stats collector is supposed to disregard requests having a cutoff_time older than its most recently written data, so that close-together requests don't result in multiple writes. But the latter part of that got broken in commit 187492b6c2e8cafc, so that if two backends concurrently decide the existing stats are too old, the collector would write the data twice. (In principle the collector's logic would still merge requests as long as the second one arrives before we've actually written data ... but since the message collection loop would write data immediately after processing a single inquiry message, that never happened in practice, and in any case the window in which it might work would be much shorter than PGSTAT_STAT_INTERVAL.) To fix, improve pgstat_recv_inquiry so that it checks whether the cutoff time is too old, and doesn't add a request to the queue if so. This means that we do not need DBWriteRequest.request_time, because the decision is taken before making a queue entry. And that means that we don't really need the DBWriteRequest data structure at all; an OID list of database OIDs will serve and allow removal of some rather verbose and crufty code. In passing, improve the comments in this area, which have been rather neglected. Also change backend_read_statsfile so that it's not silently relying on MyDatabaseId to have some particular value in the autovacuum launcher process. It accidentally worked as desired because MyDatabaseId is zero in that process; but that does not seem like a dependency we want, especially with no documentation about it. Although this patch is mine, it turns out I'd rediscovered a known bug, for which Tomas Vondra had already submitted a patch that's functionally equivalent to the non-cosmetic aspects of this patch. Thanks to Tomas for reviewing this version. Back-patch to 9.3 where the bug was introduced. Prior-Discussion: <1718942738eb65c8407fcd864883f4c8@fuzzy.cz> Patch: <4625.1464202586@sss.pgh.pa.us>
* Move memory barrier in UnlockBufHdr to before releasing the lock.Andres Freund2016-05-30
| | | | | | | | | | | This bug appears to have been introduced late in the development of 48354581a4 ("Allow Pin/UnpinBuffer to operate in a lockfree manner."). Found while debugging a bug which turned out to be independent of the commit mentioned above. Backpatch: -
* Fix PageAddItem BRIN bugAlvaro Herrera2016-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BRIN was relying on the ability to remove a tuple from an index page, then putting another tuple in the same line pointer. But PageAddItem refuses to add a tuple beyond the first free item past the last used item, and in particular, it rejects an attempt to add an item to an empty page anywhere other than the first line pointer. PageAddItem issues a WARNING and indicates to the caller that it failed, which in turn causes the BRIN calling code to issue a PANIC, so the whole sequence looks like this: WARNING: specified item offset is too large PANIC: failed to add BRIN tuple To fix, create a new function PageAddItemExtended which is like PageAddItem except that the two boolean arguments become a flags bitmap; the "overwrite" and "is_heap" boolean flags in PageAddItem become PAI_OVERWITE and PAI_IS_HEAP flags in the new function, and a new flag PAI_ALLOW_FAR_OFFSET enables the behavior required by BRIN. PageAddItem() retains its original signature, for compatibility with third-party modules (other callers in core code are not modified, either). Also, in the belt-and-suspenders spirit, I added a new sanity check in brinGetTupleForHeapBlock to raise an error if an TID found in the revmap is not marked as live by the page header. This causes it to react with "ERROR: corrupted BRIN index" to the bug at hand, rather than a hard crash. Backpatch to 9.5. Bug reported by Andreas Seltenreich as detected by his handy sqlsmith fuzzer. Discussion: https://www.postgresql.org/message-id/87mvni77jh.fsf@elite.ansel.ydns.eu
* Be more predictable about reporting "lock timeout" vs "statement timeout".Tom Lane2016-05-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If both timeout indicators are set when we arrive at ProcessInterrupts, we've historically just reported "lock timeout". However, some buildfarm members have been observed to fail isolationtester's timeouts test by reporting "lock timeout" when the statement timeout was expected to fire first. The cause seems to be that the process is allowed to sleep longer than expected (probably due to heavy machine load) so that the lock timeout happens before we reach the point of reporting the error, and then this arbitrary tiebreak rule does the wrong thing. We can improve matters by comparing the scheduled timeout times to decide which error to report. I had originally proposed greatly reducing the 1-second window between the two timeouts in the test cases. On reflection that is a bad idea, at least for the case where the lock timeout is expected to fire first, because that would assume that it takes negligible time to get from statement start to the beginning of the lock wait. Thus, this patch doesn't completely remove the risk of test failures on slow machines. Empirically, however, the case this handles is the one we are seeing in the buildfarm. The explanation may be that the other case requires the scheduler to take the CPU away from a busy process, whereas the case fixed here only requires the scheduler to not give the CPU back right away to a process that has been woken from a multi-second sleep (and, perhaps, has been swapped out meanwhile). Back-patch to 9.3 where the isolationtester timeouts test was added. Discussion: <8693.1464314819@sss.pgh.pa.us>
* Mark wal_level as PGDLLIMPORT.Tom Lane2016-05-24
| | | | | Per buildfarm, this is needed to allow extensions to use XLogIsNeeded() in Windows builds.
* Add support for more extensive testing of raw_expression_tree_walker().Tom Lane2016-05-23
| | | | | | | | | | | | | | | | | | | | | If RAW_EXPRESSION_COVERAGE_TEST is defined, do a no-op tree walk over every basic DML statement submitted to parse analysis. If we'd had this in place earlier, bug #14153 would have been caught by buildfarm testing. The difficulty is that raw_expression_tree_walker() is only used in limited cases involving CTEs (particularly recursive ones), so it's very easy for an oversight in it to not be noticed during testing of a seemingly-unrelated feature. The type of error we can expect to catch with this is complete omission of a node type from raw_expression_tree_walker(), and perhaps also recursion into a field that doesn't contain a node tree, though that would be an unlikely mistake. It won't catch failure to add new fields that need to be recursed into, unfortunately. I'll go enable this on one or two of my own buildfarm animals once bug #14153 is dealt with. Discussion: <27861.1464040417@sss.pgh.pa.us>
* Fix latent crash in do_text_output_multiline().Tom Lane2016-05-23
| | | | | | | | | | | | | | | | | | | | | do_text_output_multiline() would fail (typically with a null pointer dereference crash) if its input string did not end with a newline. Such cases do not arise in our current sources; but it certainly could happen in future, or in extension code's usage of the function, so we should fix it. To fix, replace "eol += len" with "eol = text + len". While at it, make two cosmetic improvements: mark the input string const, and rename the argument from "text" to "txt" to dodge pgindent strangeness (since "text" is a typedef name). Even though this problem is only latent at present, it seems like a good idea to back-patch the fix, since it's a very simple/safe patch and it's not out of the realm of possibility that we might in future back-patch something that expects sane behavior from do_text_output_multiline(). Per report from Hao Lee. Report: <CAGoxFiFPAGyPAJLcFxTB5cGhTW2yOVBDYeqDugYwV4dEd1L_Ag@mail.gmail.com>
* Pin the built-in index access methods.Tom Lane2016-05-19
| | | | | | | | | | | | This was overlooked in commit 473b93287, which introduced DROP ACCESS METHOD. Although that command is restricted to superusers, we don't want even superusers dropping the built-in methods; "DROP ACCESS METHOD btree" in particular is unrecoverable from. Pin these objects in the same way that other initdb-created objects are pinned. I chose to bump catversion for this fix. That's not absolutely necessary perhaps, but it will ensure that no 9.6 production systems are missing the pin entries.
* Stamp 9.6beta1.REL9_6_BETA1Tom Lane2016-05-09
|
* Fix pg_upgrade to not fail when new-cluster TOAST rules differ from old.Tom Lane2016-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch essentially reverts commit 4c6780fd17aa43ed, in favor of a much simpler solution for the case where the new cluster would choose to create a TOAST table but the old cluster doesn't have one: just don't create a TOAST table. The existing code failed in at least two different ways if the situation arose: (1) ALTER TABLE RESET didn't grab an exclusive lock, so that the lock sanity check in create_toast_table failed; (2) pg_upgrade did not provide a pg_type OID for the new toast table, so that the crosscheck in TypeCreate failed. While both these problems were introduced by later patches, they show that the hack being used to cause TOAST table creation is overwhelmingly fragile (and untested). I also note that before the TypeCreate crosscheck was added, the code would have resulted in assigning an indeterminate pg_type OID to the toast table, possibly causing a later OID conflict in that catalog; so that it didn't really work even when committed. If we simply don't create a TOAST table, there will only be a problem if the code tries to store a tuple that's wider than a page, and field compression isn't sufficient to get it under a page. Given that the TOAST creation threshold is intended to be about a quarter of a page, it's very hard to believe that cross-version differences in the do-we-need-a-toast- table heuristic could result in an observable problem. So let's just follow the old version's conclusion about whether a TOAST table is needed. (If we ever do change needs_toast_table() so much that this conclusion doesn't apply, we can devise a solution at that time, and hopefully do it in a less klugy way than 4c6780fd17aa43ed did.) Back-patch to 9.3, like the previous patch. Discussion: <8110.1462291671@sss.pgh.pa.us>
* Fix hash index vs "snapshot too old" problemmsKevin Grittner2016-05-06
| | | | | | | | | | | | | | | | Hash indexes are not WAL-logged, and so do not maintain the LSN of index pages. Since the "snapshot too old" feature counts on detecting error conditions using the LSN of a table and all indexes on it, this makes it impossible to safely do early vacuuming on any table with a hash index, so add this to the tests for whether the xid used to vacuum a table can be adjusted based on old_snapshot_threshold. While at it, add a paragraph to the docs for old_snapshot_threshold which specifically mentions this and other aspects of the feature which may otherwise surprise users. Problem reported and patch reviewed by Amit Kapila
* Move and rename fmtReloptionsArray().Dean Rasheed2016-05-06
| | | | | | | | | | | | Move fmtReloptionsArray() from pg_dump.c to string_utils.c so that it is available to other frontend code. In particular psql's \ev and \sv commands need it to handle view reloptions. Also rename the function to appendReloptionsArray(), which is a more accurate description of what it does. Author: Dean Rasheed Reviewed-by: Peter Eisentraut Discussion: http://www.postgresql.org/message-id/CAEZATCWZjCgKRyM-agE0p8ax15j9uyQoF=qew7D2xB6cF76T8A@mail.gmail.com
* Rename tsvector delete() to ts_delete(), and filter() to ts_filter().Tom Lane2016-05-05
| | | | | | | | | The similarity of the original names to SQL keywords seems like a bad idea. Rename them before we're stuck with 'em forever. In passing, minor code and docs cleanup. Discussion: <4875.1462210058@sss.pgh.pa.us>
* Revert timeline following in replication slotsAlvaro Herrera2016-05-04
| | | | | | | | | This reverts commits f07d18b6e94d, 82c83b337202, 3a3b309041b0, and 24c5f1a103ce. This feature has shown enough immaturity that it was deemed better to rip it out before rushing some more fixes at the last minute. There are discussions on larger changes in this area for the next release.
* Fix more things to be parallel-safe.Robert Haas2016-05-03
| | | | | | | | | | | Conversion functions were previously marked as parallel-unsafe, since that is the default, but in fact they are safe. Parallel-safe functions defined in pg_proc.h and redefined in system_views.sql were ending up as parallel-unsafe because the redeclarations were not marked PARALLEL SAFE. While editing system_views.sql, mark ts_debug() parallel safe also. Andreas Karlsson
* Fix thinko in commentAlvaro Herrera2016-05-02
| | | | Pointed out by Andres Freund
* Fix code comments regarding logical decodingAlvaro Herrera2016-05-02
| | | | | | | | | | | | | | Back in 3b02ea4f0780 I added some comments in various places to explain how logical decoding and other things worked. Not all of the changes were welcome, because they were misleading or wrong. This changes them a little bit to make them more accurate. Some other comments are also changed to be more accurate. Also, fix a bunch of typos. Author: Álvaro Herrera, Craig Ringer Andres Freund reviewed some parts of this.
* Fix parallel safety markings for pg_start_backup.Robert Haas2016-05-02
| | | | | | | | | | | Commit 7117685461af50f50c03f43e6a622284c8d54694 made pg_start_backup parallel-restricted rather than parallel-safe, because it now relies on backend-private state that won't be synchronized with the parallel worker. However, it didn't update pg_proc.h. Separately, Andreas Karlsson observed that system_views.sql neglected to reiterate the parallel-safety markings whe redefining various functions, including this one; so add a PARALLEL RESTRICTED declaration there to match the new value in pg_proc.h.
* Fix mishandling of equivalence-class tests in parameterized plans.Tom Lane2016-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z, it was possible for the planner to omit one of the quals needed to enforce that all members of the equivalence class are actually equal. This only happened in the case of a parameterized join node for two of the relations, that is a plan tree like Nested Loop -> Scan X -> Nested Loop -> Scan Y -> Scan Z Filter: Z.Z = X.X The eclass machinery normally expects to apply X.X = Y.Y when those two relations are joined, but in this shape of plan tree they aren't joined until the top node --- and, if the lower nested loop is marked as parameterized by X, the top node will assume that the relevant eclass condition(s) got pushed down into the lower node. On the other hand, the scan of Z assumes that it's only responsible for constraining Z.Z to match any one of the other eclass members. So one or another of the required quals sometimes fell between the cracks, depending on whether consideration of the eclass in get_joinrel_parampathinfo() for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z as the appropriate constraint there. If it generated the latter, it'd erroneously suppose that the Z scan would take care of matters. To fix, force X.X = Y.Y to be generated and applied at that join node when this case occurs. This is *extremely* hard to hit in practice, because various planner behaviors conspire to mask the problem; starting with the fact that the planner doesn't really like to generate a parameterized plan of the above shape. (It might have been impossible to hit it before we tweaked things to allow this plan shape for star-schema cases.) Many thanks to Alexander Kirkouski for submitting a reproducible test case. The bug can be demonstrated in all branches back to 9.2 where parameterized paths were introduced, so back-patch that far.
* Add a few entries to the tail of time mapping, to see old values.Kevin Grittner2016-04-29
| | | | | | | | | | | | | | | | | | Without a few entries beyond old_snapshot_threshold, the lookup would often fail, resulting in the more aggressive pruning or vacuum being skipped often enough to matter. This was very clearly shown by a python test script posted by Ants Aasma, and was likely a factor in an earlier but somewhat less clear-cut test case posted by Jeff Janes. This patch makes no change to the logic, per se -- it just makes the array of mapping entries big enough to make lookup misses based on timing much less likely. An occasional miss is still possible if a thread stalls for more than 10 minutes, but that does not create any problem with correctness of behavior. Besides, if things are so busy that a thread is stalling for more than 10 minutes, it is probably OK to skip the more aggressive cleanup at that particular point in time.
* Fix comment whitespace in VS2105 patchAndrew Dunstan2016-04-29
| | | | per gripe from Michael Paquier.
* Fix typoMagnus Hagander2016-04-29
| | | | Author: Thomas Munro
* Support building with Visual Studio 2015Andrew Dunstan2016-04-29
| | | | | | | | | | | Adjust the way we detect the locale. As a result the minumum Windows version supported by VS2015 and later is Windows Vista. Add some tweaks to remove new compiler warnings. Remove documentation references to the now obsolete msysGit. Michael Paquier, somewhat edited by me, reviewed by Christian Ullrich. Backpatch to 9.5
* Adjust DatumGetBool macro, this time for sure.Tom Lane2016-04-28
| | | | | | | | | | | | | | | Commit 23a41573c attempted to fix the DatumGetBool macro to ignore bits in a Datum that are to the left of the actual bool value. But it did that by casting the Datum to bool; and on compilers that use C99 semantics for bool, that ends up being a whole-word test, not a 1-byte test. This seems to be the true explanation for contrib/seg failing in VS2015. To fix, use GET_1_BYTE() explicitly. I think in the previous patch, I'd had some idea of not having to commit to bool being exactly 1 byte wide, but regardless of what the compiler's bool is, boolean columns and Datums are certainly 1 byte wide. The previous fix was (eventually) back-patched into all active versions, so do likewise with this one.
* Prevent to use magic constantsTeodor Sigaev2016-04-28
| | | | | | | Use macroses for definition amstrategies/amsupport fields instead of hardcoded values. Author: Nikolay Shaplov with addition for contrib/bloom
* Prevent multiple cleanup process for pending list in GIN.Teodor Sigaev2016-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, ginInsertCleanup could exit early if it detects that someone else is cleaning up the pending list, without waiting for that someone else to finish the job. But in this case vacuum could miss tuples to be deleted. Cleanup process now locks metapage with a help of heavyweight LockPage(ExclusiveLock), and it guarantees that there is no another cleanup process at the same time. Lock is taken differently depending on caller of cleanup process: any vacuums and gin_clean_pending_list() will be blocked until lock becomes available, ordinary insert uses conditional lock to prevent indefinite waiting on lock. Insert into pending list doesn't use this lock, so insertion isn't blocked. Also, patch adds stopping of cleanup process when at-start-cleanup-tail is reached in order to prevent infinite cleanup in case of massive insertion. But it will stop only for automatic maintenance tasks like autovacuum. Patch introduces choice of limit of memory to use: autovacuum_work_mem, maintenance_work_mem or work_mem depending on call path. Patch for previous releases should be reworked due to changes between 9.6 and previous ones in this area. Discover and diagnostics by Jeff Janes and Tomas Vondra Patch by me with some ideas of Jeff Janes
* Clean up parsing of synchronous_standby_names GUC variable.Tom Lane2016-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 989be0810dffd08b added a flex/bison lexer/parser to interpret synchronous_standby_names. It was done in a pretty crufty way, though, making assorted end-use sites responsible for calling the parser at the right times. That was not only vulnerable to errors of omission, but made it possible that lexer/parser errors occur at very undesirable times, and created memory leakages even if there was no error. Instead, perform the parsing once during check_synchronous_standby_names and let guc.c manage the resulting data. To do that, we have to flatten the parsed representation into a single hunk of malloc'd memory, but that is not very hard. While at it, work a little harder on making useful error reports for parsing problems; the previous code felt that "synchronous_standby_names parser returned 1" was an appropriate user-facing error message. (To be fair, it did also log a syntax error message, but separately from the GUC problem report, which is at best confusing.) It had some outright bugs in the face of invalid input, too. I (tgl) also concluded that we need to restrict unquoted names in synchronous_standby_names to be just SQL identifiers. The previous coding would accept darn near anything, which (1) makes the quoting convention both nearly-unnecessary and formally ambiguous, (2) makes it very hard to understand what is a syntax error and what is a creative interpretation of the input as a standby name, and (3) makes it impossible to further extend the syntax in future without a compatibility break. I presume that we're intending future extensions of the syntax, else this parsing infrastructure is massive overkill, so (3) is an important objection. Since we've taken a compatibility hit for non-identifier names with this change anyway, we might as well lock things down now and insist that users use double quotes for standby names that aren't identifiers. Kyotaro Horiguchi and Tom Lane
* Remove mergeHyperLogLog.Robert Haas2016-04-27
| | | | | | | | | | It's buggy. If somebody needs this later, they'll need to put back a non-buggy vesion of it. Discussion: CAM3SWZT-i6R9JU5YXa8MJUou2_r3LfGJZpQ9tYa1BYxfkj0=cQ@mail.gmail.com Discussion: CAM3SWZRUOLsYoTT83QgdUy9D8ehYWm_nvbrrfcOOzikiRfFY7g@mail.gmail.com Peter Geoghegan
* Fix EXPLAIN VERBOSE output for parallel aggregate.Robert Haas2016-04-27
| | | | | | | | | | | | The way that PartialAggregate and FinalizeAggregate plan nodes were displaying output columns before was bogus. Now, FinalizeAggregate produces the same outputs as an Aggregate would have produced, while PartialAggregate produces each of those outputs prefixed by the word PARTIAL. Discussion: 12585.1460737650@sss.pgh.pa.us Patch by me, reviewed by David Rowley.
* Emit invalidations to standby for transactions without xid.Andres Freund2016-04-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | So far, when a transaction with pending invalidations, but without an assigned xid, committed, we simply ignored those invalidation messages. That's problematic, because those are actually sent for a reason. Known symptoms of this include that existing sessions on a hot-standby replica sometimes fail to notice new concurrently built indexes and visibility map updates. The solution is to WAL log such invalidations in transactions without an xid. We considered to alternatively force-assign an xid, but that'd be problematic for vacuum, which might be run in systems with few xids. Important: This adds a new WAL record, but as the patch has to be back-patched, we can't bump the WAL page magic. This means that standbys have to be updated before primaries; otherwise "PANIC: standby_redo: unknown op code 32" errors can be encountered. XXX: Reported-By: Васильев Дмитрий, Masahiko Sawada Discussion: CAB-SwXY6oH=9twBkXJtgR4UC1NqT-vpYAtxCseME62ADwyK5OA@mail.gmail.com CAD21AoDpZ6Xjg=gFrGPnSn4oTRRcwK1EBrWCq9OqOHuAcMMC=w@mail.gmail.com
* Impose a full barrier in generic-xlc.h atomics functions.Noah Misch2016-04-26
| | | | | | | | | | | | pg_atomic_compare_exchange_*_impl() were providing only the semantics of an acquire barrier. Buildfarm members hornet and mandrill revealed this deficit beginning with commit 008608b9d51061b1f598c197477b3dc7be9c4a64. While we have no report of symptoms in 9.5, we can't rule out the possibility of certain compilers, hardware, or extension code relying on these functions' specified barrier semantics. Back-patch to 9.5, where commit b64d92f1a5602c55ee8b27a7ac474f03b7aee340 introduced atomics. Reviewed by Andres Freund.
* Improve TranslateSocketError() to handle more Windows error codes.Tom Lane2016-04-21
| | | | | | The coverage was rather lean for cases that bind() or listen() might return. Add entries for everything that there's a direct equivalent for in the set of Unix errnos that elog.c has heard of.
* Remove dead code in win32.h.Tom Lane2016-04-21
| | | | | | | | | | | There's no longer a need for the MSVC-version-specific code stanza that forcibly redefines errno code symbols, because since commit 73838b52 we're unconditionally redefining them in the stanza before this one anyway. Now it's merely confusing and ugly, so get rid of it; and improve the comment that explains what's going on here. Although this is just cosmetic, back-patch anyway since I'm intending to back-patch some less-cosmetic changes in this same hunk of code.
* PGDLLIMPORT-ify old_snapshot_threshold.Tom Lane2016-04-21
| | | | | | | Revert commit 7cb1db1d9599f0a09d6920d2149d956ef6d88b0e, which represented a misunderstanding of the problem (if snapmgr.h weren't already included in bufmgr.h, things wouldn't compile anywhere). Instead install what I think is the real fix.
* Comment improvements for ForeignPath.Robert Haas2016-04-21
| | | | | | It's not necessarily just scanning a base relation any more. Amit Langote and Etsuro Fujita
* Inline initial comparisons in TestForOldSnapshot()Kevin Grittner2016-04-21
| | | | | | | | | | | | Even with old_snapshot_threshold = -1 (which disables the "snapshot too old" feature), performance regressions were seen at moderate to high concurrency. For example, a one-socket, four-core system running 200 connections at saturation could see up to a 2.3% regression, with larger regressions possible on NUMA machines. By inlining the early (smaller, faster) tests in the TestForOldSnapshot() function, the i7 case dropped to a 0.2% regression, which could easily just be noise, and is clearly an improvement. Further testing will show whether more is needed.
* Fix memory leak and other bugs in ginPlaceToPage() & subroutines.Tom Lane2016-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and its subroutines in gindatapage.c and ginentrypage.c into a royal mess: page-update critical sections were started in one place and finished in another place not even in the same file, and the very same subroutine might return having started a critical section or not. Subsequent patches band-aided over some of the problems with this design by making things even messier. One user-visible resulting problem is memory leaks caused by the need for the subroutines to allocate storage that would survive until ginPlaceToPage calls XLogInsert (as reported by Julien Rouhaud). This would not typically be noticeable during retail index updates. It could be visible in a GIN index build, in the form of memory consumption swelling to several times the commanded maintenance_work_mem. Another rather nasty problem is that in the internal-page-splitting code path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before entering the critical section that it's supposed to be cleared in; a failure in between would leave the index in a corrupt state. There were also assorted coding-rule violations with little immediate consequence but possible long-term hazards, such as beginning an XLogInsert sequence before entering a critical section, or calling elog(DEBUG) inside a critical section. To fix, redefine the API between ginPlaceToPage() and its subroutines by splitting the subroutines into two parts. The "beginPlaceToPage" subroutine does what can be done outside a critical section, including full computation of the result pages into temporary storage when we're going to split the target page. The "execPlaceToPage" subroutine is called within a critical section established by ginPlaceToPage(), and it handles the actual page update in the non-split code path. The critical section, as well as the XLOG insertion call sequence, are both now always started and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and work in a short-lived memory context to eliminate the leakage problem. (Since a short-lived memory context had been getting created in the most common code path in the subroutines, this shouldn't cause any noticeable performance penalty; we're just moving the overhead up one call level.) In passing, fix a bunch of comments that had gone unmaintained throughout all this klugery. Report: <571276DD.5050303@dalibo.com>
* Revert no-op changes to BufferGetPage()Kevin Grittner2016-04-20
| | | | | | | | | | | | | | | | | | The reverted changes were intended to force a choice of whether any newly-added BufferGetPage() calls needed to be accompanied by a test of the snapshot age, to support the "snapshot too old" feature. Such an accompanying test is needed in about 7% of the cases, where the page is being used as part of a scan rather than positioning for other purposes (such as DML or vacuuming). The additional effort required for back-patching, and the doubt whether the intended benefit would really be there, have indicated it is best just to rely on developers to do the right thing based on comments and existing usage, as we do with many other conventions. This change should have little or no effect on generated executable code. Motivated by the back-patching pain of Tom Lane and Robert Haas
* Further reduce the number of semaphores used under --disable-spinlocks.Tom Lane2016-04-18
| | | | | | | | | | Per discussion, there doesn't seem to be much value in having NUM_SPINLOCK_SEMAPHORES set to 1024: under any scenario where you are running more than a few backends concurrently, you really had better have a real spinlock implementation if you want tolerable performance. And 1024 semaphores is a sizable fraction of the system-wide SysV semaphore limit on many platforms. Therefore, reduce this setting's default value to 128 to make it less likely to cause out-of-semaphores problems.
* Tweak EXPLAIN for parallel query to show workers launched.Robert Haas2016-04-15
| | | | | | | | | The previous display was sort of confusing, because it didn't distinguish between the number of workers that we planned to launch and the number that actually got launched. This has already confused several people, so display both numbers and label them clearly. Julien Rouhaud, reviewed by me.
* Fix portability problem induced by commit a6f6b7819.Tom Lane2016-04-15
| | | | | | | | pg_xlogdump includes bufmgr.h. With a compiler that emits code for static inline functions even when they're unreferenced, that leads to unresolved external references in the new static-inline version of BufferGetPage(). So hide it with #ifndef FRONTEND, as we've done for similar issues elsewhere. Per buildfarm member pademelon.
* Make init_spin_delay() C89 compliant #2.Andres Freund2016-04-14
| | | | | | | | | | | My previous attempt at doing so, in 80abbeba23, was not sufficient. While that fixed the problem for bufmgr.c and lwlock.c , s_lock.c still has non-constant expressions in the struct initializer, because the file/line/function information comes from the caller of s_lock(). Give up on using a macro, and use a static inline instead. Discussion: 4369.1460435533@sss.pgh.pa.us