postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	More jsonb cleanup.	Heikki Linnakangas	2014-05-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix JSONB_MAX_ELEMS and JSONB_MAX_PAIRS macros to use CB_MASK in the calculation. JENTRY_POSMASK happens to have the same value at the moment, but that's just coincidental. Refactor jsonb iterator functions, for readability. Get rid of the JENTRY_ISFIRST flag. Whenever we handle JEntrys, we have access to the whole array and have enough context information to know which entry is the first. This frees up one bit in the JEntry header for future use. While we're at it, shuffle the JEntry bits so that boolean true and false go together, for aesthetic reasons. Bump catalog version as this changes the on-disk format slightly.
*	Improve key representation for GIN jsonb_ops, and fix existence-search bug.	Tom Lane	2014-05-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the key representation so that values that would exceed 127 bytes are hashed into short strings, and so that the original JSON datatype of each value is recorded in the index. The hashing rule eliminates the major objection to having this opclass be the default for jsonb, namely that it could fail for plausible input data (due to GIN's restrictions on maximum key length). Preserving datatype information doesn't really buy us much right now, but it requires no extra space compared to the previous way, and it might be useful later. Also, change the consistency-checking functions to request recheck for exists (jsonb ? text) and related operators. The original analysis that this is an exactly checkable query was incorrect, since the index does not preserve information about whether a key appears at top level in the indexed JSON object. Add a test case demonstrating the problem. Make some other, mostly cosmetic improvements to the code in jsonb_gin.c as well. catversion bump due to on-disk data format change in jsonb_ops indexes.
*	Minor cleanup of jsonb_util.c	Heikki Linnakangas	2014-05-09
\| \| \| \| \| \|	Move the functions around to group related functions together. Remove binequal argument from lengthCompareJsonbStringValue, moving that responsibility to lengthCompareJsonbPair. Fix typo in comment.
*	Avoid some pnstrdup()s when constructing jsonb	Heikki Linnakangas	2014-05-09
\| \| \| \| \|	This speeds up text to jsonb parsing and hstore to jsonb conversions somewhat.
*	Fix missing dependencies in ecpg's test Makefiles.	Tom Lane	2014-05-08
\| \| \| \| \| \| \| \| \| \| \| \|	Ensure that ecpg preprocessor output files are rebuilt when re-testing after a change in the ecpg preprocessor itself, or a change in any of several include files that get copied verbatim into the output files. The lack of these dependencies was what created problems for Kevin Grittner after the recent pgindent run. There's no way for --enable-depend to discover these dependencies automatically, so we've gotta put them into the Makefiles by hand. While at it, reduce the amount of duplication in the ecpg invocations.
*	Increase the default value of effective_cache_size to 4GB.	Tom Lane	2014-05-08
\| \| \| \| \| \| \| \| \| \|	Per discussion, the old value of 128MB is ridiculously small on modern machines; in fact, it's not even any larger than the default value of shared_buffers, which it certainly should be. Increase to 4GB, which is unlikely to be any worse than the old default for anyone, and should be noticeably better for most. Eventually we might have an autotuning scheme for this setting, but the recent attempt crashed and burned, so for now just do this.
*	Revert "Auto-tune effective_cache size to be 4x shared buffers"	Tom Lane	2014-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit ee1e5662d8d8330726eaef7d3110cb7add24d058, as well as a remarkably large number of followup commits, which were mostly concerned with the fact that the implementation didn't work terribly well. It still doesn't: we probably need some rather basic work in the GUC infrastructure if we want to fully support GUCs whose default varies depending on the value of another GUC. Meanwhile, it also emerged that there wasn't really consensus in favor of the definition the patch tried to implement (ie, effective_cache_size should default to 4 times shared_buffers). So whack it all back to where it was. In a followup commit, I'll do what was recently agreed to, which is to simply change the default to a higher value.
*	Un-break ecpg test suite under --disable-integer-datetimes.	Noah Misch	2014-05-08
\| \| \| \| \| \| \|	Commit 4318daecc959886d001a6e79c6ea853e8b1dfb4b broke it. The change in sub-second precision at extreme dates is normal. The inconsistent truncation vs. rounding is essentially a bug, albeit a longstanding one. Back-patch to 8.4, like the causative commit.
*	Fix comment.	Tom Lane	2014-05-08
\| \| \| \| \| \| \| \|	Previous commit was confused about the case we're handling: actually, what the patch is dealing with is platforms that have optreset, and have <getopt.h>, but the latter fails to declare the former. Because we use a linking probe to set HAVE_INT_OPTRESET, we need to be sure we have a declaration even if <getopt.h> doesn't think it exists.
*	Allow for platforms that have optreset but not <getopt.h>.	Tom Lane	2014-05-08
\| \| \| \| \| \| \| \| \|	Reportedly, some versions of mingw are like that, and it seems plausible in general that older platforms might be that way. However, we'd determined experimentally that just doing "extern int" conflicts with the way Cygwin declares these variables, so explicitly exclude Cygwin. Michael Paquier, tweaked by me to hopefully not break Cygwin
*	Protect against torn pages when deleting GIN list pages.	Heikki Linnakangas	2014-05-08
\| \| \| \| \| \| \| \| \|	To-be-deleted list pages contain no useful information, as they are being deleted, but we must still protect the writes from being torn by a crash after a partial write. To do that, re-initialize the pages on WAL replay. Jeff Janes caught this with a test program to test partial writes. Backpatch to all supported versions.
*	Include files copied from libpqport in .gitignore	Heikki Linnakangas	2014-05-08
\| \| \| \|	Michael Paquier
*	Avoid buffer bloat in libpq when server is consistently faster than client.	Tom Lane	2014-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the server sends a long stream of data, and the server + network are consistently fast enough to force the recv() loop in pqReadData() to iterate until libpq's input buffer is full, then upon processing the last incomplete message in each bufferload we'd usually double the buffer size, due to supposing that we didn't have enough room in the buffer to finish collecting that message. After filling the newly-enlarged buffer, the cycle repeats, eventually resulting in an out-of-memory situation (which would be reported misleadingly as "lost synchronization with server"). Of course, we should not enlarge the buffer unless we still need room after discarding already-processed messages. This bug dates back quite a long time: pqParseInput3 has had the behavior since perhaps 2003, getCopyDataMessage at least since commit 70066eb1a1ad in 2008. Probably the reason it's not been isolated before is that in common environments the recv() loop would always be faster than the server (if on the same machine) or faster than the network (if not); or at least it wouldn't be slower consistently enough to let the buffer ramp up to a problematic size. The reported cases involve Windows, which perhaps has different timing behavior than other platforms. Per bug #7914 from Shin-ichi Morita, though this is different from his proposed solution. Back-patch to all supported branches.
*	When a background worker exists with code 0, unregister it.	Robert Haas	2014-05-07
\| \| \| \| \| \| \|	The previous behavior was to restart immediately, which was generally viewed as less useful. Petr Jelinek, with some adjustments by me.
*	When a bgworker exits, always call ReleasePostmasterChildSlot.	Robert Haas	2014-05-07
\| \| \| \| \|	Commit e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7 was insufficiently well thought out. Repair.
*	Restart bgworkers immediately after a crash-and-restart cycle.	Robert Haas	2014-05-07
\| \| \| \| \| \| \|	Just as we would start bgworkers immediately after an initial startup of the server, we should restart them immediately when reinitializing. Petr Jelinek and Robert Haas
*	Clean up jsonb code.	Heikki Linnakangas	2014-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main target of this cleanup is the convertJsonb() function, but I also touched a lot of other things that I spotted into in the process. The new convertToJsonb() function uses an output buffer that's resized on demand, so the code to estimate of the size of JsonbValue is removed. The on-disk format was not changed, even though I refactored the structs used to handle it. The term "superheader" is replaced with "container". The jsonb_exists_any and jsonb_exists_all functions no longer sort the input array. That was a premature optimization, the idea being that if there are duplicates in the input array, you only need to check them once. Also, sorting the array saves some effort in the binary search used to find a key within an object. But there were drawbacks too: the sorting and deduplicating obviously isn't free, and in the typical case there are no duplicates to remove, and the gain in the binary search was minimal. Remove all that, which makes the code simpler too. This includes a bug-fix; the total length of the elements in a jsonb array or object mustn't exceed 2^28. That is now checked.
*	Detach shared memory from bgworkers without shmem access.	Robert Haas	2014-05-07
\| \| \| \| \| \| \| \|	Since the postmaster won't perform a crash-and-restart sequence for background workers which don't request shared memory access, we'd better make sure that they can't corrupt shared memory. Patch by me, review by Tom Lane.
*	Fix failure to set ActiveSnapshot while rewinding a cursor.	Tom Lane	2014-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ActiveSnapshot needs to be set when we call ExecutorRewind because some plan node types may execute user-defined functions during their ReScan calls (nodeLimit.c does so, at least). The wisdom of that is somewhat debatable, perhaps, but for now the simplest fix is to make sure the required context is valid. Failure to do this typically led to a null-pointer-dereference core dump, though it's possible that in more complex cases a function could be executed with the wrong snapshot leading to very subtle misbehavior. Per report from Leif Jensen. It's been broken for a long time, so back-patch to all active branches.
*	Never crash-and-restart for bgworkers without shared memory access.	Robert Haas	2014-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivation for a crash and restart cycle when a backend dies is that it might have corrupted shared memory on the way down; and we can't recover reliably except by reinitializing everything. But that doesn't apply to processes that don't touch shared memory. Currently, there's nothing to prevent a background worker that doesn't request shared memory access from touching shared memory anyway, but that's a separate bug. Previous to this commit, the coding in postmaster.c was inconsistent: an exit status other than 0 or 1 didn't provoke a crash-and-restart, but failure to release the postmaster child slot did. This change makes those cases consistent.
*	Fix some more confusion between uint32 and Datum.	Tom Lane	2014-05-06
\|
*	Fix interval test, which was broken for floating-point timestamps.	Jeff Davis	2014-05-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 4318daecc959886d001a6e79c6ea853e8b1dfb4b introduced a test that couldn't be made consistent between integer and floating-point timestamps. It was designed to test the longest possible interval output length, so removing four zeros from the number of hours, as this patch does, is not ideal. But the test still has some utility for its original purpose, and there aren't a lot of other good options. Noah Misch suggested a different approach where we test that the output either matches what we expect from integer timestamps or what we expect from floating-point timestamps. That seemed to obscure an otherwise simple test, however. Reviewed by Tom Lane and Noah Misch.
*	hash_any returns Datum, not uint32 (and definitely not "int").	Tom Lane	2014-05-06
\| \| \| \| \| \| \| \| \| \|	The coding in JsonbHashScalarValue might have accidentally failed to fail given current representational choices, but the key word there would be "accidental". Insert the appropriate datatype conversion macro. And use the right conversion macro for hash_numeric's result, too. In passing make the code a bit cleaner and less repetitive by factoring out the xor step from the switch.
*	Improve comment for tricky aspect of index-only scans.	Jeff Davis	2014-05-06
\| \| \| \| \| \| \| \| \|	Index-only scans avoid taking a lock on the VM buffer, which would cause a lot of contention. To be correct, that requires some intricate assumptions that weren't completely documented in the previous comment. Reviewed by Robert Haas.
*	With ecpg exclusion removed, re-run pgindent for 9.4	Bruce Momjian	2014-05-06
\| \| \| \|	Report by Tom Lane
*	Remove pgindent ecpg exclusion pattern	Bruce Momjian	2014-05-06
\| \| \| \|	Report by Tom Lane
*	pg_basebackup streaming: adjust version check msg	Simon Riggs	2014-05-06
\| \| \| \|	Allow for translatable string, rather than use "or"
*	Improve pgindent test instructions	Bruce Momjian	2014-05-06
\|
*	Fix logic bug in dsm_attach().	Robert Haas	2014-05-06
\| \| \| \| \| \| \|	The previous coding would potentially cause attaching to segment A to fail if segment B was at the same time in the process of going away. Andres Freund, with a comment tweak by me
*	Fix improperly passed file descriptors	Bruce Momjian	2014-05-06
\| \| \| \| \| \|	Fix for commit 14ea89366fe321609afc5838ff9fe2ded1cd707d Report by Andres Freund
*	pgindent run for 9.4	Bruce Momjian	2014-05-06
\| \| \| \| \|	This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
*	Adjust pgindent to remove tabs after periods in C comments.	Bruce Momjian	2014-05-06
\|
*	Fix detection of short tar files, broken by commit ↵	Bruce Momjian	2014-05-06
\| \| \| \| \| \|	14ea89366fe321609afc5838ff9fe2ded1cd707d Report by Noah Misch
*	Correct comment in Hot Standby nbtree handling	Simon Riggs	2014-05-06
\| \| \| \|	Logic is correct, matching handling of LP_DEAD elsewhere.
*	Update typedef list in preparation for pgindent run	Bruce Momjian	2014-05-06
\|
*	pg_basebackup streaming: adjust version check msg	Simon Riggs	2014-05-06
\| \| \| \| \| \| \| \|	Commit d298b50a3b469c088bb40a4d36d38111b4cd574d by Heikki Linnakangas requested that the version check message be updated at next release, suggesting that the appropriate text would be “9.3 or later”. The logic used for the check indicates that the correct text for 9.4 is “9.3 or 9.4”, since the logic would cause this to fail for later releases.
*	Fix use of free in walsender error handling after a sysid mismatch.	Heikki Linnakangas	2014-05-06
\| \| \| \| \| \| \|	Found via valgrind. The bug exists since the introduction of the walsender, so backpatch to 9.0. Andres Freund
*	Fix handling of array of char pointers in ecpglib.	Michael Meskes	2014-05-06
\| \| \| \| \| \| \| \| \| \|	When array of char * was used as target for a FETCH statement returning more than one row, it tried to store all the result in the first element. Instead it should dump array of char pointers with right offset, use the address instead of the value of the C variable while reading the array and treat such variable as char *, instead of char for pointer arithmetic. Patch by Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
*	Properly detect read and write errors in pg_dump/dumpall, and pg_restore	Bruce Momjian	2014-05-05
\| \| \| \|	Previously some I/O errors were ignored.
*	Fix possible cache invalidation failure in ReceiveSharedInvalidMessages.	Tom Lane	2014-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit fad153ec45299bd4d4f29dec8d9e04e2f1c08148 modified sinval.c to reduce the number of calls into sinvaladt.c (which require taking a shared lock) by keeping a local buffer of collected-but-not-yet-processed messages. However, if processing of the last message in a batch resulted in a recursive call to ReceiveSharedInvalidMessages, we could overwrite that message with a new one while the outer invalidation function was still working on it. This would be likely to lead to invalidation of the wrong cache entry, allowing subsequent processing to use stale cache data. The fix is just to make a local copy of each message while we're processing it. Spotted by Andres Freund. Back-patch to 8.4 where the bug was introduced.
*	Fix pg_type.typlen for newly-revived line type.	Tom Lane	2014-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 261c7d4b653bc3e44c31fd456d94f292caa50d8f removed the "m" field from struct LINE, but neglected to make pg_type.h's idea of the type's size match. This resulted in reading past the end of palloc'd LINE values when inserting them into tuples etc. In principle that could cause a SIGSEGV, though the odds of detectable problems seem low. Bump catversion since this makes an incompatible on-disk format change. Note that if the line type had been in use in the field, this would break pg_upgrade'ability of databases containing line values; but it seems unlikely that there are any (they'd have had to be compiled with -DENABLE_LINE_TYPE). Spotted by Andres Freund.
*	Fix case of pg_dump -Fc to an unseekable file (such as a pipe).	Tom Lane	2014-05-05
\| \| \| \| \| \| \| \| \| \| \|	This was accidentally broken in commits cfa1b4a711/5e8e794e3b. It saves a line or so to call ftello unconditionally in _CloseArchive, but we have to expect that it might fail if we're not in hasSeek mode. Per report from Bernd Helmle. In passing, improve _getFilePos to print an appropriate message if ftello fails unexpectedly, rather than just a vague complaint about "ftell mismatch".
*	Pass sensible value to memset() when randomizing reorderbuffer's tuple slab.	Heikki Linnakangas	2014-05-05
\| \| \| \| \| \|	This is entirely harmless, but still wrong. Noticed by coverity. Andres Freund
*	Don't leak memory after connection aborts in pg_recvlogical.	Heikki Linnakangas	2014-05-05
\| \| \| \|	Andres Freund, noticed by coverity.
*	Use Size instead of uint32 to store result of sizeof()	Heikki Linnakangas	2014-05-05
\| \| \| \| \| \| \|	Silences coverity and is more consistent with other functions in the same file. Andres Freund
*	Assert that pre/post-fix updated tuples are on the same page during replay.	Heikki Linnakangas	2014-05-05
\| \| \| \| \| \| \| \| \| \| \|	If they were not 'oldtup.t_data' would be dereferenced while set to NULL in case of a full page image for block 0. Do so primarily to silence coverity; but also to make sure this prerequisite isn't changed without adapting the replay routine as that would appear to work in many cases. Andres Freund
*	Replace SYSTEMQUOTEs with Windows-specific wrapper functions.	Heikki Linnakangas	2014-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's easy to forget using SYSTEMQUOTEs when constructing command strings for system() or popen(). Even if we fix all the places missing it now, it is bound to be forgotten again in the future. Introduce wrapper functions that do the the extra quoting for you, and get rid of SYSTEMQUOTEs in all the callers. We previosly used SYSTEMQUOTEs in all the hard-coded command strings, and this doesn't change the behavior of those. But user-supplied commands, like archive_command, restore_command, COPY TO/FROM PROGRAM calls, as well as pgbench's \shell, will now gain an extra pair of quotes. That is desirable, but if you have existing scripts or config files that include an extra pair of quotes, those might need to be adjusted. Reviewed by Amit Kapila and Tom Lane
*	Fix yet another corner case in dumping rules/views with USING clauses.	Tom Lane	2014-05-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ruleutils.c tries to cope with additions/deletions/renamings of columns in tables referenced by views, by means of adding machine-generated aliases to the printed form of a view when needed to preserve the original semantics. A recent blog post by Marko Tiikkaja pointed out a case I'd missed though: if one input of a join with USING is itself a join, there is nothing to stop the user from adding a column of the same name as the USING column to whichever side of the sub-join didn't provide the USING column. And then there'll be an error when the view is re-parsed, since now the sub-join exposes two columns matching the USING specification. We were catching a lot of related cases, but not this one, so add some logic to cope with it. Back-patch to 9.3, which is the first release that makes any serious attempt to cope with such cases (cf commit 2ffa740be and follow-ons).
*	Fix failure to detoast fields in composite elements of structured types.	Tom Lane	2014-05-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have an array of records stored on disk, the individual record fields cannot contain out-of-line TOAST pointers: the tuptoaster.c mechanisms are only prepared to deal with TOAST pointers appearing in top-level fields of a stored row. The same applies for ranges over composite types, nested composites, etc. However, the existing code only took care of expanding sub-field TOAST pointers for the case of nested composites, not for other structured types containing composites. For example, given a command such as UPDATE tab SET arraycol = ARRAY[(ROW(x,42)::mycompositetype] ... where x is a direct reference to a field of an on-disk tuple, if that field is long enough to be toasted out-of-line then the TOAST pointer would be inserted as-is into the array column. If the source record for x is later deleted, the array field value would become a dangling pointer, leading to errors along the line of "missing chunk number 0 for toast value ..." when the value is referenced. A reproducible test case for this was provided by Jan Pecek, but it seems likely that some of the "missing chunk number" reports we've heard in the past were caused by similar issues. Code-wise, the problem is that PG_DETOAST_DATUM() is not adequate to produce a self-contained Datum value if the Datum is of composite type. Seen in this light, the problem is not just confined to arrays and ranges, but could also affect some other places where detoasting is done in that way, for example form_index_tuple(). I tried teaching the array code to apply toast_flatten_tuple_attribute() along with PG_DETOAST_DATUM() when the array element type is composite, but this was messy and imposed extra cache lookup costs whether or not any TOAST pointers were present, indeed sometimes when the array element type isn't even composite (since sometimes it takes a typcache lookup to find that out). The idea of extending that approach to all the places that currently use PG_DETOAST_DATUM() wasn't attractive at all. This patch instead solves the problem by decreeing that composite Datum values must not contain any out-of-line TOAST pointers in the first place; that is, we expand out-of-line fields at the point of constructing a composite Datum, not at the point where we're about to insert it into a larger tuple. This rule is applied only to true composite Datums, not to tuples that are being passed around the system as tuples, so it's not as invasive as it might sound at first. With this approach, the amount of code that has to be touched for a full solution is greatly reduced, and added cache lookup costs are avoided except when there actually is a TOAST pointer that needs to be inlined. The main drawback of this approach is that we might sometimes dereference a TOAST pointer that will never actually be used by the query, imposing a rather large cost that wasn't there before. On the other side of the coin, if the field value is used multiple times then we'll come out ahead by avoiding repeat detoastings. Experimentation suggests that common SQL coding patterns are unaffected either way, though. Applications that are very negatively affected could be advised to modify their code to not fetch columns they won't be using. In future, we might consider reverting this solution in favor of detoasting only at the point where data is about to be stored to disk, using some method that can drill down into multiple levels of nested structured types. That will require defining new APIs for structured types, though, so it doesn't seem feasible as a back-patchable fix. Note that this patch changes HeapTupleGetDatum() from a macro to a function call; this means that any third-party code using that macro will not get protection against creating TOAST-pointer-containing Datums until it's recompiled. The same applies to any uses of PG_RETURN_HEAPTUPLEHEADER(). It seems likely that this is not a big problem in practice: most of the tuple-returning functions in core and contrib produce outputs that could not possibly be toasted anyway, and the same probably holds for third-party extensions. This bug has existed since TOAST was invented, so back-patch to all supported branches.
*	Improve error messages in reorderbuffer.c.	Tom Lane	2014-04-30
\| \| \| \| \| \| \| \|	Be more clear about failure cases in relfilenode->relation lookup, and fix some other places that were inconsistent or not per our message style guidelines. Andres Freund and Tom Lane