aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Widen amount-to-flush arguments of FileWriteback and callers.Tom Lane2016-04-13
| | | | | | It's silly to define these counts as narrower than they might someday need to be. Also, I believe that the BLCKSZ * nflush calculation in mdwriteback was capable of overflowing an int.
* Fix assorted portability issues with using msync() for data flushing.Tom Lane2016-04-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 428b1d6b29ca599c5700d4bc4f4ce4c5880369bf introduced the use of msync() for flushing dirty data from the kernel's file buffers. Several portability issues were overlooked, though: * Not all implementations of mmap() think that nbytes == 0 means "map the whole file". To fix, use lseek() to find out the true length. Fix callers of pg_flush_data to be aware that nbytes == 0 may result in trashing the file's seek position. * Not all implementations of mmap() will accept partial-page mmap requests. To fix, round down the length request to whatever sysconf() says the page size is. (I think this is OK from a portability standpoint, because sysconf() is required by SUS v2, and we aren't trying to compile this part on Windows anyway. Buildfarm should let us know if not.) * On 32-bit machines, the file size might exceed the available free address space, or even exceed what will fit in size_t. Check for the latter explicitly to avoid passing a false request size to mmap(). If mmap fails, silently fall through to the next implementation method, rather than bleating to the postmaster log and giving up. * mmap'ing directories fails on some platforms, and even if it works, msync'ing the directory is quite unlikely to help, as for that matter are the other flush implementations. In pre_sync_fname(), just skip flush attempts on directories. In passing, copy-edit the comments a bit. Stas Kelvich and myself
* Use PG_INT32_MIN instead of reiterating the constant.Robert Haas2016-04-13
| | | | | | Makes no difference, but it's cleaner this way. Michael Paquier
* Provide errno-translation wrappers around bind() and listen() on Windows.Tom Lane2016-04-12
| | | | | | | | | | | | I've seen one too many "could not bind IPv4 socket: No error" log entries from the Windows buildfarm members. Per previous discussion, this is likely caused by the fact that we're doing nothing to translate WSAGetLastError() to errno. Put in a wrapper layer to do that. If this works as expected, it should get back-patched, but let's see what happens in the buildfarm first. Discussion: <4065.1452450340@sss.pgh.pa.us>
* Fix costing for parallel aggregation.Robert Haas2016-04-12
| | | | | | | | The original patch kind of ignored the fact that we were doing something different from a costing point of view, but nobody noticed. This patch fixes that oversight. David Rowley
* Remove unused function GetOldestWALSendPointer from walsender code.Fujii Masao2016-04-13
| | | | | | | | | That unused function was introduced as a sample because synchronous replication or replication monitoring tools might need it in the future. Recently commit 989be08 added the function SyncRepGetOldestSyncRecPtr which provides almost the same functionality for multiple synchronous standbys feature. So it's time to remove that unused sample function. This commit does that.
* Redefine create_upper_paths_hook as being invoked once per upper relation.Tom Lane2016-04-12
| | | | | | Per discussion, this gives potential users of the hook more flexibility, because they can build custom Paths that implement only one stage of upper processing atop core-provided Paths for earlier stages.
* Improve coding of column-name parsing in psql's new crosstabview.c.Tom Lane2016-04-12
| | | | | | | | | | Coverity complained about this code, not without reason because it was rather messy. Adjust it to not scribble on the passed string; that adds one malloc/free cycle per column name, which is going to be insignificant in context. We can actually const-ify both the string argument and the PGresult. Daniel Verité, with some further cleanup by me
* Avoid extra locks in GetSnapshotData if old_snapshot_threshold < 0Kevin Grittner2016-04-12
| | | | | | | | | | | | On a big NUMA machine with 1000 connections in saturation load there was a performance regression due to spinlock contention, for acquiring values which were never used. Just fill with dummy values if we're not going to use them. This patch has not been benchmarked yet on a big NUMA machine, but it seems like a good idea on general principle, and it seemed to prevent an apparent 2.2% regression on a single-socket i7 box running 200 connections at saturation load.
* Improve API of GenericXLogRegister().Tom Lane2016-04-12
| | | | | | | | | | Rename this function to GenericXLogRegisterBuffer() to make it clearer what it does, and leave room for other sorts of "register" actions in future. Also, replace its "bool isNew" argument with an integer flags argument, so as to allow adding more flags in future without an API break. Alexander Korotkov, adjusted slightly by me
* In generic WAL application and replay, ensure page "hole" is always zero.Tom Lane2016-04-12
| | | | | | | | | | | The previous coding could allow the contents of the "hole" between pd_lower and pd_upper to diverge during replay from what it had been when the update was originally applied. This would pose a problem if checksums were in use, and in any case would complicate forensic comparisons between master and slave servers. So force the "hole" to contain zeroes, both at initial application of a generically-logged action, and at replay. Alexander Korotkov, adjusted slightly by me
* Remove unnecessary definition of _WIN64 in libpq/win32.mak.Tom Lane2016-04-12
| | | | | | | | | In commit b0e40d189325dc7a54d2546245e766f8c47a7c8d, I should have just removed the /D switch defining WIN64. The reason the code worked before is that all Windows64 compilers automatically predefine _WIN64. Perhaps at one time we had code that depended on WIN64 being defined, but it's long gone, and we should not encourage any reappearance. Per discussion with Christian Ullrich.
* Correct copyright for newly added genericdesc.cStephen Frost2016-04-12
| | | | | | It's 2016 these days (no, not entirely sure how we got here either). Pointed out by Amit Langote
* Fix whitespacePeter Eisentraut2016-04-11
|
* Fix _SPI_execute_plan() for CREATE TABLE IF NOT EXISTS foo AS ...Tom Lane2016-04-11
| | | | | | | | | When IF NOT EXISTS was added to CREATE TABLE AS, this logic didn't get the memo, possibly resulting in an Assert failure. It looks like there would have been no ill effects in a non-Assert build, though. Back-patch to 9.5 where the IF NOT EXISTS option was added. Stas Kelvich
* Fix two places that thought Windows64 is indicated by WIN64 macro.Tom Lane2016-04-11
| | | | | | | | | | | | | | Everyplace else thinks it's _WIN64, so make these places fall in line. The pg_regress.c usage is not going to result in any change in behavior, only suppressing (or not) a compiler warning about downcasting HANDLEs. So there seems no need for back-patching there. The libpq/win32.mak usage might represent an actual bug, if anyone were using this script to build for Windows64, which perhaps nobody is. Given the lack of field complaints, no back-patch here either. pg_regress.c problem found by Christian Ullrich, the other by me.
* Fix freshly-introduced PL/Python portability bug.Tom Lane2016-04-11
| | | | | | | | | | | | | | | It turns out that those PyErr_Clear() calls I removed from plpy_elog.c in 7e3bb080387f4143 et al were not quite as random as they appeared: they mask a Python 2.3.x bug. (Specifically, it turns out that PyType_Ready() can fail if the error indicator is set on entry, and PLy_traceback's fetch of frame.f_code may be the first operation in a session that requires the "frame" type to be readied. Ick.) Put back the clear call, but in a more centralized place closer to what it's protecting, and this time with a comment warning what it's really for. Per buildfarm member prairiedog. Although prairiedog was only failing on HEAD, it seems clearly possible for this to occur in older branches as well, so back-patch to 9.2 the same as the previous patch.
* Use static inline function for BufferGetPage()Kevin Grittner2016-04-11
| | | | | | | | | | | | I was initially concerned that the some of the hundreds of references to BufferGetPage() where the literal BGP_NO_SNAPSHOT_TEST were passed might not optimize as well as a macro, leading to some hard-to-find performance regressions in corner cases. Inspection of disassembled code has shown identical code at all inspected locations, and the size difference doesn't amount to even one byte per such call. So make it readable. Per gripes from Álvaro Herrera and Tom Lane
* Make oldSnapshotControl a pointer to a volatile structureKevin Grittner2016-04-11
| | | | | | | | | It was incorrectly declared as a volatile pointer to a non-volatile structure. Eliminate the OldSnapshotControl struct definition; it is really not needed. Pointed out by Tom Lane. While at it, add OldSnapshotControlData to pgindent's list of structures.
* Fix whitespacePeter Eisentraut2016-04-11
|
* Prefix RLS regression test roles with 'regress_'Stephen Frost2016-04-11
| | | | | | | | To avoid any possible overlap with existing roles on a system when doing a 'make installcheck', use role names which start with 'regress_'. Pointed out by Tom.
* Add directory created during build to gitignorePeter Eisentraut2016-04-11
|
* Fix missing "volatile" in PLy_output().Tom Lane2016-04-11
| | | | | | | | Commit 5c3c3cd0a3046339 plastered "volatile" on a bunch of variables in PLy_output(), but removed the one that actually mattered, ie the one on "oldcontext". This allows some versions of clang to generate code in which "oldcontext" has been trashed when control reaches the PG_CATCH block. Per buildfarm member tick.
* cpluspluscheck: Update include pathPeter Eisentraut2016-04-11
| | | | | Some things in src/include/fe_utils require libpq headers, so add libpq's include path to the command line used here.
* Use ereport(ERROR) instead of Assert() to emit syncrep_parser error.Fujii Masao2016-04-11
| | | | | | | | | | | The existing code would either Assert or generate an invalid SyncRepConfig variable, neither of which is desirable. A regular error should be thrown instead. This commit silences compiler warning in non assertion-enabled builds. Per report from Jeff Janes. Suggested fix by Tom Lane.
* Fix poorly thought-through code from commit 5c3c3cd0a3046339.Tom Lane2016-04-11
| | | | | | | | | | | | | It's not entirely clear to me whether PyString_AsString can return null (looks like the answer might vary between Python 2 and 3). But in any case, this code's attempt to cope with the possibility was quite broken, because pstrdup() neither allows a null argument nor ever returns a null. Moreover, the code below this point assumes that "message" is a palloc'd string, which would not be the case for a dgettext result. Fix both problems by doing the pstrdup step separately.
* pg_dump: add missing "destroyPQExpBuffer(query)" in dumpForeignServer().Tom Lane2016-04-11
| | | | | | | | Coverity complained about this resource leak (why now, I don't know, since it's been like that a long time). Our general policy in pg_dump is that PQExpBuffers are worth cleaning up, so do it here too. But don't bother with a back-patch, because it seems unlikely that very many databases contain enough FOREIGN SERVER objects to notice.
* Add comment about intentional fallthrough in switch.Tom Lane2016-04-10
| | | | | | | Coverity complained about an apparent missing "break" in a switch added by bb140506df605fab. The human-readable comments are pretty clear that this is intentional, but add a standard /* FALL THRU */ comment to make it clear to tools too.
* Clean up foreign-key caching code in planner.Tom Lane2016-04-10
| | | | | | | Coverity complained that the code added by 015e88942aa50f0d lacked an error check for SearchSysCache1 failures, which it should have. But the code was pretty duff in other ways too, including failure to think about whether it could really cope with arrays of different lengths.
* Fix access-to-already-freed-memory issue in plpython's error handling.Tom Lane2016-04-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PLy_elog() could attempt to access strings that Python had already freed, because the strings that PLy_get_spi_error_data() returns are simply pointers into storage associated with the error "val" PyObject. That's fine at the instant PLy_get_spi_error_data() returns them, but just after that PLy_traceback() intentionally releases the only refcount on that object, allowing it to be freed --- so that the strings we pass to ereport() are dangling pointers. In principle this could result in garbage output or a coredump. In practice, I think the risk is pretty low, because there are no Python operations between where we decrement that refcount and where we use the strings (and copy them into PG storage), and thus no reason for Python to recycle the storage. Still, it's clearly hazardous, and it leads to Valgrind complaints when running under a Valgrind that hasn't been lobotomized to ignore Python memory allocations. The code was a mess anyway: we fetched the error data out of Python (clearing Python's error indicator) with PyErr_Fetch, examined it, pushed it back into Python with PyErr_Restore (re-setting the error indicator), then immediately pulled it back out with another PyErr_Fetch. Just to confuse matters even more, there were some gratuitous-and-yet-hazardous PyErr_Clear calls in the "examine" step, and we didn't get around to doing PyErr_NormalizeException until after the second PyErr_Fetch, making it even less clear which object was being manipulated where and whether we still had a refcount on it. (If PyErr_NormalizeException did substitute a different "val" object, it's possible that the problem could manifest for real, because then we'd be doing assorted Python stuff with no refcount on the object we have string pointers into.) So, rearrange all that into some semblance of sanity, and don't decrement the refcount on the Python error objects until the end of PLy_elog(). In HEAD, I failed to resist the temptation to reformat some messy bits from 5c3c3cd0a3046339 along the way. Back-patch as far as 9.2, because the code is substantially the same that far back. I believe that 9.1 has the bug as well; but the code around it is rather different and I don't want to take a chance on breaking something for what seems a low-probability problem.
* Avoid the use of a separate spinlock to protect a LWLock's wait queue.Andres Freund2016-04-10
| | | | | | | | | | | | | | | | Previously we used a spinlock, in adition to the atomically manipulated ->state field, to protect the wait queue. But it's pretty simple to instead perform the locking using a flag in state. Due to 6150a1b0 BufferDescs, on platforms (like PPC) with > 1 byte spinlocks, increased their size above 64byte. As 64 bytes are the size we pad allocated BufferDescs to, this can increase false sharing; causing performance problems in turn. Together with the previous commit this reduces the size to <= 64 bytes on all common platforms. Author: Andres Freund Discussion: CAA4eK1+ZeB8PMwwktf+3bRS0Pt4Ux6Rs6Aom0uip8c6shJWmyg@mail.gmail.com 20160327121858.zrmrjegmji2ymnvr@alap3.anarazel.de
* Allow Pin/UnpinBuffer to operate in a lockfree manner.Andres Freund2016-04-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pinning/Unpinning a buffer is a very frequent operation; especially in read-mostly cache resident workloads. Benchmarking shows that in various scenarios the spinlock protecting a buffer header's state becomes a significant bottleneck. The problem can be reproduced with pgbench -S on larger machines, but can be considerably worse for queries which touch the same buffers over and over at a high frequency (e.g. nested loops over a small inner table). To allow atomic operations to be used, cram BufferDesc's flags, usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable; that allows to manipulate them together using 32bit compare-and-swap operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could be lifted by using a 64bit field, but it's not a realistic configuration atm). As not all operations can easily implemented in a lockfree manner, implement the previous buf_hdr_lock via a flag bit in the atomic variable. That way we can continue to lock the header in places where it's needed, but can get away without acquiring it in the more frequent hot-paths. There's some additional operations which can be done without the lock, but aren't in this patch; but the most important places are covered. As bufmgr.c now essentially re-implements spinlocks, abstract the delay logic from s_lock.c into something more generic. It now has already two users, and more are coming up; there's a follupw patch for lwlock.c at least. This patch is based on a proof-of-concept written by me, which Alexander Korotkov made into a fully working patch; the committed version is again revised by me. Benchmarking and testing has, amongst others, been provided by Dilip Kumar, Alexander Korotkov, Robert Haas. On a large x86 system improvements for readonly pgbench, with a high client count, of a factor of 8 have been observed. Author: Alexander Korotkov and Andres Freund Discussion: 2400449.GjM57CE0Yg@dinodell
* Fix possible NULL dereference in ExecAlterObjectDependsStmtAlvaro Herrera2016-04-10
| | | | | | | | | | I used the wrong variable here. Doesn't make a difference today because the only plausible caller passes a non-NULL variable, but someday it will be wrong, and even today's correctness is subtle: the caller that does pass a NULL is never invoked because of object type constraints. Surely not a condition to rely on. Noted by Coverity
* Further minor improvement in generic_xlog.c: always say REGBUF_STANDARD.Tom Lane2016-04-10
| | | | | | | | | | | Since we're requiring pages handled by generic_xlog.c to be standard format, specify REGBUF_STANDARD when doing a full-page image, so that xloginsert.c can compress out the "hole" between pd_lower and pd_upper. Given the current API in which this path will be taken only for a newly initialized page, the hole is likely to be particularly large in such cases, so that this oversight could easily be performance-significant. I don't notice any particular change in the runtime of contrib/bloom's regression test, though.
* Micro-optimize GenericXLogFinish().Tom Lane2016-04-09
| | | | | | | | | | | | | | | | | | | Make the inner comparison loops of computeDelta() as tight as possible by pulling considerations of valid and invalid ranges out of the inner loops, and extending a match or non-match detection as far as possible before deciding what to do next. To keep this tractable, give up the possibility of merging fragments across the pd_lower to pd_upper gap. The fraction of pages where that could happen (ie, there are 4 or fewer bytes in the gap, *and* data changes immediately adjacent to it on both sides) is too small to be worth spending cycles on. Also, avoid two BLCKSZ-length memcpy()s by computing the delta before moving data into the target buffer, instead of after. This doesn't save nearly as many cycles as being tenser about computeDelta(), but it still seems worth doing. On my machine, this patch cuts a full 40% off the runtime of contrib/bloom's regression test.
* Fix PL/Python ereport() test to work on Python 2.3.Tom Lane2016-04-09
| | | | | | Per buildfarm. Pavel Stehule
* Get rid of GenericXLogUnregister().Tom Lane2016-04-09
| | | | | | | | | | | This routine is unsafe as implemented, because it invalidates the page image pointers returned by previous GenericXLogRegister() calls. Rather than complicate the API or the implementation to avoid that, let's just get rid of it; the use-case for having it seems much too thin to justify a lot of work here. While at it, do some wordsmithing on the SGML docs for generic WAL.
* Code review/prettification for generic_xlog.c.Tom Lane2016-04-09
| | | | | | | | | Improve commentary, use more specific names for the delta fields, const-ify pointer arguments where possible, avoid assuming that initializing only the first element of a local array will guarantee that the remaining elements end up as we need them. (I think that code in generic_redo actually worked, but only because InvalidBuffer is zero; this is a particularly ugly way of depending on that ...)
* Run pgindent on generic_xlog.c.Tom Lane2016-04-09
| | | | | This code desperately needs some micro-optimization, and I'd like it to be formatted a bit more nicely while I work on it.
* Fix typo in C comment.Kevin Grittner2016-04-09
|
* Turn special page pointer validation to static inline functionKevin Grittner2016-04-09
| | | | | Inclusion of multiple macros inside another macro was pushing MSVC past its size liimit. Reported by buildfarm.
* Move \crosstabview regression tests to a separate fileAlvaro Herrera2016-04-08
| | | | | | | It cannot run in the same parallel group as misc, because it creates a table which is unpredictably visible in that test. Per buildfarm member crake.
* Support \crosstabview in psqlAlvaro Herrera2016-04-08
| | | | | | | | | | | | | | | | | | | \crosstabview is a completely different way to display results from a query: instead of a vertical display of rows, the data values are placed in a grid where the column and row headers come from the data itself, similar to a spreadsheet. The sort order of the horizontal header can be specified by using another column in the query, and the vertical header determines its ordering from the order in which they appear in the query. This only allows displaying a single value in each cell. If more than one value correspond to the same cell, an error is thrown. Merging of values can be done in the query itself, if necessary. This may be revisited in the future. Author: Daniel Verité Reviewed-by: Pavel Stehule, Dean Rasheed
* Add snapshot_too_old to NSVC @contrib_excludesKevin Grittner2016-04-08
| | | | | | | | | The buildfarm showed failure for Windows MSVC builds due to this omission. This might not be the only problem with the Makefile for this feature, but hopefully this will get it past the immediate problem. Fix suggested by Tom Lane
* Expose more out/readfuncs support functions.Andres Freund2016-04-08
| | | | | | | | | | | | | | | | Previously bcac23d exposed a subset of support functions, namely the ones Kaigai found useful. In 20160304193704.elq773pyg5fyl3mi@alap3.anarazel.de I mentioned that there's some functions missing to use the facility in an external project. To avoid having to add functions piecemeal, add all the functions which are used to define READ_* and WRITE_* macros; users of the extensible node functionality are likely to need these. Additionally expose outDatum(), which doesn't have it's own WRITE_ macro, as it needs information from the embedding struct. Discussion: 20160304193704.elq773pyg5fyl3mi@alap3.anarazel.de
* Create default rolesStephen Frost2016-04-08
| | | | | | | | | | | | | This creates an initial set of default roles which administrators may use to grant access to, historically, superuser-only functions. Using these roles instead of granting superuser access reduces the number of superuser roles required for a system. Documention for each of the default roles has been added to user-manag.sgml. Bump catversion to 201604082, as we had a commit that bumped it to 201604081 and another that set it back to 201604071... Reviews by José Luis Tallón and Robert Haas
* Reserve the "pg_" namespace for rolesStephen Frost2016-04-08
| | | | | | | | | This will prevent users from creating roles which begin with "pg_" and will check for those roles before allowing an upgrade using pg_upgrade. This will allow for default roles to be provided at initdb time. Reviews by José Luis Tallón and Robert Haas
* Fix improper usage of 'dump' bitmapStephen Frost2016-04-08
| | | | | | Now that 'dump' is a bitmap, we can't simply set it to 'true'. Noticed while debugging the prior issue.
* Add the "snapshot too old" featureKevin Grittner2016-04-08
| | | | | | | | | | | | | | | | This feature is controlled by a new old_snapshot_threshold GUC. A value of -1 disables the feature, and that is the default. The value of 0 is just intended for testing. Above that it is the number of minutes a snapshot can reach before pruning and vacuum are allowed to remove dead tuples which the snapshot would otherwise protect. The xmin associated with a transaction ID does still protect dead tuples. A connection which is using an "old" snapshot does not get an error unless it accesses a page modified recently enough that it might not be able to produce accurate results. This is similar to the Oracle feature, and we use the same SQLSTATE and error message for compatibility.
* Modify BufferGetPage() to prepare for "snapshot too old" featureKevin Grittner2016-04-08
| | | | | | | | | | | This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.