aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
Commit message (Collapse)AuthorAge
* Avoid testing tuple visibility without buffer lock in RI_FKey_check().Tom Lane2016-10-23
| | | | | | | | | | | | | | | | Despite the argumentation I wrote in commit 7a2fe85b0, it's unsafe to do this, because in corner cases it's possible for HeapTupleSatisfiesSelf to try to set hint bits on the target tuple; and at least since 8.2 we have required the buffer content lock to be held while setting hint bits. The added regression test exercises one such corner case. Unpatched, it causes an assertion failure in assert-enabled builds, or otherwise would cause a hint bit change in a buffer we don't hold lock on, which given the right race condition could result in checksum failures or other data consistency problems. The odds of a problem in the field are probably pretty small, but nonetheless back-patch to all supported branches. Report: <19391.1477244876@sss.pgh.pa.us>
* Suppress "Factory" zone in pg_timezone_names view for tzdata >= 2016g.Tom Lane2016-10-19
| | | | | | IANA got rid of the really silly "abbreviation" and replaced it with one that's only moderately silly. But it's still pointless, so keep on not showing it.
* Fix cidin() to handle values above 2^31 platform-independently.Tom Lane2016-10-18
| | | | | | | | | | | | | | | | | | | | | | CommandId is declared as uint32, and values up to 4G are indeed legal. cidout() handles them properly by treating the value as unsigned int. But cidin() was just using atoi(), which has platform-dependent behavior for values outside the range of signed int, as reported by Bart Lengkeek in bug #14379. Use strtoul() instead, as xidin() does. In passing, make some purely cosmetic changes to make xidin/xidout look more like cidin/cidout; the former didn't have a monopoly on best practice IMO. Neither xidin nor cidin make any attempt to throw error for invalid input. I didn't change that here, and am not sure it's worth worrying about since neither is really a user-facing type. The point is just to ensure that indubitably-valid inputs work as expected. It's been like this for a long time, so back-patch to all supported branches. Report: <20161018152550.1413.6439@wrigleys.postgresql.org>
* Fix assorted integer-overflow hazards in varbit.c.Tom Lane2016-10-14
| | | | | | | | | | | | | | | | | bitshiftright() and bitshiftleft() would recursively call each other infinitely if the user passed INT_MIN for the shift amount, due to integer overflow in negating the shift amount. To fix, clamp to -VARBITMAXLEN. That doesn't change the results since any shift distance larger than the input bit string's length produces an all-zeroes result. Also fix some places that seemed inadequately paranoid about input typmods exceeding VARBITMAXLEN. While a typmod accepted by anybit_typmodin() will certainly be much less than that, at least some of these spots are reachable with user-chosen integer values. Andreas Seltenreich and Tom Lane Discussion: <87d1j2zqtz.fsf@credativ.de>
* Fix miserable coding in pg_stat_get_activity().Tom Lane2016-09-10
| | | | | | | | | | | | | | | | | | | | | | | | Commit dd1a3bccc replaced a test on whether a subroutine returned a null pointer with a test on whether &pointer->backendStatus was null. This accidentally failed to fail, at least on common compilers, because backendStatus is the first field in the struct; but it was surely trouble waiting to happen. Commit f91feba87 then messed things up further, changing the logic to local_beentry = pgstat_fetch_stat_local_beentry(curr_backend); if (!local_beentry) continue; beentry = &local_beentry->backendStatus; if (!beentry) { where the second "if" is now dead code, so that the intended behavior of printing a row with "<backend information not available>" cannot occur. I suspect this is all moot because pgstat_fetch_stat_local_beentry will never actually return null in this function's usage, but it's still very poor coding. Repair back to 9.4 where the original problem was introduced.
* Don't require dynamic timezone abbreviations to match underlying time zone.Tom Lane2016-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we threw an error if a dynamic timezone abbreviation did not match any abbreviation recorded in the referenced IANA time zone entry. That seemed like a good consistency check at the time, but it turns out that a number of the abbreviations in the IANA database are things that Olson and crew made up out of whole cloth. Their current policy is to remove such names in favor of using simple numeric offsets. Perhaps unsurprisingly, a lot of these made-up abbreviations have varied in meaning over time, which meant that our commit b2cbced9e and later changes made them into dynamic abbreviations. So with newer IANA database versions that don't mention these abbreviations at all, we fail, as reported in bug #14307 from Neil Anderson. It's worse than just a few unused-in-the-wild abbreviations not working, because the pg_timezone_abbrevs view stops working altogether (since its underlying function tries to compute the whole view result in one call). We considered deleting these abbreviations from our abbreviations list, but the problem with that is that we can't stay ahead of possible future IANA changes. Instead, let's leave the abbreviations list alone, and treat any "orphaned" dynamic abbreviation as just meaning the referenced time zone. It will behave a bit differently than it used to, in that you can't any longer override the zone's standard vs. daylight rule by using the "wrong" abbreviation of a pair, but that's better than failing entirely. (Also, this solution can be interpreted as adding a small new feature, which is that any abbreviation a user wants can be defined as referencing a time zone name.) Back-patch to all supported branches, since this problem affects all of them when using tzdata 2016f or newer. Report: <20160902031551.15674.67337@wrigleys.postgresql.org> Discussion: <6189.1472820913@sss.pgh.pa.us>
* Remove bogus dependencies on NUMERIC_MAX_PRECISION.Tom Lane2016-08-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NUMERIC_MAX_PRECISION is a purely arbitrary constraint on the precision and scale you can write in a numeric typmod. It might once have had something to do with the allowed range of a typmod-less numeric value, but at least since 9.1 we've allowed, and documented that we allowed, any value that would physically fit in the numeric storage format; which is something over 100000 decimal digits, not 1000. Hence, get rid of numeric_in()'s use of NUMERIC_MAX_PRECISION as a limit on the allowed range of the exponent in scientific-format input. That was especially silly in view of the fact that you can enter larger numbers as long as you don't use 'e' to do it. Just constrain the value enough to avoid localized overflow, and let make_result be the final arbiter of what is too large. Likewise adjust ecpg's equivalent of this code. Also get rid of numeric_recv()'s use of NUMERIC_MAX_PRECISION to limit the number of base-NBASE digits it would accept. That created a dump/restore hazard for binary COPY without doing anything useful; the wire-format limit on number of digits (65535) is about as tight as we would want. In HEAD, also get rid of pg_size_bytes()'s unnecessary intimacy with what the numeric range limit is. That code doesn't exist in the back branches. Per gripe from Aravind Kumar. Back-patch to all supported branches, since they all contain the documentation claim about allowed range of NUMERIC (cf commit cabf5d84b). Discussion: <2895.1471195721@sss.pgh.pa.us>
* Fix several one-byte buffer over-reads in to_numberPeter Eisentraut2016-08-08
| | | | | | | | | | | | | | | | | | | | | | | | | Several places in NUM_numpart_from_char(), which is called from the SQL function to_number(text, text), could accidentally read one byte past the end of the input buffer (which comes from the input text datum and is not null-terminated). 1. One leading space character would be skipped, but there was no check that the input was at least one byte long. This does not happen in practice, but for defensiveness, add a check anyway. 2. Commit 4a3a1e2cf apparently accidentally doubled that code that skips one space character (so that two spaces might be skipped), but there was no overflow check before skipping the second byte. Fix by removing that duplicate code. 3. A logic error would allow a one-byte over-read when looking for a trailing sign (S) placeholder. In each case, the extra byte cannot be read out directly, but looking at it might cause a crash. The third item was discovered by Piotr Stefaniak, the first two were found and analyzed by Tom Lane and Peter Eisentraut.
* Fix misestimation of n_distinct for a nearly-unique column with many nulls.Tom Lane2016-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ANALYZE found no repeated non-null entries in its sample, it set the column's stadistinct value to -1.0, intending to indicate that the entries are all distinct. But what this value actually means is that the number of distinct values is 100% of the table's rowcount, and thus it was overestimating the number of distinct values by however many nulls there are. This could lead to very poor selectivity estimates, as for example in a recent report from Andreas Joseph Krogh. We should discount the stadistinct value by whatever we've estimated the nulls fraction to be. (That is what will happen if we choose to use a negative stadistinct for a column that does have repeated entries, so this code path was just inconsistent.) In addition to fixing the stadistinct entries stored by several different ANALYZE code paths, adjust the logic where get_variable_numdistinct() forces an "all distinct" estimate on the basis of finding a relevant unique index. Unique indexes don't reject nulls, so there's no reason to assume that the null fraction doesn't apply. Back-patch to all supported branches. Back-patching is a bit of a judgment call, but this problem seems to affect only a few users (else we'd have identified it long ago), and it's bad enough when it does happen that destabilizing plan choices in a worse direction seems unlikely. Patch by me, with documentation wording suggested by Dean Rasheed Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena> Discussion: <16143.1470350371@sss.pgh.pa.us>
* Fix assorted fallout from IS [NOT] NULL patch.Tom Lane2016-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commits 4452000f3 et al established semantics for NullTest.argisrow that are a bit different from its initial conception: rather than being merely a cache of whether we've determined the input to have composite type, the flag now has the further meaning that we should apply field-by-field testing as per the standard's definition of IS [NOT] NULL. If argisrow is false and yet the input has composite type, the construct instead has the semantics of IS [NOT] DISTINCT FROM NULL. Update the comments in primnodes.h to clarify this, and fix ruleutils.c and deparse.c to print such cases correctly. In the case of ruleutils.c, this merely results in cosmetic changes in EXPLAIN output, since the case can't currently arise in stored rules. However, it represents a live bug for deparse.c, which would formerly have sent a remote query that had semantics different from the local behavior. (From the user's standpoint, this means that testing a remote nested-composite column for null-ness could have had unexpected recursive behavior much like that fixed in 4452000f3.) In a related but somewhat independent fix, make plancat.c set argisrow to false in all NullTest expressions constructed to represent "attnotnull" constructs. Since attnotnull is actually enforced as a simple null-value check, this is a more accurate representation of the semantics; we were previously overpromising what it meant for composite columns, which might possibly lead to incorrect planner optimizations. (It seems that what the SQL spec expects a NOT NULL constraint to mean is an IS NOT NULL test, so arguably we are violating the spec and should fix attnotnull to do the other thing. If we ever do, this part should get reverted.) Back-patch, same as the previous commit. Discussion: <10682.1469566308@sss.pgh.pa.us>
* Fix crash in close_ps() for NaN input coordinates.Tom Lane2016-07-16
| | | | | | | | | | The Assert() here seems unreasonably optimistic. Andreas Seltenreich found that it could fail with NaNs in the input geometries, and it seems likely to me that it might fail in corner cases due to roundoff error, even for ordinary input values. As a band-aid, make the function return SQL NULL instead of crashing. Report: <87d1md1xji.fsf@credativ.de>
* Fix GiST index build for NaN values in geometric types.Tom Lane2016-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GiST index build could go into an infinite loop when presented with boxes (or points, circles or polygons) containing NaN component values. This happened essentially because the code assumed that x == x is true for any "double" value x; but it's not true for NaNs. The looping behavior was not the only problem though: we also attempted to sort the items using simple double comparisons. Since NaNs violate the trichotomy law, qsort could (in principle at least) get arbitrarily confused and mess up the sorting of ordinary values as well as NaNs. And we based splitting choices on box size calculations that could produce NaNs, again resulting in undesirable behavior. To fix, replace all comparisons of doubles in this logic with float8_cmp_internal, which is NaN-aware and is careful to sort NaNs consistently, higher than any non-NaN. Also rearrange the box size calculation to not produce NaNs; instead it should produce an infinity for a box with NaN on one side and not-NaN on the other. I don't by any means claim that this solves all problems with NaNs in geometric values, but it should at least make GiST index insertion work reliably with such data. It's likely that the index search side of things still needs some work, and probably regular geometric operations too. But with this patch we're laying down a convention for how such cases ought to behave. Per bug #14238 from Guang-Dih Lei. Back-patch to 9.2; the code used before commit 7f3bd86843e5aad8 is quite different and doesn't lock up on my simple test case, nor on the submitter's dataset. Report: <20160708151747.1426.60150@wrigleys.postgresql.org> Discussion: <28685.1468246504@sss.pgh.pa.us>
* Be more paranoid in ruleutils.c's get_variable().Tom Lane2016-07-01
| | | | | | | | | | | | | | | | | | | We were merely Assert'ing that the Var matched the RTE it's supposedly from. But if the user passes incorrect information to pg_get_expr(), the RTE might in fact not match; this led either to Assert failures or core dumps, as reported by Chris Hanks in bug #14220. To fix, just convert the Asserts to test-and-elog. Adjust an existing test-and-elog elsewhere in the same function to be consistent in wording. (If we really felt these were user-facing errors, we might promote them to ereport's; but I can't convince myself that they're worth translating.) Back-patch to 9.3; the problematic code doesn't exist before that, and a quick check says that 9.2 doesn't crash on such cases. Michael Paquier and Thomas Munro Report: <20160629224349.1407.32667@wrigleys.postgresql.org>
* Fix validation of overly-long IPv6 addresses.Tom Lane2016-06-16
| | | | | | | | | The inet/cidr types sometimes failed to reject IPv6 inputs with too many colon-separated fields, instead translating them to '::/0'. This is the result of a thinko in the original ISC code that seems to be as yet unreported elsewhere. Per bug #14198 from Stefan Kaltenbrunner. Report: <20160616182222.5798.959@wrigleys.postgresql.org>
* Fix assorted missing infrastructure for ON CONFLICT.Tom Lane2016-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | subquery_planner() failed to apply expression preprocessing to the arbiterElems and arbiterWhere fields of an OnConflictExpr. No doubt the theory was that this wasn't necessary because we don't actually try to execute those expressions; but that's wrong, because it results in failure to match to index expressions or index predicates that are changed at all by preprocessing. Per bug #14132 from Reynold Smith. Also add pullup_replace_vars processing for onConflictWhere. Perhaps it's impossible to have a subquery reference there, but I'm not exactly convinced; and even if true today it's a failure waiting to happen. Also add some comments to other places where one or another field of OnConflictExpr is intentionally ignored, with explanation as to why it's okay to do so. Also, catalog/dependency.c failed to record any dependency on the named constraint in ON CONFLICT ON CONSTRAINT, allowing such a constraint to be dropped while rules exist that depend on it, and allowing pg_dump to dump such a rule before the constraint it refers to. The normal execution path managed to error out reasonably for a dangling constraint reference, but ruleutils.c dumped core; so in addition to fixing the omission, add a protective check in ruleutils.c, since we can't retroactively add a dependency in existing databases. Back-patch to 9.5 where this code was introduced. Report: <20160510190350.2608.48667@wrigleys.postgresql.org>
* Fix possible read past end of string in to_timestamp().Tom Lane2016-05-06
| | | | | | | | | | | | | | | | | | | | | | | to_timestamp() handles the TH/th format codes by advancing over two input characters, whatever those are. It failed to notice whether there were two characters available to be skipped, making it possible to advance the pointer past the end of the input string and keep on parsing. A similar risk existed in the handling of "Y,YYY" format: it would advance over three characters after the "," whether or not three characters were available. In principle this might be exploitable to disclose contents of server memory. But the security team concluded that it would be very hard to use that way, because the parsing loop would stop upon hitting any zero byte, and TH/th format codes can't be consecutive --- they have to follow some other format code, which would have to match whatever data is there. So it seems impractical to examine memory very much beyond the end of the input string via this bug; and the input string will always be in local memory not in disk buffers, making it unlikely that anything very interesting is close to it in a predictable way. So this doesn't quite rise to the level of needing a CVE. Thanks to Wolf Roediger for reporting this bug.
* Rename strtoi() to strtoint().Tom Lane2016-04-23
| | | | | | | | | | | | NetBSD has seen fit to invent a libc function named strtoi(), which conflicts with the long-established static functions of the same name in datetime.c and ecpg's interval.c. While muttering darkly about intrusions on application namespace, we'll rename our functions to avoid the conflict. Back-patch to all supported branches, since this would affect attempts to build any of them on recent NetBSD. Thomas Munro
* Fix ruleutils.c's dumping of ScalarArrayOpExpr containing an EXPR_SUBLINK.Tom Lane2016-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | When we shoehorned "x op ANY (array)" into the SQL syntax, we created a fundamental ambiguity as to the proper treatment of a sub-SELECT on the righthand side: perhaps what's meant is to compare x against each row of the sub-SELECT's result, or perhaps the sub-SELECT is meant as a scalar sub-SELECT that delivers a single array value whose members should be compared against x. The grammar resolves it as the former case whenever the RHS is a select_with_parens, making the latter case hard to reach --- but you can get at it, with tricks such as attaching a no-op cast to the sub-SELECT. Parse analysis would throw away the no-op cast, leaving a parsetree with an EXPR_SUBLINK SubLink directly under a ScalarArrayOpExpr. ruleutils.c was not clued in on this fine point, and would naively emit "x op ANY ((SELECT ...))", which would be parsed as the first alternative, typically leading to errors like "operator does not exist: text = text[]" during dump/reload of a view or rule containing such a construct. To fix, emit a no-op cast when dumping such a parsetree. This might well be exactly what the user wrote to get the construct accepted in the first place; and even if she got there with some other dodge, it is a valid representation of the parsetree. Per report from Karl Czajkowski. He mentioned only a case involving RLS policies, but actually the problem is very old, so back-patch to all supported branches. Report: <20160421001832.GB7976@moraine.isi.edu>
* Disable abbreviated keys for string-sorting in non-C locales.Robert Haas2016-03-23
| | | | | | | | | | | | | | | | | | | Unfortunately, every version of glibc thus far tested has bugs whereby strcoll() ordering does not match strxfrm() ordering as required by the standard. This can result in, for example, corrupted indexes. Disabling abbreviated keys in these cases slows down non-C-collation string sorting considerably, but there seems to be no practical alternative. Users who are confident that their libc implementations are solid in this regard can re-enable the optimization by compiling with TRUST_STRXFRM. Users who have built indexes using PostgreSQL 9.5 or PostgreSQL 9.5.1 should REINDEX if there is a possibility that they may have been affected by this problem. Report by Marc-Olaf Jaschke. Investigation mostly by Tom Lane, with help from Peter Geoghegan, Noah Misch, Stephen Frost, and me. Patch by me, reviewed by Peter Geoghegan and Tom Lane.
* Code review for error reports in jsonb_set().Tom Lane2016-03-23
| | | | | | User-facing (even tested by regression tests) error conditions were thrown with elog(), hence had wrong SQLSTATE and were untranslatable. And the error message texts weren't up to project style, either.
* Fix unsafe use of strtol() on a non-null-terminated Text datum.Tom Lane2016-03-23
| | | | | | | | | | jsonb_set() could produce wrong answers or incorrect error reports, or in the worst case even crash, when trying to convert a path-array element into an integer for use as an array subscript. Per report from Vitaly Burovoy. Back-patch to 9.5 where the faulty code was introduced (in commit c6947010ceb42143). Michael Paquier
* Fix assorted breakage in to_char()'s OF format option.Tom Lane2016-03-17
| | | | | | | | | | | | | | | | | | In HEAD, fix incorrect field width for hours part of OF when tm_gmtoff is negative. This was introduced by commit 2d87eedc1d4468d3 as a result of falsely applying a pattern that's correct when + signs are omitted, which is not the case for OF. In 9.4, fix missing abs() call that allowed a sign to be attached to the minutes part of OF. This was fixed in 9.5 by 9b43d73b3f9bef27, but for inscrutable reasons not back-patched. In all three versions, ensure that the sign of tm_gmtoff is correctly reported even when the GMT offset is less than 1 hour. Add regression tests, which evidently we desperately need here. Thomas Munro and Tom Lane, per report from David Fetter
* Fix json_to_record() bug with nested objects.Tom Lane2016-03-02
| | | | | | | | | | | | | | | A thinko concerning nesting depth caused json_to_record() to produce bogus output if a field of its input object contained a sub-object with a field name matching one of the requested output column names. Per bug #13996 from Johann Visagie. I added a regression test case based on his example, plus parallel tests for json_to_recordset, jsonb_to_record, jsonb_to_recordset. The latter three do not exhibit the same bug (which suggests that we may be missing some opportunities to share code...) but testing seems like a good idea in any case. Back-patch to 9.4 where these functions were introduced.
* Avoid multiple free_struct_lconv() calls on same data.Tom Lane2016-02-28
| | | | | | | | | | | A failure partway through PGLC_localeconv() led to a situation where the next call would call free_struct_lconv() a second time, leading to free() on already-freed strings, typically leading to a core dump. Add a flag to remember whether we need to do that. Per report from Thom Brown. His example case only provokes the failure as far back as 9.4, but nonetheless this code is obviously broken, so back-patch to all supported branches.
* Fix two-argument jsonb_object when called with empty arraysAndrew Dunstan2016-02-21
| | | | | | | | | | | | | | Some over-eager copy-and-pasting on my part resulted in a nonsense result being returned in this case. I have adopted the same pattern for handling this case as is used in the one argument form of the function, i.e. we just skip over the code that adds values to the object. Diagnosis and patch from Michael Paquier, although not quite his solution. Fixes bug #13936. Backpatch to 9.5 where jsonb_object was introduced.
* Fix deparsing of ON CONFLICT arbiter WHERE clauses.Tom Lane2016-02-07
| | | | | | | | | | | The parser doesn't allow qualification of column names appearing in these clauses, but ruleutils.c would sometimes qualify them, leading to dump/reload failures. Per bug #13891 from Onder Kalaci. (In passing, make stanzas in ruleutils.c that save/restore varprefix more consistent.) Peter Geoghegan
* Fix IsValidJsonNumber() to notice trailing non-alphanumeric garbage.Tom Lane2016-02-03
| | | | | | | | | | | Commit e09996ff8dee3f70 was one brick shy of a load: it didn't insist that the detected JSON number be the whole of the supplied string. This allowed inputs such as "2016-01-01" to be misdetected as valid JSON numbers. Per bug #13906 from Dmitry Ryabov. In passing, be more wary of zero-length input (I'm not sure this can happen given current callers, but better safe than sorry), and do some minor cosmetic cleanup.
* Remove new coupling between NAMEDATALEN and MAX_LEVENSHTEIN_STRLEN.Tom Lane2016-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e529cd4ffa605c6f introduced an Assert requiring NAMEDATALEN to be less than MAX_LEVENSHTEIN_STRLEN, which has been 255 for a long time. Since up to that instant we had always allowed NAMEDATALEN to be substantially more than that, this was ill-advised. It's debatable whether we need MAX_LEVENSHTEIN_STRLEN at all (versus putting a CHECK_FOR_INTERRUPTS into the loop), or whether it has to be so tight; but this patch takes the narrower approach of just not applying the MAX_LEVENSHTEIN_STRLEN limit to calls from the parser. Trusting the parser for this seems reasonable, first because the strings are limited to NAMEDATALEN which is unlikely to be hugely more than 256, and second because the maximum distance is tightly constrained by MAX_FUZZY_DISTANCE (though we'd forgotten to make use of that limit in one place). That means the cost is not really O(mn) but more like O(max(m,n)). Relaxing the limit for user-supplied calls is left for future research; given the lack of complaints to date, it doesn't seem very high priority. In passing, fix confusion between lengths-in-bytes and lengths-in-chars in comments and error messages. Per gripe from Kevin Day; solution suggested by Robert Haas. Back-patch to 9.5 where the unwanted restriction was introduced.
* Remove a useless PG_GETARG_DATUM() call from jsonb_build_array.Tom Lane2016-01-09
| | | | | | | | | | | | | | | | This loop uselessly fetched the argument after the one it's currently looking at. No real harm is done since we couldn't possibly fetch off the end of memory, but it's confusing to the reader. Also remove a duplicate (and therefore confusing) PG_ARGISNULL check in jsonb_build_object. I happened to notice these things while trolling for missed null-arg checks earlier today. Back-patch to 9.5, not because there is any real bug, but just because 9.5 and HEAD are still in sync in this file and we might as well keep them so. In passing, re-pgindent.
* Fix regrole and regnamespace output functions to do quoting, too.Tom Lane2016-01-04
| | | | We discussed this but somehow failed to implement it...
* Fix regrole and regnamespace types to honor quoting like other reg* types.Tom Lane2016-01-04
| | | | | | | | | | | | | | | Aside from any consistency arguments, this is logically necessary because the I/O functions for these types also handle numeric OID values. Without a quoting rule it is impossible to distinguish numeric OIDs from role or namespace names that happen to contain only digits. Also change the to_regrole and to_regnamespace functions to dequote their arguments. While not logically essential, this seems like a good idea since the other to_reg* functions do it. Anyone who really wants raw lookup of an uninterpreted name can fall back on the time-honored solution of (SELECT oid FROM pg_namespace WHERE nspname = whatever). Report and patch by Jim Nasby, reviewed by Michael Paquier
* Guard against null arguments in binary_upgrade_create_empty_extension().Tom Lane2016-01-03
| | | | | | | | | | | | | The CHECK_IS_BINARY_UPGRADE macro is not sufficient security protection if we're going to dereference pass-by-reference arguments before it. But in any case we really need to explicitly check PG_ARGISNULL for all the arguments of a non-strict function, not only the ones we expect null values for. Oversight in commits 30982be4e5019684e1772dd9170aaa53f5a8e894 and f92fc4c95ddcc25978354a8248d3df22269201bc. Found by Andreas Seltenreich. (The other usages in pg_upgrade_support.c seem safe.)
* Teach flatten_reloptions() to quote option values safely.Tom Lane2016-01-01
| | | | | | | | | | | | | | | | | | | | | | | | | | flatten_reloptions() supposed that it didn't really need to do anything beyond inserting commas between reloption array elements. However, in principle the value of a reloption could be nearly anything, since the grammar allows a quoted string there. Any restrictions on it would come from validity checking appropriate to the particular option, if any. A reloption value that isn't a simple identifier or number could thus lead to dump/reload failures due to syntax errors in CREATE statements issued by pg_dump. We've gotten away with not worrying about this so far with the core-supported reloptions, but extensions might allow reloption values that cause trouble, as in bug #13840 from Kouhei Sutou. To fix, split the reloption array elements explicitly, and then convert any value that doesn't look like a safe identifier to a string literal. (The details of the quoting rule could be debated, but this way is safe and requires little code.) While we're at it, also quote reloption names if they're not safe identifiers; that may not be a likely problem in the field, but we might as well try to be bulletproof here. It's been like this for a long time, so back-patch to all supported branches. Kouhei Sutou, adjusted some by me
* Add some more defenses against silly estimates to gincostestimate().Tom Lane2016-01-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A report from Andy Colson showed that gincostestimate() was not being nearly paranoid enough about whether to believe the statistics it finds in the index metapage. The problem is that the metapage stats (other than the pending-pages count) are only updated by VACUUM, and in the worst case could still reflect the index's original empty state even when it has grown to many entries. We attempted to deal with that by scaling up the stats to match the current index size, but if nEntries is zero then scaling it up still gives zero. Moreover, the proportion of pages that are entry pages vs. data pages vs. pending pages is unlikely to be estimated very well by scaling if the index is now orders of magnitude larger than before. We can improve matters by expanding the use of the rule-of-thumb estimates I introduced in commit 7fb008c5ee59b040: if the index has grown by more than a cutoff amount (here set at 4X growth) since VACUUM, then use the rule-of-thumb numbers instead of scaling. This might not be exactly right but it seems much less likely to produce insane estimates. I also improved both the scaling estimate and the rule-of-thumb estimate to account for numPendingPages, since it's reasonable to expect that that is accurate in any case, and certainly pages that are in the pending list are not either entry or data pages. As a somewhat separate issue, adjust the estimation equations that are concerned with extra fetches for partial-match searches. These equations suppose that a fraction partialEntries / numEntries of the entry and data pages will be visited as a consequence of a partial-match search. Now, it's physically impossible for that fraction to exceed one, but our estimate of partialEntries is mostly bunk, and our estimate of numEntries isn't exactly gospel either, so we could arrive at a silly value. In the example presented by Andy we were coming out with a value of 100, leading to insane cost estimates. Clamp the fraction to one to avoid that. Like the previous patch, back-patch to all supported branches; this problem can be demonstrated in one form or another in all of them.
* Add missing CHECK_FOR_INTERRUPTS in lseg_inside_polyAlvaro Herrera2015-12-14
| | | | | | | | | Apparently, there are bugs in this code that cause it to loop endlessly. That bug still needs more research, but in the meantime it's clear that the loop is missing a check for interrupts so that it can be cancelled timely. Backpatch to 9.1 -- this has been missing since 49475aab8d0d.
* Improve some messagesPeter Eisentraut2015-12-10
|
* Make gincostestimate() cope with hypothetical GIN indexes.Tom Lane2015-12-01
| | | | | | | | | | | | | | | | | | | | | | | We tried to fetch statistics data from the index metapage, which does not work if the index isn't actually present. If the index is hypothetical, instead extrapolate some plausible internal statistics based on the index page count provided by the index-advisor plugin. There was already some code in gincostestimate() to invent internal stats in this way, but since it was only meant as a stopgap for pre-9.1 GIN indexes that hadn't been vacuumed since upgrading, it was pretty crude. If we want it to support index advisors, we should try a little harder. A small amount of testing says that it's better to estimate the entry pages as 90% of the index, not 100%. Also, estimating the number of entries (keys) as equal to the heap tuple count could be wildly wrong in either direction. Instead, let's estimate 100 entries per entry page. Perhaps someday somebody will want the index advisor to be able to provide these numbers more directly, but for the moment this should serve. Problem report and initial patch by Julien Rouhaud; modified by me to invent less-bogus internal statistics. Back-patch to all supported branches, since we've supported index advisors since 9.0.
* Fix handling of inherited check constraints in ALTER COLUMN TYPE (again).Tom Lane2015-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous way of reconstructing check constraints was to do a separate "ALTER TABLE ONLY tab ADD CONSTRAINT" for each table in an inheritance hierarchy. However, that way has no hope of reconstructing the check constraints' own inheritance properties correctly, as pointed out in bug #13779 from Jan Dirk Zijlstra. What we should do instead is to do a regular "ALTER TABLE", allowing recursion, at the topmost table that has a particular constraint, and then suppress the work queue entries for inherited instances of the constraint. Annoyingly, we'd tried to fix this behavior before, in commit 5ed6546cf, but we failed to notice that it wasn't reconstructing the pg_constraint field values correctly. As long as I'm touching pg_get_constraintdef_worker anyway, tweak it to always schema-qualify the target table name; this seems like useful backup to the protections installed by commit 5f173040. In HEAD/9.5, get rid of get_constraint_relation_oids, which is now unused. (I could alternatively have modified it to also return conislocal, but that seemed like a pretty single-purpose API, so let's not pretend it has some other use.) It's unused in the back branches as well, but I left it in place just in case some third-party code has decided to use it. In HEAD/9.5, also rename pg_get_constraintdef_string to pg_get_constraintdef_command, as the previous name did nothing to explain what that entry point did differently from others (and its comment was equally useless). Again, that change doesn't seem like material for back-patching. I did a bit of re-pgindenting in tablecmds.c in HEAD/9.5, as well. Otherwise, back-patch to all supported branches.
* Fix possible internal overflow in numeric division.Tom Lane2015-11-17
| | | | | | | | | | | div_var_fast() postpones propagating carries in the same way as mul_var(), so it has the same corner-case overflow risk we fixed in 246693e5ae8a36f0, namely that the size of the carries has to be accounted for when setting the threshold for executing a carry propagation step. We've not devised a test case illustrating the brokenness, but the required fix seems clear enough. Like the previous fix, back-patch to all active branches. Dean Rasheed
* Message improvementsPeter Eisentraut2015-11-16
|
* Speed up ruleutils' name de-duplication code, and fix overlength-name case.Tom Lane2015-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 11e131854f8231a21613f834c40fe9d046926387, ruleutils.c has attempted to ensure that each RTE in a query or plan tree has a unique alias name. However, the code that was added for this could be quite slow, even as bad as O(N^3) if N identical RTE names must be replaced, as noted by Jeff Janes. Improve matters by building a transient hash table within set_rtable_names. The hash table in itself reduces the cost of detecting a duplicate from O(N) to O(1), and we can save another factor of N by storing the number of de-duplicated names already created for each entry, so that we don't have to re-try names already created. This way is probably a bit slower overall for small range tables, but almost by definition, such cases should not be a performance problem. In principle the same problem applies to the column-name-de-duplication code; but in practice that seems to be less of a problem, first because N is limited since we don't support extremely wide tables, and second because duplicate column names within an RTE are fairly rare, so that in practice the cost is more like O(N^2) not O(N^3). It would be very much messier to fix the column-name code, so for now I've left that alone. An independent problem in the same area was that the de-duplication code paid no attention to the identifier length limit, and would happily produce identifiers that were longer than NAMEDATALEN and wouldn't be unique after truncation to NAMEDATALEN. This could result in dump/reload failures, or perhaps even views that silently behaved differently than before. We can fix that by shortening the base name as needed. Fix it for both the relation and column name cases. In passing, check for interrupts in set_rtable_names, just in case it's still slow enough to be an issue. Back-patch to 9.3 where this code was introduced.
* Fix ruleutils.c's dumping of whole-row Vars in ROW() and VALUES() contexts.Tom Lane2015-11-15
| | | | | | | | | | | | | | | Normally ruleutils prints a whole-row Var as "foo.*". We already knew that that doesn't work at top level of a SELECT list, because the parser would treat the "*" as a directive to expand the reference into separate columns, not a whole-row Var. However, Joshua Yanovski points out in bug #13776 that the same thing happens at top level of a ROW() construct; and some nosing around in the parser shows that the same is true in VALUES(). Hence, apply the same workaround already devised for the SELECT-list case, namely to add a forced cast to the appropriate rowtype in these cases. (The alternative of just printing "foo" was rejected because it is difficult to avoid ambiguity against plain columns named "foo".) Back-patch to all supported branches.
* Fix erroneous hash calculations in gin_extract_jsonb_path().Tom Lane2015-11-05
| | | | | | | | | | | | | | | | | | | | The jsonb_path_ops code calculated hash values inconsistently in some cases involving nested arrays and objects. This would result in queries possibly not finding entries that they should find, when using a jsonb_path_ops GIN index for the search. The problem cases involve JSONB values that contain both scalars and sub-objects at the same nesting level, for example an array containing both scalars and sub-arrays. To fix, reset the current stack->hash after processing each value or sub-object, not before; and don't try to be cute about the outermost level's initial hash. Correcting this means that existing jsonb_path_ops indexes may now be inconsistent with the new hash calculation code. The symptom is the same --- searches not finding entries they should find --- but the specific rows affected are likely to be different. Users will need to REINDEX jsonb_path_ops indexes to make sure that all searches work as expected. Per bug #13756 from Daniel Cheng. Back-patch to 9.4 where the faulty logic was introduced.
* Message style improvementsPeter Eisentraut2015-10-28
| | | | | Message style, plurals, quoting, spelling, consistency with similar messages
* Fix incorrect translation of minus-infinity datetimes for json/jsonb.Tom Lane2015-10-20
| | | | | | | | | | | | | Commit bda76c1c8cfb1d11751ba6be88f0242850481733 caused both plus and minus infinity to be rendered as "infinity", which is not only wrong but inconsistent with the pre-9.4 behavior of to_json(). Fix that by duplicating the coding in date_out/timestamp_out/timestamptz_out more closely. Per bug #13687 from Stepan Perlov. Back-patch to 9.4, like the previous commit. In passing, also re-pgindent json.c, since it had gotten a bit messed up by recent patches (and I was already annoyed by indentation-related problems in back-patching this fix ...)
* Fix NULL handling in datum_to_jsonb().Tom Lane2015-10-15
| | | | | | | | | | | The function failed to adhere to its specification that the "tcategory" argument should not be examined when the input value is NULL. This resulted in a crash in some cases. Per bug #13680 from Boyko Yordanov. In passing, re-pgindent some recent changes in jsonb.c, and fix a rather ungrammatical comment. Diagnosis and patch by Michael Paquier, cosmetic changes by me
* Use JsonbIteratorToken consistently in automatic variable declarations.Noah Misch2015-10-12
| | | | | | Many functions stored JsonbIteratorToken values in variables of other integer types. Also, standardize order relative to other declarations. Expect compilers to generate the same code before and after this change.
* Prevent stack overflow in query-type functions.Noah Misch2015-10-05
| | | | | | The tsquery, ltxtquery and query_int data types have a common ancestor. Having acquired check_stack_depth() calls independently, each was missing at least one call. Back-patch to 9.0 (all supported versions).
* Prevent stack overflow in container-type functions.Noah Misch2015-10-05
| | | | | | | | | | | | | | A range type can name another range type as its subtype, and a record type can bear a column of another record type. Consequently, functions like range_cmp() and record_recv() are recursive. Functions at risk include operator family members and referents of pg_type regproc columns. Treat as recursive any such function that looks up and calls the same-purpose function for a record column type or the range subtype. Back-patch to 9.0 (all supported versions). An array type's element type is never itself an array type, so array functions are unaffected. Recursion depth proportional to array dimensionality, found in array_dim_to_jsonb(), is fine thanks to MAXDIM.
* Prevent stack overflow in json-related functions.Noah Misch2015-10-05
| | | | | | | | | | | | | | Sufficiently-deep recursion heretofore elicited a SIGSEGV. If an application constructs PostgreSQL json or jsonb values from arbitrary user input, application users could have exploited this to terminate all active database connections. That applies to 9.3, where the json parser adopted recursive descent, and later versions. Only row_to_json() and array_to_json() were at risk in 9.2, both in a non-security capacity. Back-patch to 9.2, where the json type was introduced. Oskari Saarenmaa, reviewed by Michael Paquier. Security: CVE-2015-5289