aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
Commit message (Collapse)AuthorAge
* Clean up new JSON API typedefsPeter Eisentraut2013-07-20
| | | | | | | The new JSON API uses a bit of an unusual typedef scheme, where for example OkeysState is a pointer to okeysState. And that's not applied consistently either. Change that to the more usual PostgreSQL style where struct typedefs are upper case, and use pointers explicitly.
* Move checking an explicit VARIADIC "any" argument into the parser.Andrew Dunstan2013-07-18
| | | | | | | | | This is more efficient and simpler . It does mean that an untyped NULL can no longer be used in such cases, which should be mentioned in Release Notes, but doesn't seem a terrible loss. The workaround is to cast the NULL to some array type. Pavel Stehule, reviewed by Jeevan Chalke.
* Fix end-of-loop optimization in pglz_find_match() function.Heikki Linnakangas2013-07-17
| | | | | | | | | | | After the recent pglz optimization patch, the next/prev pointers in the hash table are never NULL, INVALID_ENTRY_PTR is used to represent invalid entries instead. The end-of-loop check in pglz_find_match() function didn't get the memo. The result was the same from a correctness point of view, but because the NULL-check would never fail, the tiny optimization turned into a pessimization. Reported by Stephen Frost, using Coverity scanner.
* Implement the FILTER clause for aggregate function calls.Noah Misch2013-07-16
| | | | | | | | | This is SQL-standard with a few extensions, namely support for subqueries and outer references in clause expressions. catversion bump due to change in Aggref and WindowFunc. David Fetter, reviewed by Dean Rasheed.
* Fix bool abusePeter Eisentraut2013-07-08
| | | | | | path_encode's "closed" argument used to take three values: TRUE, FALSE, or -1, while being of type bool. Replace that with a three-valued enum for more clarity.
* Expose the estimation of number of changed tuples since last analyzeMagnus Hagander2013-07-05
| | | | | | | This value, now pg_stat_all_tables.n_mod_since_analyze, was already tracked and used by autovacuum, but not exposed to the user. Mark Kirkwood, review by Laurenz Albe
* Get rid of pg_class.reltoastidxid.Fujii Masao2013-07-04
| | | | | | | | | | Treat TOAST index just the same as normal one and get the OID of TOAST index from pg_index but not pg_class.reltoastidxid. This change allows us to handle multiple TOAST indexes, and which is required infrastructure for upcoming REINDEX CONCURRENTLY feature. Patch by Michael Paquier, reviewed by Andres Freund and me.
* Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.Robert Haas2013-07-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | SnapshotNow scans have the undesirable property that, in the face of concurrent updates, the scan can fail to see either the old or the new versions of the row. In many cases, we work around this by requiring DDL operations to hold AccessExclusiveLock on the object being modified; in some cases, the existing locking is inadequate and random failures occur as a result. This commit doesn't change anything related to locking, but will hopefully pave the way to allowing lock strength reductions in the future. The major issue has held us back from making this change in the past is that taking an MVCC snapshot is significantly more expensive than using a static special snapshot such as SnapshotNow. However, testing of various worst-case scenarios reveals that this problem is not severe except under fairly extreme workloads. To mitigate those problems, we avoid retaking the MVCC snapshot for each new scan; instead, we take a new snapshot only when invalidation messages have been processed. The catcache machinery already requires that invalidation messages be sent before releasing the related heavyweight lock; else other backends might rely on locally-cached data rather than scanning the catalog at all. Thus, making snapshot reuse dependent on the same guarantees shouldn't break anything that wasn't already subtly broken. Patch by me. Review by Michael Paquier and Andres Freund.
* Add timezone offset output option to to_char()Bruce Momjian2013-07-01
| | | | | | Add ability for to_char() to output the timezone's UTC offset (OF). We already have the ability to return the timezone abbeviation (TZ/tz). Per request from Andrew Dunstan
* Optimize pglz compressor for small inputs.Heikki Linnakangas2013-07-01
| | | | | | | | | | | | | | | | The pglz compressor has a significant startup cost, because it has to initialize to zeros the history-tracking hash table. On a 64-bit system, the hash table was 64kB in size. While clearing memory is pretty fast, for very short inputs the relative cost of that was quite large. This patch alleviates that in two ways. First, instead of storing pointers in the hash table, store 16-bit indexes into the hist_entries array. That slashes the size of the hash table to 1/2 or 1/4 of the original, depending on the pointer width. Secondly, adjust the size of the hash table based on input size. For very small inputs, you don't need a large hash table to avoid collisions. Review by Amit Kapila.
* Renovate display of non-ASCII messages on Windows.Noah Misch2013-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GNU gettext selects a default encoding for the messages it emits in a platform-specific manner; it uses the Windows ANSI code page on Windows and follows LC_CTYPE on other platforms. This is inconvenient for PostgreSQL server processes, so realize consistent cross-platform behavior by calling bind_textdomain_codeset() on Windows each time we permanently change LC_CTYPE. This primarily affects SQL_ASCII databases and processes like the postmaster that do not attach to a database, making their behavior consistent with PostgreSQL on non-Windows platforms. Messages from SQL_ASCII databases use the encoding implied by the database LC_CTYPE, and messages from non-database processes use LC_CTYPE from the postmaster system environment. PlatformEncoding becomes unused, so remove it. Make write_console() prefer WriteConsoleW() to write() regardless of the encodings in use. In this situation, write() will invariably mishandle non-ASCII characters. elog.c has assumed that messages conform to the database encoding. While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL. Introduce MessageEncoding to track the actual encoding of message text. The present consumers are Windows-specific code for converting messages to UTF16 for use in system interfaces. This fixes the appearance in Windows event logs and consoles of translated messages from SQL_ASCII processes like the postmaster. Note that SQL_ASCII inherently disclaims a strong notion of encoding, so non-ASCII byte sequences interpolated into messages by %s may yet yield a nonsensical message. MULE_INTERNAL has similar problems at present, albeit for a different reason: its lack of libiconv support or a conversion to UTF8. Consequently, one need no longer restart Windows with a different Windows ANSI code page to broadly test backend logging under a given language. Changing the user's locale ("Format") is enough. Several accounts can simultaneously run postmasters under different locales, all correctly logging localized messages to Windows event logs and consoles. Alexander Law and Noah Misch
* Use WaitLatch, not pg_usleep, for delaying in pg_sleep().Tom Lane2013-06-15
| | | | | | | | | | This avoids platform-dependent behavior wherein pg_sleep() might fail to be interrupted by statement timeout, query cancel, SIGTERM, etc. Also, since there's no reason to wake up once a second any more, we can reduce the power consumption of a sleeping backend a tad. Back-patch to 9.3, since use of SA_RESTART for SIGALRM makes this a bigger issue than it used to be.
* Avoid reading past datum end when parsing JSON.Noah Misch2013-06-12
| | | | | | | Several loops in the JSON parser examined a byte in memory just before checking whether its address was in-bounds, so they could read one byte beyond the datum's allocation. A SIGSEGV is possible. New in 9.3, so no back-patch.
* Improve updatability checking for views and foreign tables.Tom Lane2013-06-12
| | | | | | | | | | | | | | | | | | | | | Extend the FDW API (which we already changed for 9.3) so that an FDW can report whether specific foreign tables are insertable/updatable/deletable. The default assumption continues to be that they're updatable if the relevant executor callback function is supplied by the FDW, but finer granularity is now possible. As a test case, add an "updatable" option to contrib/postgres_fdw. This patch also fixes the information_schema views, which previously did not think that foreign tables were ever updatable, and fixes view_is_auto_updatable() so that a view on a foreign table can be auto-updatable. initdb forced due to changes in information_schema views and the functions they rely on. This is a bit unfortunate to do post-beta1, but if we don't change this now then we'll have another API break for FDWs when we do change it. Dean Rasheed, somewhat editorialized on by Tom Lane
* Fix unescaping of JSON Unicode escapes, especially for non-UTF8.Andrew Dunstan2013-06-12
| | | | | | | | | | Per discussion on -hackers. We treat Unicode escapes when unescaping them similarly to the way we treat them in PostgreSQL string literals. Escapes in the ASCII range are always accepted, no matter what the database encoding. Escapes for higher code points are only processed in UTF8 databases, and attempts to process them in other databases will result in an error. \u0000 is never unescaped, since it would result in an impermissible null byte.
* Handle Unicode surrogate pairs correctly when processing JSON.Andrew Dunstan2013-06-08
| | | | | | | | | | | | | In 9.2, Unicode escape sequences are not analysed at all other than to make sure that they are in the form \uXXXX. But in 9.3 many of the new operators and functions try to turn JSON text values into text in the server encoding, and this includes de-escaping Unicode escape sequences. This processing had not taken into account the possibility that this might contain a surrogate pair to designate a character outside the BMP. That is now handled correctly. This also enforces correct use of surrogate pairs, something that is not done by the type's input routines. This fact is noted in the docs.
* Additional spelling correctionsStephen Frost2013-06-03
| | | | | | A few more minor spelling corrections, no functional changes. Thom Brown
* Minor spelling fixesStephen Frost2013-06-01
| | | | | | Fix a few spelling mistakes. Per bug report #8193 from Lajos Veres.
* Don't emit non-canonical empty arrays in array_remove().Noah Misch2013-05-31
| | | | Dean Rasheed
* pgindent run for release 9.3Bruce Momjian2013-05-29
| | | | | This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
* Fix crash when trying to display a NOTIFY rule action.Tom Lane2013-05-16
| | | | | | | | Fixes oversight in commit 2ffa740be9d96a3743ecb7e42391c53d0760c65a. Per report from Josh Kupershmidt. I think we've broken this case before, so let's add a regression test this time.
* Fix to_number() to correctly ignore thousands separator when it's '.'.Tom Lane2013-05-11
| | | | | | | | | | | | | | | | | | | | | | | The existing code in NUM_numpart_from_char has hard-wired logic to treat '.' as decimal point, even when we're using a locale-aware format string and the locale says that '.' is the thousands separator. This results in clearly wrong answers in FM mode (where we must be able to identify the decimal point location), as per bug report from Patryk Kordylewski. Since the initialization code in NUM_prepare_locale already sets up Np->decimal as either the locale decimal-point string or "." depending on which decimal-point format code was used, there's really no need to have any extra logic at all in NUM_numpart_from_char: we only need to test for a match to Np->decimal. (Note: AFAICS there's nothing in here that explicitly checks for thousands separators --- rather, any unmatched character is silently skipped over. That's pretty bogus IMO but it's not the issue being complained of.) This is a longstanding bug, but it's possible that some existing apps are depending on '.' being recognized as decimal point even when using a D format code. Hence, no back-patch. We should probably list this as a potential incompatibility in the 9.3 release notes.
* Guard against input_rows == 0 in estimate_num_groups().Tom Lane2013-05-10
| | | | | | | | | | | | | | | This case doesn't normally happen, because the planner usually clamps all row estimates to at least one row; but I found that it can arise when dealing with relations excluded by constraints. Without a defense, estimate_num_groups() can return zero, which leads to divisions by zero inside the planner as well as assertion failures in the executor. An alternative fix would be to change set_dummy_rel_pathlist() to make the size estimate for a dummy relation 1 row instead of 0, but that seemed pretty ugly; and probably someday we'll want to drop the convention that the minimum rowcount estimate is 1 row. Back-patch to 8.4, as the problem can be demonstrated that far back.
* Move materialized views' is-populated status into their pg_class entries.Tom Lane2013-05-06
| | | | | | | | | | | | Previously this state was represented by whether the view's disk file had zero or nonzero size, which is problematic for numerous reasons, since it's breaking a fundamental assumption about heap storage. This was done to allow unlogged matviews to revert to unpopulated status after a crash despite our lack of any ability to update catalog entries post-crash. However, this poses enough risk of future problems that it seems better to not support unlogged matviews until we can find another way. Accordingly, revert that choice as well as a number of existing kluges forced by it in favor of creating a pg_class.relispopulated flag column.
* Use correct length to convert json unicode escapes.Andrew Dunstan2013-05-01
| | | | Bug reported on IRC - fix due to Andrew Gierth.
* Clean up references to SQL92Peter Eisentraut2013-04-20
| | | | | | In most cases, these were just references to the SQL standard in general. In a few cases, a contrast was made between SQL92 and later standards -- those have been kept unchanged.
* Correct handling of NULL arguments in json funcs.Andrew Dunstan2013-04-15
| | | | Per gripe from Tom Lane.
* Create a distinction between a populated matview and a scannable one.Kevin Grittner2013-04-09
| | | | | | | | | | | | | | | | The intent was that being populated would, long term, be just one of the conditions which could affect whether a matview was scannable; being populated should be necessary but not always sufficient to scan the relation. Since only CREATE and REFRESH currently determine the scannability, names and comments accidentally conflated these concepts, leading to confusion. Also add missing locking for the SQL function which allows a test for scannability, and fix a modularity violatiion. Per complaints from Tom Lane, although its not clear that these will satisfy his concerns. Hopefully this will at least better frame the discussion.
* Support indexing of regular-expression searches in contrib/pg_trgm.Tom Lane2013-04-09
| | | | | | | | | | | | | | | | This works by extracting trigrams from the given regular expression, in generally the same spirit as the previously-existing support for LIKE searches, though of course the details are far more complicated. Currently, only GIN indexes are supported. We might be able to make it work with GiST indexes later. The implementation includes adding API functions to backend/regex/ to provide a view of the search NFA created from a regular expression. These functions are meant to be generic enough to be supportable in a standalone version of the regex library, should that ever happen. Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane
* Fix off by one error in JSON extract path code.Andrew Dunstan2013-04-04
| | | | Bug report by David Wheeler, diagnosis assistance from Tom Lane.
* Avoid updating our PgBackendStatus entry when track_activities is off.Tom Lane2013-04-03
| | | | | | | | The point of turning off track_activities is to avoid this reporting overhead, but a thinko in commit 4f42b546fd87a80be30c53a0f2c897acb826ad52 caused pgstat_report_activity() to perform half of its updates anyway. Fix that, and also make sure that we clear all the now-disabled fields when transitioning to the non-reporting state.
* Add new JSON processing functions and parser API.Andrew Dunstan2013-03-29
| | | | | | | | | | | | | | | | | The JSON parser is converted into a recursive descent parser, and exposed for use by other modules such as extensions. The API provides hooks for all the significant parser event such as the beginning and end of objects and arrays, and providing functions to handle these hooks allows for fairly simple construction of a wide variety of JSON processing functions. A set of new basic processing functions and operators is also added, which use this API, including operations to extract array elements, object fields, get the length of arrays and the set of keys of a field, deconstruct an object into a set of key/value pairs, and create records from JSON objects and arrays of objects. Catalog version bumped. Andrew Dunstan, with some documentation assistance from Merlin Moncure.
* Add sql_drop event for event triggersAlvaro Herrera2013-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This event takes place just before ddl_command_end, and is fired if and only if at least one object has been dropped by the command. (For instance, DROP TABLE IF EXISTS of a table that does not in fact exist will not lead to such a trigger firing). Commands that drop multiple objects (such as DROP SCHEMA or DROP OWNED BY) will cause a single event to fire. Some firings might be surprising, such as ALTER TABLE DROP COLUMN. The trigger is fired after the drop has taken place, because that has been deemed the safest design, to avoid exposing possibly-inconsistent internal state (system catalogs as well as current transaction) to the user function code. This means that careful tracking of object identification is required during the object removal phase. Like other currently existing events, there is support for tag filtering. To support the new event, add a new pg_event_trigger_dropped_objects() set-returning function, which returns a set of rows comprising the objects affected by the command. This is to be used within the user function code, and is mostly modelled after the recently introduced pg_identify_object() function. Catalog version bumped due to the new function. Dimitri Fontaine and Álvaro Herrera Review by Robert Haas, Tom Lane
* Fix "element <@ range" cost estimation.Heikki Linnakangas2013-03-21
| | | | | | | | | | | | | The statistics-based cost estimation patch for range types broke that, by incorrectly assuming that the left operand of all range oeprators is a range. That lead to a "type x is not a range type" error. Because it took so long for anyone to notice, add a regression test for that case. We still don't do proper statistics-based cost estimation for that, so you just get a default constant estimate. We should look into implementing that, but this patch at least fixes the regression. Spotted by Tom Lane, when testing query from Josh Berkus.
* Allow extracting machine-readable object identityAlvaro Herrera2013-03-20
| | | | | | | | | | | Introduce pg_identify_object(oid,oid,int4), which is similar in spirit to pg_describe_object but instead produces a row of machine-readable information to uniquely identify the given object, without resorting to OIDs or other internal representation. This is intended to be used in the event trigger implementation, to report objects being operated on; but it has usefulness of its own. Catalog version bumped because of the new function.
* Extend format() to handle field width and left/right alignment.Tom Lane2013-03-14
| | | | | | This change adds some more standard sprintf() functionality to format(). Pavel Stehule, reviewed by Dean Rasheed and Kyotaro Horiguchi
* Add cost estimation of range @> and <@ operators.Heikki Linnakangas2013-03-14
| | | | | | | | | | | The estimates are based on the existing lower bound histogram, and a new histogram of range lengths. Bump catversion, because the range length histogram now needs to be present in statistic slot kind 6, or you get an error on @> and <@ queries. (A re-ANALYZE would be enough to fix that, though) Alexander Korotkov, with some refactoring by me.
* JSON generation improvements.Andrew Dunstan2013-03-10
| | | | | | | | | | | | | | | | | | This adds the following: json_agg(anyrecord) -> json to_json(any) -> json hstore_to_json(hstore) -> json (also used as a cast) hstore_to_json_loose(hstore) -> json The last provides heuristic treatment of numbers and booleans. Also, in json generation, if any non-builtin type has a cast to json, that function is used instead of the type's output function. Andrew Dunstan, reviewed by Steve Singer. Catalog version bumped.
* SP-GiST support of the range adjacent operator -|-Heikki Linnakangas2013-03-08
| | | | Alexander Korotkov, reviewed by Jeff Davis.
* Fix to_char() to use ASCII-only case-folding rules where appropriate.Tom Lane2013-03-05
| | | | | | | | | | | | | | formatting.c used locale-dependent case folding rules in some code paths where the result isn't supposed to be locale-dependent, for example to_char(timestamp, 'DAY'). Since the source data is always just ASCII in these cases, that usually didn't matter ... but it does matter in Turkish locales, which have unusual treatment of "i" and "I". To confuse matters even more, the misbehavior was only visible in UTF8 encoding, because in single-byte encodings we used pg_toupper/pg_tolower which don't have locale-specific behavior for ASCII characters. Fix by providing intentionally ASCII-only case-folding functions and using these where appropriate. Per bug #7913 from Adnan Dursun. Back-patch to all active branches, since it's been like this for a long time.
* Fix overflow check in tm2timestamp (this time for sure).Tom Lane2013-03-04
| | | | | | I fixed this code back in commit 841b4a2d5, but didn't think carefully enough about the behavior near zero, which meant it improperly rejected 1999-12-31 24:00:00. Per report from Magnus Hagander.
* Fix map_sql_value_to_xml_value() to treat domains like their base types.Tom Lane2013-03-03
| | | | | | | | | | | | This was already the case for domains over arrays, but not for domains over certain built-in types such as boolean. The special formatting rules for those types should apply to domains over them as well. Per discussion. While this is a bug fix, it's also a behavioral change that seems likely to trip up some applications. So no back-patch. Pavel Stehule
* Add a materialized view relations.Kevin Grittner2013-03-03
| | | | | | | | | | | | | | | | | | | | | | A materialized view has a rule just like a view and a heap and other physical properties like a table. The rule is only used to populate the table, references in queries refer to the materialized data. This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements. It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first. Much of the documentation work by Robert Haas. Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja Security review by KaiGai Kohei, with a decision on how best to implement sepgsql still pending.
* Move relpath() to libpgcommonAlvaro Herrera2013-02-21
| | | | | | | This enables non-backend code, such as pg_xlogdump, to use it easily. The previous location, in src/backend/catalog/catalog.c, made that essentially impossible because that file depends on many backend-only facilities; so this needs to live separately.
* Fix CVE-2013-0255 properly.Tom Lane2013-02-13
| | | | | | | | | | | | Revert commit ab0f7b6089fd215f6ce6081e2e222c38d643a526 (in HEAD only) in favor of the proper solution, which is to declare enum_recv() correctly in the system catalogs. It should be declared to take type "internal" not "cstring". Also improve the type_sanity regression test, which should have caught this typo, so that it actually would. Most of the relevant checks on the signature of type I/O functions should not have been restricted to basetypes/pseudotypes, as they should apply to any type's I/O functions.
* Simplify box_overlap computations.Tom Lane2013-02-08
| | | | | | | | | | | | Given the assumption that a box's high coordinates are not less than its low coordinates, the tests in box_ov() are overly complicated and can be reduced to about half as much work. Since many other functions in geo_ops.c rely on that assumption, there doesn't seem to be a good reason not to use it here. Per discussion of Alexander Korotkov's GiST fix, which was already using the simplified logic (in a non-fuzzy form, but the equivalence holds just as well for fuzzy).
* Enable building with Microsoft Visual Studio 2012.Andrew Dunstan2013-02-06
| | | | | | Backpatch to release 9.2 Brar Piening and Noah Misch, reviewed by Craig Ringer.
* Prevent execution of enum_recv() from SQL.Tom Lane2013-02-04
| | | | | | | | | | | | | | | | | This function was misdeclared to take cstring when it should take internal. This at least allows crashing the server, and in principle an attacker might be able to use the function to examine the contents of server memory. The correct fix is to adjust the system catalog contents (and fix the regression tests that should have caught this but failed to). However, asking users to correct the catalog contents in existing installations is a pain, so as a band-aid fix for the back branches, install a check in enum_recv() to make it throw error if called with a cstring argument. We will later revert this in HEAD in favor of correcting the catalogs. Our thanks to Sumit Soni (via Secunia SVCRP) for reporting this issue. Security: CVE-2013-0255
* Perform line wrapping and indenting by default in ruleutils.c.Tom Lane2013-02-03
| | | | | | | | | | | | | | | This patch changes pg_get_viewdef() and allied functions so that PRETTY_INDENT processing is always enabled. Per discussion, only the PRETTY_PAREN processing (that is, stripping of "unnecessary" parentheses) poses any real forward-compatibility risk, so we may as well make dump output look as nice as we safely can. Also, set the default wrap length to zero (i.e, wrap after each SELECT or FROM list item), since there's no very principled argument for the former default of 80-column wrapping, and most people seem to agree this way looks better. Marko Tiikkaja, reviewed by Jeevan Chalke, further hacking by Tom Lane
* Reject nonzero day fields in AT TIME ZONE INTERVAL functions.Tom Lane2013-01-31
| | | | | | | | | | It's not sensible for an interval that's used as a time zone value to be larger than a day. When we changed the interval type to contain a separate day field, check_timezone() was adjusted to reject nonzero day values, but timetz_izone(), timestamp_izone(), and timestamptz_izone() evidently were overlooked. While at it, make the error messages for these three cases consistent.