aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/varlena.c
Commit message (Collapse)AuthorAge
...
* Don't downcase entries within shared_preload_libraries et al.Tom Lane2017-06-20
| | | | | | | | | | | | | | | | | | load_libraries(), which processes the various xxx_preload_libraries GUCs, was parsing them using SplitIdentifierString() which isn't really appropriate for values that could be path names: it downcases unquoted text, and it doesn't allow embedded whitespace unless quoted. Use SplitDirectoriesString() instead. That also allows us to simplify load_libraries() a bit, since canonicalize_path() is now done for it. While this definitely seems like a bug fix, it has the potential to break configuration settings that accidentally worked before because of the downcasing behavior. Also, there's an easy workaround for the bug, namely to double-quote troublesome text. Hence, no back-patch. QL Zhuo, tweaked a bit by me Discussion: https://postgr.es/m/CAB-oJtxHVDc3H+Km3CjB9mY1VDzuyaVH_ZYSz7iXcRqCtb93Ew@mail.gmail.com
* Fix ICU collation use on WindowsPeter Eisentraut2017-06-16
| | | | | | | | Windows uses a separate code path for libc locales. The code previously ended up there also if an ICU collation should be used, leading to a crash. Reported-by: Ashutosh Sharma <ashu.coek88@gmail.com>
* Tighten checks for whitespace in functions that parse identifiers etc.Tom Lane2017-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces isspace() calls with scanner_isspace() in functions that are likely to be presented with non-ASCII input. isspace() has the small advantage that it will correctly recognize no-break space in single-byte encodings (such as LATIN1); but it cannot work successfully for any multibyte character, and depending on platform it might return false positive results for some fragments of multibyte characters. That's disastrous for functions that are trying to discard whitespace between valid strings, as noted in bug #14662 from Justin Muise. Even treating no-break space as whitespace is pretty questionable for the usages touched here, because the core scanner would think it is an identifier character. Affected functions are parse_ident(), parseNameAndArgTypes (underlying regprocedurein() and siblings), SplitIdentifierString (used for parsing GUCs and options that are qualified names or lists of names), and SplitDirectoriesString (used for parsing GUCs that are lists of directories). All the functions adjusted here are parsing SQL identifiers and similar constructs, so it's reasonable to insist that their definition of whitespace match the core scanner. So we can hope that this won't cause many backwards-compatibility problems. I've left alone isspace() calls in places that aren't really expecting any non-ASCII input characters, such as float8in(). Back-patch to all supported branches. Discussion: https://postgr.es/m/10129.1495302480@sss.pgh.pa.us
* Post-PG 10 beta1 pgindent runBruce Momjian2017-05-17
| | | | perltidy run not included.
* Remove create_singleton_array(), hard-coding the case in its sole caller.Tom Lane2017-05-02
| | | | | | | | | | | | | | | | | | | | | | create_singleton_array() was not really as useful as we perhaps thought when we added it. It had never accreted more than one call site, and is only saving a dozen lines of code at that one, which is considerably less bulk than the function itself. Moreover, because of its insistence on using the caller's fn_extra cache space, it's arguably a coding hazard. text_to_array_internal() does not currently use fn_extra in any other way, but if it did it would be subtly broken, since the conflicting fn_extra uses could be needed within a single query, in the seldom-tested case that the field separator varies during the query. The same objection seems likely to apply to any other potential caller. The replacement code is a bit uglier, because it hardwires knowledge of the storage parameters of type TEXT, but it's not like we haven't got dozens or hundreds of other places that do the same. Uglier seems like a good tradeoff for smaller, faster, and safer. Per discussion with Neha Khatri. Discussion: https://postgr.es/m/CAFO0U+_fS5SRhzq6uPG+4fbERhoA9N2+nPrtvaC9mmeWivxbsA@mail.gmail.com
* Fix locale pointer use in WIN32 code pathPeter Eisentraut2017-03-25
| | | | Author: David Rowley <david.rowley@2ndquadrant.com>
* ICU supportPeter Eisentraut2017-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a column collprovider to pg_collation that determines which library provides the collation data. The existing choices are default and libc, and this adds an icu choice, which uses the ICU4C library. The pg_locale_t type is changed to a union that contains the provider-specific locale handles. Users of locale information are changed to look into that struct for the appropriate handle to use. Also add a collversion column that records the version of the collation when it is created, and check at run time whether it is still the same. This detects potentially incompatible library upgrades that can corrupt indexes and other structures. This is currently only supported by ICU-provided collations. initdb initializes the default collation set as before from the `locale -a` output but also adds all available ICU locales with a "-x-icu" appended. Currently, ICU-provided collations can only be explicitly named collations. The global database locales are still always libc-provided. ICU support is enabled by configure --with-icu. Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Andreas Karlsson <andreas@proxel.se>
* Use wrappers of PG_DETOAST_DATUM_PACKED() more.Noah Misch2017-03-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | This makes almost all core code follow the policy introduced in the previous commit. Specific decisions: - Text search support functions with char* and length arguments, such as prsstart and lexize, may receive unaligned strings. I doubt maintainers of non-core text search code will notice. - Use plain VARDATA() on values detoasted or synthesized earlier in the same function. Use VARDATA_ANY() on varlenas sourced outside the function, even if they happen to always have four-byte headers. As an exception, retain the universal practice of using VARDATA() on return values of SendFunctionCall(). - Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large for a one-byte header, so this misses no optimization.) Sites that do not call get_page_from_raw() typically need the four-byte alignment. - For now, do not change btree_gist. Its use of four-byte headers in memory is partly entangled with storage of 4-byte headers inside GBT_VARKEY, on disk. - For now, do not change gtrgm_consistent() or gtrgm_distance(). They incorporate the varlena header into a cache, and there are multiple credible implementation strategies to consider.
* Fix typo in comment.Fujii Masao2017-02-22
| | | | neha khatri
* Move some things from builtins.h to new header filesPeter Eisentraut2017-01-20
| | | | This avoids that builtins.h has to include additional header files.
* Make messages mentioning type names more uniformAlvaro Herrera2017-01-18
| | | | | | | | | This avoids additional translatable strings for each distinct type, as well as making our quoting style around type names more consistent (namely, that we don't quote type names). This continues what started as f402b9950120. Discussion: https://postgr.es/m/20160401170642.GA57509@alvherre.pgsql
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Refer to OS X as "macOS", except for the port name which is still "darwin".Tom Lane2016-09-25
| | | | | | | | | | | | | | | | | | We weren't terribly consistent about whether to call Apple's OS "OS X" or "Mac OS X", and the former is probably confusing to people who aren't Apple users. Now that Apple has rebranded it "macOS", follow their lead to establish a consistent naming pattern. Also, avoid the use of the ancient project name "Darwin", except as the port code name which does not seem desirable to change. (In short, this patch touches documentation and comments, but no actual code.) I didn't touch contrib/start-scripts/osx/, either. I suspect those are obsolete and due for a rewrite, anyway. I dithered about whether to apply this edit to old release notes, but those were responsible for quite a lot of the inconsistencies, so I ended up changing them too. Anyway, Apple's being ahistorical about this, so why shouldn't we be?
* Move code shared between libpq and backend from backend/libpq/ to common/.Heikki Linnakangas2016-09-02
| | | | | | | | | | | | | | | | | | | | | | When building libpq, ip.c and md5.c were symlinked or copied from src/backend/libpq into src/interfaces/libpq, but now that we have a directory specifically for routines that are shared between the server and client binaries, src/common/, move them there. Some routines in ip.c were only used in the backend. Keep those in src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file that's now in common. Fix the comment in src/common/Makefile to reflect how libpq actually links those files. There are two more files that libpq symlinks directly from src/backend: encnames.c and wchar.c. I don't feel compelled to move those right now, though. Patch by Michael Paquier, with some changes by me. Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>
* Make format() error messages consistent againPeter Eisentraut2016-08-08
| | | | 07d25a964 changed only one occurrence. Change the other one as well.
* pgindent run for 9.6Robert Haas2016-06-09
|
* Disable abbreviated keys for string-sorting in non-C locales.Robert Haas2016-03-23
| | | | | | | | | | | | | | | | | | | Unfortunately, every version of glibc thus far tested has bugs whereby strcoll() ordering does not match strxfrm() ordering as required by the standard. This can result in, for example, corrupted indexes. Disabling abbreviated keys in these cases slows down non-C-collation string sorting considerably, but there seems to be no practical alternative. Users who are confident that their libc implementations are solid in this regard can re-enable the optimization by compiling with TRUST_STRXFRM. Users who have built indexes using PostgreSQL 9.5 or PostgreSQL 9.5.1 should REINDEX if there is a possibility that they may have been affected by this problem. Report by Marc-Olaf Jaschke. Investigation mostly by Tom Lane, with help from Peter Geoghegan, Noah Misch, Stephen Frost, and me. Patch by me, reviewed by Peter Geoghegan and Tom Lane.
* Improve error reporting in format()Teodor Sigaev2016-02-11
| | | | | | Clarify invalid format conversion type error message and add hint. Author: Jim Nasby
* Re-pgindent varlena.c.Tom Lane2016-02-08
| | | | Just to make sure previous commit worked ...
* Rename typedef "string" to "VarString".Tom Lane2016-02-08
| | | | | | | | | | Since pgindent treats typedef names as global, the original coding of b47b4dbf683f13e6 would have had rather nasty effects on the formatting of other files in which "string" is used as a variable or field name. Use a less generic name for this typedef, and rename some other identifiers to match. Peter Geoghegan, per gripe from me
* Fix small goof in comment.Robert Haas2016-02-05
| | | | Peter Geoghegan
* Extend sortsupport for text to more opclasses.Robert Haas2016-02-03
| | | | | | | | | | | | | | | | Have varlena.c expose an interface that allows the char(n), bytea, and bpchar types to piggyback on a now-generalized SortSupport for text. This pushes a little more knowledge of the bpchar/char(n) type into varlena.c than might be preferred, but that seems like the approach that creates least friction. Also speed things up for index builds that use text_pattern_ops or varchar_pattern_ops. This patch does quite a bit of renaming, but it seems likely to be worth it, so as to avoid future confusion about the fact that this code is now more generally used than the old names might have suggested. Peter Geoghegan, reviewed by Álvaro Herrera and Andreas Karlsson, with small tweaks by me.
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Remove unnecessary escaping in C character literalsPeter Eisentraut2015-12-22
| | | | '\"' is more commonly written simply as '"'.
* Correct tiny inaccuracy in strxfrm cache comment.Robert Haas2015-11-03
| | | | Peter Geoghegan
* Be a bit more rigorous about how we cache strcoll and strxfrm results.Robert Haas2015-10-20
| | | | | | | | | | | Commit 0e57b4d8bd9674adaf5747421b3255b85e385534 contained some clever logic that attempted to make sure that we couldn't get confused about whether the last thing we cached was a strcoll() result or a strxfrm() result, but it wasn't quite clever enough, because we can perform further abbreviations after having already performed some comparisons. Introduce an explicit flag in the hopes of making this watertight. Peter Geoghegan, reviewed by me.
* Remove obsolete comment.Robert Haas2015-10-20
| | | | Peter Geoghegan
* Speed up text sorts where the same strings occur multiple times.Robert Haas2015-10-09
| | | | | | | | | | | | | | | | Cache strxfrm() blobs across calls made to the text SortSupport abbreviation routine. This can speed up sorting if the same string needs to be abbreviated many times in a row. Also, cache the result of the previous strcoll() comparison, so that if we're asked to compare the same strings agin, we do need to call strcoll() again. Perhaps surprisingly, these optimizations don't seem to hurt even when they don't help. memcmp() is really cheap compared to strcoll() or strxfrm(). Peter Geoghegan, reviewed by me.
* Make abbreviated key comparisons for text a bit cheaper.Robert Haas2015-10-09
| | | | | | | If we do some byte-swapping while abbreviating, we can do comparisons using integer arithmetic rather than memcmp. Peter Geoghegan, reviewed and slightly revised by me.
* In bttext_abbrev_convert, move pfree to the right place.Robert Haas2015-06-29
| | | | | | | Without this, we might access memory that's already been freed, or leak memory if in the C locale. Peter Geoghegan
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Collection of typo fixes.Heikki Linnakangas2015-05-20
| | | | | | | | | | | | | | | Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
* Make trace_sort control abbreviation debug output for the text opclass.Robert Haas2015-04-07
| | | | | | | | This is consistent with what the new numeric suppor for abbreviated keys now does, and seems much more convenient than having a separate compiler define to control this debug output. Peter Geoghegan
* Change the way we decide whether to give up on abbreviated text keys.Robert Haas2015-04-03
| | | | | | | | | Be more aggressive about aborting early on if it looks like it's not helping, but be less aggressive about aborting later on, since it's more expensive at that point, and also since we're currently aborting in some cases where abbreviation can still deliver a substantial win. Peter Geoghegan. Extensive testing by Tomas Vondra.
* Add missing calls to DatumGetUInt32.Robert Haas2015-04-02
| | | | | | | These were inadvertently ommitted from the commit that introduced abbreviated keys, commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23. Peter Geoghegan
* Re-enable abbreviated keys on Windows.Robert Haas2015-01-26
| | | | | | | | Commit 1be4eb1b2d436d1375899c74e4c74486890d8777 disabled this, but I think the real problem here was fixed by commit b181a91981203f6ec9403115a2917bd3f9473707 and commit d060e07fa919e0eb681e2fa2cfbe63d6c40eb2cf. So let's try re-enabling it now and see what happens.
* Replace a bunch more uses of strncpy() with safer coding.Tom Lane2015-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | strncpy() has a well-deserved reputation for being unsafe, so make an effort to get rid of nearly all occurrences in HEAD. A large fraction of the remaining uses were passing length less than or equal to the known strlen() of the source, in which case no null-padding can occur and the behavior is equivalent to memcpy(), though doubtless slower and certainly harder to reason about. So just use memcpy() in these cases. In other cases, use either StrNCpy() or strlcpy() as appropriate (depending on whether padding to the full length of the destination buffer seems useful). I left a few strncpy() calls alone in the src/timezone/ code, to keep it in sync with upstream (the IANA tzcode distribution). There are also a few such calls in ecpg that could possibly do with more analysis. AFAICT, none of these changes are more than cosmetic, except for the four occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength source leads to a non-null-terminated destination buffer and ensuing misbehavior. These don't seem like security issues, first because no stack clobber is possible and second because if your values of sslcert etc are coming from untrusted sources then you've got problems way worse than this. Still, it's undesirable to have unpredictable behavior for overlength inputs, so back-patch those four changes to all active branches.
* Fix typos, update README.Robert Haas2015-01-23
| | | | Peter Geoghegan
* Repair brain fade in commit b181a91981203f6ec9403115a2917bd3f9473707.Robert Haas2015-01-22
| | | | | | The split between which things need to happen in the C-locale case and which needed to happen in the locale-aware case was a few bricks short of a load. Try to fix that.
* More fixes for abbreviated keys infrastructure.Robert Haas2015-01-22
| | | | | | | | | | | | | | | | | | First, when LC_COLLATE = C, bttext_abbrev_convert should use memcpy() rather than strxfrm() to construct the abbreviated key, because the authoritative comparator uses memcpy(). If we do anything else here, we might get inconsistent answers, and the buildfarm says this risk is not theoretical. It should be faster this way, too. Second, while I'm looking at bttext_abbrev_convert, convert a needless use of goto into the loop it's trying to implement into an actual loop. Both of the above problems date to the original commit of abbreviated keys, commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23. Third, fix a bogus assignment to tss->locale before tss is set up. That's a new goof in commit b529b65d1bf8537ca7fa024760a9782d7c8b66e5.
* Heavily refactor btsortsupport_worker.Robert Haas2015-01-22
| | | | | | | | | | | | | | Prior to commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23, this function only had one job, which was to decide whether we could avoid trampolining through the fmgr layer when performing sort comparisons. As of that commit, it has a second job, which is to decide whether we can use abbreviated keys. Unfortunately, those two tasks are somewhat intertwined in the existing coding, which is likely why neither Peter Geoghegan nor I noticed prior to commit that this calls pg_newlocale_from_collation() in cases where it didn't previously. The buildfarm noticed, though. To fix, rewrite the logic so that the decision as to which comparator to use is more cleanly separated from the decision about abbreviation.
* Disable abbreviated keys on Windows.Robert Haas2015-01-20
| | | | | | | | | | Most of the Windows buildfarm members (bowerbird, hamerkop, currawong, jacana, brolga) are unhappy with yesterday's abbreviated keys patch, although there are some (narwhal, frogmouth) that seem OK with it. Since there's no obvious pattern to explain why some are working and others are failing, just disable this across-the-board on Windows for now. This is a bit unfortunate since the optimization will be a big win in some cases, but we can't leave the buildfarm broken.
* Use abbreviated keys for faster sorting of text datums.Robert Haas2015-01-19
| | | | | | | | | | | | | | | | | | | | | This commit extends the SortSupport infrastructure to allow operator classes the option to provide abbreviated representations of Datums; in the case of text, we abbreviate by taking the first few characters of the strxfrm() blob. If the abbreviated comparison is insufficent to resolve the comparison, we fall back on the normal comparator. This can be much faster than the old way of doing sorting if the first few bytes of the string are usually sufficient to resolve the comparison. There is the potential for a performance regression if all of the strings to be sorted are identical for the first 8+ characters and differ only in later positions; therefore, the SortSupport machinery now provides an infrastructure to abort the use of abbreviation if it appears that abbreviation is producing comparatively few distinct keys. HyperLogLog, a streaming cardinality estimator, is included in this commit and used to make that determination for text. Peter Geoghegan, reviewed by me.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Move the guts of our Levenshtein implementation into core.Robert Haas2014-11-13
| | | | | | | | The hope is that we can use this to produce better diagnostics in some cases. Peter Geoghegan, reviewed by Michael Paquier, with some further changes by me.
* Add a fast pre-check for equality of equal-length strings.Robert Haas2014-09-19
| | | | | | | | | Testing reveals that that doing a memcmp() before the strcoll() costs practically nothing, at least on the systems we tested, and it speeds up sorts containing many equal strings significatly. Peter Geoghegan. Review by myself and Heikki Linnakangas. Comments rewritten by me.
* Fix typo in b34e37bfefbed1bf9396dde18f308d8b96fd176c.Robert Haas2014-08-26
| | | | Spotted by Peter Geoghegan.
* Add sortsupport routines for text.Robert Haas2014-08-14
| | | | | | | This provides a small but worthwhile speedup when sorting text, at least in cases to which the sortsupport machinery applies. Robert Haas and Peter Geoghegan
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Code review for commit d26888bc4d1e539a82f21382b0000fe5bbf889d9.Tom Lane2014-04-03
| | | | | Mostly, copy-edit the comments; but also fix it to not reject domains over arrays.