| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
load_libraries(), which processes the various xxx_preload_libraries GUCs,
was parsing them using SplitIdentifierString() which isn't really
appropriate for values that could be path names: it downcases unquoted
text, and it doesn't allow embedded whitespace unless quoted.
Use SplitDirectoriesString() instead. That also allows us to simplify
load_libraries() a bit, since canonicalize_path() is now done for it.
While this definitely seems like a bug fix, it has the potential to
break configuration settings that accidentally worked before because
of the downcasing behavior. Also, there's an easy workaround for the
bug, namely to double-quote troublesome text. Hence, no back-patch.
QL Zhuo, tweaked a bit by me
Discussion: https://postgr.es/m/CAB-oJtxHVDc3H+Km3CjB9mY1VDzuyaVH_ZYSz7iXcRqCtb93Ew@mail.gmail.com
|
|
|
|
|
|
|
|
| |
Windows uses a separate code path for libc locales. The code previously
ended up there also if an ICU collation should be used, leading to a
crash.
Reported-by: Ashutosh Sharma <ashu.coek88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces isspace() calls with scanner_isspace() in functions
that are likely to be presented with non-ASCII input. isspace() has
the small advantage that it will correctly recognize no-break space
in single-byte encodings (such as LATIN1); but it cannot work successfully
for any multibyte character, and depending on platform it might return
false positive results for some fragments of multibyte characters. That's
disastrous for functions that are trying to discard whitespace between
valid strings, as noted in bug #14662 from Justin Muise. Even treating
no-break space as whitespace is pretty questionable for the usages touched
here, because the core scanner would think it is an identifier character.
Affected functions are parse_ident(), parseNameAndArgTypes (underlying
regprocedurein() and siblings), SplitIdentifierString (used for parsing
GUCs and options that are qualified names or lists of names), and
SplitDirectoriesString (used for parsing GUCs that are lists of
directories).
All the functions adjusted here are parsing SQL identifiers and similar
constructs, so it's reasonable to insist that their definition of
whitespace match the core scanner. So we can hope that this won't cause
many backwards-compatibility problems. I've left alone isspace() calls
in places that aren't really expecting any non-ASCII input characters,
such as float8in().
Back-patch to all supported branches.
Discussion: https://postgr.es/m/10129.1495302480@sss.pgh.pa.us
|
|
|
|
| |
perltidy run not included.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
create_singleton_array() was not really as useful as we perhaps thought
when we added it. It had never accreted more than one call site, and is
only saving a dozen lines of code at that one, which is considerably less
bulk than the function itself. Moreover, because of its insistence on
using the caller's fn_extra cache space, it's arguably a coding hazard.
text_to_array_internal() does not currently use fn_extra in any other way,
but if it did it would be subtly broken, since the conflicting fn_extra
uses could be needed within a single query, in the seldom-tested case that
the field separator varies during the query. The same objection seems
likely to apply to any other potential caller.
The replacement code is a bit uglier, because it hardwires knowledge of
the storage parameters of type TEXT, but it's not like we haven't got
dozens or hundreds of other places that do the same. Uglier seems like
a good tradeoff for smaller, faster, and safer.
Per discussion with Neha Khatri.
Discussion: https://postgr.es/m/CAFO0U+_fS5SRhzq6uPG+4fbERhoA9N2+nPrtvaC9mmeWivxbsA@mail.gmail.com
|
|
|
|
| |
Author: David Rowley <david.rowley@2ndquadrant.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a column collprovider to pg_collation that determines which library
provides the collation data. The existing choices are default and libc,
and this adds an icu choice, which uses the ICU4C library.
The pg_locale_t type is changed to a union that contains the
provider-specific locale handles. Users of locale information are
changed to look into that struct for the appropriate handle to use.
Also add a collversion column that records the version of the collation
when it is created, and check at run time whether it is still the same.
This detects potentially incompatible library upgrades that can corrupt
indexes and other structures. This is currently only supported by
ICU-provided collations.
initdb initializes the default collation set as before from the `locale
-a` output but also adds all available ICU locales with a "-x-icu"
appended.
Currently, ICU-provided collations can only be explicitly named
collations. The global database locales are still always libc-provided.
ICU support is enabled by configure --with-icu.
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes almost all core code follow the policy introduced in the
previous commit. Specific decisions:
- Text search support functions with char* and length arguments, such as
prsstart and lexize, may receive unaligned strings. I doubt
maintainers of non-core text search code will notice.
- Use plain VARDATA() on values detoasted or synthesized earlier in the
same function. Use VARDATA_ANY() on varlenas sourced outside the
function, even if they happen to always have four-byte headers. As an
exception, retain the universal practice of using VARDATA() on return
values of SendFunctionCall().
- Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large
for a one-byte header, so this misses no optimization.) Sites that do
not call get_page_from_raw() typically need the four-byte alignment.
- For now, do not change btree_gist. Its use of four-byte headers in
memory is partly entangled with storage of 4-byte headers inside
GBT_VARKEY, on disk.
- For now, do not change gtrgm_consistent() or gtrgm_distance(). They
incorporate the varlena header into a cache, and there are multiple
credible implementation strategies to consider.
|
|
|
|
| |
neha khatri
|
|
|
|
| |
This avoids that builtins.h has to include additional header files.
|
|
|
|
|
|
|
|
|
| |
This avoids additional translatable strings for each distinct type, as
well as making our quoting style around type names more consistent
(namely, that we don't quote type names). This continues what started
as f402b9950120.
Discussion: https://postgr.es/m/20160401170642.GA57509@alvherre.pgsql
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We weren't terribly consistent about whether to call Apple's OS "OS X"
or "Mac OS X", and the former is probably confusing to people who aren't
Apple users. Now that Apple has rebranded it "macOS", follow their lead
to establish a consistent naming pattern. Also, avoid the use of the
ancient project name "Darwin", except as the port code name which does not
seem desirable to change. (In short, this patch touches documentation and
comments, but no actual code.)
I didn't touch contrib/start-scripts/osx/, either. I suspect those are
obsolete and due for a rewrite, anyway.
I dithered about whether to apply this edit to old release notes, but
those were responsible for quite a lot of the inconsistencies, so I ended
up changing them too. Anyway, Apple's being ahistorical about this,
so why shouldn't we be?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When building libpq, ip.c and md5.c were symlinked or copied from
src/backend/libpq into src/interfaces/libpq, but now that we have a
directory specifically for routines that are shared between the server and
client binaries, src/common/, move them there.
Some routines in ip.c were only used in the backend. Keep those in
src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file
that's now in common.
Fix the comment in src/common/Makefile to reflect how libpq actually links
those files.
There are two more files that libpq symlinks directly from src/backend:
encnames.c and wchar.c. I don't feel compelled to move those right now,
though.
Patch by Michael Paquier, with some changes by me.
Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>
|
|
|
|
| |
07d25a964 changed only one occurrence. Change the other one as well.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, every version of glibc thus far tested has bugs whereby
strcoll() ordering does not match strxfrm() ordering as required by
the standard. This can result in, for example, corrupted indexes.
Disabling abbreviated keys in these cases slows down non-C-collation
string sorting considerably, but there seems to be no practical
alternative. Users who are confident that their libc implementations
are solid in this regard can re-enable the optimization by compiling
with TRUST_STRXFRM.
Users who have built indexes using PostgreSQL 9.5 or PostgreSQL 9.5.1
should REINDEX if there is a possibility that they may have been
affected by this problem.
Report by Marc-Olaf Jaschke. Investigation mostly by Tom Lane, with
help from Peter Geoghegan, Noah Misch, Stephen Frost, and me. Patch
by me, reviewed by Peter Geoghegan and Tom Lane.
|
|
|
|
|
|
| |
Clarify invalid format conversion type error message and add hint.
Author: Jim Nasby
|
|
|
|
| |
Just to make sure previous commit worked ...
|
|
|
|
|
|
|
|
|
|
| |
Since pgindent treats typedef names as global, the original coding of
b47b4dbf683f13e6 would have had rather nasty effects on the formatting
of other files in which "string" is used as a variable or field name.
Use a less generic name for this typedef, and rename some other
identifiers to match.
Peter Geoghegan, per gripe from me
|
|
|
|
| |
Peter Geoghegan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Have varlena.c expose an interface that allows the char(n), bytea, and
bpchar types to piggyback on a now-generalized SortSupport for text.
This pushes a little more knowledge of the bpchar/char(n) type into
varlena.c than might be preferred, but that seems like the approach
that creates least friction. Also speed things up for index builds
that use text_pattern_ops or varchar_pattern_ops.
This patch does quite a bit of renaming, but it seems likely to be
worth it, so as to avoid future confusion about the fact that this code
is now more generally used than the old names might have suggested.
Peter Geoghegan, reviewed by Álvaro Herrera and Andreas Karlsson,
with small tweaks by me.
|
|
|
|
| |
Backpatch certain files through 9.1
|
|
|
|
| |
'\"' is more commonly written simply as '"'.
|
|
|
|
| |
Peter Geoghegan
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 0e57b4d8bd9674adaf5747421b3255b85e385534 contained some clever
logic that attempted to make sure that we couldn't get confused about
whether the last thing we cached was a strcoll() result or a strxfrm()
result, but it wasn't quite clever enough, because we can perform
further abbreviations after having already performed some comparisons.
Introduce an explicit flag in the hopes of making this watertight.
Peter Geoghegan, reviewed by me.
|
|
|
|
| |
Peter Geoghegan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cache strxfrm() blobs across calls made to the text SortSupport
abbreviation routine. This can speed up sorting if the same string
needs to be abbreviated many times in a row.
Also, cache the result of the previous strcoll() comparison, so that
if we're asked to compare the same strings agin, we do need to call
strcoll() again.
Perhaps surprisingly, these optimizations don't seem to hurt even when
they don't help. memcmp() is really cheap compared to strcoll() or
strxfrm().
Peter Geoghegan, reviewed by me.
|
|
|
|
|
|
|
| |
If we do some byte-swapping while abbreviating, we can do comparisons
using integer arithmetic rather than memcmp.
Peter Geoghegan, reviewed and slightly revised by me.
|
|
|
|
|
|
|
| |
Without this, we might access memory that's already been freed, or
leak memory if in the C locale.
Peter Geoghegan
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.
Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.
Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
|
|
|
|
|
|
|
|
| |
This is consistent with what the new numeric suppor for abbreviated keys
now does, and seems much more convenient than having a separate compiler
define to control this debug output.
Peter Geoghegan
|
|
|
|
|
|
|
|
|
| |
Be more aggressive about aborting early on if it looks like it's not
helping, but be less aggressive about aborting later on, since it's
more expensive at that point, and also since we're currently aborting
in some cases where abbreviation can still deliver a substantial win.
Peter Geoghegan. Extensive testing by Tomas Vondra.
|
|
|
|
|
|
|
| |
These were inadvertently ommitted from the commit that introduced
abbreviated keys, commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23.
Peter Geoghegan
|
|
|
|
|
|
|
|
| |
Commit 1be4eb1b2d436d1375899c74e4c74486890d8777 disabled this, but I
think the real problem here was fixed by commit
b181a91981203f6ec9403115a2917bd3f9473707 and commit
d060e07fa919e0eb681e2fa2cfbe63d6c40eb2cf. So let's try re-enabling
it now and see what happens.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
strncpy() has a well-deserved reputation for being unsafe, so make an
effort to get rid of nearly all occurrences in HEAD.
A large fraction of the remaining uses were passing length less than or
equal to the known strlen() of the source, in which case no null-padding
can occur and the behavior is equivalent to memcpy(), though doubtless
slower and certainly harder to reason about. So just use memcpy() in
these cases.
In other cases, use either StrNCpy() or strlcpy() as appropriate (depending
on whether padding to the full length of the destination buffer seems
useful).
I left a few strncpy() calls alone in the src/timezone/ code, to keep it
in sync with upstream (the IANA tzcode distribution). There are also a
few such calls in ecpg that could possibly do with more analysis.
AFAICT, none of these changes are more than cosmetic, except for the four
occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength
source leads to a non-null-terminated destination buffer and ensuing
misbehavior. These don't seem like security issues, first because no stack
clobber is possible and second because if your values of sslcert etc are
coming from untrusted sources then you've got problems way worse than this.
Still, it's undesirable to have unpredictable behavior for overlength
inputs, so back-patch those four changes to all active branches.
|
|
|
|
| |
Peter Geoghegan
|
|
|
|
|
|
| |
The split between which things need to happen in the C-locale case and
which needed to happen in the locale-aware case was a few bricks short
of a load. Try to fix that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First, when LC_COLLATE = C, bttext_abbrev_convert should use memcpy()
rather than strxfrm() to construct the abbreviated key, because the
authoritative comparator uses memcpy(). If we do anything else here,
we might get inconsistent answers, and the buildfarm says this risk
is not theoretical. It should be faster this way, too.
Second, while I'm looking at bttext_abbrev_convert, convert a needless
use of goto into the loop it's trying to implement into an actual
loop.
Both of the above problems date to the original commit of abbreviated
keys, commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23.
Third, fix a bogus assignment to tss->locale before tss is set up.
That's a new goof in commit b529b65d1bf8537ca7fa024760a9782d7c8b66e5.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23, this function
only had one job, which was to decide whether we could avoid trampolining
through the fmgr layer when performing sort comparisons. As of that
commit, it has a second job, which is to decide whether we can use
abbreviated keys. Unfortunately, those two tasks are somewhat intertwined
in the existing coding, which is likely why neither Peter Geoghegan nor
I noticed prior to commit that this calls pg_newlocale_from_collation() in
cases where it didn't previously. The buildfarm noticed, though.
To fix, rewrite the logic so that the decision as to which comparator to
use is more cleanly separated from the decision about abbreviation.
|
|
|
|
|
|
|
|
|
|
| |
Most of the Windows buildfarm members (bowerbird, hamerkop, currawong,
jacana, brolga) are unhappy with yesterday's abbreviated keys patch,
although there are some (narwhal, frogmouth) that seem OK with it.
Since there's no obvious pattern to explain why some are working and
others are failing, just disable this across-the-board on Windows for
now. This is a bit unfortunate since the optimization will be a big
win in some cases, but we can't leave the buildfarm broken.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit extends the SortSupport infrastructure to allow operator
classes the option to provide abbreviated representations of Datums;
in the case of text, we abbreviate by taking the first few characters
of the strxfrm() blob. If the abbreviated comparison is insufficent
to resolve the comparison, we fall back on the normal comparator.
This can be much faster than the old way of doing sorting if the
first few bytes of the string are usually sufficient to resolve the
comparison.
There is the potential for a performance regression if all of the
strings to be sorted are identical for the first 8+ characters and
differ only in later positions; therefore, the SortSupport machinery
now provides an infrastructure to abort the use of abbreviation if
it appears that abbreviation is producing comparatively few distinct
keys. HyperLogLog, a streaming cardinality estimator, is included in
this commit and used to make that determination for text.
Peter Geoghegan, reviewed by me.
|
|
|
|
| |
Backpatch certain files through 9.0
|
|
|
|
|
|
|
|
| |
The hope is that we can use this to produce better diagnostics in
some cases.
Peter Geoghegan, reviewed by Michael Paquier, with some further
changes by me.
|
|
|
|
|
|
|
|
|
| |
Testing reveals that that doing a memcmp() before the strcoll() costs
practically nothing, at least on the systems we tested, and it speeds
up sorts containing many equal strings significatly.
Peter Geoghegan. Review by myself and Heikki Linnakangas. Comments
rewritten by me.
|
|
|
|
| |
Spotted by Peter Geoghegan.
|
|
|
|
|
|
|
| |
This provides a small but worthwhile speedup when sorting text, at least
in cases to which the sortsupport machinery applies.
Robert Haas and Peter Geoghegan
|
|
|
|
|
| |
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
|
|
|
|
|
| |
Mostly, copy-edit the comments; but also fix it to not reject domains over
arrays.
|