| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
strings. This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString. A number of
existing macros that provided variants on these themes were removed.
Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed. There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).
This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.
Brendan Jurd and Tom Lane
|
|
|
|
|
|
|
|
|
|
|
|
| |
that are reported as "equal" by wcscoll() are checked to see if they really
are bitwise equal, and are sorted per strcmp() if not. We made this happen
a couple of years ago in the regular code path, but it unaccountably got
left out of the Windows/UTF8 case (probably brain fade on my part at the
time). As in the prior set of changes, affected users may need to reindex
indexes on textual columns.
Backpatch as far as 8.2, which is the oldest release we are still supporting
on Windows.
|
| |
|
|
|
|
| |
avoid this problem in the future.)
|
| |
|
|
|
|
|
|
| |
buildfarm member grebe, I see no reason to revert the 1-byte-header-friendly
changes I made in varlena.c. Instead, tweak the code a little bit to
get more advantage out of that.
|
|
|
|
|
|
|
| |
demonstrably necessary for text_substring() since regexp_split functions
may pass it such a value; and we might as well convert the whole file
at once. Per buildfarm results (though I wonder why most machines aren't
showing a failure).
|
|
|
|
|
|
|
|
|
| |
when handed an invalidly-encoded pattern. The previous coding could get
into an infinite loop if pg_mb2wchar_with_len() returned a zero-length
string after we'd tested for nonempty pattern; which is exactly what it
will do if the string consists only of an incomplete multibyte character.
This led to either an out-of-memory error or a backend crash depending
on platform. Per report from Wiktor Wodecki.
|
|
|
|
|
|
|
|
|
| |
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields. While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.
Greg Stark with some help from Tom Lane
|
|
|
|
|
|
|
|
|
|
|
| |
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names. Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught. In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
|
|
|
|
| |
back-stamped for this.
|
|
|
|
|
|
|
|
|
|
| |
text_to_array(): they all had O(N^2) behavior on long input strings in
multibyte encodings, because of repeated rescanning of the input text to
identify substrings whose positions/lengths were computed in characters
instead of bytes. Fix by tracking the current source position as a char
pointer as well as a character-count. Also avoid some unnecessary palloc
operations. text_to_array() also leaked memory intracall due to failure
to pfree temporary strings. Per gripe from Tatsuo Ishii.
|
|
|
|
|
|
|
|
|
|
|
| |
overlapping possible matches for the separator string, such as
string_to_array('123xx456xxx789', 'xx').
Also, revise the logic of replace(), split_part(), and string_to_array()
to avoid O(N^2) work from redundant searches and conversions to pg_wchar
format when there are N matches to the separator string.
Backpatched the full patch as far as 8.0. 7.4 also has the bug, but the
code has diverged a lot, so I just went for a quick-and-dirty fix of the
bug itself in that branch.
|
| |
|
| |
|
|
|
|
|
|
|
| |
Strip unused include files out unused include files, and add needed
includes to C files.
The next step is to remove unused include files in C files.
|
|
|
|
|
|
|
|
|
| |
libpq/md5.h, so that there's a clear separation between backend-only
definitions and shared frontend/backend definitions. (Turns out this
is reversing a bad decision from some years ago...) Fix up references
to crypt.h as needed. I looked into moving the code into src/port, but
the headers in src/include/libpq are sufficiently intertwined that it
seems more work than it's worth to do that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
characters in all cases. Formerly we mostly just threw warnings for invalid
input, and failed to detect it at all if no encoding conversion was required.
The tighter check is needed to defend against SQL-injection attacks as per
CVE-2006-2313 (further details will be published after release). Embedded
zero (null) bytes will be rejected as well. The checks are applied during
input to the backend (receipt from client or COPY IN), so it no longer seems
necessary to check in textin() and related routines; any string arriving at
those functions will already have been validated. Conversion failure
reporting (for characters with no equivalent in the destination encoding)
has been cleaned up and made consistent while at it.
Also, fix a few longstanding errors in little-used encoding conversion
routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic,
mic_to_euc_tw were all broken to varying extents.
Patches by Tatsuo Ishii and Tom Lane. Thanks to Akio Ishida and Yasuo Ohgaki
for identifying the security issues.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
functions are not strict, they will be called (passing a NULL first parameter)
during any attempt to input a NULL value of their datatype. Currently, all
our input functions are strict and so this commit does not change any
behavior. However, this will make it possible to build domain input functions
that centralize checking of domain constraints, thereby closing numerous holes
in our domain support, as per previous discussion.
While at it, I took the opportunity to introduce convenience functions
InputFunctionCall, OutputFunctionCall, etc to use in code that calls I/O
functions. This eliminates a lot of grotty-looking casts, but the main
motivation is to make it easier to grep for these places if we ever need
to touch them again.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
are unnecessarily allocated on the heap rather than the stack. If the
StringInfo doesn't outlive the stack frame in which it is created,
there is no need to allocate it on the heap via makeStringInfo() --
stack allocation is faster. While it's not a big deal unless the
code is in a critical path, I don't see a reason not to save a few
cycles -- using stack allocation is not less readable.
I also cleaned up a bit of code along the way: moved variable
declarations into a more tightly-enclosing scope where possible,
fixed some pointless copying of strings in dblink, etc.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
equal: if strcoll claims two strings are equal, check it with strcmp, and
sort according to strcmp if not identical. This fixes inconsistent
behavior under glibc's hu_HU locale, and probably under some other locales
as well. Also, take advantage of the now-well-defined behavior to speed up
texteq, textne, bpchareq, bpcharne: they may as well just do a bitwise
comparison and not bother with strcoll at all.
NOTE: affected databases may need to REINDEX indexes on text columns to be
sure they are self-consistent.
|
|
|
|
|
|
|
|
|
| |
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
|
|
|
|
| |
contents directly.
|
| |
|
|
|
|
|
|
|
| |
fix problems with replacement-string backslashes that aren't followed by
one of the expected characters, avoid giving the impression that
replace_text_regexp() is meant to be called directly as a SQL function,
etc.
|
|
|
|
|
|
| |
exported routines of ip.c, md5.c, and fe-auth.c to begin with 'pg_'.
Also get rid of the vestigial fe_setauthsvc/fe_getauthsvc routines
altogether.
|
| |
|
| |
|
| |
|
|
|
|
| |
code as we use on other platforms when encoding is not UTF8.
|
|
|
|
|
| |
is not adequately tested yet, but let's get it into beta1 so it can be
tested. Magnus Hagander and Tom Lane.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
An attached patch is a small additional improvement.
This patch use appendStringInfoText instead of appendStringInfoString.
There is an overhead of PG_TEXT_GET_STR when appendStringInfoString is
executed by text type. This can be reduced by appendStringInfoText.
Atsushi Ogawa
|
|
|
|
|
| |
Add spaces where needed.
Reference time interval variables as tinterval.
|
|
|
|
|
|
|
| |
optional arguments as text input functions, ie, typioparam OID and
atttypmod. Make all the datatypes that use typmod enforce it the same
way in typreceive as they do in typinput. This fixes a problem with
failure to enforce length restrictions during COPY FROM BINARY.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The specification of this function is as follows.
regexp_replace(source text, pattern text, replacement text, [flags
text])
returns text
Replace string that matches to regular expression in source text to
replacement text.
- pattern is regular expression pattern.
- replacement is replace string that can use '\1'-'\9', and '\&'.
'\1'-'\9': back reference to the n'th subexpression.
'\&' : entire matched string.
- flags can use the following values:
g: global (replace all)
i: ignore case
When the flags is not specified, case sensitive, replace the first
instance only.
Atsushi Ogawa
|
|
|
|
| |
Mark Kirkwood
|
|
|
|
|
|
| |
possible compression.
Mark Kirkwood
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The content of the patch is as follows:
(1)Create shortcut when subtext was not found.
(2)Stop using LEFT and RIGHT macro.
In LEFT and RIGHT macro, TEXTPOS is executed by the same content as
execution immediately before. The execution frequency of TEXTPOS can be
reduced by using text_substring instead of LEFT and RIGHT macro.
(3)Add appendStringInfoText, and use it instead of
appendStringInfoString.
There is an overhead of PG_TEXT_GET_STR when appendStringInfoString is
executed by text type. This can be reduced by appendStringInfoText.
(4)Reduce execution of TEXTDUP.
The effect of the patch that I measured is as follows:
- The Data for test was created by 'pgbench -i'.
- Test SQL:
select replace(aid, '9', 'A') from accounts;
- Test results: Linux(CPU: Pentium III, Compiler option: -O2)
original: 1.515s
patched: 1.250s
Atsushi Ogawa
|
|
|
|
|
| |
cstring, rather than text, so as to eliminate useless conversions
inside the parser. Per recent discussion.
|
|
|
|
| |
used. From Jaime Casanova.
|
|
|
|
|
|
|
|
| |
from Abhijit Menon-Sen, minor editorialization from Neil Conway. Also,
improve md5(text) to allocate a constant-sized buffer on the stack
rather than via palloc.
Catalog version bumped.
|
|
|
|
|
|
|
| |
only one argument. (Per recent discussion, the option to accept multiple
arguments is pretty useless for user-defined types, and would be a likely
source of security holes if it was used.) Simplify call sites of
output/send functions to not bother passing more than one argument.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
implement the md5() SQL-level function). The old code did the
following:
1. de-toast the datum
2. convert it to a cstring via textout()
3. get the length of the cstring via strlen()
Since we are treating the datum context as a blob of binary data,
the latter two steps are unnecessary. Once the data has been
detoasted, we can just use it as-is, and derive its length from
the varlena metadata.
This patch improves some run-of-the-mill md5() computations by
just under 10% in my limited tests, and passes the regression tests.
I also noticed that md5_text() wasn't checking the return value
of md5_hash(); encountering OOM at precisely the right moment
could result in returning a random md5 hash. This patch corrects
that. A better fix would be to make md5_hash() only return on
success (and/or allocate via palloc()), but since it's used in
the frontend as well I don't see an easy way to do that.
|
|
|
|
|
|
|
|
| |
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
of a composite type to get that type's OID as their second parameter,
in place of typelem which is useless. The actual changes are mostly
centralized in getTypeInputInfo and siblings, but I had to fix a few
places that were fetching pg_type.typelem for themselves instead of
using the lsyscache.c routines. Also, I renamed all the related variables
from 'typelem' to 'typioparam' to discourage people from assuming that
they necessarily contain array element types.
|