aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/varlena.c
Commit message (Collapse)AuthorAge
...
* Simplify and standardize conversions between TEXT datums and ordinary CTom Lane2008-03-25
| | | | | | | | | | | | | | | | | | | | strings. This patch introduces four support functions cstring_to_text, cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and two macros CStringGetTextDatum and TextDatumGetCString. A number of existing macros that provided variants on these themes were removed. Most of the places that need to make such conversions now require just one function or macro call, in place of the multiple notational layers that used to be needed. There are no longer any direct calls of textout or textin, and we got most of the places that were using handmade conversions via memcpy (there may be a few still lurking, though). This commit doesn't make any serious effort to eliminate transient memory leaks caused by detoasting toasted text objects before they reach text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few places where it was easy, but much more could be done. Brendan Jurd and Tom Lane
* Fix varstr_cmp's special case for UTF8 encoding on Windows so that stringsTom Lane2008-03-13
| | | | | | | | | | | | that are reported as "equal" by wcscoll() are checked to see if they really are bitwise equal, and are sorted per strcmp() if not. We made this happen a couple of years ago in the regular code path, but it unaccountably got left out of the Windows/UTF8 case (probably brain fade on my part at the time). As in the prior set of changes, affected users may need to reindex indexes on textual columns. Backpatch as far as 8.2, which is the oldest release we are still supporting on Windows.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian2007-11-15
| | | | avoid this problem in the future.)
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Although I'd misdiagnosed the reason for the recent failures onTom Lane2007-09-22
| | | | | | buildfarm member grebe, I see no reason to revert the 1-byte-header-friendly changes I made in varlena.c. Instead, tweak the code a little bit to get more advantage out of that.
* Fix varlena.c routines to allow 1-byte-header text values. This is nowTom Lane2007-09-22
| | | | | | | demonstrably necessary for text_substring() since regexp_split functions may pass it such a value; and we might as well convert the whole file at once. Per buildfarm results (though I wonder why most machines aren't showing a failure).
* Make replace(), split_part(), and string_to_array() behave somewhat sanelyTom Lane2007-07-19
| | | | | | | | | when handed an invalidly-encoded pattern. The previous coding could get into an infinite loop if pg_mb2wchar_with_len() returned a zero-length string after we'd tested for nonempty pattern; which is exactly what it will do if the string consists only of an incomplete multibyte character. This led to either an out-of-memory error or a backend crash depending on platform. Per report from Wiktor Wodecki.
* Support varlena fields with single-byte headers and unaligned storage.Tom Lane2007-04-06
| | | | | | | | | This commit breaks any code that assumes that the mere act of forming a tuple (without writing it to disk) does not "toast" any fields. While all available regression tests pass, I'm not totally sure that we've fixed every nook and cranny, especially in contrib. Greg Stark with some help from Tom Lane
* Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).Tom Lane2007-02-27
| | | | | | | | | | | Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with VARSIZE and VARDATA, and as a consequence almost no code was using the longer names. Rename the length fields of struct varlena and various derived structures to catch anyplace that was accessing them directly; and clean up various places so caught. In itself this patch doesn't change any behavior at all, but it is necessary infrastructure if we hope to play any games with the representation of varlena headers. Greg Stark and Tom Lane
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.
* Fix performance issues in replace_text(), replace_text_regexp(), andTom Lane2006-11-08
| | | | | | | | | | text_to_array(): they all had O(N^2) behavior on long input strings in multibyte encodings, because of repeated rescanning of the input text to identify substrings whose positions/lengths were computed in characters instead of bytes. Fix by tracking the current source position as a char pointer as well as a character-count. Also avoid some unnecessary palloc operations. text_to_array() also leaked memory intracall due to failure to pfree temporary strings. Per gripe from Tatsuo Ishii.
* Fix string_to_array() to correctly handle the case where there areTom Lane2006-10-07
| | | | | | | | | | | overlapping possible matches for the separator string, such as string_to_array('123xx456xxx789', 'xx'). Also, revise the logic of replace(), split_part(), and string_to_array() to avoid O(N^2) work from redundant searches and conversions to pg_wchar format when there are N matches to the separator string. Backpatched the full patch as far as 8.0. 7.4 also has the bug, but the code has diverged a lot, so I just went for a quick-and-dirty fix of the bug itself in that branch.
* pgindent run for 8.2.Bruce Momjian2006-10-04
|
* Remove 576 references of include files that were not needed.Bruce Momjian2006-07-14
|
* Allow include files to compile own their own.Bruce Momjian2006-07-13
| | | | | | | Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.
* Split definitions for md5.c out of crypt.h and into their own headerTom Lane2006-06-20
| | | | | | | | | libpq/md5.h, so that there's a clear separation between backend-only definitions and shared frontend/backend definitions. (Turns out this is reversing a bad decision from some years ago...) Fix up references to crypt.h as needed. I looked into moving the code into src/port, but the headers in src/include/libpq are sufficiently intertwined that it seems more work than it's worth to do that.
* Change the backend to reject strings containing invalidly-encoded multibyteTom Lane2006-05-21
| | | | | | | | | | | | | | | | | | | | characters in all cases. Formerly we mostly just threw warnings for invalid input, and failed to detect it at all if no encoding conversion was required. The tighter check is needed to defend against SQL-injection attacks as per CVE-2006-2313 (further details will be published after release). Embedded zero (null) bytes will be rejected as well. The checks are applied during input to the backend (receipt from client or COPY IN), so it no longer seems necessary to check in textin() and related routines; any string arriving at those functions will already have been validated. Conversion failure reporting (for characters with no equivalent in the destination encoding) has been cleaned up and made consistent while at it. Also, fix a few longstanding errors in little-used encoding conversion routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic, mic_to_euc_tw were all broken to varying extents. Patches by Tatsuo Ishii and Tom Lane. Thanks to Akio Ishida and Yasuo Ohgaki for identifying the security issues.
* Modify all callers of datatype input and receive functions so that if theseTom Lane2006-04-04
| | | | | | | | | | | | | | | functions are not strict, they will be called (passing a NULL first parameter) during any attempt to input a NULL value of their datatype. Currently, all our input functions are strict and so this commit does not change any behavior. However, this will make it possible to build domain input functions that centralize checking of domain constraints, thereby closing numerous holes in our domain support, as per previous discussion. While at it, I took the opportunity to introduce convenience functions InputFunctionCall, OutputFunctionCall, etc to use in code that calls I/O functions. This eliminates a lot of grotty-looking casts, but the main motivation is to make it easier to grep for these places if we ever need to touch them again.
* Update copyright for 2006. Update scripts.Bruce Momjian2006-03-05
|
* Attached is a patch that replaces a bunch of places where StringInfosNeil Conway2006-03-01
| | | | | | | | | | | | | are unnecessarily allocated on the heap rather than the stack. If the StringInfo doesn't outlive the stack frame in which it is created, there is no need to allocate it on the heap via makeStringInfo() -- stack allocation is faster. While it's not a big deal unless the code is in a critical path, I don't see a reason not to save a few cycles -- using stack allocation is not less readable. I also cleaned up a bit of code along the way: moved variable declarations into a more tightly-enclosing scope where possible, fixed some pointless copying of strings in dblink, etc.
* Fix typo in comment.Neil Conway2006-02-26
|
* Adjust string comparison so that only bitwise-equal strings are consideredTom Lane2005-12-22
| | | | | | | | | | | | equal: if strcoll claims two strings are equal, check it with strcmp, and sort according to strcmp if not identical. This fixes inconsistent behavior under glibc's hu_HU locale, and probably under some other locales as well. Also, take advantage of the now-well-defined behavior to speed up texteq, textne, bpchareq, bpcharne: they may as well just do a bitwise comparison and not bother with strcoll at all. NOTE: affected databases may need to REINDEX indexes on text columns to be sure they are self-consistent.
* Re-run pgindent, fixing a problem where comment lines after a blankBruce Momjian2005-11-22
| | | | | | | | | comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.
* Mop-up for nulls-in-arrays patch: fix some places that access arrayTom Lane2005-11-18
| | | | contents directly.
* Message correctionsPeter Eisentraut2005-10-29
|
* Code review for regexp_replace patch. Improve documentation and comments,Tom Lane2005-10-18
| | | | | | | fix problems with replacement-string backslashes that aren't followed by one of the expected characters, avoid giving the impression that replace_text_regexp() is meant to be called directly as a SQL function, etc.
* Clean up libpq's pollution of application namespace by renaming theTom Lane2005-10-17
| | | | | | exported routines of ip.c, md5.c, and fe-auth.c to begin with 'pg_'. Also get rid of the vestigial fe_setauthsvc/fe_getauthsvc routines altogether.
* Standard pgindent run for 8.1.Bruce Momjian2005-10-15
|
* Suppress signed-vs-unsigned-char warnings.Tom Lane2005-09-24
|
* Update two comments to refer to use the new list API names.Neil Conway2005-09-16
|
* The idea of using _strncoll() on Windows doesn't work. Revert to sameTom Lane2005-08-26
| | | | code as we use on other platforms when encoding is not UTF8.
* Add small hack to support use of Unicode-based locales on WIN32. ThisTom Lane2005-08-24
| | | | | is not adequately tested yet, but let's get it into beta1 so it can be tested. Magnus Hagander and Tom Lane.
* Code and docs review for pg_column_size() patch.Tom Lane2005-08-02
|
* Thank you for applying patch --- regexp_replace.Bruce Momjian2005-07-29
| | | | | | | | | | An attached patch is a small additional improvement. This patch use appendStringInfoText instead of appendStringInfoString. There is an overhead of PG_TEXT_GET_STR when appendStringInfoString is executed by text type. This can be reduced by appendStringInfoText. Atsushi Ogawa
* Remove unnecessary parentheses in assignments.Bruce Momjian2005-07-21
| | | | | Add spaces where needed. Reference time interval variables as tinterval.
* Change typreceive function API so that receive functions get the sameTom Lane2005-07-10
| | | | | | | optional arguments as text input functions, ie, typioparam OID and atttypmod. Make all the datatypes that use typmod enforce it the same way in typreceive as they do in typinput. This fixes a problem with failure to enforce length restrictions during COPY FROM BINARY.
* I made the patch that implements regexp_replace again.Bruce Momjian2005-07-10
| | | | | | | | | | | | | | | | | | | | | | | The specification of this function is as follows. regexp_replace(source text, pattern text, replacement text, [flags text]) returns text Replace string that matches to regular expression in source text to replacement text. - pattern is regular expression pattern. - replacement is replace string that can use '\1'-'\9', and '\&'. '\1'-'\9': back reference to the n'th subexpression. '\&' : entire matched string. - flags can use the following values: g: global (replace all) i: ignore case When the flags is not specified, case sensitive, replace the first instance only. Atsushi Ogawa
* pg_column_size() cleanup for messages and code cleanup.Bruce Momjian2005-07-07
| | | | Mark Kirkwood
* Add pg_column_size() to return storage size of a column, includingBruce Momjian2005-07-06
| | | | | | possible compression. Mark Kirkwood
* I made the patch that improved the performance of replace_text().Bruce Momjian2005-07-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The content of the patch is as follows: (1)Create shortcut when subtext was not found. (2)Stop using LEFT and RIGHT macro. In LEFT and RIGHT macro, TEXTPOS is executed by the same content as execution immediately before. The execution frequency of TEXTPOS can be reduced by using text_substring instead of LEFT and RIGHT macro. (3)Add appendStringInfoText, and use it instead of appendStringInfoString. There is an overhead of PG_TEXT_GET_STR when appendStringInfoString is executed by text type. This can be reduced by appendStringInfoText. (4)Reduce execution of TEXTDUP. The effect of the patch that I measured is as follows: - The Data for test was created by 'pgbench -i'. - Test SQL: select replace(aid, '9', 'A') from accounts; - Test results: Linux(CPU: Pentium III, Compiler option: -O2) original: 1.515s patched: 1.250s Atsushi Ogawa
* Change the UNKNOWN type to have an internal representation matchingTom Lane2005-05-30
| | | | | cstring, rather than text, so as to eliminate useless conversions inside the parser. Per recent discussion.
* Remove second argument from textToQualifiedNameList(), as it is no longerNeil Conway2005-05-27
| | | | used. From Jaime Casanova.
* Implement md5(bytea), update regression tests and documentation. PatchNeil Conway2005-05-20
| | | | | | | | from Abhijit Menon-Sen, minor editorialization from Neil Conway. Also, improve md5(text) to allocate a constant-sized buffer on the stack rather than via palloc. Catalog version bumped.
* Change CREATE TYPE to require datatype output and send functions to haveTom Lane2005-05-01
| | | | | | | only one argument. (Per recent discussion, the option to accept multiple arguments is pretty useless for user-defined types, and would be a likely source of security holes if it was used.) Simplify call sites of output/send functions to not bother passing more than one argument.
* This patch optimizes the md5_text() function (which is used toNeil Conway2005-02-23
| | | | | | | | | | | | | | | | | | | | | | | | implement the md5() SQL-level function). The old code did the following: 1. de-toast the datum 2. convert it to a cstring via textout() 3. get the length of the cstring via strlen() Since we are treating the datum context as a blob of binary data, the latter two steps are unnecessary. Once the data has been detoasted, we can just use it as-is, and derive its length from the varlena metadata. This patch improves some run-of-the-mill md5() computations by just under 10% in my limited tests, and passes the regression tests. I also noticed that md5_text() wasn't checking the return value of md5_hash(); encountering OOM at precisely the right moment could result in returning a random md5 hash. This patch corrects that. A better fix would be to make md5_hash() only return on success (and/or allocate via palloc()), but since it's used in the frontend as well I don't see an easy way to do that.
* Tag appropriate files for rc3PostgreSQL Daemon2004-12-31
| | | | | | | | Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...
* Pgindent run for 8.0.Bruce Momjian2004-08-29
|
* Update copyright to 2004.Bruce Momjian2004-08-29
|
* Infrastructure for I/O of composite types: arrange for the I/O routinesTom Lane2004-06-06
| | | | | | | | | | of a composite type to get that type's OID as their second parameter, in place of typelem which is useless. The actual changes are mostly centralized in getTypeInputInfo and siblings, but I had to fix a few places that were fetching pg_type.typelem for themselves instead of using the lsyscache.c routines. Also, I renamed all the related variables from 'typelem' to 'typioparam' to discourage people from assuming that they necessarily contain array element types.