aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/encode.c
Commit message (Collapse)AuthorAge
* Speed up hex_encode with bytewise lookupJohn Naylor2025-01-17
| | | | | | | | | | | Previously, hex_encode looked up each nibble of the input separately. We now use a larger lookup table containing the two-byte encoding of every possible input byte, resulting in a 1/3 reduction in encoding time. Reviewed by Tom Lane, Michael Paquier, Nathan Bossart, David Rowley Discussion: https://postgr.es/m/CANWCAZZvXuJMgqMN4u068Yqa19CEjS31tQKZp_qFFFbgYfaXqQ%40mail.gmail.com
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Fix small overestimation of base64 encoding output length.Tom Lane2023-06-08
| | | | | | | | | | | | | | | | pg_base64_enc_len() and its clones overestimated the output length by up to 2 bytes, as a result of sloppy thinking about where to divide. No callers require a precise estimate, so this has no consequences worse than palloc'ing a byte or two more than necessary. We might as well get it right though. This bug is very ancient, dating to commit 79d78bb26 which added encode.c. (The other instances were presumably copied from there.) Still, it doesn't quite seem worth back-patching. Oleg Tselebrovskiy Discussion: https://postgr.es/m/f94da55286a63022150bc266afdab754@postgrespro.ru
* New header varatt.h split off from postgres.hPeter Eisentraut2023-01-10
| | | | | | | | | This new header contains all the variable-length data types support (TOAST support) from postgres.h, which isn't needed by large parts of the backend code. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/ddcce239-0f29-6e62-4b47-1f8ca742addf%40enterprisedb.com
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Convert a few more datatype input functions to report errors softly.Tom Lane2022-12-14
| | | | | | | Convert the remaining string-category input functions (bpcharin, varcharin, byteain) to the new style. Discussion: https://postgr.es/m/3038346.1671060258@sss.pgh.pa.us
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Revert refactoring of hex code to src/common/Michael Paquier2021-08-19
| | | | | | | | | | | | | | | | | | | | | | | This is a combined revert of the following commits: - c3826f8, a refactoring piece that moved the hex decoding code to src/common/. This code was cleaned up by aef8948, as it originally included no overflow checks in the same way as the base64 routines in src/common/ used by SCRAM, making it unsafe for its purpose. - aef8948, a more advanced refactoring of the hex encoding/decoding code to src/common/ that added sanity checks on the result buffer for hex decoding and encoding. As reported by Hans Buschmann, those overflow checks are expensive, and it is possible to see a performance drop in the decoding/encoding of bytea or LOs the longer they are. Simple SQLs working on large bytea values show a clear difference in perf profile. - ccf4e27, a cleanup made possible by aef8948. The reverts of all those commits bring back the performance of hex decoding and encoding back to what it was in ~13. Fow now and post-beta3, this is the simplest option. Reported-by: Hans Buschmann Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net Backpatch-through: 14
* Rework refactoring of hex and encoding routinesMichael Paquier2021-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit addresses some issues with c3826f83 that moved the hex decoding routine to src/common/: - The decoding function lacked overflow checks, so when used for security-related features it was an open door to out-of-bound writes if not carefully used that could remain undetected. Like the base64 routines already in src/common/ used by SCRAM, this routine is reworked to check for overflows by having the size of the destination buffer passed as argument, with overflows checked before doing any writes. - The encoding routine was missing. This is moved to src/common/ and it gains the same overflow checks as the decoding part. On failure, the hex routines of src/common/ issue an error as per the discussion done to make them usable by frontend tools, but not by shared libraries. Note that this is why ECPG is left out of this commit, and it still includes a duplicated logic doing hex encoding and decoding. While on it, this commit uses better variable names for the source and destination buffers in the existing escape and base64 routines in encode.c and it makes them more robust to overflow detection. The previous core code issued a FATAL after doing out-of-bound writes if going through the SQL functions, which would be enough to detect problems when working on changes that impacted this area of the code. Instead, an error is issued before doing an out-of-bound write. The hex routines were being directly called for bytea conversions and backup manifests without such sanity checks. The current calls happen to not have any problems, but careless uses of such APIs could easily lead to CVE-class bugs. Author: Bruce Momjian, Michael Paquier Reviewed-by: Sehrope Sarkuni Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* move hex_decode() to /common so it can be called from frontendBruce Momjian2020-12-24
| | | | | | | This allows removal of a copy of hex_decode() from ecpg, and will be used by the soon-to-be added pg_alterckey command. Backpatch-through: master
* Avoid using %c printf format for potentially non-ASCII characters.Tom Lane2020-06-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since %c only passes a C "char" to printf, it's incapable of dealing with multibyte characters. Passing just the first byte of such a character leads to an output string that is visibly not correctly encoded, resulting in undesirable behavior such as encoding conversion failures while sending error messages to clients. We've lived with this issue for a long time because it was inconvenient to avoid in a portable fashion. However, now that we always use our own snprintf code, it's reasonable to use the %.*s format to print just one possibly-multibyte character in a string. (We previously avoided that obvious-looking answer in order to work around glibc's bug #6530, cf commits 54cd4f045 and ed437e2b2.) Hence, run around and fix a bunch of places that used %c to report a character found in a user-supplied string. For simplicity, I did not touch places that were emitting non-user-facing debug messages, or reporting catalog data that should always be ASCII. (It's also unclear how useful this approach could be in frontend code, where it's less certain that we know what encoding we're dealing with.) In passing, improve a couple of poorly-written error messages in pageinspect/heapfuncs.c. This is a longstanding issue, but I'm hesitant to back-patch because of the impact on translatable message strings. In any case this fix would not work reliably before v12. Tom Lane and Quan Zongliang Discussion: https://postgr.es/m/a120087c-4c88-d9d4-1ec5-808d7a7f133d@gmail.com
* Adjust bytea get_bit/set_bit to use int8 not int4 for bit numbering.Tom Lane2020-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since the existing bit number argument can't exceed INT32_MAX, it's not possible for these functions to manipulate bits beyond the first 256MB of a bytea value. Lift that restriction by redeclaring the bit number arguments as int8 (which requires a catversion bump, hence is not back-patchable). The similarly-named functions for bit/varbit don't really have a problem because we restrict those types to at most VARBITMAXLEN bits; hence leave them alone. While here, extend the encode/decode functions in utils/adt/encode.c to allow dealing with values wider than 1GB. This is not a live bug or restriction in current usage, because no input could be more than 1GB, and since none of the encoders can expand a string more than 4X, the result size couldn't overflow uint32. But it might be desirable to support more in future, so make the input length values size_t and the potential-output-length values uint64. Also add some test cases to improve the miserable code coverage of these functions. Movead Li, editorialized some by me; also reviewed by Ashutosh Bapat Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Rename base64 routines to avoid conflict with Solaris built-in functions.Tom Lane2018-02-28
| | | | | | | | | | | | | | | | Solaris 11.4 has built-in functions named b64_encode and b64_decode. Rename ours to something else to avoid the conflict (fortunately, ours are static so the impact is limited). One could wish for less duplication of code in this area, but that would be a larger patch and not very suitable for back-patching. Since this is a portability fix, we want to put it into all supported branches. Report and initial patch by Rainer Orth, reviewed and adjusted a bit by Michael Paquier Discussion: https://postgr.es/m/ydd372wk28h.fsf@CeBiTec.Uni-Bielefeld.DE
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Phase 3 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Initial pgindent run with pg_bsd_indent version 2.0.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname *varname", as well as some similar cases for const- or volatile-qualified pointers. * Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Use wrappers of PG_DETOAST_DATUM_PACKED() more.Noah Misch2017-03-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | This makes almost all core code follow the policy introduced in the previous commit. Specific decisions: - Text search support functions with char* and length arguments, such as prsstart and lexize, may receive unaligned strings. I doubt maintainers of non-core text search code will notice. - Use plain VARDATA() on values detoasted or synthesized earlier in the same function. Use VARDATA_ANY() on varlenas sourced outside the function, even if they happen to always have four-byte headers. As an exception, retain the universal practice of using VARDATA() on return values of SendFunctionCall(). - Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large for a one-byte header, so this misses no optimization.) Sites that do not call get_page_from_raw() typically need the four-byte alignment. - For now, do not change btree_gist. Its use of four-byte headers in memory is partly entangled with storage of 4-byte headers inside GBT_VARKEY, on disk. - For now, do not change gtrgm_consistent() or gtrgm_distance(). They incorporate the varlena header into a cache, and there are multiple credible implementation strategies to consider.
* Make messages mentioning type names more uniformAlvaro Herrera2017-01-18
| | | | | | | | | This avoids additional translatable strings for each distinct type, as well as making our quoting style around type names more consistent (namely, that we don't quote type names). This continues what started as f402b9950120. Discussion: https://postgr.es/m/20160401170642.GA57509@alvherre.pgsql
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Message improvementsPeter Eisentraut2015-11-16
|
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Fix error hint style.Robert Haas2014-07-09
| | | | Mistake caught by Tom Lane.
* Improve error messages for bytea decoding failures.Robert Haas2014-07-09
| | | | Craig Ringer
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Remove duplicate variable initializations identified by clang static checker.Tom Lane2009-08-30
| | | | | | | One of these represents a nontrivial bug (a promptly-leaked palloc), so backpatch. Greg Stark
* Support hex-string input and output for type BYTEA.Tom Lane2009-08-04
| | | | | | | | | | | | | Both hex format and the traditional "escape" format are automatically handled on input. The output format is selected by the new GUC variable bytea_output. As committed, bytea_output defaults to HEX, which is an *incompatible change*. We will keep it this way for awhile for testing purposes, but should consider whether to switch to the more backwards-compatible default of ESCAPE before 8.5 is released. Peter Eisentraut
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Simplify and standardize conversions between TEXT datums and ordinary CTom Lane2008-03-25
| | | | | | | | | | | | | | | | | | | | strings. This patch introduces four support functions cstring_to_text, cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and two macros CStringGetTextDatum and TextDatumGetCString. A number of existing macros that provided variants on these themes were removed. Most of the places that need to make such conversions now require just one function or macro call, in place of the multiple notational layers that used to be needed. There are no longer any direct calls of textout or textin, and we got most of the places that were using handmade conversions via memcpy (there may be a few still lurking, though). This commit doesn't make any serious effort to eliminate transient memory leaks caused by detoasting toasted text objects before they reach text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few places where it was easy, but much more could be done. Brendan Jurd and Tom Lane
* Fix encode(...bytea..., 'escape') so that it converts all high-bit-set byteTom Lane2008-02-26
| | | | | | | | | | | | | | | | | values into \nnn octal escape sequences. When the database encoding is multibyte this is *necessary* to avoid generating invalidly encoded text. Even in a single-byte encoding, the old behavior seems very hazardous --- consider for example what happens if the text is transferred to another database with a different encoding. Decoding would then yield some other bytea value than what was encoded, which is surely undesirable. Per gripe from Hernan Gonzalez. Backpatch to 8.3, but not further. This is a bit of a judgment call, but I make it on these grounds: pre-8.3 we don't really have much encoding safety anyway because of the convert() function family, and we would also have much higher risk of breaking existing apps that may not be expecting this behavior. 8.3 is still new enough that we can probably get away with making this change in the function's behavior.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).Tom Lane2007-02-27
| | | | | | | | | | | Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with VARSIZE and VARDATA, and as a consequence almost no code was using the longer names. Rename the length fields of struct varlena and various derived structures to catch anyplace that was accessing them directly; and clean up various places so caught. In itself this patch doesn't change any behavior at all, but it is necessary infrastructure if we hope to play any games with the representation of varlena headers. Greg Stark and Tom Lane
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.
* Update copyright for 2006. Update scripts.Bruce Momjian2006-03-05
|
* Standard pgindent run for 8.1.Bruce Momjian2005-10-15
|
* Suppress signed-vs-unsigned-char warnings.Tom Lane2005-09-24
|
* Update copyrights that were missed.Bruce Momjian2005-01-01
|
* Pgindent run for 8.0.Bruce Momjian2004-08-29
|
* Update copyright to 2004.Bruce Momjian2004-08-29
|
* Solve the 'Turkish problem' with undesirable locale behavior for caseTom Lane2004-05-07
| | | | | | | | | | | | | conversion of basic ASCII letters. Remove all uses of strcasecmp and strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp; remove most but not all direct uses of toupper and tolower in favor of pg_toupper and pg_tolower. These functions use the same notions of case folding already developed for identifier case conversion. I left the straight locale-based folding in place for situations where we are just manipulating user data and not trying to match it to built-in strings --- for example, the SQL upper() function is still locale dependent. Perhaps this will prove not to be what's wanted, but at the moment we can initdb and pass regression tests in Turkish locale.
* $Header: -> $PostgreSQL Changes ...PostgreSQL Daemon2003-11-29
|