aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/varlena.c
Commit message (Collapse)AuthorAge
* Use PRI?64 instead of "ll?" in format strings (continued).Peter Eisentraut2025-03-29
| | | | | | | Continuation of work started in commit 15a79c73, after initial trial. Author: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
* Add reverse(bytea).Nathan Bossart2025-03-13
| | | | | | | | | | This commit introduces a function for reversing the order of the bytes in binary strings. Bumps catversion. Author: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://postgr.es/m/CAJ7c6TMe0QVRuNssUArbMi0bJJK32%2BzNA3at5m3osrBQ25MHuw%40mail.gmail.com
* Allow casting between bytea and integer types.Dean Rasheed2025-03-07
| | | | | | | | | | | | | | | | | This allows smallint, integer, and bigint values to be cast to and from bytea. The bytea value is the two's complement representation of the integer, with the most significant byte first. For example: 1234::bytea -> \x000004d2 (-1234)::bytea -> \xfffffb2e Author: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Joel Jacobson <joel@compiler.org> Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com
* Support text position search functions with nondeterministic collationsPeter Eisentraut2025-02-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows using text position search functions with nondeterministic collations. These functions are - position, strpos - replace - split_part - string_to_array - string_to_table which all use common internal infrastructure. There was previously no internal implementation of this, so it was met with a not-supported error. This adds the internal implementation and removes the error. Unlike with deterministic collations, the search cannot use any byte-by-byte optimized techniques but has to go substring by substring. We also need to consider that the found match could have a different length than the needle and that there could be substrings of different length matching at a position. In most cases, we need to find the longest such substring (greedy semantics), but this can be configured by each caller. Reviewed-by: Euler Taveira <euler@eulerto.com> Discussion: https://www.postgresql.org/message-id/flat/582b2613-0900-48ca-8b0d-340c06f4d400@eisentraut.org
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Add min and max aggregates for bytea type.Tom Lane2024-10-08
| | | | | | | | | Similar to a0f1fce80, although we chose to duplicate logic rather than invoke byteacmp, primarily to avoid repeat detoasting. Marat Buharov, Aleksander Alekseev Discussion: https://postgr.es/m/CAPCEVGXiASjodos4P8pgyV7ixfVn-ZgG9YyiRZRbVqbGmfuDyg@mail.gmail.com
* Simplify checks for deterministic collations.Jeff Davis2024-09-12
| | | | | | | | | | | | Remove redundant checks for locale->collate_is_c now that we always have a valid pg_locale_t. Also, remove pg_locale_deterministic() wrapper, which is no longer useful after commit e9931bfb75. Just check the field directly, consistent with other fields in pg_locale_t. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
* Remove redundant check for default collation.Jeff Davis2024-09-12
| | | | | | | | | The operative check is for a deterministic collation, so the check for DEFAULT_COLLATION is redundant. Furthermore, it will be wrong if we ever support a non-deterministic default collation. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
* Remove lc_collate_is_c().Jeff Davis2024-09-04
| | | | | | | Instead just look up the collation and check collate_is_c field. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
* Fix obsolete comments in varstr_cmp().Jeff Davis2024-08-21
|
* Fix pgindent damageRobert Haas2024-08-21
| | | | Oversight in commit a95ff1fe2eb4926b13e0940ad1f37d048704bdb0
* Slightly refactor varstr_sortsupport() to improve readability.Jeff Davis2024-08-20
| | | | | Author: Andreas Karlsson Discussion: https://postgr.es/m/69c2a864-846f-4309-bd5a-aaa1c34f9a11@proxel.se
* Remove TRACE_SORT macroPeter Eisentraut2024-08-14
| | | | | | | | | | | | The TRACE_SORT macro guarded the availability of the trace_sort GUC setting. But it has been enabled by default ever since it was introduced in PostgreSQL 8.1, and there have been no reports that someone wanted to disable it. So just remove the macro to simplify things. (For the avoidance of doubt: The trace_sort GUC is still there. This only removes the rarely-used macro guarding it.) Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/be5f7162-7c1d-44e3-9a78-74dcaa6529f2%40eisentraut.org
* Remove support for null pg_locale_t most places.Jeff Davis2024-08-05
| | | | | | | | | | | | | Previously, passing NULL for pg_locale_t meant "use the libc provider and the server environment". Now that the database collation is represented as a proper pg_locale_t (not dependent on setlocale()), remove special cases for NULL. Leave wchar2char() and char2wchar() unchanged for now, because the callers don't always have a libc-based pg_locale_t available. Discussion: https://postgr.es/m/cfd9eb85-c52a-4ec9-a90e-a5e4de56e57d@eisentraut.org Reviewed-by: Peter Eisentraut, Andreas Karlsson
* Revise GUC names quoting in messages againPeter Eisentraut2024-05-17
| | | | | | | | | | | | | | | After further review, we want to move in the direction of always quoting GUC names in error messages, rather than the previous (PG16) wildly mixed practice or the intermittent (mid-PG17) idea of doing this depending on how possibly confusing the GUC name is. This commit applies appropriate quotes to (almost?) all mentions of GUC names in error messages. It partially supersedes a243569bf65 and 8d9978a7176, which had moved things a bit in the opposite direction but which then were abandoned in a partial state. Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
* Fix an assortment of typosDavid Rowley2024-05-04
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/ae9f2fcb-4b24-5bb0-4240-efbbbd944ca1@gmail.com
* C comment: mention no doc for negative start of substring(text)Bruce Momjian2024-03-26
| | | | | | Also add URL to hackers discussion. Backpatch-through: master
* Add pg_column_toast_chunk_id().Nathan Bossart2024-03-14
| | | | | | | | | | | | | This function returns the chunk_id of an on-disk TOASTed value. If the value is un-TOASTed or not on-disk, it returns NULL. This is useful for identifying which values are actually TOASTed and for investigating "unexpected chunk number" errors. Bumps catversion. Author: Yugo Nagata Reviewed-by: Jian He Discussion: https://postgr.es/m/20230329105507.d764497456eeac1ca491b5bd%40sraoss.co.jp
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Additional unicode primitive functions.Jeff Davis2023-11-01
| | | | | | | | | | | | Introduce unicode_version(), icu_unicode_version(), and unicode_assigned(). The latter requires introducing a new lookup table for the Unicode General Category, which is generated along with the other Unicode lookup tables. Discussion: https://postgr.es/m/CA+TgmoYzYR-yhU6k1XFCADeyj=Oyz2PkVsa3iKv+keM8wp-F_A@mail.gmail.com Reviewed-by: Peter Eisentraut
* Optimize various aggregate deserialization functions, take 2David Rowley2023-10-27
| | | | | | | | | | | | | | | | | | | | f0efa5aec added initReadOnlyStringInfo to allow a StringInfo to be initialized from an existing buffer and also relaxed the requirement that a StringInfo's buffer must be NUL terminated at data[len]. Now that we have that, there's no need for these aggregate deserial functions to use appendBinaryStringInfo() as that rather wastefully palloc'd a new buffer and memcpy'd in the bytea's buffer. Instead, we can just use the bytea's buffer and point the StringInfo directly to that using the new initializer function. In Amdahl's law, this speeds up the serial portion of parallel aggregates and makes sum(numeric), avg(numeric), var_pop(numeric), var_samp(numeric), variance(numeric), stddev_pop(numeric), stddev_samp(numeric), stddev(numeric), array_agg(anyarray), string_agg(text) and string_agg(bytea) scale better in parallel queries. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvr%3De-YOigriSHHm324a40HPqcUhSp6pWWgjz5WwegR%3DcQ%40mail.gmail.com
* Revert "Optimize various aggregate deserialization functions"David Rowley2023-10-10
| | | | | | | | | | This reverts commit 608fd198def5390c3490bfe903730207dfd8eeb4. On 2nd thoughts, the StringInfo API requires that strings are NUL terminated and pointing directly to the data in a bytea Datum isn't NUL terminated. Discussion: https://postgr.es/m/CAApHDvorfO3iBZ=xpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ@mail.gmail.com
* Optimize various aggregate deserialization functionsDavid Rowley2023-10-09
| | | | | | | | | | | | | | | The serialized representation of an internal aggregate state is a bytea value. In each deserial function, in order to "receive" the bytea value we appended it onto a short-lived StringInfoData using appendBinaryStringInfo. This was a little wasteful as it meant having to palloc memory, copy a (possibly long) series of bytes then later pfree that memory. Instead of going to this extra trouble, we can just fake up a StringInfoData and point the data directly at the bytea's payload. This should help increase the performance of internal aggregate deserialization. Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAApHDvr=e-YOigriSHHm324a40HPqcUhSp6pWWgjz5WwegR=cQ@mail.gmail.com
* Add to_bin() and to_oct().Nathan Bossart2023-08-23
| | | | | | | | | | | This commit introduces functions for converting numbers to their equivalent binary and octal representations. Also, the base conversion code for these functions and to_hex() has been moved to a common helper function. Co-authored-by: Eric Radman Reviewed-by: Ian Barwick, Dag Lem, Vignesh C, Tom Lane, Peter Eisentraut, Kirk Wolak, Vik Fearing, John Naylor, Dean Rasheed Discussion: https://postgr.es/m/Y6IyTQQ/TsD5wnsH%40vm3.eradman.com
* Change "..." to cstring in old input/output function comments.Heikki Linnakangas2023-06-26
| | | | | | | It was not clear what the "..." meant. Author: Steve Chavez Discussion: https://www.postgresql.org/message-id/CAGRrpzZzeh7zC3yaVG9di%3DydJ%2BiHwdXnFPB3evGFKvC1zf6ajA@mail.gmail.com
* Pre-beta mechanical code beautification.Tom Lane2023-05-19
| | | | | | | | | | | | | | | Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
* Redesign interrupt/cancel API for regex engine.Thomas Munro2023-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, a PostgreSQL-specific callback checked by the regex engine had a way to trigger a special error code REG_CANCEL if it detected that the next call to CHECK_FOR_INTERRUPTS() would certainly throw via ereport(). A later proposed bugfix aims to move some complex logic out of signal handlers, so that it won't run until the next CHECK_FOR_INTERRUPTS(), which makes the above design impossible unless we split CHECK_FOR_INTERRUPTS() into two phases, one to run logic and another to ereport(). We may develop such a system in the future, but for the regex code it is no longer necessary. An earlier commit moved regex memory management over to our MemoryContext system. Given that the purpose of the two-phase interrupt checking was to free memory before throwing, something we don't need to worry about anymore, it seems simpler to inject CHECK_FOR_INTERRUPTS() directly into cancelation points, and just let it throw. Since the plan is to keep PostgreSQL-specific concerns separate from the main regex engine code (with a view to bein able to stay in sync with other projects), do this with a new macro INTERRUPT(), customizable in regcustom.h and defaulting to nothing. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
* Fix abbreviated keys bug introduced in d87d548cd03.Jeff Davis2023-03-25
| | | | | Discussion: http://postgr.es/m/CAMkU=1z17XJatF-rMCY3Cjqcxer-Kyn57x6h3OSCpJ0LpAp0ig@mail.gmail.com Reported-by: Jeff Janes
* Refactor to introduce pg_locale_deterministic().Jeff Davis2023-02-23
| | | | | | | | Avoids the need of callers to test for NULL, and also avoids the need to access the pg_locale_t structure directly. Reviewed-by: Peter Eisentraut, Peter Geoghegan Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com
* Refactor to add pg_strcoll(), pg_strxfrm(), and variants.Jeff Davis2023-02-23
| | | | | | | | | | | | | Offers a generally better separation of responsibilities for collation code. Also, a step towards multi-lib ICU, which should be based on a clean separation of the routines required for collation providers. Callers with NUL-terminated strings should call pg_strcoll() or pg_strxfrm(); callers with strings and their length should call the variants pg_strncoll() or pg_strnxfrm(). Reviewed-by: Peter Eisentraut, Peter Geoghegan Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com
* Remove unused code related to unknown typePeter Eisentraut2023-02-04
| | | | | | | | These are leftovers obsoleted by cfd9be939e9c516243c5b6a49ad1e1a9a38f1052. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/e7887965-9e70-fd01-c2d1-5bc02f9169aa%40enterprisedb.com
* Allow parallel aggregate on string_agg and array_aggDavid Rowley2023-01-23
| | | | | | | | | | | | | | This adds combine, serial and deserial functions for the array_agg() and string_agg() aggregate functions, thus allowing these aggregates to partake in partial aggregations. This allows both parallel aggregation to take place when these aggregates are present and also allows additional partition-wise aggregation plan shapes to include plans that require additional aggregation once the partially aggregated results from the partitions have been combined. Author: David Rowley Reviewed-by: Andres Freund, Tomas Vondra, Stephen Frost, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9sx_6GTcvd6TMuZnNtCh0VhBzhX6FZqw17TgVFH-ga_A@mail.gmail.com
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Convert a few more datatype input functions to report errors softly.Tom Lane2022-12-14
| | | | | | | Convert the remaining string-category input functions (bpcharin, varcharin, byteain) to the new style. Discussion: https://postgr.es/m/3038346.1671060258@sss.pgh.pa.us
* Rename SetSingleFuncCall() to InitMaterializedSRF()Michael Paquier2022-10-18
| | | | | | | | | | | | | | | | | | Per discussion, the existing routine name able to initialize a SRF function with materialize mode is unpopular, so rename it. Equally, the flags of this function are renamed, as of: - SRF_SINGLE_USE_EXPECTED -> MAT_SRF_USE_EXPECTED_DESC - SRF_SINGLE_BLESS -> MAT_SRF_BLESS The previous function and flags introduced in 9e98583 are kept around for compatibility purposes, so as any extension code already compiled with v15 continues to work as-is. The declarations introduced here for compatibility will be removed from HEAD in a follow-up commit. The new names have been suggested by Andres Freund and Melanie Plageman. Discussion: https://postgr.es/m/20221013194820.ciktb2sbbpw7cljm@awork3.anarazel.de Backpatch-through: 15
* Adjust assorted hint messages that list all valid options.Peter Eisentraut2022-09-16
| | | | | | | | | | Instead of listing all valid options, we now try to provide one that looks similar. Since this may be useful elsewhere, this change introduces a new set of functions that can be reused for similar purposes. Author: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b1f9f399-3a1a-b554-283f-4ae7f34608e2@enterprisedb.com
* Split up guc.c for better build speed and ease of maintenance.Tom Lane2022-09-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | guc.c has grown to be one of our largest .c files, making it a bottleneck for compilation. It's also acquired a bunch of knowledge that'd be better kept elsewhere, because of our not very good habit of putting variable-specific check hooks here. Hence, split it up along these lines: * guc.c itself retains just the core GUC housekeeping mechanisms. * New file guc_funcs.c contains the SET/SHOW interfaces and some SQL-accessible functions for GUC manipulation. * New file guc_tables.c contains the data arrays that define the built-in GUC variables, along with some already-exported constant tables. * GUC check/assign/show hook functions are moved to the variable's home module, whenever that's clearly identifiable. A few hard- to-classify hooks ended up in commands/variable.c, which was already a home for miscellaneous GUC hook functions. To avoid cluttering a lot more header files with #include "guc.h", I also invented a new header file utils/guc_hooks.h and put all the GUC hook functions' declarations there, regardless of their originating module. That allowed removal of #include "guc.h" from some existing headers. The fallout from that (hopefully all caught here) demonstrates clearly why such inclusions are best minimized: there are a lot of files that, for example, were getting array.h at two or more levels of remove, despite not having any connection at all to GUCs in themselves. There is some very minor code beautification here, such as renaming a couple of inconsistently-named hook functions and improving some comments. But mostly this just moves code from point A to point B and deals with the ensuing needs for #include adjustments and exporting a few functions that previously weren't exported. Patch by me, per a suggestion from Andres Freund; thanks also to Michael Paquier for the idea to invent guc_funcs.c. Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
* Fix incorrect uses of Datum conversion macrosPeter Eisentraut2022-09-05
| | | | | | | | | | | Since these macros just cast whatever you give them to the designated output type, and many normal uses also cast the output type further, a number of incorrect uses go undiscovered. The fixes in this patch have been discovered by changing these macros to inline functions, which is the subject of a future patch. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
* Preserve memory context of VarStringSortSupport buffers.Tom Lane2022-08-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When enlarging the work buffers of a VarStringSortSupport object, varstrfastcmp_locale was careful to keep them in the ssup_cxt memory context; but varstr_abbrev_convert just used palloc(). The latter creates a hazard that the buffers could be freed out from under the VarStringSortSupport object, resulting in stomping on whatever gets allocated in that memory later. In practice, because we only use this code for ICU collations (cf. 3df9c374e), the problem is confined to use of ICU collations. I believe it may have been unreachable before the introduction of incremental sort, too, as traditional sorting usually just uses one context for the duration of the sort. We could fix this by making the broken stanzas in varstr_abbrev_convert match the non-broken ones in varstrfastcmp_locale. However, it seems like a better idea to dodge the issue altogether by replacing the pfree-and-allocate-anew coding with repalloc, which automatically preserves the chunk's memory context. This fix does add a few cycles because repalloc will copy the chunk's content, which the existing coding assumes is useless. However, we don't expect that these buffer enlargement operations are performance-critical. Besides that, it's far from obvious that copying the buffer contents isn't required, since these stanzas make no effort to mark the buffers invalid by resetting last_returned, cache_blob, etc. That seems to be safe upon examination, but it's fragile and could easily get broken in future, which wouldn't get revealed in testing with short-to-moderate-size strings. Per bug #17584 from James Inform. Whether or not the issue is reachable in the older branches, this code has been broken on its own terms from its introduction, so patch all the way back. Discussion: https://postgr.es/m/17584-95c79b4a7d771f44@postgresql.org
* Pre-beta mechanical code beautification.Tom Lane2022-05-12
| | | | | Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.
* Specialize tuplesort routines for different kinds of abbreviated keysJohn Naylor2022-04-02
| | | | | | | | | | | | | | | | | | | Previously, the specialized tuplesort routine inlined handling for reverse-sort and NULLs-ordering but called the datum comparator via a pointer in the SortSupport struct parameter. Testing has showed that we can get a useful performance gain by specializing datum comparison for the different representations of abbreviated keys -- signed and unsigned 64-bit integers and signed 32-bit integers. Almost all abbreviatable data types will benefit -- the only exception for now is numeric, since the datum comparison is more complex. The performance gain depends on data type and input distribution, but often falls in the range of 10-20% faster. Thomas Munro Reviewed by Peter Geoghegan, review and performance testing by me Discussion: https://www.postgresql.org/message-id/CA%2BhUKGKKYttZZk-JMRQSVak%3DCXSJ5fiwtirFf%3Dn%3DPAbumvn1Ww%40mail.gmail.com
* Create routine able to set single-call SRFs for Materialize modeMichael Paquier2022-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set-returning functions that use the Materialize mode, creating a tuplestore to include all the tuples returned in a set rather than doing so in multiple calls, use roughly the same set of steps to prepare ReturnSetInfo for this job: - Check if ReturnSetInfo supports returning a tuplestore and if the materialize mode is enabled. - Create a tuplestore for all the tuples part of the returned set in the per-query memory context, stored in ReturnSetInfo->setResult. - Build a tuple descriptor mostly from get_call_result_type(), then stored in ReturnSetInfo->setDesc. Note that there are some cases where the SRF's tuple descriptor has to be the one specified by the function caller. This refactoring is done so as there are (well, should be) no behavior changes in any of the in-core functions refactored, and the centralized function that checks and sets up the function's ReturnSetInfo can be controlled with a set of bits32 options. Two of them prove to be necessary now: - SRF_SINGLE_USE_EXPECTED to use expectedDesc as tuple descriptor, as expected by the function's caller. - SRF_SINGLE_BLESS to validate the tuple descriptor for the SRF. The same initialization pattern is simplified in 28 places per my count as of src/backend/, shaving up to ~900 lines of code. These mostly come from the removal of the per-query initializations and the sanity checks now grouped in a single location. There are more locations that could be simplified in contrib/, that are left for a follow-up cleanup. fcc2817, 07daca5 and d61a361 have prepared the areas of the code related to this change, to ease this refactoring. Author: Melanie Plageman, Michael Paquier Reviewed-by: Álvaro Herrera, Justin Pryzby Discussion: https://postgr.es/m/CAAKRu_azyd1Z3W_r7Ou4sorTjRCs+PxeHw1CWJeXKofkE6TuZg@mail.gmail.com
* Fix inconsistencies in SRF checks of pg_config() and string_to_table()Michael Paquier2022-02-19
| | | | | | | | | | | | | | | | | | | The execution paths of those functions have been using a set of checks inconsistent with any other SRF function: - string_to_table() missed a check on expectedDesc, the tuple descriptor expected by the caller, that should never be NULL. Introduced in 66f1630. - pg_config() should check for a ReturnSetInfo, and expectedDesc cannot be NULL. Its error messages were also inconsistent. Introduced in a5c43b8. Extracted from a larger patch by the same author, in preparation for a larger patch set aimed at refactoring the way tuplestores are created and checked in SRF functions. Author: Melanie Plageman Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/CAAKRu_azyd1Z3W_r7Ou4sorTjRCs+PxeHw1CWJeXKofkE6TuZg@mail.gmail.com
* Remove all traces of tuplestore_donestoring() in the C codeMichael Paquier2022-02-17
| | | | | | | | | | | | | | | | | | This routine is a no-op since dd04e95 from 2003, with a macro kept around for compatibility purposes. This has led to the same code patterns being copy-pasted around for no effect, sometimes in confusing ways like in pg_logical_slot_get_changes_guts() from logical.c where the code was actually incorrect. This issue has been discussed on two different threads recently, so rather than living with this legacy, remove any uses of this routine in the C code to simplify things. The compatibility macro is kept to avoid breaking any out-of-core modules that depend on it. Reported-by: Tatsuhito Kasahara, Justin Pryzby Author: Tatsuhito Kasahara Discussion: https://postgr.es/m/20211217200419.GQ17618@telsasoft.com Discussion: https://postgr.es/m/CAP0=ZVJeeYfAeRfmzqAF2Lumdiv4S4FewyBnZd4DPTrsSQKJKw@mail.gmail.com
* Improve worst-case performance of text_position_get_match_pos()John Naylor2022-02-04
| | | | | | | | | | | | | | | | | This function converts a byte position to a character position after a successful string match. Rather than calling pg_mblen() in a loop, use pg_mbstrlen_with_len() since the latter can inline its own call to pg_mblen(). When the string match is at the end of the haystack text, this change results in 10-20% performance improvement, depending on platform and typical character length in bytes. This also simplifies the code a little. Specializing for UTF-8 could result in further improvement, but the performance gain was not found to be reliable between platforms. The modest gain in this commit is stable between platforms and usable by all server encodings. Discussion: https://www.postgresql.org/message-id/CAFBsxsH1Yutrmu+6LLHKK8iXY+vG--Do6zN+2900spHXQNNQKQ@mail.gmail.com
* Call pg_newlocale_from_collation() also with default collationPeter Eisentraut2022-01-20
| | | | | | | | | | | | | | | | Previously, callers of pg_newlocale_from_collation() did not call it if the collation was DEFAULT_COLLATION_OID and instead proceeded with a pg_locale_t of 0. Instead, now we call it anyway and have it return 0 if the default collation was passed. It already did this, so we just have to adjust the callers. This simplifies all the call sites and also makes future enhancements easier. After discussion and testing, the previous comment in pg_locale.c about avoiding this for performance reasons may have been mistaken since it was testing a very different patch version way back when. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917e34@enterprisedb.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Revert refactoring of hex code to src/common/Michael Paquier2021-08-19
| | | | | | | | | | | | | | | | | | | | | | | This is a combined revert of the following commits: - c3826f8, a refactoring piece that moved the hex decoding code to src/common/. This code was cleaned up by aef8948, as it originally included no overflow checks in the same way as the base64 routines in src/common/ used by SCRAM, making it unsafe for its purpose. - aef8948, a more advanced refactoring of the hex encoding/decoding code to src/common/ that added sanity checks on the result buffer for hex decoding and encoding. As reported by Hans Buschmann, those overflow checks are expensive, and it is possible to see a performance drop in the decoding/encoding of bytea or LOs the longer they are. Simple SQLs working on large bytea values show a clear difference in perf profile. - ccf4e27, a cleanup made possible by aef8948. The reverts of all those commits bring back the performance of hex decoding and encoding back to what it was in ~13. Fow now and post-beta3, this is the simplest option. Reported-by: Hans Buschmann Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net Backpatch-through: 14
* Let regexp_replace() make use of REG_NOSUB when feasible.Tom Lane2021-08-09
| | | | | | | | | | | | | | | | | | If the replacement string doesn't contain \1...\9, then we don't need sub-match locations, so we can use the REG_NOSUB optimization here too. There's already a pre-scan of the replacement string to look for backslashes, so extend that to check for digits, and refactor to allow that to happen before we compile the regexp. While at it, try to speed up the pre-scan by using memchr() instead of a handwritten loop. It's likely that this is lost in the noise compared to the regexp processing proper, but maybe not. In any case, this coding is shorter. Also, add some test cases to improve the poor coverage of appendStringInfoRegexpSubstr(). Discussion: https://postgr.es/m/3534632.1628536485@sss.pgh.pa.us
* Add assorted new regexp_xxx SQL functions.Tom Lane2021-08-03
| | | | | | | | | | | | | | | | This patch adds new functions regexp_count(), regexp_instr(), regexp_like(), and regexp_substr(), and extends regexp_replace() with some new optional arguments. All these functions follow the definitions used in Oracle, although there are small differences in the regexp language due to using our own regexp engine -- most notably, that the default newline-matching behavior is different. Similar functions appear in DB2 and elsewhere, too. Aside from easing portability, these functions are easier to use for certain tasks than our existing regexp_match[es] functions. Gilles Darold, heavily revised by me Discussion: https://postgr.es/m/fc160ee0-c843-b024-29bb-97b5da61971f@darold.net