aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/regexp.c
Commit message (Collapse)AuthorAge
...
* Fix similar_escape() to convert parentheses to non-capturing style.Tom Lane2010-01-02
| | | | | | | | | | | This is needed to avoid unwanted interference with SUBSTRING behavior, as per bug #5257 from Roman Kononov. Also, add some basic intelligence about character classes (bracket expressions) since we now have several behaviors that aren't appropriate inside a character class. As with the previous patch in this area, I'm reluctant to back-patch since it might affect applications that are relying on the prior behavior.
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Remove regex_flavor GUC, so that regular expressions are always "advanced"Tom Lane2009-10-21
| | | | | | | | | style by default. Per discussion, there seems to be hardly anything that really relies on being able to change the regex flavor, so the ability to select it via embedded options ought to be enough for any stragglers. Also, if we didn't remove the GUC, we'd really be morally obligated to mark the regex functions non-immutable, which'd possibly create performance issues.
* Improve similar_escape() in two different ways:Tom Lane2009-10-10
| | | | | | | | | | | | | | | | | | | * Stop escaping ? and {. As of SQL:2008, SIMILAR TO is defined to have POSIX-compatible interpretation of ? as well as {m,n} and related constructs, so we should allow these things through to our regex engine. * Escape ^ and $. It appears that our regex engine will treat ^^ at the beginning of the string the same as ^, and similarly for $$ at the end of the string, which meant that SIMILAR TO was effectively ignoring ^ at the start of the pattern and $ at the end. Since these are not supposed to be metacharacters, this is a bug. The second part of this is arguably a back-patchable bug fix, but I'm hesitant to do that because it might break applications that are expecting something like "col SIMILAR TO '^foo$'" to work like a POSIX pattern. Seems safer to only change it at a major version boundary. Per discussion of an example from Doug Gorley.
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Convert three more guc settings to enum type:Magnus Hagander2008-04-02
| | | | default_transaction_isolation, session_replication_role and regex_flavor.
* Fix regexp substring matching (substring(string from pattern)) for the cornerTom Lane2008-03-19
| | | | | | | | | | | | case where there is a match to the pattern overall but the user has specified a parenthesized subexpression and that subexpression hasn't got a match. An example is substring('foo' from 'foo(bar)?'). This should return NULL, since (bar) isn't matched, but it was mistakenly returning the whole-pattern match instead (ie, 'foo'). Per bug #4044 from Rui Martins. This has been broken since the beginning; patch in all supported versions. The old behavior was sufficiently inconsistent that it's impossible to believe anyone is depending on it.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian2007-11-15
| | | | avoid this problem in the future.)
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Doh --- what's really happening on buildfarm member grebe is that itsTom Lane2007-09-22
| | | | malloc returns NULL for malloc(0). Defend against that case.
* Fix regex, LIKE, and some other second-rank text-manipulation functionsTom Lane2007-09-21
| | | | | | to not cause needless copying of text datums that have 1-byte headers. Greg Stark, in response to performance gripe from Guillaume Smet and ITAGAKI Takahiro.
* Avoid memory leakage across successive calls of regexp_matches() orTom Lane2007-08-11
| | | | | | | regexp_split_to_table() within a single query. This is only a partial solution, as it turns out that with enough matches per string these functions can also tickle a repalloc() misbehavior. But fixing that is a topic for a separate patch.
* Code review for regexp_matches/regexp_split patch. Refactor to avoid assumingTom Lane2007-08-11
| | | | | | | | | that cached compiled patterns will still be there when the function is next called. Clean up looping logic, thereby fixing bug identified by Pavel Stehule. Share setup code between the two functions, add some comments, and avoid risky mixing of int and size_t variables. Clean up the documentation a tad, and accept all the flag characters mentioned in table 9-19 rather than just a subset.
* Code cleanup for the new regexp UDFs: we can hardcode the OID and someNeil Conway2007-03-28
| | | | | properties of the "text" type, and then simplify the code accordingly. Patch from Jeremy Drake.
* Add three new regexp functions: regexp_matches, regexp_split_to_array,Neil Conway2007-03-20
| | | | | | | | | | | | and regexp_split_to_table. These functions provide access to the capture groups resulting from a POSIX regular expression match, and provide the ability to split a string on a POSIX regular expression, respectively. Patch from Jeremy Drake; code review by Neil Conway, additional comments and suggestions from Tom and Peter E. This patch bumps the catversion, adds some regression tests, and updates the docs.
* Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).Tom Lane2007-02-27
| | | | | | | | | | | Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with VARSIZE and VARDATA, and as a consequence almost no code was using the longer names. Rename the length fields of struct varlena and various derived structures to catch anyplace that was accessing them directly; and clean up various places so caught. In itself this patch doesn't change any behavior at all, but it is necessary infrastructure if we hope to play any games with the representation of varlena headers. Greg Stark and Tom Lane
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.
* Fix regex_fixed_prefix() to cope reasonably well with regex patterns of theTom Lane2007-01-03
| | | | | | | | | | form '^(foo)$'. Before, these could never be optimized into indexscans. The recent changes to make psql and pg_dump generate such patterns (for \d commands and -t and related switches, respectively) therefore represented a big performance hit for people with large pg_class catalogs, as seen in recent gripe from Erik Jones. While at it, be more paranoid about case-sensitivity checking in multibyte encodings, and fix some other corner cases in which a regex might be interpreted too liberally.
* pgindent run for 8.2.Bruce Momjian2006-10-04
|
* Remove 576 references of include files that were not needed.Bruce Momjian2006-07-14
|
* Alphabetically order reference to include files, "N" - "S".Bruce Momjian2006-07-11
|
* Fix similar_escape() so that SIMILAR TO works properly for patterns involvingTom Lane2006-04-13
| | | | | | | | | | | | | | alternatives ("|" symbol). The original coding allowed the added ^ and $ constraints to be absorbed into the first and last alternatives, producing a pattern that would match more than it should. Per report from Eric Noriega. I also changed the pattern to add an ARE director ("***:"), ensuring that SIMILAR TO patterns do not change behavior if regex_flavor is changed. This is necessary to make the non-capturing parentheses work, and seems like a good idea on general principles. Back-patched as far as 7.4. 7.3 also has the bug, but a fix seems impractical because that version's regex engine doesn't have non-capturing parens.
* Update copyright for 2006. Update scripts.Bruce Momjian2006-03-05
|
* Re-run pgindent, fixing a problem where comment lines after a blankBruce Momjian2005-11-22
| | | | | | | | | comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.
* Code review for regexp_replace patch. Improve documentation and comments,Tom Lane2005-10-18
| | | | | | | fix problems with replacement-string backslashes that aren't followed by one of the expected characters, avoid giving the impression that replace_text_regexp() is meant to be called directly as a SQL function, etc.
* Standard pgindent run for 8.1.Bruce Momjian2005-10-15
|
* Suppress signed-vs-unsigned-char warnings.Tom Lane2005-09-24
|
* I made the patch that implements regexp_replace again.Bruce Momjian2005-07-10
| | | | | | | | | | | | | | | | | | | | | | | The specification of this function is as follows. regexp_replace(source text, pattern text, replacement text, [flags text]) returns text Replace string that matches to regular expression in source text to replacement text. - pattern is regular expression pattern. - replacement is replace string that can use '\1'-'\9', and '\&'. '\1'-'\9': back reference to the n'th subexpression. '\&' : entire matched string. - flags can use the following values: g: global (replace all) i: ignore case When the flags is not specified, case sensitive, replace the first instance only. Atsushi Ogawa
* Tag appropriate files for rc3PostgreSQL Daemon2004-12-31
| | | | | | | | Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...
* Our interface code for Spencer's regexp package was checking for regexpTom Lane2004-11-24
| | | | | | | error conditions during regexp compile, but not during regexp execution; any sort of "can't happen" errors would be treated as no-match instead of being reported as they should be. Noticed while trying to duplicate a reported Tcl bug.
* Update copyright to 2004.Bruce Momjian2004-08-29
|
* Solve the 'Turkish problem' with undesirable locale behavior for caseTom Lane2004-05-07
| | | | | | | | | | | | | conversion of basic ASCII letters. Remove all uses of strcasecmp and strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp; remove most but not all direct uses of toupper and tolower in favor of pg_toupper and pg_tolower. These functions use the same notions of case folding already developed for identifier case conversion. I left the straight locale-based folding in place for situations where we are just manipulating user data and not trying to match it to built-in strings --- for example, the SQL upper() function is still locale dependent. Perhaps this will prove not to be what's wanted, but at the moment we can initdb and pass regression tests in Turkish locale.
* pwdTom Lane2004-02-03
|
* Repair problem identified by Olivier Prenant: ALTER DATABASE SET search_pathTom Lane2004-01-19
| | | | | | | | | | | should not be too eager to reject paths involving unknown schemas, since it can't really tell whether the schemas exist in the target database. (Also, when reading pg_dumpall output, it could be that the schemas don't exist yet, but eventually will.) ALTER USER SET has a similar issue. So, reduce the normal ERROR to a NOTICE when checking search_path values for these commands. Supporting this requires changing the API for GUC assign_hook functions, which causes the patch to touch a lot of places, but the changes are conceptually trivial.
* $Header: -> $PostgreSQL Changes ...PostgreSQL Daemon2003-11-29
|
* Another pgindent run with updated typedefs.Bruce Momjian2003-08-08
|
* Update copyrights to 2003.Bruce Momjian2003-08-04
|
* pgindent run.Bruce Momjian2003-08-04
|
* Error message editing in utils/adt. Again thanks to Joe Conway for doingTom Lane2003-07-27
| | | | the bulk of the heavy lifting ...
* Create a GUC variable REGEX_FLAVOR to control the type of regularTom Lane2003-02-06
| | | | | | | | expression accepted by the regex operators, per discussion yesterday. Along the way, reduce deadlock_timeout from PGC_POSTMASTER to PGC_SIGHUP category. It is probably best to insist that all backends share the same setting, but that doesn't mean it has to be frozen at startup.
* Replace regular expression package with Henry Spencer's latest versionTom Lane2003-02-05
| | | | | | | (extracted from Tcl 8.4.1 release, as Henry still hasn't got round to making it a separate library). This solves a performance problem for multibyte, as well as upgrading our regexp support to match recent Tcl and nearly match recent Perl.
* Bring SIMILAR TO and SUBSTRING into some semblance of conformance withTom Lane2002-09-22
| | | | | | | the SQL99 standard. (I'm not sure that the character-class features are quite right, but that can be fixed later.) Document SQL99 and POSIX regexps as being different features; provide variants of SUBSTRING for each.
* pgindent run.Bruce Momjian2002-09-04
|
* Update copyright to 2002.Bruce Momjian2002-06-20
|
* Search the existing regular expression cache as a ring buffer.Thomas G. Lockhart2002-06-15
| | | | | | | | | Will optimize the case for repeated calls for the same expression, which seems to be the most common case. Formerly, always searched from the first entry. May want to look at the least-recently-used algorithm to make sure it is identifying the right slots to reclaim. Seems silly to do math when it seems that we could simply use an incrementing counter...
* Implement SQL99 OVERLAY(). Allows substitution of a substring in a string.Thomas G. Lockhart2002-06-11
| | | | | | | | | | | Implement SQL99 SIMILAR TO as a synonym for our existing operator "~". Implement SQL99 regular expression SUBSTRING(string FROM pat FOR escape). Extend the definition to make the FOR clause optional. Define textregexsubstr() to actually implement this feature. Update the regression test to include these new string features. All tests pass. Rename the regular expression support routines from "pg95_xxx" to "pg_xxx". Define CREATE CHARACTER SET in the parser per SQL99. No implementation yet.
* New pgindent run with fixes suggested by Tom. Patch manually reviewed,Bruce Momjian2001-11-05
| | | | initdb/regression tests pass.
* pgindent run on all C files. Java run to follow. initdb/regressionBruce Momjian2001-10-25
| | | | tests pass.