aboutsummaryrefslogtreecommitdiff
path: root/src/backend/regex
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2021-08-24 16:37:26 -0400
committerTom Lane <tgl@sss.pgh.pa.us>2021-08-24 16:37:26 -0400
commit65dc30ced64cd17f3800ff1b73ab1d358e92efd8 (patch)
treeb35b6ee8945ac8d43b890ca0ca0d7f26ffe0f19d /src/backend/regex
parent1046a69b3087a6417e85cae9b6bc76caa22f913b (diff)
downloadpostgresql-65dc30ced64cd17f3800ff1b73ab1d358e92efd8.tar.gz
postgresql-65dc30ced64cd17f3800ff1b73ab1d358e92efd8.zip
Fix regexp misbehavior with capturing parens inside "{0}".
Regexps like "(.){0}...\1" drew an "invalid backreference number". That's not unreasonable on its face, since the capture group will never be matched if it's iterated zero times. However, other engines such as Perl's don't complain about this, nor do we throw an error for related cases such as "(.)|\1", even though that backref can never succeed either. Also, if the zero-iterations case happens at runtime rather than compile time --- say, "(x)*...\1" when there's no "x" to be found --- that's not an error, we just deem the backref to not match. Making this even less defensible, no error was thrown for nested cases such as "((.)){0}...\2"; and to add insult to injury, those cases could result in assertion failures instead. (It seems that nothing especially bad happened in non-assert builds, though.) Let's just fix it so that no error is thrown and instead the backref is deemed to never match, so that compile-time detection of no iterations behaves the same as run-time detection. Per report from Mark Dilger. This appears to be an aboriginal error in Spencer's library, so back-patch to all supported versions. Pre-v14, it turns out to also be necessary to back-patch one aspect of commits cb76fbd7e/00116dee5, namely to create capture-node subREs with the begin/end states of their subexpressions, not the current lp/rp of the outer parseqatom invocation. Otherwise delsub complains that we're trying to disconnect a state from itself. This is a bit scary but code examination shows that it's safe: in the pre-v14 code, if we want to wrap iteration around the subexpression, the first thing we do is overwrite the atom's begin/end fields with new states. So the bogus values didn't survive long enough to be used for anything, except if no iteration is required, in which case it doesn't matter. Discussion: https://postgr.es/m/A099E4A8-4377-4C64-A98C-3DEDDC075502@enterprisedb.com
Diffstat (limited to 'src/backend/regex')
-rw-r--r--src/backend/regex/regcomp.c22
1 files changed, 17 insertions, 5 deletions
diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c
index ae3a7b6a38c..d9840171a33 100644
--- a/src/backend/regex/regcomp.c
+++ b/src/backend/regex/regcomp.c
@@ -1089,11 +1089,23 @@ parseqatom(struct vars *v,
/* annoying special case: {0} or {0,0} cancels everything */
if (m == 0 && n == 0)
{
- if (atom != NULL)
- freesubre(v, atom);
- if (atomtype == '(')
- v->subs[subno] = NULL;
- delsub(v->nfa, lp, rp);
+ /*
+ * If we had capturing subexpression(s) within the atom, we don't want
+ * to destroy them, because it's legal (if useless) to back-ref them
+ * later. Hence, just unlink the atom from lp/rp and then ignore it.
+ */
+ if (atom != NULL && (atom->flags & CAP))
+ {
+ delsub(v->nfa, lp, atom->begin);
+ delsub(v->nfa, atom->end, rp);
+ }
+ else
+ {
+ /* Otherwise, we can clean up any subre infrastructure we made */
+ if (atom != NULL)
+ freesubre(v, atom);
+ delsub(v->nfa, lp, rp);
+ }
EMPTYARC(lp, rp);
return top;
}