diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2021-02-20 18:11:56 -0500 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2021-02-20 18:11:56 -0500 |
commit | 08c0d6ad65f7c161add82ae906efb90dbd7f653d (patch) | |
tree | cc6376d6fd084c6584b18d818c0816e3b7e190c9 /src/backend/regex/regexport.c | |
parent | 17661188336c8cbb1783808912096932c57893a3 (diff) | |
download | postgresql-08c0d6ad65f7c161add82ae906efb90dbd7f653d.tar.gz postgresql-08c0d6ad65f7c161add82ae906efb90dbd7f653d.zip |
Invent "rainbow" arcs within the regex engine.
Some regular expression constructs, most notably the "." match-anything
metacharacter, produce a sheaf of parallel NFA arcs covering all
possible colors (that is, character equivalence classes). We can make
a noticeable improvement in the space and time needed to process large
regexes by replacing such cases with a single arc bearing the special
color code "RAINBOW". This requires only minor additional complication
in places such as pull() and push().
Callers of pg_reg_getoutarcs() must now be prepared for the possibility
of seeing a RAINBOW arc. For the one known user, contrib/pg_trgm,
that's a net benefit since it cuts the number of arcs to be dealt with,
and the handling isn't any different than for other colors that contain
too many characters to be dealt with individually.
This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.
Patch by me, reviewed by Joel Jacobson
Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
Diffstat (limited to 'src/backend/regex/regexport.c')
-rw-r--r-- | src/backend/regex/regexport.c | 7 |
1 files changed, 4 insertions, 3 deletions
diff --git a/src/backend/regex/regexport.c b/src/backend/regex/regexport.c index d4f940b8c34..a493dbe88c1 100644 --- a/src/backend/regex/regexport.c +++ b/src/backend/regex/regexport.c @@ -222,7 +222,8 @@ pg_reg_colorisend(const regex_t *regex, int co) * Get number of member chrs of color number "co". * * Note: we return -1 if the color number is invalid, or if it is a special - * color (WHITE or a pseudocolor), or if the number of members is uncertain. + * color (WHITE, RAINBOW, or a pseudocolor), or if the number of members is + * uncertain. * Callers should not try to extract the members if -1 is returned. */ int @@ -233,7 +234,7 @@ pg_reg_getnumcharacters(const regex_t *regex, int co) assert(regex != NULL && regex->re_magic == REMAGIC); cm = &((struct guts *) regex->re_guts)->cmap; - if (co <= 0 || co > cm->max) /* we reject 0 which is WHITE */ + if (co <= 0 || co > cm->max) /* <= 0 rejects WHITE and RAINBOW */ return -1; if (cm->cd[co].flags & PSEUDO) /* also pseudocolors (BOS etc) */ return -1; @@ -257,7 +258,7 @@ pg_reg_getnumcharacters(const regex_t *regex, int co) * whose length chars_len must be at least as long as indicated by * pg_reg_getnumcharacters(), else not all chars will be returned. * - * Fetching the members of WHITE or a pseudocolor is not supported. + * Fetching the members of WHITE, RAINBOW, or a pseudocolor is not supported. * * Caution: this is a relatively expensive operation. */ |