aboutsummaryrefslogtreecommitdiff
path: root/src/include/regex/regguts.h
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2015-10-16 15:36:16 -0400
committerTom Lane <tgl@sss.pgh.pa.us>2015-10-16 15:55:59 -0400
commit538b3b8b359fa77cb7e1507113efb788b4e159c9 (patch)
tree3b0fd75b342361e0fbdfcfd8f69682297c446432 /src/include/regex/regguts.h
parent6a7153661d66a00a03ff117c24fa49480b0699c8 (diff)
downloadpostgresql-538b3b8b359fa77cb7e1507113efb788b4e159c9.tar.gz
postgresql-538b3b8b359fa77cb7e1507113efb788b4e159c9.zip
Improve memory-usage accounting in regular-expression compiler.
This code previously counted the number of NFA states it created, and complained if a limit was exceeded, so as to prevent bizarre regex patterns from consuming unreasonable time or memory. That's fine as far as it went, but the code paid no attention to how many arcs linked those states. Since regexes can be contrived that have O(N) states but will need O(N^2) arcs after fixempties() processing, it was still possible to blow out memory, and take a long time doing it too. To fix, modify the bookkeeping to count space used by both states and arcs. I did not bother with including the "color map" in the accounting; it can only grow to a few megabytes, which is not a lot in comparison to what we're allowing for states+arcs (about 150MB on 64-bit machines or half that on 32-bit machines). Looking at some of the larger real-world regexes captured in the Tcl regression test suite suggests that the most that is likely to be needed for regexes found in the wild is under 10MB, so I believe that the current limit has enough headroom to make it okay to keep it as a hard-wired limit. In connection with this, redefine REG_ETOOBIG as meaning "regular expression is too complex"; the previous wording of "nfa has too many states" was already somewhat inapropos because of the error code's use for stack depth overrun, and it was not very user-friendly either. Back-patch to all supported branches.
Diffstat (limited to 'src/include/regex/regguts.h')
-rw-r--r--src/include/regex/regguts.h15
1 files changed, 9 insertions, 6 deletions
diff --git a/src/include/regex/regguts.h b/src/include/regex/regguts.h
index 357891aa254..19fe991c74f 100644
--- a/src/include/regex/regguts.h
+++ b/src/include/regex/regguts.h
@@ -334,9 +334,6 @@ struct nfa
struct colormap *cm; /* the color map */
color bos[2]; /* colors, if any, assigned to BOS and BOL */
color eos[2]; /* colors, if any, assigned to EOS and EOL */
- size_t size; /* Current NFA size; differs from nstates as
- * it also counts the number of states in
- * children of this NFA. */
struct vars *v; /* simplifies compile error reporting */
struct nfa *parent; /* parent NFA, if any */
};
@@ -384,10 +381,16 @@ struct cnfa
#define NULLCNFA(cnfa) ((cnfa).nstates == 0)
/*
- * Used to limit the maximum NFA size to something sane. [Tcl Bug 1810264]
+ * This symbol limits the transient heap space used by the regex compiler,
+ * and thereby also the maximum complexity of NFAs that we'll deal with.
+ * Currently we only count NFA states and arcs against this; the other
+ * transient data is generally not large enough to notice compared to those.
+ * Note that we do not charge anything for the final output data structures
+ * (the compacted NFA and the colormap).
*/
-#ifndef REG_MAX_STATES
-#define REG_MAX_STATES 100000
+#ifndef REG_MAX_COMPILE_SPACE
+#define REG_MAX_COMPILE_SPACE \
+ (100000 * sizeof(struct state) + 100000 * sizeof(struct arcbatch))
#endif
/*