postgresql - postgresql mirror

diff options

author	Tom Lane <tgl@sss.pgh.pa.us>	2009-12-01 21:00:24 +0000
committer	Tom Lane <tgl@sss.pgh.pa.us>	2009-12-01 21:00:24 +0000
commit	0d32342501f2a562bc57156dc92d59a0624be4a6 (patch)
tree	9039a0f5bdc634c1a7dfa99371160e51e1759168 /contrib/btree_gist/btree_utils_var.c
parent	ef51395e24c7452a9a50e3576b52fb64602f8cad (diff)
download	postgresql-0d32342501f2a562bc57156dc92d59a0624be4a6.tar.gz postgresql-0d32342501f2a562bc57156dc92d59a0624be4a6.zip

Teach the regular expression functions to do case-insensitive matching and

locale-dependent character classification properly when the database encoding is UTF8. The previous coding worked okay in single-byte encodings, or in any case for ASCII characters, but failed entirely on multibyte characters. The fix assumes that the <wctype.h> functions use Unicode code points as the wchar representation for Unicode, ie, wchar matches pg_wchar. This is only a partial solution, since we're still stupid about non-ASCII characters in multibyte encodings other than UTF8. The practical effect of that is limited, however, since those cases are generally Far Eastern glyphs for which concepts like case-folding don't apply anyway. Certainly all or nearly all of the field reports of problems have been about UTF8. A more general solution would require switching to the platform's wchar representation for all regex operations; which is possible but would have substantial disadvantages. Let's try this and see if it's sufficient in practice.

Diffstat (limited to 'contrib/btree_gist/btree_utils_var.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: