aboutsummaryrefslogtreecommitdiff
path: root/src/backend/parser
diff options
context:
space:
mode:
authorAndrew Dunstan <andrew@dunslane.net>2013-06-08 10:00:09 -0400
committerAndrew Dunstan <andrew@dunslane.net>2013-06-08 10:00:09 -0400
commitd535136b5d60b19f7ffa777b97ed301739c15a9d (patch)
treef799a0847c3ab87d2ddc0942ca18da89da9d9b66 /src/backend/parser
parent94e3311b97448324d67ba9a527854271373329d9 (diff)
downloadpostgresql-d535136b5d60b19f7ffa777b97ed301739c15a9d.tar.gz
postgresql-d535136b5d60b19f7ffa777b97ed301739c15a9d.zip
Don't downcase non-ascii identifier chars in multi-byte encodings.
Long-standing code has called tolower() on identifier character bytes with the high bit set. This is clearly an error and produces junk output when the encoding is multi-byte. This patch therefore restricts this activity to cases where there is a character with the high bit set AND the encoding is single-byte. There have been numerous gripes about this, most recently from Martin Schäfer. Backpatch to all live releases.
Diffstat (limited to 'src/backend/parser')
-rw-r--r--src/backend/parser/scansup.c8
1 files changed, 5 insertions, 3 deletions
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 8f6febc694e..f20f3b62a82 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -132,8 +132,10 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
{
char *result;
int i;
+ bool enc_is_single_byte;
result = palloc(len + 1);
+ enc_is_single_byte = pg_database_encoding_max_length() == 1;
/*
* SQL99 specifies Unicode-aware case normalization, which we don't yet
@@ -141,8 +143,8 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
* locale-aware translation. However, there are some locales where this
* is not right either (eg, Turkish may do strange things with 'i' and
* 'I'). Our current compromise is to use tolower() for characters with
- * the high bit set, and use an ASCII-only downcasing for 7-bit
- * characters.
+ * the high bit set, as long as they aren't part of a multi-byte character,
+ * and use an ASCII-only downcasing for 7-bit characters.
*/
for (i = 0; i < len; i++)
{
@@ -150,7 +152,7 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
if (ch >= 'A' && ch <= 'Z')
ch += 'a' - 'A';
- else if (IS_HIGHBIT_SET(ch) && isupper(ch))
+ else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
ch = tolower(ch);
result[i] = (char) ch;
}