Support PG_UNICODE_FAST locale in the builtin collation provider.

The PG_UNICODE_FAST locale uses code point sort order (fast, memcmp-based) combined with Unicode character semantics. The character semantics are based on Unicode full case mapping. Full case mapping can map a single codepoint to multiple codepoints, such as "ß" uppercasing to "SS". Additionally, it handles context-sensitive mappings like the "final sigma", and it uses titlecase mappings such as "ǅ" when titlecasing (rather than plain uppercase mappings). Importantly, the uppercasing of "ß" as "SS" is specifically mentioned by the SQL standard. In Postgres, UCS_BASIC uses plain ASCII semantics for case mapping and pattern matching, so if we changed it to use the PG_UNICODE_FAST locale, it would offer better compliance with the standard. For now, though, do not change the behavior of UCS_BASIC. Discussion: https://postgr.es/m/ddfd67928818f138f51635712529bc5e1d25e4e7.camel@j-davis.com Discussion: https://postgr.es/m/27bb0e52-801d-4f73-a0a4-02cfdd4a9ada@eisentraut.org Reviewed-by: Peter Eisentraut, Daniel Verite
author: Jeff Davis <jdavis@postgresql.org> 2025-01-17 15:56:30 -0800
committer: Jeff Davis <jdavis@postgresql.org> 2025-01-17 15:56:30 -0800
commit: d3d0983169130a9b81e3fe48d5c2ca4931480956 (patch)
tree: 75e680ff03b4af3fd21a36be49515367133e6d02 /src/backend/utils/adt/pg_locale_builtin.c
parent: 286a365b9c25479f8ad82043ed136748733adfa6 (diff)
download: postgresql-d3d0983169130a9b81e3fe48d5c2ca4931480956.tar.gz
postgresql-d3d0983169130a9b81e3fe48d5c2ca4931480956.zip
1 files changed, 9 insertions, 3 deletions
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index fef5b6e6d38..436e32c0ca0 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -78,7 +78,8 @@ size_t
 strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	return unicode_strlower(dest, destsize, src, srclen, false);
+	return unicode_strlower(dest, destsize, src, srclen,
+							locale->info.builtin.casemap_full);
 }
 
 size_t
@@ -93,7 +94,8 @@ strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		.prev_alnum = false,
 	};
 
-	return unicode_strtitle(dest, destsize, src, srclen, false,
+	return unicode_strtitle(dest, destsize, src, srclen,
+							locale->info.builtin.casemap_full,
 							initcap_wbnext, &wbstate);
 }
 
@@ -101,7 +103,8 @@ size_t
 strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	return unicode_strupper(dest, destsize, src, srclen, false);
+	return unicode_strupper(dest, destsize, src, srclen,
+							locale->info.builtin.casemap_full);
 }
 
 pg_locale_t
@@ -142,6 +145,7 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
 	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
+	result->info.builtin.casemap_full = (strcmp(locstr, "PG_UNICODE_FAST") == 0);
 	result->provider = COLLPROVIDER_BUILTIN;
 	result->deterministic = true;
 	result->collate_is_c = true;
@@ -164,6 +168,8 @@ get_collation_actual_version_builtin(const char *collcollate)
 		return "1";
 	else if (strcmp(collcollate, "C.UTF-8") == 0)
 		return "1";
+	else if (strcmp(collcollate, "PG_UNICODE_FAST") == 0)
+		return "1";
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
author	Jeff Davis <jdavis@postgresql.org>	2025-01-17 15:56:30 -0800
committer	Jeff Davis <jdavis@postgresql.org>	2025-01-17 15:56:30 -0800
commit	d3d0983169130a9b81e3fe48d5c2ca4931480956 (patch)
tree	75e680ff03b4af3fd21a36be49515367133e6d02 /src/backend/utils/adt/pg_locale_builtin.c
parent	286a365b9c25479f8ad82043ed136748733adfa6 (diff)
download	postgresql-d3d0983169130a9b81e3fe48d5c2ca4931480956.tar.gz postgresql-d3d0983169130a9b81e3fe48d5c2ca4931480956.zip