src/common/unicode/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

This directory contains tools to generate the tables in
src/include/common/unicode_norm.h, used for Unicode normalization. The
generated .h file is included in the source tree, so these are normally not
needed to build PostgreSQL, only if you need to re-generate the .h file
from the Unicode data files for some reason, e.g. to update to a new version
of Unicode.

Generating unicode_norm_table.h
-------------------------------

1. Download the Unicode data file, UnicodeData.txt, from the Unicode
consortium and place it to the current directory. Run the perl script
"norm_test_generate.pl", to process it, and to generate the
"unicode_norm_table.h" file. The Makefile contains a rule to download the
data files if they don't exist.

    make unicode_norm_table.h

2. Inspect the resulting header file. Once you're happy with it, copy it to
the right location.

    cp unicode_norm_table.h ../../../src/include/common/


Tests
-----

The Unicode consortium publishes a comprehensive test suite for the
normalization algorithm, in a file called NormalizationTest.txt. This
directory also contains a perl script and some C code, to run our
normalization code with all the test strings in NormalizationTest.txt.
To download NormalizationTest.txt and run the tests:

    make normalization-check