aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/mb/Unicode
Commit message (Collapse)AuthorAge
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Make all Perl warnings fatalPeter Eisentraut2023-12-29
| | | | | | | | | | | | | | | | | | | | | | | | There are a lot of Perl scripts in the tree, mostly code generation and TAP tests. Occasionally, these scripts produce warnings. These are probably always mistakes on the developer side (true positives). Typical examples are warnings from genbki.pl or related when you make a mess in the catalog files during development, or warnings from tests when they massage a config file that looks different on different hosts, or mistakes during merges (e.g., duplicate subroutine definitions), or just mistakes that weren't noticed because there is a lot of output in a verbose build. This changes all warnings into fatal errors, by replacing use warnings; by use warnings FATAL => 'all'; in all Perl files. Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org
* Remove distprepPeter Eisentraut2023-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A PostgreSQL release tarball contains a number of prebuilt files, in particular files produced by bison, flex, perl, and well as html and man documentation. We have done this consistent with established practice at the time to not require these tools for building from a tarball. Some of these tools were hard to get, or get the right version of, from time to time, and shipping the prebuilt output was a convenience to users. Now this has at least two problems: One, we have to make the build system(s) work in two modes: Building from a git checkout and building from a tarball. This is pretty complicated, but it works so far for autoconf/make. It does not currently work for meson; you can currently only build with meson from a git checkout. Making meson builds work from a tarball seems very difficult or impossible. One particular problem is that since meson requires a separate build directory, we cannot make the build update files like gram.h in the source tree. So if you were to build from a tarball and update gram.y, you will have a gram.h in the source tree and one in the build tree, but the way things work is that the compiler will always use the one in the source tree. So you cannot, for example, make any gram.y changes when building from a tarball. This seems impossible to fix in a non-horrible way. Second, there is increased interest nowadays in precisely tracking the origin of software. We can reasonably track contributions into the git tree, and users can reasonably track the path from a tarball to packages and downloads and installs. But what happens between the git tree and the tarball is obscure and in some cases non-reproducible. The solution for both of these issues is to get rid of the step that adds prebuilt files to the tarball. The tarball now only contains what is in the git tree (*). Getting the additional build dependencies is no longer a problem nowadays, and the complications to keep these dual build modes working are significant. And of course we want to get the meson build system working universally. This commit removes the make distprep target altogether. The make dist target continues to do its job, it just doesn't call distprep anymore. (*) - The tarball also contains the INSTALL file that is built at make dist time, but not by distprep. This is unchanged for now. The make maintainer-clean target, whose job it is to remove the prebuilt files in addition to what make distclean does, is now just an alias to make distprep. (In practice, it is probably obsolete given that git clean is available.) The following programs are now hard build requirements in configure (they were already required by meson.build): - bison - flex - perl Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
* Doc: improve cross-reference in Makefile comment.Tom Lane2023-09-25
| | | | | | Per gripe from Japin Li. Discussion: https://postgr.es/m/MEYP282MB16692171F13B5DF40DB768EEB6FCA@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
* Pre-beta mechanical code beautification.Tom Lane2023-05-19
| | | | | | | | | | | | | | | Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Make Unicode makefile parallel-safePeter Eisentraut2021-10-04
| | | | | | | | | Fix the rules so that each rule is parallel safe, using the same trickery that we use elsewhere in the tree for rules that produce more than one output file. Refactor the whole makefile so that there is less repetition. Discussion: https://www.postgresql.org/message-id/18e34084-aab1-1b4c-edd1-c4f9fb04f714%40enterprisedb.com
* Update Unicode map text filesPeter Eisentraut2021-10-04
| | | | | | | A couple of newer ones are available. There are no functional differences, but let's get them in anyway, so that there is no surprise diff next time someone wants to do some actual work in this area.
* Fix typo in commentPeter Eisentraut2021-07-22
| | | | | Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/20210716.170209.175434392011070182.horikyota.ntt%40gmail.com
* Remove some whitespace in generated C outputPeter Eisentraut2021-07-19
| | | | | | | | It doesn't match the normal coding style. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/22016aa9-ca59-15c7-01df-f292cb558c4d@enterprisedb.com
* Make UCS_to_most.pl process encodings in sorted orderPeter Eisentraut2021-07-19
| | | | | | | | This just makes the progress output easier to follow. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/22016aa9-ca59-15c7-01df-f292cb558c4d@enterprisedb.com
* Initial pgindent and pgperltidy run for v14.Tom Lane2021-05-12
| | | | | | | | Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Fix some typosMichael Paquier2020-11-14
| | | | | Author: Daniel Gustafsson Discussion: https://postgr.es/m/C36ADFDF-D09A-4EE5-B186-CB46C3653F4C@yesql.se
* Fix conversion table generator scripts.Thomas Munro2020-07-22
| | | | | | | | | | | | | convutils.pm used implicit conversion of undefined value to integer zero. Some of conversion scripts are susceptible to regexp greediness. Fix, avoiding whitespace changes in the output. Also update ICU URLs that moved. No need to back-patch, because the output of these scripts is also in the source tree so we shouldn't need to rerun them on back-branches. Author: Kyotaro Horiguchi <horikyoga.ntt@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJ7SEGLbj%3D%3DTQCcyKRA9aqj8%2B6L%3DexSq1y25TA%3DWxLziQ%40mail.gmail.com
* Use perl warnings pragma consistentlyAndrew Dunstan2020-04-13
| | | | | | | | | | We've had a mixture of the warnings pragma, the -w switch on the shebang line, and no warnings at all. This patch removes the -w swicth and add the warnings pragma to all perl sources missing it. It raises the severity of the TestingAndDebugging::RequireUseWarnings perlcritic policy to level 5, so that we catch any future violations. Discussion: https://postgr.es/m/20200412074245.GB623763@rfd.leadboat.com
* Add support for automatically updating Unicode derived filesPeter Eisentraut2020-01-09
| | | | | | | | | | | | | | | | | | | | | We currently have several sets of files generated from data provided by Unicode. These all have ad hoc rules and instructions for updating when new Unicode versions appear, and it's not done consistently. This patch centralizes and automates the process and makes it part of the release checklist. The Unicode and CLDR versions are specified in Makefile.global.in. There is a new make target "update-unicode" that downloads all the relevant files and runs the generation script. There is also a new script for generating the table of combining characters for ucs_wcwidth(). That table is now in a separate include file rather than hardcoded into the middle of other code. This is based on the script that was used for generating d8594d123c155aeecd47fc2450f62f5100b2fbf0, but the script itself wasn't committed at that time. Reviewed-by: John Naylor <john.naylor@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c8d05f42-443e-6c23-819b-05b31759a37c@2ndquadrant.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Update unicode.org URLsPeter Eisentraut2019-10-13
| | | | | Use https, consistent host name, remove references to ftp. Also update the URLs for CLDR, which has moved from Trac to GitHub.
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Avoid use of unportable hex constant in convutils.pmAndrew Dunstan2018-05-27
| | | | Discussion: https://postgr.es/m/5a6d6de8-cff8-1ffb-946c-ccf381800ea1@2ndQuadrant.com
* Don't fall off the end of perl functionsAndrew Dunstan2018-05-27
| | | | | | | | | | | | | | | This complies with the perlcritic policy Subroutines::RequireFinalReturn, which is a severity 4 policy. Since we only currently check at severity level 5, the policy is raised to that level until we move to level 4 or lower, so that any new infringements will be caught. A small cosmetic piece of tidying of the pgperlcritic script is included. Mike Blackwell Discussion: https://postgr.es/m/CAESHdJpfFm_9wQnQ3koY3c91FoRQsO-fh02za9R3OEMndOn84A@mail.gmail.com
* Restrict vertical tightness to parentheses in Perl codeAndrew Dunstan2018-05-09
| | | | | | | | | | | | | | | | | The vertical tightness settings collapse vertical whitespace between opening and closing brackets (parentheses, square brakets and braces). This can make data structures in particular harder to read, and is not very consistent with our style in non-Perl code. This patch restricts that setting to parentheses only, and reformats all the perl code accordingly. Not applying this to parentheses has some unfortunate effects, so the consensus is to keep the setting for parentheses and not for the others. The diff for this patch does highlight some places where structures should have trailing commas. They can be added manually, as there is no automatic tool to do so. Discussion: https://postgr.es/m/a2f2b87c-56be-c070-bfc0-36288b4b41c1@2ndQuadrant.com
* perltidy: Add option --nooutdent-long-commentsPeter Eisentraut2018-04-27
|
* perltidy: Add option --nooutdent-long-quotesPeter Eisentraut2018-04-27
|
* Update headers of generated filesPeter Eisentraut2018-02-24
| | | | | The scripts were changed in c98c35cd084a25c6cf9b08c76de8b89facd75fe7, but the output files were not updated to reflect the script changes.
* Add current directory to Perl include pathPeter Eisentraut2018-02-24
| | | | | | Recent Perl versions don't have the current directory in the module include path anymore, so we need to add it here explicitly to make these scripts continue to work.
* Use croak instead of die in Perl code when appropriatePeter Eisentraut2018-02-24
|
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Avoid putting build-location-dependent strings into generated files.Tom Lane2017-12-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various Perl scripts we use to generate files were in the habit of printing things like "generated by $0" into their output files. That looks like a fine idea at first glance, but it results in non-reproducible output, because in VPATH builds $0 won't be just the name of the script file, but a full path for it. We'd prefer that you get identical results whether using VPATH or not, so this is a bad thing. Some of these places also printed their input file name(s), causing an additional hazard of the same type. Hence, establish a policy that thou shalt not print $0, nor input file pathnames, into output files (they're still allowed in error messages, though). Instead just write the script name verbatim. While we are at it, we can make these annotations more useful by giving the script's full relative path name within the PG source tree, eg instead of Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl. Not all of the changes made here actually affect any files shipped in finished tarballs today, but it seems best to apply the policy everyplace so that nobody copies unsafe code into places where it could matter. Christoph Berg and Tom Lane Discussion: https://postgr.es/m/20171215102223.GB31812@msg.df7cb.de
* UCS_to_most.pl: Process encodings in sorted orderPeter Eisentraut2017-10-19
| | | | | Otherwise the order depends on the Perl hash implementation, making it cumbersome to scan the output when debugging.
* Remove unnecessary parentheses in return statementsPeter Eisentraut2017-09-05
| | | | | | | | The parenthesized style has only been used in a few modules. Change that to use the style that is predominant across the whole tree. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
* Post-PG 10 beta1 pgperltidy runBruce Momjian2017-05-17
|
* Remove duplicate assignment.Heikki Linnakangas2017-04-07
| | | | | | Harmless, but clearly wrong. Kyotaro Horiguchi
* Include array size in forward declaration.Heikki Linnakangas2017-03-13
| | | | | Some compilers require it. At least Visual Studio, according to the buildfarm, and gcc with the -pedantic flag.
* Use radix tree for character encoding conversions.Heikki Linnakangas2017-03-13
| | | | | | | | | | | | | | | | | | | Replace the mapping tables used to convert between UTF-8 and other character encodings with new radix tree-based maps. Looking up an entry in a radix tree is much faster than a binary search in the old maps. As a bonus, the radix tree representation is also more compact, making the binaries slightly smaller. The "combined" maps work the same as before, with binary search. They are much smaller than the main tables, so it doesn't matter so much. However, the "combined" maps are now stored in the same .map files as the main tables. This seems more clear, since they're always used together, and generated from the same source files. Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages. Reviewed by Michael Paquier and Daniel Gustafsson. Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp
* Remove obsolete references to JIS0201.TXT JIS0208.TXT.Heikki Linnakangas2017-03-13
| | | | We don't use those files anymore, since commit 1de9cc0dcc.
* Add KOI8-U map files to Makefile.Heikki Linnakangas2017-02-02
| | | | | | | These were left out by mistake back when support for KOI8-U encoding was added. Extracted from Kyotaro Horiguchi's larger patch.
* Small fixes to the Perl scripts to create unicode conversion tables.Heikki Linnakangas2017-02-01
| | | | | | | | | Add missing semicolons in UCS_to_* perl scripts. For consistency, use "$hashref->{key}" style everywhere. Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/20170130.153738.139030994.horiguchi.kyotaro@lab.ntt.co.jp
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Make all unicode perl scripts to use strict, rearrange logic for clarity.Heikki Linnakangas2016-11-30
| | | | | | | The loops were a bit difficult to understand, due to breaking out of them early. Also fix things that perlcritic complained about. Daniel Gustafsson
* Rewrite the perl scripts to produce our Unicode conversion tables.Heikki Linnakangas2016-11-30
| | | | | | | | | | | | | | | | | Generate EUC_CN mappings from gb-18030-2000.xml, because GB2312.TXT is no longer available. Get UHC from windows-949-2000.xml, it's more up-to-date. Plus tons more small changes. With these changes, the perl scripts faithfully produce the *.map files we have in the repository, from the external source files. In the passing, fix the Makefile to also download CP932.TXT and CP950.TXT. Based on patches by Kyotaro Horiguchi, reviewed by Daniel Gustafsson. Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi
* Remove leading zeros, for consistency with other map files.Heikki Linnakangas2016-11-30
| | | | | | | | | | | The common style is to pad to 4 digits. Running the current perl scripts to generate these map files would override this change, but the next commit will rewrite the perl scripts to produce this style. I'm doing this as a separate commit, to make it more clear what non-cosmetic changes the next commit makes to the map files. Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi
* Remove code points < 0x80 from character conversion tables.Heikki Linnakangas2016-11-30
| | | | | | | | | | | | | | | | | PostgreSQL treats characters with < 0x80 leading byte as plain ASCII, and they are not even passed to the conversion routines. There is no point in having them in the conversion tables. Everything in the tables were direct ASCII-ASCII mappings, except for two: * SHIFT_JIS_2004 code point 0x5C (backslash in ASCII) was mapped to Unicode YEN SIGN character. * Unicode 0x5C (backslash again) was mapped to "REVERSE SOLIDUS" in SHIFT_JIS_2004 These mappings never had any effect, so there's no functional change from removing them. Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi
* Fix broken statement in UCS_to_most.pl.Robert Haas2016-11-15
| | | | | | | This has been wrong for a very long time, and it's puzzling to me how it ever worked for anyone. Kyotaro Horiguchi
* Add make rules to download raw Unicode mapping filesPeter Eisentraut2016-11-01
| | | | | | This serves as implicit documentation and is handy if someone wants to tweak things. The rules are not part of a normal build, like this entire directory.
* Remove bogus mapping from UTF-8 to SJIS conversion table.Heikki Linnakangas2016-10-07
| | | | | | | | | | | | | | 0xc19c is not a valid UTF-8 byte sequence. It doesn't do any harm, AFAICS, but it's surely not intentional. No backpatching though, just to be sure. In the passing, also add a file header comment to the file, like the UCS_to_SJIS.pl script would produce. (The file was originally created with UCS_to_SJIS.pl, but has been modified by hand since then. That's questionable, but I'll leave fixing that for later.) Kyotaro Horiguchi Discussion: <20160907.155050.233844095.horiguchi.kyotaro@lab.ntt.co.jp>
* Finish pgindent run for 9.6: Perl files.Noah Misch2016-06-12
|