coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

better i18n for join, uniq, etc.


From: Paul Eggert
Subject: better i18n for join, uniq, etc.
Date: Mon, 30 Oct 2023 01:48:44 -0700
User-agent: Mozilla Thunderbird

I installed the attached patches to GNU Coreutils so that join and uniq support multi-byte characters better out-of-the-box. This uses Gnulib's new mcel module which makes for simpler multi-byte processing than what's in Fedora's i18n patches for Coreutils. (I also hope it's faster, though I haven't tested this.)

The idea is to continue this process of using mcel for the other programs where vanilla Coreutils doesn't conform to POSIX in multi-byte locales.

The key patch is 0009. Patch 0010 brings in the Fedora tests for join and uniq in multi-byte locales; these tests pass for me.

Some work is still needed for ignoring case in join and uniq. As I understand it, the Fedora patches don't support 'uniq --ignore-case' in multi-byte locales. They do support 'join --ignore-case', though they ignore it in the simple-minded way that GNU diff does (except diff lowercases first whereas Fedora join uppercases first; although neither approach is perfect isn't lowercasing better?).

Comments welcome. If the idea isn't a good one we can back out the patches. But I hope this can move forward.

Attachment: 0001-maint-prefer-c_isxdigit-when-that-is-the-intent.patch
Description: Text Data

Attachment: 0002-digest-omit-unnecessary-b2sum-includes.patch
Description: Text Data

Attachment: 0003-maint-move-field_sep-into-separate-module.patch
Description: Text Data

Attachment: 0004-maint-include-ctype.h-selectively.patch
Description: Text Data

Attachment: 0005-maint-port-to-oddball-tolower.patch
Description: Text Data

Attachment: 0006-dircolors-assume-C-locale-spaces.patch
Description: Text Data

Attachment: 0007-stdbuf-port-to-oddball-toupper.patch
Description: Text Data

Attachment: 0008-test-allow-non-blank-white-space-in-numbers.patch
Description: Text Data

Attachment: 0009-join-uniq-support-multi-byte-separators.patch
Description: Text Data

Attachment: 0010-maint-copy-join-uniq-tests-from-Fedora.patch
Description: Text Data

Attachment: 0011-maint-pacify-make-syntax-check.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]