better i18n for join, uniq, etc.

From: Paul Eggert
Subject: better i18n for join, uniq, etc.
Date: Mon, 30 Oct 2023 01:48:44 -0700
I installed the attached patches to GNU Coreutils so that join and uniq support multi-byte characters better out-of-the-box. This uses Gnulib's new mcel module which makes for simpler multi-byte processing than what's in Fedora's i18n patches for Coreutils. (I also hope it's faster, though I haven't tested this.)

The idea is to continue this process of using mcel for the other programs where vanilla Coreutils doesn't conform to POSIX in multi-byte locales.

The key patch is 0009. Patch 0010 brings in the Fedora tests for join and uniq in multi-byte locales; these tests pass for me.

Some work is still needed for ignoring case in join and uniq. As I understand it, the Fedora patches don't support 'uniq --ignore-case' in multi-byte locales. They do support 'join --ignore-case', though they ignore it in the simple-minded way that GNU diff does (except diff lowercases first whereas Fedora join uppercases first; although neither approach is perfect isn't lowercasing better?).

Comments welcome. If the idea isn't a good one we can back out the patches. But I hope this can move forward.

