[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multibyte support for sort, uniq, join, tr, cut, paste, expand, unex

From: Sebastian Kisela
Subject: Re: Multibyte support for sort, uniq, join, tr, cut, paste, expand, unexpand, fmt, fold, and pr
Date: Wed, 17 Jan 2018 08:45:36 +0100

Hi Eric, hi Assaf!

I have checked the Eric's effort on the multibyte support for coreutils.
The work done seems solid.

However I tried using the multibyte tests that are part of the patch that
Assaf mentioned on top of Eric's repository,
and all of the tests that were using C locale failed at the first attempt
of printing a multibyte character.
I believe the reason is the approach to "error handling" as Eric expressed
and I am not sure, if that is not a dangerous behavior, considering it
might break a bunch of scripts counting on the current error handling?

  * Handling of invalid encodings. I generally stop with an error; you wrap
> the foreign byte and pass it through to the output as an opaque object.
It might be better to have at least an option to choose the behavior when
the invalid sequence is encountered.
See [1] and [2] for quite relevant discussion about error handling.

  * Surrogate pairs. I trust wchar_t to be a sufficient character type; you
> have a special case for UTF-16 systems.
> Here I agree with the approach from Eric.

Have a good one,


reply via email to

[Prev in Thread] Current Thread [Next in Thread]