|
From: | Sebastian Kisela |
Subject: | Re: Multibyte support for sort, uniq, join, tr, cut, paste, expand, unexpand, fmt, fold, and pr |
Date: | Wed, 17 Jan 2018 08:45:36 +0100 |
Hi Eric, hi Assaf! I have checked the Eric's effort on the multibyte support for coreutils. The work done seems solid. However I tried using the multibyte tests that are part of the patch that Assaf mentioned on top of Eric's repository, and all of the tests that were using C locale failed at the first attempt of printing a multibyte character. I believe the reason is the approach to "error handling" as Eric expressed and I am not sure, if that is not a dangerous behavior, considering it might break a bunch of scripts counting on the current error handling? * Handling of invalid encodings. I generally stop with an error; you wrap > the foreign byte and pass it through to the output as an opaque object. > > It might be better to have at least an option to choose the behavior when the invalid sequence is encountered. See [1] and [2] for quite relevant discussion about error handling. * Surrogate pairs. I trust wchar_t to be a sufficient character type; you > have a special case for UTF-16 systems. > > Here I agree with the approach from Eric. Have a good one, Sebastian [1] http://lists.gnu.org/archive/html/coreutils/2016-10/msg00001.html [2] http://austingroupbugs.net/bug_view_page.php?bug_id=1007
[Prev in Thread] | Current Thread | [Next in Thread] |