[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#7960: [PATCH] fmt: fix formatting multibyte text (bug #7372)
From: |
Eric Blake |
Subject: |
bug#7960: [PATCH] fmt: fix formatting multibyte text (bug #7372) |
Date: |
Wed, 02 Feb 2011 14:33:44 -0700 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7 |
[readding the list]
On 02/02/2011 02:11 PM, Kostya Stopani wrote:
> On Wed, Feb 02, 2011 at 10:15:53AM -0700, Eric Blake wrote:
>
>> Thanks for the patch. However, it's not trivial, so it would need
>> copyright assignment.
>
> Oh boy... Anyway I don't mind signing papers, if you (or whoever)
> don't mind bothering with it.
OK, I'll send you those details off-list.
>
>> Furthermore, there are already known issues where upstream coreutils
>> is lacking multibyte character support, but a solution has to be
>> both maintainable and no-impact to the single-byte locale case.
>
> I believe this patch doesn't break single-byte behavior because no
> conversion takes place. mbsnrtowcs() is used only to count
> characters. I've tested various cases (8-bit encoding was KOI8-R):
>
> |--------+---------------+--------------------------|
> | Locale | Text encoding | Result |
> |--------+---------------+--------------------------|
> | UTF-8 | UTF-8 | old fmt: text too narrow |
> | | | new fmt: ok |
> |--------+---------------+--------------------------|
> | UTF-8 | 8-bit | same |
> |--------+---------------+--------------------------|
> | 8-bit | UTF-8 | same |
> |--------+---------------+--------------------------|
> | 8-bit | 8-bit | same |
> |--------+---------------+--------------------------|
>
> From my point of view the alternative is to convert everything to
> wchar_t, which imposes the need to keep track of conversion errors and
> gracefully fall back to single-byte.
Keeping things in multibyte rather than converting to wchar_t is the way
to go (especially given the ongoing discussion of how to handle the fact
that on cygwin, wchar_t is UTF-16 and thus still multi-unit as an
extension to POSIX, with all sorts of ramifications to programs that
expect POSIX semantics).
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature