[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#17196: UTF-8 printf string formating problem
From: |
Pádraig Brady |
Subject: |
bug#17196: UTF-8 printf string formating problem |
Date: |
Mon, 07 Apr 2014 14:08:07 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 04/06/2014 07:24 PM, Bob Proulx wrote:
> Pádraig Brady wrote:
>> Yes printf follows the C standard which only considers bytes.
>> ...
>> I don't think we'd be able to change the current operation of printf
>> due to backwards compat reasons? Though we might be able to somehow leverage
>> the existing multibyte character aware alignment/truncation code in:
>> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=gl/lib/mbsalign.c;hb=HEAD
>
> Dan Douglas pointed out in the corresponding discussion in bug-bash
> that ksh uses the L modifier.
>
> http://lists.gnu.org/archive/html/bug-bash/2014-04/msg00021.html
>
> Dan Douglas wrote:
> > ksh93 already has this feature using the "L" modifier:
> >
> > ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
> > ★★★
>
> At least there is prior art for it.
So we can count bytes, chars or cells (graphemes).
Thinking a bit more about it, I think shell level printf
should be dealing in text of the current encoding and counting cells.
In the edge case where you want to deal in bytes one can do:
LC_ALL=C printf ...
I see that ksh behaves as I would expect and counts cells,
though requires the explicit %L enabler:
$ ksh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
á★★
$ ksh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
A★
$ ksh -c "printf '%.3Ls\n' $'AA\u2605\u2605\u2605'"
A
zsh seems to just count characters:
$ zsh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
á★
$ zsh -c "printf '%.3s\n' $'a\u0301\u2605\u2605\u2605'"
á★
$ zsh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
A★★
I see that dash gives invalid directive for any of %ls %Ls %S.
Pity there is no consensus here.
Personally I would go for:
printf '%3s' 'blah' # count cells
printf '%3Ls' 'blah' # count chars
LANG=C '%3Ls' 'blah' # count bytes
LANG=C '%3s' 'blah' # count bytes
Pádraig.
- bug#17196: UTF-8 printf string formating problem, Jan Novak, 2014/04/06
- bug#17196: UTF-8 printf string formating problem, Pádraig Brady, 2014/04/06
- bug#17196: UTF-8 printf string formating problem, Pádraig Brady, 2014/04/06
- bug#17196: UTF-8 printf string formating problem, Bob Proulx, 2014/04/06
- bug#17196: UTF-8 printf string formating problem, Steffen Nurpmeso, 2014/04/09
- bug#17196: UTF-8 printf string formating problem, Rich Felker, 2014/04/10
- bug#17196: UTF-8 printf string formating problem, Steffen Nurpmeso, 2014/04/10
- bug#17196: UTF-8 printf string formating problem, Chet Ramey, 2014/04/10
- bug#17196: UTF-8 printf string formating problem, Steffen Nurpmeso, 2014/04/11
- bug#17196: UTF-8 printf string formating problem, Chet Ramey, 2014/04/11