bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17196: UTF-8 printf string formating problem


From: Pádraig Brady
Subject: bug#17196: UTF-8 printf string formating problem
Date: Mon, 07 Apr 2014 14:08:07 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 04/06/2014 07:24 PM, Bob Proulx wrote:
> Pádraig Brady wrote:
>> Yes printf follows the C standard which only considers bytes.
>> ...
>> I don't think we'd be able to change the current operation of printf
>> due to backwards compat reasons? Though we might be able to somehow leverage
>> the existing multibyte character aware alignment/truncation code in:
>> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=gl/lib/mbsalign.c;hb=HEAD
> 
> Dan Douglas pointed out in the corresponding discussion in bug-bash
> that ksh uses the L modifier.
> 
>   http://lists.gnu.org/archive/html/bug-bash/2014-04/msg00021.html
> 
>   Dan Douglas wrote:
>   > ksh93 already has this feature using the "L" modifier:
>   > 
>   > ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
>   > ★★★
> 
> At least there is prior art for it.

So we can count bytes, chars or cells (graphemes).

Thinking a bit more about it, I think shell level printf
should be dealing in text of the current encoding and counting cells.
In the edge case where you want to deal in bytes one can do:
  LC_ALL=C printf ...

I see that ksh behaves as I would expect and counts cells,
though requires the explicit %L enabler:
  $ ksh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
  á★★
  $ ksh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
  A★
  $ ksh -c "printf '%.3Ls\n' $'AA\u2605\u2605\u2605'"
  A

zsh seems to just count characters:
  $ zsh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
  á★
  $ zsh -c "printf '%.3s\n' $'a\u0301\u2605\u2605\u2605'"
  á★
  $ zsh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
  A★★

I see that dash gives invalid directive for any of %ls %Ls %S.

Pity there is no consensus here.
Personally I would go for:
  printf '%3s' 'blah'  # count cells
  printf '%3Ls' 'blah' # count chars
  LANG=C '%3Ls' 'blah' # count bytes
  LANG=C '%3s' 'blah'  # count bytes

Pádraig.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]