bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Possible printf %c width multi-byte bug


From: Aharon Robbins
Subject: Re: [bug-gawk] Possible printf %c width multi-byte bug
Date: Fri, 10 May 2013 11:30:32 +0300
User-agent: Heirloom mailx 12.5 6/20/10

Hi. I'm not sure why, but I received three copies of your note.

> From: Nethox <address@hidden>
> To: address@hidden
> Subject: [bug-gawk] Possible printf %c width multi-byte bug
>
> I am not sure if the following is a bug or intended behaviour. But I
> find gawk's printf %c and %s inconsistent when width is specified and
> multi-byte encoding (UTF-8) is used.
>
>
> Test program:
>         BEGIN { printf "%2c\n", "??" }
> ....

The short answer is "this whole business is a mess".

I did not find the POSIX standard to be super clear on this point. OTOH,
it would probably not hurt to spend some time digging and langage
lawyering with the standard to try to figure things out a little more.

Things are complicated because all input and output use multibyte
encodings whereas wide characters are simply large numerical values.

In any case, the upcoming 4.1 release is in code freeze. After it's
released I will try to spend some time reading the standard and also
stepping through the various cases with a debugger.

I suspect that no matter what I do, it will be wrong for some corner case.

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]