[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Weird printf '%c' behaviour under non-C locale
From: |
Aharon Robbins |
Subject: |
Re: [bug-gawk] Weird printf '%c' behaviour under non-C locale |
Date: |
Fri, 27 Apr 2012 11:23:53 +0300 |
User-agent: |
Heirloom mailx 12.4 7/29/08 |
Hi.
> Date: Mon, 23 Apr 2012 14:09:27 +0200
> To: address@hidden
> From: Jeroen Schot <address@hidden>
> Subject: [bug-gawk] Weird printf '%c' behaviour under non-C locale
>
> Hello,
>
> A Debian user reported a bug [1] about gawk's printf '%c' behaviour,
> which has changed in 4.0. It is no longer possible to write single
> bytes when using a multibyte locale.
>
> From what I understand the new behaviour is in line POSIX, but I
> believe there should still be a way to write single bytes (other than
> changing the locale of the entire program).
>
> I am interested in your opinion on this.
>
> [1]: http://bugs.debian.org/669714
>
> Kinds regards,
>
> Jeroen Schot
Thanks for the report. I reviewed the test case you pointed to. I would
not call it a bug, per se, but it is a corner case. ("Damned if you do
and damned if you don't.")
As we discussed offline, I think the -b option is the correct answer
here. I have updated the doc to reflect that -b also affects output.
I tested that if the poster changes his script to
#! /usr/bin/gawk -bf
....
that the correct result is produced, even in a UTF-8 locale.
Another option, portable across awks and other Unix systems, would be
to rework the script as a shell script:
#! /bin/sh
LC_ALL=C
export LC_ALL # override all inherited locale settings
awk '....' "$@"
Hmmm. Might need a little more shell scripting to get the -v options before
the program, but you get the idea.
Thanks!
Arnold