bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Using ENVIRON["LANG"] = "C" instead of --characters-as-by


From: Hermann Peifer
Subject: Re: [bug-gawk] Using ENVIRON["LANG"] = "C" instead of --characters-as-bytes
Date: Thu, 12 Mar 2015 20:21:52 +0100
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.5.0

On 2015-03-12 20:11, Eli Zaretskii wrote:
Date: Thu, 12 Mar 2015 19:14:47 +0100
From: Hermann Peifer <address@hidden>

I thought that in gawk/master, I could use ENVIRON["LANG"] = "C" instead
of --characters-as-bytes, in analogy to changing the time zone via
ENVIRON["TZ"] = "UTC". I do however always end up with the lint warning:
Invalid multibyte data detected. There may be a mismatch between your
data and your locale.

Is this a feature ?

Hermann

# Some code snippet which doesn't work as epxected
BEGIN {
        # Try to simulate --characters-as-bytes
        ENVIRON["LC_ALL"] = "C"
        ENVIRON["LANG"] = "C"
}

AFAIR, you need to call 'setlocale' after setting these in the
environment, for it to switch to another locale.  Unless, that is,
Gawk does that for you when you set these members in ENVIRON[] (which
it doesn't, AFAICS).

And btw, the assumption that 'setlocale' looks at these environment
variables is non-portable outside of the Posix world.


Thanks. I am using gawk/master where the NEWS say:

1. If not in POSIX mode, changes to ENVIRON are reflected into
   gawk's environment, affecting any programs run by system()
   or for piped redirections. This can also affect built-in routines, such
as mktime(), which is typically influenced by the TZ environment variable.

# This works as expected (my locale is en_US.UTF-8)
$ LC_ALL=C awk -f myscript.awk ...

Hermann



reply via email to

[Prev in Thread] Current Thread [Next in Thread]