[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20822: environment mangled by locale

From: Zefram
Subject: bug#20822: environment mangled by locale
Date: Sun, 26 Jun 2016 11:33:49 +0100

Mark H Weaver wrote:
>                                           by convention they are
>supposed to encoded in the locale encoding.

This convention is bunk.  The encoding aspect of the locale system is
fundamentally broken: the model is that every string in the universe
(every file content, filename, command line argument, etc.) is encoded
in the same way, and the locale environment variable tells you which
universe you're in.  But in the real universe, files, filenames, and so
on turn up encoded how their authors liked to encode them, and that's
not always the same.  In the real universe we have to cope with data
that is not encoded in our preferred way.

>                                             If that convention is
>violated, I don't see what a program could do about it.

If the convention is violated, then there is some difficulty in presenting
correctly-encoded (or even consistently-encoded) output to the user, but
it is not insuperable.  Perhaps the program knows by some non-locale means
how a string is encoded, and can explicitly convert.  Perhaps it doesn't
know the real encoding, but can trust that the user will understand the
octet string if it is passed through with neither decoding of input nor
encoding for output.  Or perhaps the program doesn't need to put the
string into textual output at all, but only to use it some API or file
format that's expecting an encodingless octet string.

So there are many things a program can reasonably do about it, and which
one to do depends on the application.

>Can someone show me a realistic example of how this would be used in

Looking specifically at environment variables: an environment
variable could give the name of a file that is to be consulted under
specified circumstances, and the right file may happen to have a name
that is inconsistent with the encoding used by the user's terminal.
(The filename is not required for output; it only needs to be passed as
an uninterpreted octet string to the open(2) syscall.)  An environment
variable could specify a Unicode-using name of a language module to be
loaded, while the user doesn't otherwise use Unicode, or doesn't use
an encoding encompassing enough of it.  (Name not required on output,
again; will be either transformed into a filename or looked up in a file
format that specifies its own encoding.)  The program could be env(1), not
interpreting the environment but needing to output the octets correctly.
The program could be saving an uninterpreted environment, for a cron
job to later run some other program with equivalent settings.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]