[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
locale-dependent token separator handling doesn't work in multi-byte loc
locale-dependent token separator handling doesn't work in multi-byte locales
Wed, 8 Oct 2014 15:52:24 +0100
When bash parses code it honours the "blank" character class in
the current locale as token separator.
For instance, if "x" is a blank character in the current locale,
would output bar. "yash" is the only other shell that I know
that does the same.
With bash, that only works in single-byte locales though.
Probably because bash does some isblank() on individual bytes
instead of characters.
I would also question the usefulness of such a feature.
That's what aggravated CVE-2014-0475 (a glibc vulnerability). By
creating a locale where every character except "s" "h" and a few
others were blanks, one could do LC_ALL=../../my/evil/locale ssh
address@hidden and interpreting the /etc/bash.bashrc as
shipped with some GNU/Linux distributions was enough to get a
shell on the git server (provided you were able to upload the
locale to the server).
That also means that the script syntax also depends on the
>From a review of the available locales on my GNU system, I
couldn't find a single locale where "blank" is anything but
space and tab.
The only locales where more blank characters are defined are the
So removing that feature would not break anything.
There's a similar issue in what is allowed in variable names
$ address@hidden bash -c $'declare St\xe9phane=1'
$ LC_ALL=fr_FR.UTF-8 bash -c $'declare St\u00e9phane=1'
bash: ligne 0 : declare: « Stéphane=1 » : identifiant non valable
Here, removing the feature might break scripts written for
single-byte non-ASCII locales, but given that most of the world
is switching to UTF-8, it seems unlikely as we'd have seen
reports of the problem before.
yash, zsh and ksh93 do support
LC_ALL=fr_FR.UTF-8 zsh -c $'St\u00e9phane=1'
in multi-byte locale. but again, I'd say it's not necessarily useful.
It's nice to be able to use my first name as a variable name,
but make the parsing of a script depending on its *user*'s (as
opposed to *author*'s) locale is not ideal.
Maybe there's a better way to address that.
In any case, I think the feature should either be fixed (make it
also work in multi-byte locales), or the limitation (that it
only works in single-byte locales) documented, or the feature
(make the parsing locale-dependant) removed.
- locale-dependent token separator handling doesn't work in multi-byte locales,
Stephane Chazelas <=