[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: locale-dependent token separator handling doesn't work in multi-byte
Re: locale-dependent token separator handling doesn't work in multi-byte locales
Wed, 8 Oct 2014 17:36:03 +0100
2014-10-08 09:17:18 -0600, Eric Blake:
> I would argue that locale-dependent parsing is probably a bug waiting to
> happen, and would be in favor of removing the feature and forcing the
> use of the C locale for the duration of parsing a script. Yes, that
> means you can't write a variable name with non-ASCII characters, but as
> you've demonstrated, running such a script in a different locale than
> where it was written raises too many issues about what should happen.
Note that they're not the only problem. ksh arithmetics honours
the decimal point (which by the way when it's "," conflicts with
the "," operator), and of course there's a problem with
character ranges and classes.
the problem is that the shell and the utilities are both used as
tools by the user and as building blocks in the language used to
write scripts which means there's a conflict there.
> This may also be the sort of question worth asking the Austin Group
> about, to see if POSIX should be tightened on this front.
I agree, better before Chet starts to work on it. Chet says
that it's a POSIX requirement to have code parsed according to
locale and that's also my understanding after reading:
Given that yash is the only conformant shell in that regard
(though has issues IIRC), there's not much point POSIX requiring