bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: quotearg improvements [was: filenames in error messages]


From: Eric Blake
Subject: Re: quotearg improvements [was: filenames in error messages]
Date: Wed, 13 Feb 2008 20:57:51 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Bruno Haible on 2/13/2008 8:13 PM:
| Sorry, but you lost me here. Where did the C trigraphs come into play?

Because the quotearg module _already_ did trigraph quoting (try ls
- --quoting-style=c for an example).  The question is whether the new
c_maybe style (or if we come up with a better name for it), designed for
use in unambiguous error message output, should continue using that
trigraph code or ditch it.  I think the consensus is to ditch it by
default, although it might still be worth leaving the option in the code
to provide it (quotearg, as a module, is useful for more than just error
messages).

|> For C strings, the code already outputs \a, \b, \f, \n, \r, \t, \v, \\,
|> \"; and for all other non-printable characters, a 3-digit \nnn octal
|
| So you want to escape, in an UTF-8 locale, all non-ASCII characters or
bytes?
| So that a Japanese user, for an error in file をつけた時でも, gets to read
|
\343\202\222\343\201\244\343\201\221\343\201\237\346\231\202\343\201\247\343\202\202
?

No.  The existing quotearg code was already locale-dependent, and tries
its hardest to recognize valid multibyte sequences as printable.  It only
prints an octal escape for invalid multibyte sequences and/or nonprintable
characters, according to the current locale's notion of printable.
However, when in the C locale, the notion of what is printable is fuzzy as
you change machines; I am often annoyed that on cygwin, where there is no
locale besides C, isprint('\0xc0') is false, even though it renders in the
terminal as a single-byte printable character (accented A, as if by
iso-8859-1) - to date, I've simply maintained a cygwin-specific patch to
quotearg that treats all characters above 0x80 as printable, even when the
C locale claims otherwise.

|
| This is far, far away from the original goal, and also neglects the
principle
| of minimal surprise. I mean, if the goal is to solve ambiguities, then
please
| add enough escapes to solve ambiguities, but not more than that!

OK - then I think we're settled here - since we are using "" on the
outside of ambiguous strings, we do not need to worry about quoting most
remaining shell special characters.  Space, ?, (), [], {}, |, etc. can all
be output as-is - with no change to the quotearg module.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHs7w+84KuGfSFAYARApuLAJ4p6TkDWc4n0NgZXHaMQSbNWhF8GwCeLgwM
3KDZv7r/5dZ+mBy3m1e7p5I=
=3nCJ
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]