[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnulib] quote characters in stds

From: Karl Berry
Subject: Re: [bug-gnulib] quote characters in stds
Date: Tue, 7 Jun 2005 14:31:32 -0400

    This is misleading.

I know, but I'm not sure what to say.  Just delete the sentence about
Latin1, maybe?  I guess it's not really necessary.

    To represent them, you need Unicode, i.e. the UTF-8 encoding.

Yes, but rms has explicitly rejected (in previous email with me) the
idea of recommending the use of UTF-8 in any context whatsoever.  Sigh.

    This is not true for several years now. 

Well, whether or not it is true, rms will not accept it, so there's no
sense arguing it here.

My personal experience is that it is true that Unicode is still
considerably less widely usable than Latin1.  Sure, Unicode is available
in many contexts and systems.  But the names in your message, just for
example, came through as garbage to me.  No doubt I personally could
eventually configure everything involved to display it properly, but the
point is that it doesn't "just work".  And I suspect I am using far
newer versions of everything than an "average" user.

    PS: The right spelling of the encodings is "Latin1" (no dash, no space)

I'm glad to know that, it's easier to type than @tie{} :).  I had mostly
seen it with a space.  Do you happen to know where the definitive
spelling is given?  I've poked around the ISO site without success.

Another draft below.  I'm not quite sure why ` would ever be
"unacceptable", and I'm a bit skeptical that it will past muster with
rms, but I'm trying to avoid an argument with standards-mavens.  And gcc
4 already does '...'.  Any improved wording and/or backup facts welcome :).


@node Quote characters
@section Quote characters
@cindex quote characters

In the C locale, GNU programs should stick to plain ASCII for
quotation characters in messages to users: preferably 0x60 (`) for
left quotes and 0x27 (') for right quotes.  If using ` is unacceptable
in your application, other possibilities are using ' for both opening
and closing, or " (0x22) for both opening and closing.  It is ok, but
not required, to use locale-specific quotes in other locales.

The @uref{http://www.gnu.org/software/gnulib/, Gnulib} @code{quote}
and @code{quotearg} modules provide a reasonably straightforward way
support locale-specific quote characters, as well as taking care of
other issues, such as quoting a filename that itself contains a quote
character.  See the Gnulib documentation for usage details.

In any case, the documentation for your program should clearly specify
how it does quoting, if different than the preferred method of ` and
'.  This is especially important if the output of your program is ever
likely to be parsed by another program.

ASCII should also be preferred in source code comments, text
documents, and other contexts, unless there is good reason to do
something else because of the domain at hand.

If you need to use non-ASCII characters, for example to represent
names of contributors, you should normally stick with one encoding, as
one cannot in general mix encodings reliably.  

Quotation characters are a difficult area in the computing world at this
time: there are no true left or right quote characters in ASCII, or even
Latin1 (the ` character we use is standardized as a grave accent).
Latin1 does have paired standalone accents, but it seems wrong in
principle to abuse them as quotes.  And even Latin1 is not universally

Unicode contains the unambiguous quote characters required, and its
common encoding UTF-8 is upward compatible with address@hidden  But Unicode
and UTF-8 are deployed even less widely than Latin1; it would be
premature to require Unicode support for running essentially every GNU

Perhaps the prevailing situation will change in a few years, and then
we will revisit this.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]