[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: ASCII-only startup message?

From: Drew Adams
Subject: RE: ASCII-only startup message?
Date: Sun, 27 Dec 2015 14:47:04 -0800 (PST)

> > Or consider character HYPHEN-MINUS (U+002D), character HYPHEN
> > (U+2010), and character MINUS SIGN (U+2212).
> >
> > You might say that the first of these is analogous to the ASCII
> > apostrophe (U+0027) - it is essentially for compatibility.
> Yes, that is true, but not for compatibility between "apostrophe" and
> "right single quotation mark" as that imagined argument continues in
> your post, but for compatibility between "left single quotation mark"
> and "right single quotation mark" as well as less common characters
> like "prime".

Huh?  The Unicode _name_ of character U+0027 is... "APOSTROPHE".
And the Unicode "old name" of it is "APOSTROPHE-QUOTE".

Claiming that Unicode intends this character only for compatibility
between "left single quotation mark", "right single quotation mark",
and less common characters like "prime", and NOT for compatibility
between "apostrophe" and "right single quotation mark" is, well,
imaginative.  Where do you get that notion?


And then there is this, which echoes the point I made that an
apostrophe _is not_ a closing quotation mark.


(cited here, BTW: 

  Using U+2019 is inconsistent with the rest of the standard

  Earlier in section 6.2, the standard explains the difference
  between punctuation marks and modifier letters:

  Punctuation marks generally break words; modifier letters
  generally are considered part of a word.  Consider any English
  word with an apostrophe, e.g. “don’t”.

  The word “don’t” is a single word. It is not the word “don”
  juxtaposed against the word “t”. The apostrophe is part of the
  word, which, in Unicode-speak, means it’s a modifier letter,
  not a punctuation mark, regardless of what colloquial English
  calls it.

  According to the Unicode character database, U+2019 is a
  punctuation mark (General Category = Pf), while U+02BC is a
  modifier letter (General Category = Lm).  Since English
  apostrophes are part of the words they’re in, they are
  modifier letters, and hence should be represented by U+02BC,
  not U+2019.

And this, which makes a somewhat different argument:
It refers to the previous argument thus:

  Were there no modifier letters at all, Unicode had have to
  introduce an apostrophe character, because an apostrophe is
  not at all the same as a quotation mark and does not work the
  same way neither.  By handling text, not theories, Ted Clancy
  at Mozilla clearly shows us that ambiguating the apostrophe
  with a close-quote brings up counterproductive complications
  that impact severely the productivity of the users.

Reply: https://www.mail-archive.com/address@hidden/msg35851.html

And this URL provides a history of the move from U+02BC to U+0219:

It points out that this move was so odd that it required the
invention of the word "ambiguation" to cover the confusion.
The same article suggests that the Unicode Consortium itself
"is not at ease with the new preference".

  A search in the Mail Archives shows why the apostrophe and the
  single close quote were ambiguated—a process that needs even a
  new word to put on it, as ordinarily everybody works for
  disambiguation. It was for simplification's sake, in word
  processing software.

Simplification for word-processing software!  Aka MS Word and
its notorious misuse of _left_ single quotation mark for things
like "‘Tis the season" (it should be "’Tis"):

  The phenomenon called “the Apostrophe Catastrophe” consists in
  a huge number of instances where text processing software (word
  processor, desktop publishing) inserts an open quote instead of
  a leading apostrophe.

Interestingly, a similar discussion surrounds the use of hyphen:

  But luckily, the miscategorisation of U+2010 hasn't led to any
  pressing practical problems, unlike the misuse of U+2019 for the

This discussion, BTW, is from _2015_, 16 years after the Unicode
decision to switch from using U+02BC to using U+0219 as apostrophe.
Still problematic, it would seem.  Certainly not cut-and-dried.


To be clear, I am NOT arguing that _Emacs_ should use U+02BC
instead of U+0219 as apostrophe.  I argue that Emacs should
(continue to) use U+0027 (ASCII apostrophe) as apostrophe (in its
own doc, *scratch* comments, and so on).  Not because it is a
more genuine apostrophe but because it is much easier for users
(and programs) to work with.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]