emacs-devel
[Top][All Lists]

RE: How to opt out of curly-quote spamming altogether?

 From: Drew Adams Subject: RE: How to opt out of curly-quote spamming altogether? Date: Mon, 24 Aug 2015 14:44:19 -0700 (PDT)

> > Typically, presentation is a separate layer or process, and the
> > same structure/content can be, by choice, presented in different
> > ways (for different media, among other things).  Inline code is
> > typically presented using a fixed-width font, such as Courier, as
> > opposed to ordinary text, which is typically presented using a
> > proportional font.  Glossary terms might be presented using bold
> > or colored text, perhaps linked to a glossary entry.  And so on.
> >
> > Anyone used to LaTeX or Tex is used to this separation, for
> > example.
>
> That's an interesting statement since plain TeX does not in any
> manner provide semantic commands (it switches to a typewriter font when
> using verbatim but the reason for that is quite banally that normal text
> fonts are not able to print all ASCII characters as they use, say, text
> quotes instead of  and ' characters and some other, more glaring
> substitutions).
>
> Plain TeX does not even have an \em command for emphasizing things.
> You need to decide yourself whether to use italics or boldface or
> underlining or whatever.

Sorry, I should have stuck with LaTeX as the example.  It is LaTeX
that I am (used to be, many moon ago) familiar with (as a user only).

> LaTeX tries to be a bit more semantic, but the sort of differentiation
> that Texinfo requires would require loading quite a number of non-
> core packages.

I didn't mean to suggest that we should use LaTeX (or Tex) - I hadn't
considered that.  I meant it only as an example, which I thought some
people here might be familiar with, of separation of presentation from
structure.

A better example is XML-based doc.  The structure is what it is, and
you can render/present it in any number of ways.  Likewise, HTML.

> > I'm surprised if Texinfo/makeinfo does not provide for it - if an
> > inline code snippet or key binding necessarily ends up with a
> > presentation that is identical to ordinary text quoting (curly
> > quotes, whether single or double).
>
> Texinfo is primarily semantic markup.  Various backends decide how
> to typeset things.

That's what I thought.  And makeinfo is what creates the Info
presentation we use.  But I have only a vague idea of these things.

> In its text mode, plain TeX as well as texinfo.tex convert  and '
> characters into proper English symmetric quote marks (the respective
> default _text_ fonts do not have a straight quote mark or a backquote
> in their corresponding character slots).

OK, but that is not really the point here.  The point is that the
pre-presentation way we represent inline code (or URLs or emphasis
or whatever structural/semantic elements) would/should ideally not be
and ' (or curly quotes, a fortiori).  Why?  For the reason that Paul
and others (I think) and I have given: you can't tell when  and ' are
used as markup (for structure) vs representing just themselves as chars.

In most doc contexts, some kind of nonambiguous element or markup is
used to identify its argument text as of a particular kind (e.g. code).

But in Emacs, it is useful to have the representation of the structure
be something that users can directly search etc.  And it is useful if
it can be readable enough to serve more or less for presentation.  IOW,
any presentation-layer transformation should be kept small/minor, to be
able to take advantage of Emacs's ability to manipulate the plain-text
input (structure/content layer).  IOW, if possible, we don't really
want to be looking at, searching, etc., XML elements or similar.

And in that context, I personally think that ...' is a reasonable
(nearly brilliant) compromise.  We could throw some presentation on
top of it - I'm not against that.  But what we should not do, IMO, is
either (a) just replace it by curly-quote chars or (b) render it, as
presentation, using curly-quote chars.

(a) is because of the PITA of typing etc. such chars, but this is a
minor problem compared with (b), I think.  (b) is important, to me,
because curly quotes are used for ordinary text quoting.  Any char
should be allowed to represent itself.  The problems with  and '
doing that apply also to curly quotes.  And curly quotes also have
a day job of quoting ordinary text, unlike  and '.

I would rather have us use, say, highlighting, or a different font,
to set off such pieces of text (e.g. inline code), than to show
them as if they were ordinary quoted text.

But then, if we rendered inline code, URLs etc. by highlighting
the text or using a different font, how to search for or within or
outside such zones of text?

That could be implemented.  I have code that lets you search
within or outside of zones of text that have particular text
properties, for example, or are delimited using buffer positions
(e.g. markers).  And the same thing could be done for text zones
with a particular font (I have not done that, so far).

I'm not saying that we need to do something like this.  I'm saying
that this is preferable to a rendering/presentation that just
muddies the water by using ordinary text quoting (curly quotes).
We can do better - but not if we give up at the start and adopt
curly quotes as our markup and presentation.

Personally, I think that sticking with ...' is a reasonable
approach: simple, rarely ambiguous, supple wrt inputting the
chars, searching, etc.  But fixed-width font for such things,
and proportional font for ordinary text, would also be a
reasonable presentation.

> The proper representation in Unicode is the use of the English
> ‘quote marks’: those are the proper characters for the glyphs TeX
> and Texinfo use for text fonts in the slots for  and '.
> Consequently, it is quite correct that those are the output for
> the preformatted Info pages.

See above.  It's not about translating  and '.  Imagine that those
are not used in the input to start with, i.e., that we used other
markup to distinguish inline code, URLs etc.

That curly quotes are "proper representations in Unicode" of  and '
is irrelevant.  We should not be asking how to represent  and ',
but how to demark things like inline code fragments in a structural
layer and how to present them in a presentation layer.

Besides which, the "proper representation" of  and ' in Unicode
is  and '.  They are first-class Unicode citizens.