[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, et

From: Eli Zaretskii
Subject: bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc.
Date: Wed, 06 May 2015 19:27:44 +0300

> Date: Wed, 06 May 2015 09:09:26 -0400
> From: Richard Stallman <address@hidden>
> CC: address@hidden, address@hidden
>   > >    > > Would admin/unidata/UnicodeData.txt do?
>   > > 
>   > > It doesn't do the job, becuase it doesn't contain the characters
>   > > themselves.
>   > You mean, the glyphs? 
> Yes, exactly.
>                          (It does show the codepoint, so you can easily
>   > display the character via "C-x 8 RET".)
> You mean, one character at a time?
> I want to be able to scan quickly through the buffer looking at
> lots of characters to find the one I want.  If I have to type
> a command for _each character_, just to see it, that is useless
> for the purpose.

Maybe I don't understand the use case you have in mind.  I thought the
use case was that you already know the character's name, at least
approximately, and want to look up its code, to type is faster.

> C-x 8 RET is even worse than that, because it requires
> _copying_ the name of the character.  To actually see the character
> point is on requires
> M-f C-f C-SPC C-s ; C-b M-w C-a C-x 8 RET C-y SPC

"C-x 8 RET" accepts the codepoint in hex, so if you are already
looking at the line that defines the character, all you need is to
type a 4-, sometimes 5-hex-digit number.

And if you want to type the name, "C-x 8 RET" provides completion, so
no need for such a complicated dance for copying the name.

> I could make that a keyboard macro and repeat it many times
> to get all these codes into the buffer.  It would take a long time.
> Furthermore, it would show only one character per line,
> so few characters would appear on the screen at any time.
> To look at them all would require lots of scrolling.

I don't really see how looking for a character with your eyes could be
a convenient feature, except in very corner situations with a small
number of simply-looking characters.  Even for Latin characters, there
are many similar shapes, like Ả and Ă or Ő and Ố, and they are spread
all over the Unicode range.  How would you go about finding your
character, if all you have is some vague idea of its shape (which,
btw, could look quite different with different fonts)?  Sounds like a
very inefficient way to me.

I think we must assume the user has some idea about the character:
either its approximate name, or at least the block or script to which
it belongs.  Then we could display some reasonably manageable subset
of characters.  We could further help by asking about the base
character (the above examples have either A or O as their base
character), because if the user knows that, with some scripts the
number of potential candidates will go down drastically.  But even
when the base character is known, the number of candidates is not
negligible: e.g., there are 46 characters in the Unicode database that
are somehow related to A.

> The buffer shoulod be divided into stanzas, each one labeled with the
> name of its script or portion thereof.

Not sure what you mean by "script" here.  Emacs currently knows about
almost 100 scripts defined by Unicode, so even displaying a couple of
lines for each one will make a large buffer.  Isn't it better to allow
the user to specify one, with completion?

>   > As for showing the glyphs, visiting a file with large number of
>   > characters runs a high risk of being an annoyance due to the
>   > corresponding fonts being unavailable on the system.
> We could set up a way to test whether a code point can be
> displayed, and skip scripts that can't be displayed.

Alas, we don't know which cannot be displayed until we've tried and

>     So if we provide such a command, IMO we should prompt for a block of
>     codepoints, and display only that block.
> It is inconvenient to expect users to know the codepoint values.

Unicode blocks have names, so providing completion for them would do
the job, I think.  The entire Unicode codespace is divided into about
200 blocks, so if the user knows, or can guess the one she needs, that
will probably limit the search for the character to some reasonable

Moreover, some scripts share the same blocks, and vice versa.  So
being able to specify just scripts or just blocks is not enough; we
need both.

I think we need all these methods, possibly more, because you may not
necessarily know or guess easily where to look.  For example, there
are certain characters that appear as mathematical symbols in addition
to their "normal" places, so unless the user already knows in which
block to look, they will find the "base character" method very useful,
and without it could very well miss their character.

> Suppose I want to see Greek letters -- I have no idea what codepoints
> those are, and I should not need to know them in order to specify
> "Greek letters".

You'd only need to know "Greek", and all the Greek blocks will be
displayed.  If you happen to know more, like "Greek Extended", it will
further limit the number of characters to view.  And, of course, there
are complications: you might think it's a Greek character, but it
could really be a math symbol or a Cyrillic character instead.

> The header line for each script could have a [hide] or [show] button
> to select visibility of that script.  Initially they could all be
> hidden, and the user would expose those that she is interested in.

A 100-button buffer is not very convenient, especially when you have
only an approximate idea about the script you are after (e.g., is that
funny shape part of "Miscellaneous Technical" block or "Geometric

reply via email to

[Prev in Thread] Current Thread [Next in Thread]