emacs-pretest-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: case-table functions clobbering extra slots


From: Simon Josefsson
Subject: Re: case-table functions clobbering extra slots
Date: Mon, 31 Jan 2005 14:32:15 +0100
User-agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3.50 (gnu/linux)

Kenichi Handa <address@hidden> writes:

> In article <address@hidden>, Richard Stallman <address@hidden> writes:
>>             Normalization and proper locale handling are a couple, and
>>     case folding depends on them anyway.
>
>> Not being a Unicode expert, I am not sure what normalization refers to
>> in this context.  It's not obvious what would make locale handling
>> proper or improper, and that might perhaps be a matter on which
>> opinions may differ.
>
> I don't know how normalization and locale handling relates,
> but, as far as I know, at least, normalization and
> case-handling relates in this case (from SpecialCasing.txt
> in Unicode):
>
> 0149; 0149; 02BC 004E; 02BC 004E; # LATIN SMALL LETTER N PRECEDED BY 
> APOSTROPHE
>
> As 0149 has no uppercase precomposed character, Unicode
> represents it by the sequence of 02BC (MODIFIER LETTER
> APOSTROPHE) and 004E ('N') (incredibly bad design).
>
> Then, how should we downcase this sequence?  To 0149 or to
> 02BC+'n'.  Unicode defines several normalization forms, some
> of them allow the sequence 02BC+'n', and some doesn't.
> Should or should not Emacs stick to one normalization form?
> I still can't have a time to learn all these related things.

One idea is for Emacs to have coding systems for normalized text.
E.g., utf8-nfc or utf-8nfkc that would make sure content is NFC or
NFKC normalized.  However, I do not think it is a good idea to
normalize all down/up-cased output strings, unless Emacs somehow has
detected that the file should only contain normalized data.

The Unicode specification has some recommendations on implementing the
up/down-casing user interface in text editors, which I found
interesting, see below.

Generally, the Unicode specification leaves a lot to desire when it
comes to up/down-case discussions, but I'm not aware of a better
reference.

,----
| Reversibility
| 
| It is important to note that no casing operations are reversible. For
| example:
| 
| toUpperCase(toLowerCase( John Brown )) -> JOHN BROWN
| toLowerCase(toUpperCase( John Brown )) -> john brown
| 
| There are even single words like vederLa in Italian or the name
| McGowan in English, which are neither upper-, lower-, nor
| titlecase. This format is sometimes called inner-caps, and it is often
| used in programming and in Web names. Once the string McGowan has been
| uppercased, lowercased, or titlecased, the original cannot be
| recovered by applying another uppercase, lowercase, or titlecase
| operation. There are also single characters that do not have
| reversible mappings, such as the Greek sigmas.
| 
| For word processors that use a single command-key sequence to toggle
| the selection through different casings, it is recommended to save the
| original string, and return to it via the sequence of keys. The user
| interface would produce the following results in response to a series
| of command-keys. Notice that the original string is restored every
| fourth time.
| 
| 1. The quick brown
| 
| 2. THE QUICK BROWN
| 
| 3. the quick brown
| 
| 4. The Quick Brown
| 
| 5. The quick brown (repeating from here on)
| 
| Uppercase, titlecase, and lowercase can be represented in a word
| processor by using a character style. Removing the character style
| restores the text to its original state. However, if this approach is
| taken, any spell-checking software needs to be aware of the case style
| so that it can check the spelling against the actual appearance.
`----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]