[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: case-table functions clobbering extra slots

From: Miles Bader
Subject: Re: case-table functions clobbering extra slots
Date: Wed, 2 Feb 2005 09:33:15 +0900

I suppose one question is: should support for such weird cases be
expected to work only specific language environments (e.g., Turkish),
or generally?

For the purpose of supporting non-reversible mappings, it seems like
the unicode suggestion would work -- a case mapping could have a flag
meaning "non-reversible", and if the up/down-casing code sees such a
flag, save a text property on the result saying what the original
character was.  So the "dotted-uppercase-I to i" mapping could have a
"non-reversible" flag, and the downcasing code would notice this when
changing it to a normal "i", and put an `uppercase' text property on
the result character.  Then if the user subsequently did an upcase,
the upcasing code could notice the `uppercase' property and properly
change the normal "i" to a dotted-uppercase-I.  The same thing would
work in the reverse direction for german eszet (upcasing it would
change to "SS" and get a `lowercase' property containing the eszet,
and presumbly some indication that the two S characters should be

The case of up/downcasing from scratch, where there's no text property
attached, is obviously language specific for characters which have a
one-to-many mapping.  It seems like this could be accomplished using a
language-environment-specific hook that gets called on _words_ (from
the this thread, I get the idea that position within a word is
significant) which are noted to be potentially problematic.  For
efficiency, you probably don't want to call the hook on every word, so
in the up/down-case character tables, there could be a "suspicious"
flag (since it's usually only a few characters and they're language
specific, maybe this should be an alist or something similarly
sparse?).  The code would just do up/downcasing as normal, except that
if a character had the "suspicious" flag set, it would call the hook
on the whole word containing it instead, and skip ahead to the next
word.  In the Turkish case, there'd be a "suspicious" flag for the
normal ascii "i" character.

As for the interaction of these two mechanisms, I suppose a character
should _not_ be considered "suspicious" if it has an appropiate
`uppercase' or `lowercase' property, which would mean the hook would
only get called on new words.

The word-hook is probably unnecessary even for most funny mappings,
e.g., in Turkish I guess "i" always gets translated to
dotted-uppercase-I, so I suppose the "suspicious alist" could offer
language-specific character mappings as well, e.g., if the alist
property contained a string, it would just contain the
language-specific mapping, -- (?i . "dotted-uppercase-I") for Turkish
-- and if `t', would instead mean "suspicious" and result in the
word-hook being called (funny greek characters or whatever).

[I guess it's not possible to do a perfect job, but it seems possible
to at least do a respectable one.]

Do not taunt Happy Fun Ball.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]