emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character folding in the pretest


From: Werner LEMBERG
Subject: Re: Character folding in the pretest
Date: Fri, 05 Feb 2016 08:15:53 +0100 (CET)

>> Basically the same as Eli has described: Base character plus
>> diacritics, probably plus some basic shapes with `diacritics' that
>> Unicode doesn't represent as composable: o → ø, l → ł, d → đ, etc.
> 
> Composability is somewhat arbitrary.  The character composition has
> very little to do with "visual similarities".  Just have a look at
> character compositions in Devanagari for example.

Character compositions in Devanagari form ligatures.  This is a
completely different concept.  It is possible that a given character
sequence yields different renderings, depending on the availability of
a ligature in a font.  The same issue is present in Arabic, BTW.  What
we are discussing here is inherently bound to alphabetic scripts, in
particular Latin, Greek, and Cyrillic.  Abugida and Abjad scripts need
a separate solution, as do CJKV scripts.

> Likewise in German, ß is a variation of SS and Ü is a variation of
> UE.  As far as I know, I could write "Müller" as "Mueller".

In German, `Mueller' is an emergency representation if `ü' is not
available; it is highly discouraged otherwise.  But yes, it would be
beneficial if there were an option to make a search for `Mueller'
match `Müller' also (and vice versa).

> However, this is not true for Swedish. I'll say it again (and I
> apologise for repeating myself, this kind of repetition makes me
> sound like the troll that you accused me of being) but in Swedish
> the difference between Å and A are just as great as the difference
> in English between the letters E and O.  [...]

Funnily, in your neighbouring country Denmark `A' and `Å' are much
nearer, cf. `Århus' vs. `Aarhus'.

>> What you describe naturally leads to another user option: Don't
>> handle characters as `equal' (with a proper definition of `equal')
>> that aren't `equal' in the user's locale.
> 
> This is exactly my point.  [...]

:)


    Werner

reply via email to

[Prev in Thread] Current Thread [Next in Thread]