[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Character folding in the pretest
From: |
Werner LEMBERG |
Subject: |
Re: Character folding in the pretest |
Date: |
Fri, 05 Feb 2016 07:01:03 +0100 (CET) |
>> This naturally leads to a possible user option: Having `optical'
>> matches or not, where `optical' means `base character plus
>> diacritic and/or slight modifications', e.g., o → ø → ö etc., etc.
>
> How do you even define "optical similarities"?
Basically the same as Eli has described: Base character plus
diacritics, probably plus some basic shapes with `diacritics' that
Unicode doesn't represent as composable: o → ø, l → ł, d → đ, etc.
> Should l and I compare the same under this definition? They
> certainly looks similar.
No, since the similarity is a font issue only. For this reason I
*never* use Arial-like fonts.
> What about p and q? They look like mirror images of each other.
> What about z and s? They even sound similar.
Nonsense. I've clearly mentioned `base character plus diacritic'.
Why do you intentionally skip that? Doing so reminds me of
Schopenhauer's first stratagem in `The Art of Being Right'...
> To a Swedish speaker there are zero similarities between a, ä and å.
I'm a native German speaker, and there is *zero* similarity in the
sound between `a' and `ä', say. But it is quite common in English
texts, say, to omit the diaeresis dots, thus having a searching mode
that finds both `Hänsel und Gretel' and `Hansel and Gretel' at the
same time would be very valuable.
> My personal preference is that the expected behaviour of searches is
> more related to the locale of the user, rather than that of the
> document being searched. In other words, as a non-Spanish speaker,
> I'd expect to be able to find ñ when searching for n, even if the
> document I'm searching in is in Spanish. There are definitely an
> infinite number of counter-examples to this (enough to keep this
> thread going for another 100 messages, I'm sure), but at least there
> is reason to consider making the default based on the locale of the
> user.
What you describe naturally leads to another user option: Don't handle
characters as `equal' (with a proper definition of `equal') that
aren't `equal' in the user's locale.
Werner
- Re: Character folding in the pretest, (continued)
- Re: Character folding in the pretest, Elias Mårtenson, 2016/02/04
- Re: Character folding in the pretest, Dirk-Jan C. Binnema, 2016/02/04
- RE: Character folding in the pretest, Drew Adams, 2016/02/04
- Re: Character folding in the pretest, Óscar Fuentes, 2016/02/04
- Re: Character folding in the pretest, Clément Pit--Claudel, 2016/02/04
- Re: Character folding in the pretest, Óscar Fuentes, 2016/02/04
- Re: Character folding in the pretest, Werner LEMBERG, 2016/02/04
- Re: Character folding in the pretest, Elias Mårtenson, 2016/02/05
- Re: Character folding in the pretest,
Werner LEMBERG <=
- Re: Character folding in the pretest, Elias Mårtenson, 2016/02/05
- Re: Character folding in the pretest, Werner LEMBERG, 2016/02/05
- Re: Character folding in the pretest, Elias Mårtenson, 2016/02/05
- Re: Character folding in the pretest, Rasmus, 2016/02/06
- Re: Character folding in the pretest, Eli Zaretskii, 2016/02/06
- Re: Character folding in the pretest, Eli Zaretskii, 2016/02/05
- Re: Character folding in the pretest, Filipp Gunbin, 2016/02/05
- Re: Character folding in the pretest, Eli Zaretskii, 2016/02/05
- Re: Character folding in the pretest, Óscar Fuentes, 2016/02/05
- Re: Character folding in the pretest, Eli Zaretskii, 2016/02/05