[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: char equivalence classes in search - why not symmetric?

From: Eli Zaretskii
Subject: Re: char equivalence classes in search - why not symmetric?
Date: Tue, 01 Sep 2015 19:16:32 +0300

> Date: Tue, 1 Sep 2015 08:46:26 -0700 (PDT)
> From: Drew Adams <address@hidden>
> When character folding is turned on, shouldn't you be able to
> search for á and find (match) a, à, ã, ª, â, å, and ä?

No.  You should find only á.

> I think so.  Currently you cannot - you can only do the reverse:
> search for a and find any of the above.  a is treated specially.
> Why?

It's the same principle as with case-folding: if you type "FOO", you
will not find the lowercase variant.

> I suppose that the logic behind the current implementation is
> to mirror what we do with case-fold searching.  But is that the
> right thing in this case?

It's what the Unicode Standard recommends, and IMO it makes a lot of
sense.  See http://unicode.org/reports/tr10/#Searching.

> To me, folding a group of chars together for search purposes
> should be symmetric - go both ways.

You will see that the above Unicode report explicitly recommends to
make it _asymmetric_.

> Why not?  Why, when char folding, treat plain a specially for
> searching?  Why not treat á, a, à, ã, ª, â, å, and ä the same?
> Isn't that the point here?  We are telling Isearch that they
> are equivalent.  Why pick one of them as the canonical
> search-pattern to use for finding any of them?  Why privilege
> a over á, a, à, ã, ª, â, å, and ä?

Because we are not "telling Isearch that they are equivalent".  We are
asking for matches that disregard the diacriticals (and in case of ª
also higher-order collation-order variation).

> Now most of the time I, like most people, will by typing a
> instead of á into a search string.  But that's not really the
> point.  I think users should be able to use any members of an
> equivalence class of chars indifferently.

That'd make searching for exactly á unnecessarily complicated and/or
cumbersome, for no good reason.  The symmetry you suggest has no
practical advantages (because you can find all of these characters by
just specifying a), but does have significant practical disadvantages.

> This feature, welcome as it is, seems only half-baked, so far.

No need for derogatory language, thank you.  We certainly have a lot
to learn about this feature, but half-baked it isn't.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]