[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: char equivalence classes in search - why not symmetric?

From: Drew Adams
Subject: RE: char equivalence classes in search - why not symmetric?
Date: Tue, 1 Sep 2015 10:50:22 -0700 (PDT)

> > When character folding is turned on, shouldn't you be able to
> > search for á and find (match) a, à, ã, ª, â, å, and ä?
> No.  You should find only á.

No reason?

> > I think so.  Currently you cannot - you can only do the
> > reverse: search for a and find any of the above.  a is treated 
> > specially.  Why?
> It's the same principle as with case-folding: if you type "FOO",
> you will not find the lowercase variant.

You're just echoing what it does, not supporting the behavior with
reasons.  And I already mentioned what you say here.

> > I suppose that the logic behind the current implementation is
> > to mirror what we do with case-fold searching.  But is that the
> > right thing in this case?
> It's what the Unicode Standard recommends, and IMO it makes a
> lot of sense.  See http://unicode.org/reports/tr10/#Searching.

I don't see that, when reading that section.  I do see that it
explicitly calls out that behavior as an _option_:

  8.2 Asymmetric Search
  Users often find asymmetric searching to be a useful option.

That users can find this optionally useful, I have no doubt.
And I wouldn't be against making it a user option in Emacs.

But I do not see anything in the section you cited that says
that this asymmetric behavior is required, or recommended.

In any case, Emacs is not beholden to any particular standard,
as RMS so often reminds us.  The question is what is useful for
Emacs users.

If you think "it makes a lot of sense" then you should have
no difficulty giving some of that sense.  So far, none; just
appeals to authority.

> > To me, folding a group of chars together for search purposes
> > should be symmetric - go both ways.
> You will see that the above Unicode report explicitly recommends
> to make it _asymmetric_.

No, I do not see that.  I see that the report points out that
such an optional behavior can be useful for some users.

And it specifically points out the case "When doing an
asymmetric search", making clear that there is also the case
when NOT doing an asymmetric search.

Obviously, for the simpler case of a symmetric search there
is no need for a section describing it - it is straightforward,
whereas the asymmetric search case takes some explaining.
Which is precisely what makes it more complex for users.

Nowhere in that report do I see that asymmetric search is the
only, or even the recommended, search behavior.  It is
explicitly pointed out as an optional behavior.

But I read the section quickly, and you are the expert.
Please point to where I am mistaken.

> > Why not?  Why, when char folding, treat plain a specially for
> > searching?  Why not treat á, a, à, ã, ª, â, å, and ä the same?
> > Isn't that the point here?  We are telling Isearch that they
> > are equivalent.  Why pick one of them as the canonical
> > search-pattern to use for finding any of them?  Why privilege
> > a over á, a, à, ã, ª, â, å, and ä?
> Because we are not "telling Isearch that they are equivalent".

I think we should be.  At least that should be one possibility.

> We are asking for matches that disregard the diacriticals
> (and in case of ª also higher-order collation-order variation).

No.  You are asking for that only when you use a search pattern
that does not use the diacriticals.  When you search with á in
the pattern you are NOT asking for matches that disregard the
diacriticals.  And why not?  So far, no reasons given.

I would favor being able not just to toggle between folded
and unfolded search but to cycle among folded-symmetric,
folded-asymmetric, and unfolded.  Why not?

> > Now most of the time I, like most people, will by typing a
> > instead of á into a search string.  But that's not really the
> > point.  I think users should be able to use any members of an
> > equivalence class of chars indifferently.
> That'd make searching for exactly á unnecessarily complicated and/or
> cumbersome, for no good reason.  The symmetry you suggest has no
> practical advantages (because you can find all of these characters by
> just specifying a), but does have significant practical disadvantages.

Assertions with no supporting reasons/examples.

> > This feature, welcome as it is, seems only half-baked, so far.
> No need for derogatory language, thank you.

Where I work, "half-baked" is used often, and it means not
entirely finished, whether that refers to dev, QA, doc, whatever.
It is not used in a derogatory way.  And I made very clear that
I welcome this feature.

If you feel that "half-baked" in the context of software
development is derogatory then I apologize for using the term.
Let me say it this way: This feature, welcome as it is, seems
not entirely finished.  Whether now or later, I would like to
see it go further.

> We certainly have a lot to learn about this feature,

And to document.  And hopefully to further develop in the future.

> but half-baked it isn't.

Certainly the doc is half-baked, if baked at all.  And in
terms of the longer term goal of facilitating users modifying
the classes of chars that are treated equivalently, and of
defining their own sets of such classes, we are not there yet.

Saying this does not take away from the progress made so far.
This is a very welcome feature.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]