[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regex and case-fold-search problem
From: |
Richard Stallman |
Subject: |
Re: regex and case-fold-search problem |
Date: |
Fri, 30 Aug 2002 15:19:14 -0400 |
So, I agree with Stephen that his method is good enough.
It is wrong even for ASCII--we definitely must do something better, at
least for ASCII. The only question is, how much more than ASCII?
I think we all know that is the right behaviour, and at
least for ASCII, the latest code works as that. Perhpas, we
should make Emacs work correctly also for Latin-1 chars,
because in emacs-unicode also, they have the same code
order.
What about for Latin-2 characters? Will those regexp ranges
change their meaning in emacs-unicode?
If so, perhaps we only need to make an effort to support ranges really
right for codes 0-256.
> A faster way, in the usual cases, would be to look for the case where
> several consecutive characters that have just one case-sibling each,
> and the siblings are consecutive too. Each subrange of this kind can
> be turned into two subranges, the original and the case-converted.
> Also identify subranges of characters that have no case-siblings; each
> subrange of this kind just remains as it is. Finally, any unusual
> characters that are encountered can be replaced with a list of all the
> case-siblings.
> This too requires use of the whole case table.
Implemnting that for any range of characters consumes our
man-power and makes the running code slower.
It is not a very hard program to write, I think. I'd guess around 30
lines. However, you're right about the slowness for large ranges. If
we only do this for codes 0-256 (or, currently, for ASCII and
Latin-1), then it won't be too slow.
Consider the situation that one writes this regexp
"[\000-\xffff]"
to search only Unicode BMP chars in emacs-unicode.
Do you think that is a reasonable kind of range that we
should try to support? If so, there goes my idea that
we only need to support ranges in 0-256 very well.
On the other hand, if we handle \000-\xffff by doing case conversion
carefully only for ASCII and Latin-1, and treat the rest of the range
in a less smart way, we would get the same results in this case.
Is that a good solution?
- Re: regex and case-fold-search problem, (continued)
- Re: regex and case-fold-search problem, Richard Stallman, 2002/08/26
- Re: regex and case-fold-search problem, Kenichi Handa, 2002/08/29
- Re: regex and case-fold-search problem, Kim F. Storm, 2002/08/29
- Re: regex and case-fold-search problem, Kenichi Handa, 2002/08/29
- Re: regex and case-fold-search problem, Kim F. Storm, 2002/08/29
- Re: regex and case-fold-search problem, Stefan Monnier, 2002/08/29
- Re: regex and case-fold-search problem, Kenichi Handa, 2002/08/29
- Re: regex and case-fold-search problem, Richard Stallman, 2002/08/30
- Re: regex and case-fold-search problem,
Richard Stallman <=
- Re: regex and case-fold-search problem, Stefan Monnier, 2002/08/30
- Re: regex and case-fold-search problem, Eli Zaretskii, 2002/08/31