[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regex and case-fold-search problem

From: Richard Stallman
Subject: Re: regex and case-fold-search problem
Date: Mon, 26 Aug 2002 15:51:41 -0600 (MDT)

    In my opinion, specifying ranges by chars are nonsense
    because there should be no semantics in the order of
    characters codes.

The fact is, people know the character codes and take advantage of
their knowledge.  I don't think this is unreasonable.  But that
question is academic, since the feature is used and we need to make it

    Does that happen because under case-fold-search non-nil the
    characters on the range specification are downcased?

It looks that way.

      Maybe we can simply use the smallest contiguous
    > range of chars that includes all the chars we should match,

That isn't right.  The range should be equal to the disjunction of all
characters in it; A-_ should be equivalent to []A.....Z[\^_].  With
case folding, that should match A-Z, a-z, and [\]^_.  In other words,
The correct behavior is that all character codes that are equivalent
(when you ignore case) to any character in the originally specified
range should match.

Given the whole case table, you can compute this by looping over the
original (non-case-folded) range and finding, for each character, all
the characters that are equivalent to it.  Then those could be
assembled into the smallest possible number of ranges.

A faster way, in the usual cases, would be to look for the case where
several consecutive characters that have just one case-sibling each,
and the siblings are consecutive too.  Each subrange of this kind can
be turned into two subranges, the original and the case-converted.
Also identify subranges of characters that have no case-siblings; each
subrange of this kind just remains as it is.  Finally, any unusual
characters that are encountered can be replaced with a list of all the

This too requires use of the whole case table.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]