[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
regex and case-fold-search problem
From: |
Kenichi Handa |
Subject: |
regex and case-fold-search problem |
Date: |
Fri, 23 Aug 2002 15:25:42 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
While working on emacs-unicode, I noticed a very difficult
problem which also exists in the current emacs.
(let ((case-fold-search nil))
(string-match "[Þ-ß]" "Þ")) => 0
(let ((case-fold-search nil))
(string-match "[Þß]" "Þ")) => 0
(let ((case-fold-search t))
(string-match "[Þ-ß]" "Þ")) => nil !!!
(let ((case-fold-search t))
(string-match "[Þß]" "Þ")) => 0
When you see the output of M-x list-charset-chars RET
latin-iso8859-1 RET, you'll soon find what's going on.
The relevan character codes are as follows:
Þ (#x8DE)
ß (#x8DF)
(downcase ?Þ) == ?þ (#x8FE)
(downcase ?ß) == ?ß (#x8DF)
This problem is not specific to non-ASCII chars, it's just
rarer to face such a sitution in ASCII chars.
(let ((case-fold-search nil))
(string-match "[A-_]" "A")) => 0
(let ((case-fold-search t))
(string-match "[A-_]" "A")) => nil
(let ((case-fold-search t))
(string-match "[A_]" "A")) => 0
In my opinion, specifying ranges by chars are nonsense
because there should be no semantics in the order of
characters codes. But, anyway, we have to decide what to
do.
(1) Regard the above case as a bug, and fix it completely.
As we don't support a range striding over different
charsets by the current Emacs, I think the fix is
difficult but not that much. But, in emacs-unicode, we
can't have such a restriction, and thus the fix is very
difficult.
(2) Regard the above case as an (unpleasant) feature, and
document it.
(3) Signal an error for such a regex (and of course document
it).
---
Ken'ichi HANDA
address@hidden
- regex and case-fold-search problem,
Kenichi Handa <=
- Re: regex and case-fold-search problem, Eli Zaretskii, 2002/08/23
- Re: regex and case-fold-search problem, Kenichi Handa, 2002/08/23
- Re: regex and case-fold-search problem, Miles Bader, 2002/08/23
- Re: regex and case-fold-search problem, Eli Zaretskii, 2002/08/24
- Re: regex and case-fold-search problem, Andreas Schwab, 2002/08/24
- Re: regex and case-fold-search problem, Miles Bader, 2002/08/25
- Re: regex and case-fold-search problem, Stefan Monnier, 2002/08/26
- Re: regex and case-fold-search problem, Richard Stallman, 2002/08/26
- Re: regex and case-fold-search problem, Eli Zaretskii, 2002/08/24
- Re: regex and case-fold-search problem, Kenichi Handa, 2002/08/25