|
From: | Paul Eggert |
Subject: | Re: Unibyte characters, strings and buffers |
Date: | Fri, 28 Mar 2014 12:21:04 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 |
Code that blithly passes bytes in the range 128-255 to char-equal is *already* buggy.There's nothing wrong with those bytes, certainly not when they stand for Latin-1 characters.
Sure, and if they stand for Latin-1 characters the proposed change will do the right thing.
How is it a win, when it actually _adds_ bugs? E.g., under your proposal, (char-equal 192 224) will yield non-nil when case-fold-search is non-nil.
That's not a bug, since À and à are the same character, ignoring case.As I understand it, the scenario you're worried about is that someone is visiting a unibyte buffer and is doing a case-folded search involving non-ASCII bytes and doesn't want these bytes to match their Latin-1 case-folded counterparts. This scenario is not common enough to worry about. Changing the behavior for this rare case is a cost, I suppose, but it's outweighed by the benefit of simplifying case-equal and fixing its semantics to be a bit saner.
Plus, the change is simpler and easier to explain than what we have now, and that is a long-term win.I don't see how it is simpler or easier to explain. It replaces one lopsided interpretation of 128-255 values with another.
It's simpler because it decouples the rules for char-equal from the question of whether the current buffer is multibyte. Separation of concerns is a win.
I suggested a solution: ignore case-fold-search in unibyte buffers.
Sorry, I didn't see that suggestion. It would be better than what we have now for char-equal, but it would have undesirable side effects elsewhere. When I type find-file-literally to visit a buffer in raw-text form, it's more convenient if I can type C-s h t m l (or whatever) and find "HTML". I'd rather not lose that capability.
[Prev in Thread] | Current Thread | [Next in Thread] |