help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: search-forward in emacs23 lisp


From: rasmith
Subject: Re: search-forward in emacs23 lisp
Date: Sun, 28 Mar 2010 19:44:12 -0500 (CDT)

From: Peter Dyballa <Peter_Dyballa@Web.DE>
Subject: Re: search-forward in emacs23 lisp
Date: Sun, 28 Mar 2010 23:45:26 +0200

> 
> Am 27.03.2010 um 21:31 schrieb rasmith:
> 
>> The behavior of the search-forward function in emacs-lisp has changed
>> in emacs23 in a way that breaks some scripts I use, in particular
>> cgreek-tlg.el from Naoto Takahashi's cgreek package.
> 
> 
> Maybe the problem is simply that, that the buffer is in UTF-8. Then is
> makes really no sense to search for that byte because it does not
> exist, like a quark (although baryons and mesons are built from them),
> there only exists the two-byte word \xc3\xbf (standing for ÿ, LATIN
> SMALL LETTER Y WITH DIAERESIS). Clearly, you can't search what does
> not exist – except you're Lancelot.
> 
> Which coding is used in the buffer? Can you switch to a (raw)
> byte-based encoding and test in this state?
> 

No, the buffer's not in utf-8.  The file was read in with
insert-file-contents literally, and (set-buffer raw) 
and (set-buffer-multibyte nil) were executed just before that.
When I run the function containing the problem code, sometimes it just
returns a not found: "\377" and stops, and sometimes it returns an
error message indicating that it's not looking at what it expects (the
actual message is "Unexpected author description introducer" followed
by a pair of bytes in hex).  I can then switch into that buffer, and
in the latter case what I find is that the point is sitting just after
a pair of bytes, specifically \231\277 (this is where 
(search-forward (char-to-string ?\xff)) stopped).  This is well beyond
an earlier occurrence of \377 in the buffer (I won't explain the
rather complicated format of the files in question, but in them \377
is used as a string terminator--and don't ask me to change that, since
the whole purpose of the code is to process files having this
format). While visiting that buffer, it's pretty obvious that it's in
raw mode (all high bytes display in octal, and what-cursor-position
identifies everything you look at as an 8-bit byte, never a utf-8
multibyte character).  

Within that buffer, an isearch for \377 finds a 255 byte
with no problem.  The problem is entirely in the search-forward
function.  I tried inserting (search-forward (unibyte-string ?\377))
in the buffer and executing it from there; when I do that, it skips
right over \377 but stops instead at \231\277 (which as I pointed out
is not the utf-8 version of \377).  This result happens with all the
possible arguments I've come up with for search-forward, such as:
(unibyte-string ?\377) 
(string-to-unibyte (unibyte-string ?\377))
"ÿ"
"\377"
"\xff" (this is even worse: it's translated to two bytes \x00ff)

I've verified that (unibyte-string ?\377) returns exactly what it
should: a string containing just the 8-bit byte \377.  However, when 
search-forward gets that argument, running from a raw buffer with
multibyte turned off, it first turns it into the two-byte string
\231\277 and then matches on that.  If there's a way to keep it from
doing that, I'd like to know.

As I said in a reply to myself, I found a workaround:

      (while (/= (char-after) ?\377)
        (forward-char 1)
        )
      (forward-char 1)

But it would be nice to know exactly what it is that search-forward is
doing here.  My knowledge of emacs-lisp is pretty rudimentary, so if
I'm missing something obvious, please let me know.

Thanks,

Robin Smith

reply via email to

[Prev in Thread] Current Thread [Next in Thread]