[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Making re-search-forward search for \377

From: Eli Zaretskii
Subject: Re: Making re-search-forward search for \377
Date: Mon, 03 Nov 2008 21:42:14 +0200

> From: Tyler Spivey <address@hidden>
> Date: Sun, 02 Nov 2008 20:54:52 -0800
> What I'm trying to do is split text up for use in a mud
> client, based on the following re:
> "\\(\377[\371\357]\\)\\|\\(\n\\)"
> the encoding of the process is raw-text-unix.
> manually running M-: (re-search-forward "\\(\377[\371\357]\\)") fails,
> but
> running M-: (re-search-forward "\377\371") works fine. However, I want
> it to match
> the longer re stated above, but running re-search on that just matches
> the newlines.
> This is mostly text, with telnet control characters thrown in

If it's text, Emacs is unlikely to treat what was \377 etc. in the
file as just 8-bit byte whose integer value is \377.  Depending on
your locale, Emacs will interpret such bytes as encoded characters and
convert them to its internal representation, which is exposed to you
as a large integer.  (This conversion is called ``decoding''.)

To see what Emacs thinks about those characters, go to one of them and
type "C-u C-x =".

If I'm right, searching for literal \377\371 is unlikely to succeed,
since there's no such character in the buffer after decoding.
Instead, you should search for the codepoints in the internal
representation, as shown to you by "C-u C-x =".  To insert such
characters, the easiest way is to use an ``input method''.  You set an
input method by typing "C-u C-\" and then the name of the input method
you want.  Typing "C-u C-\ TAB" will show the list of available input
methods, and "C-h C-\ METHOD" will describe the named input method.

> In reading section of the manual, we get this:
>    You can represent a unibyte non-ASCII character with its character
> code, which must be in the range from 128 (0200 octal) to 255 (0377
> octal).  If you write all such character codes in octal and the string
> contains no other characters forcing it to be multibyte, this produces
> a unibyte string.  However, using any hex escape in a string (even for
> an ASCII character) forces the string to be multibyte.
> I've left enable-multibyte-characters alone, but even searching for
> "[\377]\371" fails, while "\377\371" succeeds.

I don't recommend to use unibyte facilities, they are tricky and

reply via email to

[Prev in Thread] Current Thread [Next in Thread]