[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Need some help with Rmail/mbox

From: Kenichi Handa
Subject: Re: Need some help with Rmail/mbox
Date: Mon, 22 Sep 2008 13:31:56 +0900
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)

In pre-unicode-merge Emacs (more exactly, before
2008-03-12), the automatic unibyte -> multibyte conversion
sometimes caused a headache for Emacs Lisp developper
because the behaviour differs in each lang. env.  But, with
the current Emacs, that conversion works more
developper-friendly; i.e. all bytes with MSB set are
converted to the corresponding eight-bit characters of
multibyte represenation (* see the attached note).

So, now we have these four ways to get a multibute buffer
decoded from a unibyte buffer, and they all should work
equally safely.

(1) Do decode-coding-region while specifying a multibyte
buffer as TARGET.

(2) Insert the contents of unibyte buffer into a multibyte
buffer, and then perform decode-coding-region in that
multibyte buffer.

(3) Get a unibyte string form a unibyte buffer, and then
decode it while specifying a multibyte buffer as TARGET.

(4) Deocde a unibyte buffer into a mulitbyte string, and
then insert it into a multibyte buffer.

(Please note that using decode-coding-region directly in a
unibyte-buffer is not reliable because if a coding system
has post-read-converion function, that funcion (usually)
works only in a mutlibyte buffer.)

The efficiency is (1) > (2) > (3) > (4).

And, for the case of Rmail/mbox, before decoding, we may
have to perform base64 or qp decoding, and they can't
specify the different buffer/string as target.  And I don't
know if they works for a multibyte buffer/string.

So, at the moment, I think the following strategy is good.

Copy the contents of RMAIL buffer to a temporary unibyte
buffer, perform base64/qp decoding in that buffer, then do
decode-coding-region while specifying the view buffer as

Kenichi Handa

* Note: Those eight-bit characters have values
#x3FFF80..#x3FFFFF, and, for instance, char-after and aref
return one of those values.  To get the original byte value,
one needs (encode-char EIGHT-BIT-CHAR 'eight-bit) or
(multibyte-char-to-unibyte EIGHT-BIT-CHAR).  Perhaps, we
have to provide some APIs for directly getting a byte value
of EIGHT-BIT-CHAR, but we have not yet decided what to do.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]