Re: help needed with coding systems (unrmail problems)

From: Mark Lillibridge
Subject: Re: help needed with coding systems (unrmail problems)
Date: Fri, 14 Jan 2011 10:21:13 -0800

Stefan wrote:
>  I (Mark) wrote:
>  >     Ok, I have a Rmail Babyl file whose contents are correctly encoded
>  > via raw-text-unix (V22) -- for those curious, I believe this can be
>  raw-text-unix is an alias for `binary'.  I.e. it takes bytes in and
>  returns the same bytes unchanged.

    I think you are thinking of raw-text; I believe raw-text-unix does
end of line conversion (only).

>  Decoding using it should never result
>  in any non-ascii chars: only ascii chars and "eight-bit chars"
>  (i.e. bytes between 128-255).

    This is only true if you either read to a multibyte buffer or read
to a unibyte buffer and then never convert it to multibyte (Rmail does
the latter).

>  > I have verified that this character is represented on disk as 81 FC
>  > (hex).  If I visit that file literally (also), I see \201\374, which is
>  > octal for 81 FC as expected.
>  >     When I fire up unrmail on this file, it first reads it in as
>  > "raw-text-unix":
>  I.e. it read it literally.

I think these concepts are also not equivalent for subtle reasons.

>  > It then decodes the main part of the file containing the messages:
>  >       (unless (and coding-system
>  >                    (coding-system-p coding-system))
>  >         (setq coding-system
>  >               ;; Emacs 21.1 and later writes RMAIL files in emacs-mule, but
>  >               ;; earlier versions did that with the current buffer's 
> encoding.
>  >               ;; So we want to favor detection of emacs-mule (whose normal
>  >               ;; priority is quite low), but still allow detection of other
>  >               ;; encodings if emacs-mule won't fit.  The call to
>  >               ;; detect-coding-with-priority below achieves that.
>  >               (car (detect-coding-with-priority
>  >                     from to
>  >                     '((coding-category-emacs-mule . emacs-mule))))))
>  >       (message "decoding file with %s" coding-system)
>  >       (unless (memq coding-system
>  >                     '(undecided undecided-unix))
>  >         (set-buffer-modified-p t)       ; avoid locking when decoding
>  >         (let ((buffer-undo-list t))
>  >           (decode-coding-region from to coding-system))
>  >         (setq coding-system last-coding-system-used))
>  >       (message "actual coding system used: %s" coding-system)
>  > I have verified via the inserted message calls above that it is decoding
>  > using raw-text-unix here.
>  Sounds like you have a problem here: it should be using emacs-mule
>  (since \201\374 is the emacs-mule encoding of ΓΌ).

    See my "earlier" message entitled "Rmail and the raw-text coding
system" for why Rmail is using raw-text-unix instead of emacs-mule; I
just resent it so maybe it will get through to the mailing list this

- Mark

