emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Need some help with Rmail/mbox


From: Stephen J. Turnbull
Subject: Need some help with Rmail/mbox
Date: Fri, 19 Sep 2008 12:28:41 +0900

Paul Michael Reilly writes:

 > As near as I can tell the task is to decode the message body in two
 > steps:

But why not just use the existing code to do this?  AIUI, the Babyl
format was designed for one-buffer operation on a pseudo-RFC-822
message, so most functions used to wash and display probably assume
that the message is in the current buffer, which is narrowed so that
the presentation header plus the body form an RFC 2822 message.

All you should need to do for a first cut is to copy the message to a
new buffer, which doesn't need to be narrowed, but might need to have
some Babyl sentinels added.

If I'm missing something, feel free to ignore me, but I don't really
understand what all you think is different about presenting a
free-standing RFC 2822 message as opposed to presenting one that is
part of a Babyl-format buffer.  I don't think they should be that
different.  The main thing is that the Babyl format caches the set of
presentation headers in the Babyl-format file, but mbox won't.  So
you'll need to hide (or remove) the non-presentation headers
one-by-one rather than by just narrowing the buffer.

 > first to decode according to the character encoding (e.g. quoted-
 > printable or base64) and then to decode that result to some coding
 > system.

That's basically it.  You should do the processing on buffers, not
strings, though, and

 >        (decode-coding-string body (detect-coding-string body t))

you want to parse the coding from the *header*, not guess on the body.
If you want you can add guessing and/or user-specified MIME charsets
as a user option, but (a) almost all genuine mail today will contain
an appropriate Content-Type charset parameter, and (b) lack of such
(unless all text is US-ASCII) is an extremely strong indicator of
spam.  Few users will need to be able to read messages that have bogus
charset parameters: this feature is not immediately necessary.

The general algorithm should be something like

Identify message in mbox buffer
Copy message to presentation buffer
Identify header and body, add Babyl sentinels if desired
Parse headers (specifically content type)
Dispatch on content type and subtype:
    Case type is text and subtype is plain
        Identify charset parameter:
            (or charset-from-content-type "us-ascii")
        Map charset to Emacs coding-system
        (decode-coding-region (body-begin) (body-end) coding-system)
        Wash header for presentation, eg:
            Hide non-displayed header
            Decode RFC 2047-encoded headers
        Wash body for presentation, eg:
            Highlight and activate url-like substrings
            Highlight quoted material
Display buffer in window





reply via email to

[Prev in Thread] Current Thread [Next in Thread]