[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [VM] searching in mime encoded email

From: John Hein
Subject: Re: [VM] searching in mime encoded email
Date: Fri, 20 Jan 2012 19:28:07 -0700

Julian Bradfield wrote at 20:23 +0000 on Jan 20, 2012:
 > But if you ever get any non-English mail, you may well already have
 > "binary data" in your text.

Indeed.  Some, not much, often with mismarked encoding.

 > I don't see how your worry is any difference from saying "if you do
 > cat /bin/ls
 > in an xterm, weird stuff will happen". Of course it will; so what?
 > If it's actually utf-8, you'll probably see it as intended, as every
 > modern distribution is set up to use utf-8 by default.

Most of the non-english email with "binary" payload I get is base64
encoded or quoted-printable or q-encoding in headers.  Does that mean
there aren't mailers out there sending raw binary or converting from
an encoding to binary before delivering it in the user's inbox?  No,
just none that I've seen (or noticed at least) yet.

Do I see 8-bit data despite the sender lying with
'Content-Transfer-Encoding: 7bit'?  Most definitely - typically things
from misbehaving mailers like 0xa0 (non-breaking space) and 0x92
(seems to be a right single quote in windows-1252, but certainly not
iso-8859-1 like the message I'm looking at claims) and the like.
Except for spam / virus payloads, these broken elements are
fairly innocuous, however.

Will it always be the case that base64 is used instead of raw binary?
No - some day raw binary may flow freely over email channels.  And I
agree vm should be prepared for it.  The biggest issue might be
migrating away from the default mbox format for local folders
(separate topic).

Re: so what?  If the binary data is marked with a proper mime type and
encoding, and I always use a tool (e.g., vm) that knows what to do with
that chunk of mime, then I agree it really is a dont-care.  Those
conditions don't always hold true.  But even then (mismarked encoding
like application/pdf marked as text/plain or someone uses cat(1) on a
message in an xterm perhaps), you could still say "so what? that's
operator error or a bug that should be fixed" and sleep well with
that answer.

Will vm currently handle any case of raw binary in the payload of
a message?  As I said earlier, I hope so, but I wouldn't be surprised
if it didn't (mainly due to aforementioned storage format).

But going back to the questions at hand.

  (a) Whether or not to add a feature in vm to support re-encoding
      base64 sections (or any arbitrary mime section) to some other

  (b) Whether to fix M-S and/or V C text to grok base64 or other
      transfer encodings (not to mention complications due to
      character sets)...
    (1) on the fly in-memory
    (2) by way of doing (a)

I think (b)(1) is best if it can be made "not slow" in vm.

If someone adds (a), that'd probably be okay and useful in certain
circumstances, but the user would have to be explicitly aware of any
consequences when he decides to invoke that feature (e.g.,
invalidating of signed messages, possible mail storage issues, etc.).
I don't think vm should automatically do any permanent re-encoding
of messages or message parts for its own needs (e.g., (b)(2)).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]