bug-mailutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-mailutils] New storage?


From: Kurt Hackenberg
Subject: Re: [bug-mailutils] New storage?
Date: Sun, 24 Feb 2019 17:37:06 -0500
User-agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1

On 2/24/19 3:43 PM, Sergey Poznyakoff wrote:

1) The main difference of the proposed format is that messages are
delimited with a single dot. I don't see why it is important. For what
it's worth, from the algorithmic point of view, whether to look for
"\n.\n" or for "\nFrom " makes very little difference, if at all.

Mbox is not a single format; it's many related formats, all partially incompatible and more or less indistinguishable. Software that attempts to read mbox has to know or guess which variant it's reading, or attempt to handle all of them simultaneously. Mailutils apparently tries to handle them all. From a look at the source, it reads mbox with "the notorious VALID() macro", which looks for 20 variations of the mbox From_ line.

There shouldn't be 20 ways to find message boundaries. There should only be one way.

When software reading mbox is wrong about which variant it's reading, or tries to handle them all but misses one, it sometimes guesses wrong about where message boundaries are. The usual result of that is damaging one message and losing another.

Also, most mbox variants damage messages with ">From " escaping.

2) The new format drops the "From " line, but does not propose any way
to keep the envelope information. It looks like this information is
simply lost (just as it is with Maildir or MH).

That information is redundant these days. The envelope sender is in the message header Return-Path:; receive time is in Received:. Those headers probably didn't exist when mbox was invented, but now both have existed for decades.

3) It does not address any of the real problems of the mbox format:
locking, simultaneous access, message deletion, to name a few.

Those are real problems, in some applications, but they are not the only problems. Of course a single, sequential file doesn't do random access or concurrent access. An application that needs those should probably store mail some other way, like maildir, which does both.

My proposal doesn't add function to mbox; it does correctly, and easily, what mbox does with substantial difficulty and sometimes incorrectly.

For an example of the difficulty, I recently wrote some software that reads and writes several of these mail storage formats. Implementing many common variants of mbox took about 30 times as much code as implementing my proposed format. That factor of 30 in programming labor has real-world consequences.

That factor of 30 is not because of duplicate code. The many mbox variants are implemented as common functions that take arguments for the parameters of the variants. There is little or no duplication.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]