[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Sieve, problem backing up mail
From: |
Sergey Poznyakoff |
Subject: |
Re: Sieve, problem backing up mail |
Date: |
Sat, 07 Jun 2025 20:37:51 +0200 |
User-agent: |
MH (GNU Mailutils 3.17.90) |
Hi Steve,
Thanks for the additional info.
It turned out the issue is due to different ways of encoding message
UIDs implemented by mailutils and mbsync. Mbsync uses message UIDs
to track which messages have been already downloaded and which are
new. The problem is described in mbsync manpage, section "Maildir
Stores":
As mbsync needs UIDs, but no standardized UID storage scheme exists for
Maildir, mbsync supports two schemes, each with its pros and cons.
The native scheme is stolen from the latest Maildir patches to c-client
and is therefore compatible with pine. The UID validity is stored in
a file named .uidvalidity; the UIDs are encoded in the file names of the
messages.
(sorry for a lengthy quote). Now, the way mbsync encodes UIDs is by
storing a ",U=<UID>" attribute in the file name. This is very similar
to the technique used by mailutils, except that it uses lowercase "u":
",u=<UID>". As a consequence, message UIDs as perceived by mbsync and
mailutils differ. The story is further complicated by the fact that
mailutils implements the so-called "modern delivery identifiers", while
mbsync uses "old-fashioned" ones (both terms are from the original
description of the maildir format by D. J. Bernstein [1]).
To cope with this, I have pushed a change that enables mailutils to
detect mbsync-style mailboxes and to modify them in a way understandable
to mbsync. Now mailutils automatically determines the "compatibility
flavor" of the mailbox it is opening. Once determined, it proceeds
operating with the mailbox in a way compatible with the software that
created it. The change is available at [1]. Notice, that the main
GNU git server is now under a constant load from ai bots, so its response
times leave much to be desired. You'd be better off using its mirror
instead [2].
That fixes the main problem reported in your original letter. There
remains, however, a problem with the general approach. Let me
illustrate it.
Running your recipe with modified mailutils (modulo file path and
account changes) I get:
# First sync:
$ mbsync --config mbsyncrc myaccount-inbox
Processed 1 box(es) in 1 channel(s),
pulled 100 new message(s) and 0 flag update(s),
expunged 0 message(s) from near side,
pushed 0 new message(s) and 0 flag update(s),
expunged 0 message(s) from far side.
# Sieving mailbox:
$ sieve -v --mbox-url=./mail/INBOX ./filter.siv
sieve: ./filter.siv:7.3-37: FILEINTO on msg uid 1: delivering into
maildir:///.../account/backup-allmail
sieve: ./filter.siv:8.3-6: KEEP on msg uid 1
...
sieve: ./filter.siv:7.3-37: FILEINTO on msg uid 100: delivering into
maildir://.../account/backup-allmail
sieve: ./filter.siv:8.3-6: KEEP on msg uid 100
So far so good. Now, after sending a new message to my remote account:
# Second sync:
$ mbsync --config mbsyncrc myaccount-inbox
Processed 1 box(es) in 1 channel(s),
pulled 1 new message(s) and 1 flag update(s),
expunged 1 message(s) from near side,
pushed 0 new message(s) and 0 flag update(s),
expunged 0 message(s) from far side.
mbsync downloads only the new message, which is what we aimed for.
Then, calling sieve second time, I got:
$ sieve -v --mbox-url=./mail/INBOX ./filter.siv
sieve: ./filter.siv:7.3-37: FILEINTO on msg uid 1: delivering into
maildir:///.../account/backup-allmail
sieve: ./filter.siv:8.3-6: KEEP on msg uid 1
...
sieve: ./filter.siv:7.3-37: FILEINTO on msg uid 101: delivering into
maildir://.../account/backup-allmail
sieve: ./filter.siv:8.3-6: KEEP on msg uid 101
And here's the rub: since all processed messages are kept in the
original mailbox, sieve happily processes them again and appends all of
them to backup-allmail second time. In contrast to mbsync, sieve is not
a mail synchronization tool, and it does not use uids to track which
mails have been processed and which haven't. The solution would be
either to change the backup approach, or to implement an extension to sieve
that would make it uid-aware. The latter seems to be promising. I'll
investigate it when I have enough free time. Feel free to ping me, if
it takes too long.
Regards,
Sergey
[1] http://cr.yp.to/proto/maildir.html
[2]
https://cgit.git.savannah.gnu.org/cgit/mailutils.git/commit/?id=be30644d0d1e1a2098413e6991e134447bab5703
[3]
https://git.gnu.org.ua/mailutils.git/commit/?id=be30644d0d1e1a2098413e6991e134447bab5703