[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [nmh-workers] INCing of email archives
From: |
Ralph Corderoy |
Subject: |
Re: [nmh-workers] INCing of email archives |
Date: |
Thu, 25 Jul 2019 09:19:17 +0100 |
Hi Bakul,
> Once in a while I download email archives of some mailing list
> and unpack them using "inc -file <archive-file>". But more
> than once I have seen that inc gets confused and doesn't
> unpack the whole thing. The cause seems to be a line starting
> with From in some message body.
Then it isn't any of the four mbox formats described at
https://en.wikipedia.org/wiki/Mbox#Family ?
> Ideally inc should look that a "From ..." line is immediately followed
> by header lines. And if this is not the case, assume it is in the
> message body.
I agree that would be one heuristic to help, but it would also have
problems:
From the outset, was clear we failed 42
times: the first on attempting to read faulty input...
> fix() {
> grep -n '^From .*[^0-9]$' $1 | sed 's/:.*/s|^|>|/' > ,$1
> if [ -s ,$1 ]; then echo wq >> ,$1; cat ,$1 | ed $1; fi
> rm ,$1
> }
>
> This prepends a > to any line beginning with "From "and not
> ending with a digit.
sed -i '/^From .*[^0-9]$/s/^/> /' "${1?}"
--
Cheers, Ralph.