nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] why does mhfixmsg dislike long text lines?


From: Steven Winikoff
Subject: Re: [Nmh-workers] why does mhfixmsg dislike long text lines?
Date: Mon, 22 Jan 2018 20:09:58 -0500

>Are you saying you received via SMTP a RFC5322 message where there
>was 42027 characters between CR-LF pairs?

I think I might have said that :-/, but whether I did or not you're right
that it isn't what I meant.


>That suggests to me that you in fact received a message that had lines no
>greater than 78 characters between CR-LF pairs, and _after you decoded it_
>it might have had a very long line.

Exactly.

But that's also the situation with the message I received today which
sparked my original question.  That one had only one part, described
with:

   Content-Transfer-Encoding: base64
   Content-Disposition: inline
   Content-Type: text/html;
           charset="UTF-8"
   MIME-Version: 1.0

Before decoding, the body width was 76 characters (some of the headers were
wider, even those were all under 200 characters wide) -- but when I tried to
decode it, this happened:

    mhfixmsg: /tmp/msg, will not decode text/html;  charset="UTF-8" because it 
is binary (line length > 998)

...but (line length > 998) refers to the decoded text, which really is more
than 998 characters wide.  This is what I was originally asking about (or
trying to :-/, and I apologize for not being clear on that point).


>THAT is completely legal according to the RFCs.  For the most part, it
>doesn't matter what it decodes to; what nmh cares about is that the
>message it is reading is valid according to RFC 5322.  THAT is where the
>998 byte line length limit comes into play.  You could send the entirety
>of "War and Peace" in text/plain part all as one line, and as long as it
>was encoded properly that would be fine.

This suggests to me that removing the 998-character limit in mhfixmsg
(only, and nowhere else) is a reasonable thing to do.

The comment in mhfixmsg which I quoted at the beginning of this thread
seems to be saying that sometimes message components described as text/*
are really binary files, and that the 998-character limit is used in
mhfixmsg (only) as a heuristic to identify this situation.


>>But you're quite right that this code isn't easy to understand.  If I were
>>to modify uip/mhfixmsg.c without touching sbr/m_getfld.c, am I risking
>>anything other than generating messages that nmh won't be able to read?
>
>Good question!  Your use cases seem to be ... well, I don't understand
>them.

That's because I keep being unclear, which in turn is because I don't
know enough to be clearer -- though I'm learning a lot just from this
discussion. :-)

My use case is simply that people keep sending me messages which decode
to HTML with horribly long lines, and I'd prefer to save the decoded text
rather than the encoded version[*].

(Digression:  I'd also prefer to reformat the long lines at the same time.
I'm seriously considering piping the decoded HTML through something like
tidy [ http://www.html-tidy.org/ ] before saving it. :-/)

As it happens, I have 

   mhbuild:  -maxunencoded 900

in my .mh_profile, and have had for a while.

This is a coincidence, in that I was unaware of the 998-character limit,
until today, but happily I'm under it anyway. :-)

...so if I were to quote text with wider lines than that the right thing
would happen -- although in practice if I were to quote text with lines that
long, I'd almost certainly run them through fmt first.


>And might I suggest that if you're going to keep asking us questions
>about nmh, you should join the mailing list? :-)

I'd be happy to, as long as it wouldn't be considered as a commitment to
work on the code -- not that I'm opposed to that in principle, but I think
I've already demonstrated I'm not competent to step in and do anything
useful. :-(

The only reason I've been writing to nmh-workers is that I'm unaware
of anywhere else to turn.  Is there a corresponding nmh-users list or
something similar?

     - Steven


[*]
    That's because one of the biggest reasons for using nmh, at least for
    me, is that it's so useful to be able to manipulate saved email with
    standard command-line tools.

    For example, I particularly depend on being able to find specific saved
    messages using grep or mairix[**] -- and if the message body is saved in
    base64 encoding, both of those programs fail completely.


[**]
    http://www.rpcurnow.force9.co.uk/mairix/

-- 
___________________________________________________________________________
Steven Winikoff                |"Garfield is, for my money at least, the
Concordia University           | shining exemplar of that productive
Montreal, QC, Canada           | laziness that gave us flush plumbing,
address@hidden   | clothes washers, dish washers, electric
                               | lights, and automated guitar string
                               | factories."                 - Mike Andrews



reply via email to

[Prev in Thread] Current Thread [Next in Thread]