nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

(Not-so) hypothetical question: What to do about NULs?


From: Ken Hornstein
Subject: (Not-so) hypothetical question: What to do about NULs?
Date: Sat, 18 Feb 2023 19:19:14 -0500

I've been idly thinking about this for a while, and while the question
might be simple I think it gets at some larger meta-issues we have never
really agreed on how to resolve it properly.

My question is, simply: What should happen when nmh encounters a NUL
character (U+0000) in email?

The rules
---------

In theory, a NUL is never permitted in an email message.  RFC 5322 (the
latest incarnation of RFC 822) says in §4:

   Finally, certain characters that were formerly allowed in messages
   appear in this section.  The NUL character (ASCII value 0) was once
   allowed, but is no longer for compatibility reasons.

However, in §4.1 a NUL character is added to the BNF for obs-utext and
obs-body, so in THEORY you are supposed to handle that if you handle
obsolete messages.  §4 also says:

      Note: This section identifies syntactic forms that any
      implementation MUST reasonably interpret.  However, there are
      certainly Internet messages that do not conform to even the
      additional syntax given in this section.  The fact that a
      particular form does not appear in any section of this document is
      not justification for computer programs to crash or for malformed
      data to be irretrievably lost by any implementation.  It is up to
      the implementation to deal with messages robustly.

RFC 5322 punts some of the message syntax back to the MIME RFCs.
The "binary" content transfer encoding does allow any octet including
NUL characters.  But RFC 2045 says in §6.2:

   Mail transport for unencoded 8bit data is defined in RFC 1652.  As of
   the initial publication of this document, there are no standardized
   Internet mail transports for which it is legitimate to include
   unencoded binary data in mail bodies.  Thus there are no
   circumstances in which the "binary" Content-Transfer-Encoding is
   actually valid in Internet mail.  However, in the event that binary
   mail transport becomes a reality in Internet mail, or when MIME is
   used in conjunction with any other binary-capable mail transport
   mechanism, binary bodies must be labelled as such using this
   mechanism.

RFC 9051 (IMAP4rev2) says in §4.3.1:

   IMAP4rev2 is compatible with [I18N-HDRS]. As a result, the identified
   charset for header-field values with 8-bit content is UTF-8
   [UTF-8]. IMAP4rev2 implementations MUST accept and MAY transmit
   [UTF-8] text in quoted-strings as long as the string does not contain
   NUL, CR, or LF. This differs from IMAP4rev1 implementations.

   Although a BINARY content transfer encoding is defined, unencoded
   binary strings are not permitted, unless returned in a <literal8>
   in response to a BINARY.PEEK[<section-binary>]<<partial>> or
   BINARY[<section-binary>]<<partial>> FETCH data item. A "binary string"
   is any string with NUL characters. A string with an excessive amount
   of CTL characters MAY also be considered to be binary. Unless returned
   in response to BINARY.PEEK[...]/BINARY[...] FETCH, client and server
   implementations MUST encode binary data into a textual form, such as
   base64, before transmitting the data.

So it's ... a bit wishy-washy, but I think the case for NUL not being
valid is mostly okay.  IMAP, at least, says you can't send a NUL unless
you are getting a BINARY response with the special literal8 response
format (and BINARY is not defined in RFC 3501).

Messages in the real world
--------------------------

While other rules seem to be violated with impunity (see: 16MB single
lines) I am not aware of bare NULs commonly being sent in email messages
today.  Also, I am not aware of "binary" being used as a C-T-E at all.
Now, I could be COMPLETELY wrong about this!  It would be interesting to
hear about use of the binary CTE or other occurances of NUL characters
in the wild.

My impression is that if you are getting binary data, it is universally
encoded with base64; that it something everyone seems to be doing.  And
a NUL character doesn't seem to be valid in non-ASCII character sets
as anything other than a NUL.

How other mail programs deal with NULs
--------------------------------------

I was curious, so I took a look.  I tried to look at "modern" mail programs,
and by that I mean, "Seems to be kept up to date".  Which sadly excludes
Heirloom mailx as it seems to had it's last release in 2005.  I am open
to hearing about what other mail program do.

- fetchmail

Fetchmail uncerimously just smashes any NUL characters it sees, so if
you are retrieving messages using fetchmail you never see any NUL
characters.  From transact.c:

                /*
                 * Smash out any NULs, they could wreak havoc later on.
                 * Some network stacks seem to generate these at random,
                 * especially (according to reports) at the beginning of the
                 * first read.  NULs are illegal in RFC822 format.
                 */

You might get a special header warning you that a message had an
embedded NUL, though.

- alpine

Internally alpine (which uses a lot of c-client) uses SIZEDTEXT to represent
"text", which is defined as:

SIZEDTEXT {
  unsigned char *data;          /* text */
  unsigned long size;           /* size of text in octets */
};

In theory that could handle a NUL just fine.  However, the POP code looks
like this (pop3.c):

    if (pop3_send_num (stream,"RETR",elt->msgno) &&
        (LOCAL->txt = netmsg_slurp (LOCAL->netstream,&elt->rfc822_size,
                                    &LOCAL->hdrsize)))

netmsg_slurp looks like this (netmsg.c):

  while ((s = net_getline (stream)) != NULL) {
    if (*s == '.') {            /* possible end of text? */
      if (s[1]) t = s + 1;      /* pointer to true start of line */
      else {
        fs_give ((void **) &s); /* free the line */
        break;                  /* end of data */
      }
    }
    else t = s;                 /* want the entire line */
    if (f) {                    /* copy it to the file */
      i = strlen (t);           /* size of line */
      if ((fwrite (t,(size_t) 1,(size_t) i,f) == i) &&
          (fwrite ("\015\012",(size_t) 1,(size_t) 2,f) == 2)) {

net_getline() returns a char *, so that suggests to me if there is a NUL
encountered in the message everything AFTER it is going to be missed.
What happens in the IMAP code is a little harder to follow.  There is
a fair amount of the use of char *, but also plenty of places where
SIZEDTEXT is used.  So ... might be okay?  But again, you're never
supposed to get a NUL in IMAP unless you specify binary (who knows if
that's the reality, though).  Internally it seems like char * is the
returned value from pine_mail_fetch_text() which suggests to me that
if there was a NUL in an on-disk message things would get truncated.

- mutt

It looks like if mutt sees a NUL in an IMAP literal response, it will gladly
write it to the cache file (a literal is processed character-by-character).
For POP, mutt calls a function called pop_fetch_data() and that assumes
that the returned data will never contain a NUL.  Internally mutt does
have an idea if the content contains a NUL (the CONTENT structure contains
a 'nulbin' member which contains the number of NUL bytes), but it's not
clear to me what happens when a NUL is encountered.


My takeaway is that common programs assume you're never going to get a
NUL when doing POP; with IMAP, it's a "maybe".


What should nmh do?
-------------------

Given the above, what I see is programs seem to do a range of "silently
snip out the NUL" character to "probably truncate the line the NUL is
on" to "mostly handle it correctly".  I will note that I didn't do a
COMPLETE code analysis on everything so I might have gotten some of
this behavior wrong; corrections welcome!  Given that these programs
have a lot wider coverage than nmh, it suggests to me that explicitly
handling NULs isn't really necessary.  So what should we do when a NUL
is encountered?  I see some possible options:

- Abort with an error, optionally running "rm -rf /" as root.  Let's call
  this the "Ralph Cordery" option :-).  Okay, I am kidding; I only mention
  that because because Ralph has typically been what I would call a
  "strict constructionist" and usually advocates aborting when
  encountering something unexpected or outside of the specification.

  I am personally not a fan of this option in this case, as no other
  program seems to do this and when people encounter this they say, "Hey,
  why does nmh crap out on this message and nobody else does?", they'll
  email nmh-workers about it and then a long message thread gets started
  (see: 16MB single line).  Also, if this were to happen in particular
  circumstances (e.g., rcvstore) you could end up with lost mail.

- Simply edit this offending byte out of our lives; let's call this the
  "Josef Stalin" option.  This is CLOSE to what many programs do, and
  would be the simplest option as a lot of nmh code assumes you can
  put message data in a C string.  I think this would look like
  eating a NUL when receiving message data over the network or off of
  disk.  Right now we kind of truncate the data at whatever C-string
  the data is currently in, which is kinda wrong but probably what other
  utilities do in practice.  Probably someting close to this is what I
  would prefer.

- Completely handle embedded NULs properly.  This is arguably the most
  correct option but would involve a lot of code changes.

I am curious as to what others think is the best option, or if they have
an option I haven't listed here.

--Ken



reply via email to

[Prev in Thread] Current Thread [Next in Thread]