nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (Not-so) hypothetical question: What to do about NULs?


From: Steffen Nurpmeso
Subject: Re: (Not-so) hypothetical question: What to do about NULs?
Date: Sun, 19 Feb 2023 01:48:10 +0100
User-agent: s-nail v14.9.24-411-g8db62d75cb

Ken Hornstein wrote in
 <20230219001921.597AD1E0839@pb-smtp20.pobox.com>:
 ...
 |- mutt
 ...
 |[.]Internally mutt does
 |have an idea if the content contains a NUL (the CONTENT structure contains
 |a 'nulbin' member which contains the number of NUL bytes), but it's not
 |clear to me what happens when a NUL is encountered.

Seems to me this is classifcation of attachment data, which will
end up as octet-stream in that case.

For S-nail we more or less do what Heirloom mailx has done.
For classification purposes we switch to octet-stream.
For display purposes we happily display it after passing it
through some kind of makeprint.

      isuni = ((n_psonce & n_PSO_UNICODE) != 0);
      ...
         if(!iswprint(wc) && wc != '\n' /*&& wc != '\r' && wc != '\b'*/ &&
               wc != '\t'){
            if ((wc & ~S(wchar_t,037)) == 0)
               wc = isuni ? 0x2400 | wc : '?';
            else if(wc == 0177)
               wc = isuni ? 0x2421 : '?';
            else
               wc = isuni ? 0x2426 : '?';
         }else if(isuni){ /* TODO ctext */
            /* Need to filter out L-TO-R and R-TO-R marks TODO ctext */
            if(wc == 0x200E || wc == 0x200F || (wc >= 0x202A && wc <= 0x202E))
               continue;
            /* And some zero-width messes */
            if(wc == 0x00AD || (wc >= 0x200B && wc <= 0x200D))
               continue;
            /* Oh about the ISO C wide character interfaces, baby! */
            if(wc == 0xFEFF)
               continue;
         }

Or, without mb* and wc* sausage,

   {
      int c;
      while(inp < maxp){
         c = *inp++ & 0377;
         if(!su_cs_is_print(c) &&
               c != '\n' && c != '\r' && c != '\b' && c != '\t')
            c = '?';
         *outp++ = c;
      }
      out->l = in->l;
   }

This is even a degression against Heirloom mailx that Jörg
Schilling was very dissatisfied about, as the above only handles
ASCII printable regardless of the locale.  (My plan was to write
a CText library for Unicode handling, and it was quite progressed
with only about two months until decomposition and normalization
were implemented (Christmas 2014), when something very bad
happened.  Maybe i will do it someday.  Or simply do what OpenBSD
does and use perl's fantastic Unicode support to generate some
tables.)

The implementation is total crap.  (longjmp codebase, data leaks,
blocking I/O, all that (it was).)  All of these (mailbox read,
content-transfer decoding, character set conversion, .. display
preparation) should be "filters" with input and output plugged
together, with internal buffers as necessary.  That is the v15
MIME and I/O layer rewrite that is not happening for nine years.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]