base64-decode-region inserts carriage-returns

From: Eric Hanchrow
Subject: base64-decode-region inserts carriage-returns
Date: 08 Jun 2002 13:42:42 -0700

Using Bash, create a binary file containing eight bytes in two lines:

        bash$ echo -n $'\001\002\003\n\001\002\003\n' > /tmp/bin

Double-check that the file contains what we think it does:

        bash$ od -c /tmp/bin

        you'll see 0000000 001 002 003  \n 001 002 003  \n

Start Emacs with -q --no-site-file.

Visit that file in Emacs:

        M-x find-file-literally RET /tmp/bin RET

Base64-encode it:

        C-x h M-x base64-encode-region RET

Put a carriage-return-linefeed pair at the end of the single line:

        M-> C-q C-m RET

Save the encoded version:

        C-x C-w bin.b64 RET

Revisit the file, thus setting the buffer to use the MS-DOS line
ending convention:

        C-x C-v RET

Base64-decode the file:

        C-x h M-x base64-decode-region

Save the decoded version to a different file for comparison with the

        C-x C-w bin.again RET

Now examine the newly-saved version with od back at the shell:

        od -c /tmp/bin.again 

        you'll now see 0000000 001 002 003  \r  \n 001 002 003  \r  \n

Thus the binary file has had some carriage-returns inserted into it,
which is a Bad Thing, since those carriage-returns were not present in
the encoded data.

RFC 2045 says both

        All line breaks or other characters not found in Table 1 must
        be ignored by decoding software.


        Any characters outside of the base64 alphabet are to be
        ignored in base64-encoded data.

If this is indeed a bug (as opposed to my misunderstanding how
base64-decode-region is supposed to work) then a possible fix would be
to have base64-decode-region, after it's done its work, do
(set-buffer-file-coding-system 'raw-text-unix) or something similar.

