Re: [PATCH] md5sum, sha*sum: only escape file names containing newlines

From: Pádraig Brady
Subject: Re: [PATCH] md5sum, sha*sum: only escape file names containing newlines
Date: Fri, 01 Nov 2013 18:53:04 +0000
On 11/01/2013 06:20 PM, Eric Blake wrote:
> On 11/01/2013 11:03 AM, Pádraig Brady wrote:
>>> Escape the output (marking with a leading '\' and backslash-escaping
>>> both '\' and '\n') only when the file name contains a newline.
>>> Before, we would do that for a file name containing either newline or 
>>> backslash.
>>> This probably deserves a NEWS entry, since it is user-visible.
>> I debated that as I thought it could have no impact on anything,
>> but it could actually if one was comparing old and new outputs?
>> newsum=$(md5sum my file set | md5sum)
>> [ "$newsum" = "$(cat ./oldsum)" ] || error
> Not just that, but the new format is not necessarily parseable by older
> md*sum.  Your patch didn't show (but probably should be enhanced) what
> happens for a file named 'a\nb'; pre-patch, it gave '\sum  a\\nb',
> post-patch it gives 'sum  a\nb'


> - but if the older utility assumes that
> the missing leading \ was a mistake and unescapes the file name, it
> results in looking for a file as 3 three-byte name "a<newline>b", which
> is also part of the user-visible change.

Right but that's a big if.
So you're referring to non GNU utils parsing these checksum files,
and non honoring the leading \ escape marker.
That's quite unlikely I would think.

> Breaking output so that older versions can't parse newer output has been
> one of the reasons that I have only threatened to patch \r handling,
> rather than actually doing it, because it's tricky to think about
> old/new interactions and what might break.  Depending on how
> conservative we are trying to be, we may need to add a command line
> option that will let the user forcefully revert to the older-style
> output for intentional interaction with older checksum tools regardless
> of filename.  For 99% of the cases, the output is identical, since files
> with \n or \\ in the name are already rare.  Thinking aloud, it may be
> appropriate to have such a mode option be tri-state (old, new, or warn;
> with default being warn), where the warning mode gives the new output
> but ALSO flags to the user that their output may not be parseable by
> older summing utilities.

Well any change here isn't worth a flag I think.
Even for \r one can always `tr -d '\r'` the DOS files before processing.

The only reason I was avoiding the redundant '\' escaping
was to avoid having to do the unescaping like in cleanup_sum()
here for example
But I suppose even that's not general.

OK I think it's not worth changing the output format now,
given the possibility of non GNU tools parsing incorrectly,
and the edge case where the output is directly compared
to older output.

I'll just do a maint commit to optimize/document at bit.


