Re: md5sum prints binary junk when given bad input

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: md5sum prints binary junk when given bad input

From:	James Strother
Subject:	Re: md5sum prints binary junk when given bad input
Date:	Sun, 21 Dec 2008 02:48:58 -0800

Thanks for the feedback.  I would be happy to make whatever changes are
necessary, but I think to put first things first we should resolve whether
such
a change would actually be beneficial.

When designing interfaces, I usually try to decide which input to accept or
reject based on whether it shows "clear intent."  Since it is somewhat
difficult
to discern the intent of an ill-formed line, I would probably design a
program
like md5sum to error on the first unrecognisable line.  This would prevent
users
from accidentally running the program on files that did not contain
checksums,
and it would also prevent md5sum from skipping over lines that were intended
to
be checksums but were ill-formed in such a way that they were simply
skipped.
Of course, any lines with unusual formatting but clear intent should be
accepted,
such as lines in any of the various dialects or with varying amounts of
whitespace
between tokens.

All that said, I don't want to break anyone's scripts.  If feeding emails
into md5sum
is a common use case, then the current behaviour should certainly be
maintained.
Unfortunately, I'm just not in any position to determine the frequency with
which this
is currently being used.  It's not something I would have ever thought to
try, and the
silent skipping of ill-formed lines does not seem to be prominently
documented (it
describes the behaviour with respect to valid lines, but does not
specifically state that
ill-formed lines are silently skipped).  So my guess is that very few if any
people rely
on this behaviour, but I would defer to someone who interacted more with
other
md5sum users.

At any rate, if it is deemed useful I could prepare a proper patch that made
md5sum
a bit more strict.


Thanks for the proposed patch; but before we can even consider applying
> it, it would be better to reformulate it against the latest sources, per
> the suggestions in HACKING:
>
> http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=HACKING;h=6eb0480;hb=4000c35
>
> Since you are proposing to affect user-visible behavior, you would also
> need to patch NEWS and the documentation, plus a testsuite addition would
> be welcome.  At which point your contribution becomes non-trivial, so you
> should consider assigning copyright to the FSF.


No problem.



> >
> >
> > diff -ru coreutils-6.12/src/md5sum.c coreutils-6.12-modified/src/md5sum.c
> > --- coreutils-6.12/src/md5sum.c 2008-05-25 23:40:33.000000000 -0700
> > +++ coreutils-6.12-modified/src/md5sum.c        2008-12-20
> > 18:25:19.000000000 -0800
> > @@ -361,6 +361,21 @@
> >    return *s == '\0';
> >  }
> >
> > +
> > +/* Return true if S is a NUL-terminated string of printable characters.
> > +   Otherwise, return false.  */
> > +static bool
> > +printable_chars (unsigned char const *s)
> > +{
> > +  while (isprint (*s) || isspace (*s))
>
> This is susceptible to locale issues.  Do you really want the error
> reporting to differ according to locale, or should this use the gnulib
> module c-ctype?


I had the same concern.   I only used these functions to be consistent with
the rest of the file, which uses isxdigit() for recognizing hexadecimals.
I assumed there was some reason for using locale-dependent functions,
but if there is not then that might be switched over as well.



> > +    {
> > +      ++s;
> > +    }
>
> For a one-liner statement, we typically omit the {}.
>

Whoops.



> +
> > +  return *s == '\0';
> > +}
> > +
> > +
> >  /* An interface to the function, DIGEST_STREAM.
> >     Operate on FILENAME (it may be "-").
> >
> > @@ -469,6 +484,13 @@
> >        if (line_length <= 0)
> >         break;
> >
> > +      /* Check if the line contains non-printable characters */
> > +      if (! printable_chars (line))
> > +       error (EXIT_FAILURE, 0,
> > +              _("%s: %" PRIuMAX
> > +                ": checksum file contains non-text characters"),
> > +              checkfile_name, line_number);
> > +
> >        /* Ignore comment lines, which begin with a '#' character.  */
> >        if (line[0] == '#')
> >         continue;
> >
> >
> > Of course, if there is a reasonable use case in which non-printable
> > characters
> > do appear, then this would be a bad idea and the current behavior would
> be
> > good enough.
>
> Is identifying non-printable characters the best heuristic for rejecting
> an input file as an invalid md5sum listing?  Or is there a better
> heuristic worth applying?


Probably not.  With more thought, I would favor going entirely one way
or the other.  Either producing an error on the first ill-formed line or
keeping
the current behaviour of silently skipping all ill-formed lines.


Cheers,
   Jim

[Prev in Thread]

Current Thread

[Next in Thread]

md5sum prints binary junk when given bad input, James Strother, 2008/12/19
- Re: md5sum prints binary junk when given bad input, Eric Blake, 2008/12/19
  - Message not available
    - Re: md5sum prints binary junk when given bad input, Eric Blake, 2008/12/20
    - Re: md5sum prints binary junk when given bad input, James Strother <=
    - Re: md5sum prints binary junk when given bad input, Jim Meyering, 2008/12/22

Prev by Date: Re: [PATCH] stty doc enhancements
Next by Date: Re: timespec declaration issue on Tru64
Previous by thread: Re: md5sum prints binary junk when given bad input
Next by thread: Re: md5sum prints binary junk when given bad input
Index(es):
- Date
- Thread