[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: md5sum prints binary junk when given bad input

From: Eric Blake
Subject: Re: md5sum prints binary junk when given bad input
Date: Sat, 20 Dec 2008 21:47:37 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20081105 Thunderbird/ Mnenhy/

Hash: SHA1

[please keep replies on the list]

According to James Strother on 12/20/2008 7:41 PM:
> Hello Eric,
> I tried 6.12 and it no longer spills binary characters onto the screen,
> but instead it scans through the entire file looking for valid MD5 lines
> before producing the error than no valid MD5 lines were found.
> This is certainly better, but I expect that someone unfamiliar with md5sum
> who made the same mistake that I did would find this behavior quite
> perplexing.  It would seem that md5sum is checking their file, but then
> it wouldn't produce any output.  (I only put this forth because I work in
> a lab of linux-newbies and I field similar issues every day).
> I couldn't think of a situation in which non-printable text could appear
> in a valid checksum file.  If it is the case that non-printable text is
> always invalid, then it's easy enough to check for it and produce a
> specific error.  Here is a patch that does it:

Thanks for the proposed patch; but before we can even consider applying
it, it would be better to reformulate it against the latest sources, per
the suggestions in HACKING:

Since you are proposing to affect user-visible behavior, you would also
need to patch NEWS and the documentation, plus a testsuite addition would
be welcome.  At which point your contribution becomes non-trivial, so you
should consider assigning copyright to the FSF.

> diff -ru coreutils-6.12/src/md5sum.c coreutils-6.12-modified/src/md5sum.c
> --- coreutils-6.12/src/md5sum.c 2008-05-25 23:40:33.000000000 -0700
> +++ coreutils-6.12-modified/src/md5sum.c        2008-12-20
> 18:25:19.000000000 -0800
> @@ -361,6 +361,21 @@
>    return *s == '\0';
>  }
> +
> +/* Return true if S is a NUL-terminated string of printable characters.
> +   Otherwise, return false.  */
> +static bool
> +printable_chars (unsigned char const *s)
> +{
> +  while (isprint (*s) || isspace (*s))

This is susceptible to locale issues.  Do you really want the error
reporting to differ according to locale, or should this use the gnulib
module c-ctype?

> +    {
> +      ++s;
> +    }

For a one-liner statement, we typically omit the {}.

> +
> +  return *s == '\0';
> +}
> +
> +
>  /* An interface to the function, DIGEST_STREAM.
>     Operate on FILENAME (it may be "-").
> @@ -469,6 +484,13 @@
>        if (line_length <= 0)
>         break;
> +      /* Check if the line contains non-printable characters */
> +      if (! printable_chars (line))
> +       error (EXIT_FAILURE, 0,
> +              _("%s: %" PRIuMAX
> +                ": checksum file contains non-text characters"),
> +              checkfile_name, line_number);
> +
>        /* Ignore comment lines, which begin with a '#' character.  */
>        if (line[0] == '#')
>         continue;
> Of course, if there is a reasonable use case in which non-printable
> characters
> do appear, then this would be a bad idea and the current behavior would be
> good enough.

Is identifying non-printable characters the best heuristic for rejecting
an input file as an invalid md5sum listing?  Or is there a better
heuristic worth applying?

In one respect, the current behavior of silently ignoring invalid lines
makes md5sum more like patch: by ignoring irrelevant lines until it
identifies the data important to the program, it is much easier to run the
program on an entire email than it is to extract just the relevant portion
to feed to the tool.

> Cheers,
>    Jim

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


reply via email to

[Prev in Thread] Current Thread [Next in Thread]