bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: base64 tool?


From: Simon Josefsson
Subject: Re: base64 tool?
Date: Mon, 27 Jun 2005 23:55:38 +0200
User-agent: Gnus/5.110004 (No Gnus v0.4) Emacs/22.0.50 (gnu/linux)

Paul Eggert <address@hidden> writes:

>> address@hidden address@hidden
>> address@hidden address@hidden
>> address@hidden -w
>> address@hidden --wrap
>> address@hidden wrap data
>> address@hidden column to wrap data after
>> +During encoding, wrap lines after @var{COLS} characters.  This must be
>> +a positive number.
>> +
>> +If this option is not given at all, no line wrapping will be
>> +performed.  If @var{COLS} is omitted, the default is 76.
>
> The POSIX utility syntax guidelines do not allow options with optional
> arguments, and I tend to agree, at least for single-letter options.
> How about if we simply require COLS?

Yup, Eric Blake's suggestion was much better.  Make the default to
always wrap after 76, require COLS, have 0 mean no wrapping.

>> address@hidden -i
>> address@hidden --ignore-garbage
>> address@hidden -i
>> address@hidden --ignore-garbage
>> address@hidden Ignore garbage in base64 stream
>> +During decoding, ignore unrecognized characters (including newline),
>> +to permit distorted data to be decoded.
>
> Is this option needed if the encoder used the -w option?  It's
> not clear from the documentation.

-w is for encoding, -i for decoding.  So they are orthogonal.  I
thought about making -w affect decoding to, so decoding would ignore
(only) newlines after COLS.  Could be fixed later on.

>> +                 size_t * outlen)
>
> * outlen -> *outlen

Fixed, ran indent too.

>> +  const char b64[64] =
>> +    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
>
> Surely this should be static, not auto.

Yup.

> Also, it's a bit tricky that there's a local var "b64" and a global
> one also named "b64".  I'd rename one of them.

The local one is now 'b64str'.

>> +      *out++ = b64[((to_uchar (in[0]) << 4)
>> +                + (--inlen ? to_uchar (in[1]) >> 4 : 0)) & 0x3f];
>
> That "& 0x3f" is hard to follow.  Here's a better and more-consistent
> indenting style:
>
>       *out++ = b64[((to_uchar (in[0]) << 4)
>                     + (--inlen ? to_uchar (in[1]) >> 4 : 0))
>                    & 0x3f];
> +      *out++ =
> +     (inlen
> +      ? b64[((to_uchar (in[1]) << 2)
> +             + (--inlen ? to_uchar (in[2]) >> 6 : 0)) & 0x3f] : '=');
>
> Similarly, the "& 0x3f" and ": '='" are indented confusingly.  How
> about this instead?
>
>       *out++ =
>         (inlen
>          ? b64[((to_uchar (in[1]) << 2)
>                 + (--inlen ? to_uchar (in[2]) >> 6 : 0))
>                & 0x3f]
>          : '=');

Yup.

>> +      if (!--outlen)
>> +    break;
>
> This is in a loop that will test outlen anyway.  Why is
> the if-test needed here?  Can't you simply decrement outlen?

Which of the if statements are you referring to? Outlen is decremented
several times in the loop, and can become zero at any one of the
conditioned decrements, so has to be checked after every decrement to
avoid that the next decrement wrap the counter.

>> +      in += 3;
>
> This assignment might have undefined behavior if IN points at or near
> the end of a buffer.  The C standard says you can't point more than
> one byte past the end of a buffer.

Does "undefined behavior" include crashing?  Or merely that the
variable may contain an undefined value?  If the latter, that
shouldn't be a problem.

In any case, I think it could be guarded as:

if (inlen)
  in += 3;

since inlen would be 0 at that point if we are near the end of the
buffer.

>
>> +The data is encoded as described for the base64 alphabet in RFC 3548.\n\
>
> data is -> data are
> ("Data" is a plural in English.)

Thanks!

>> +      n = fread (inbuf + sum, 1, BLOCKSIZE - sum, in);
>> +
>> +      if (n > BLOCKSIZE - sum)
>> +        error (EXIT_FAILURE, 0, _("read too much"));
>> ...
>> +      if (n > B64BLOCKSIZE - sum)
>> +        error (EXIT_FAILURE, 0, _("read too much"));
>
> That's not possible.  If you want to test for this, you should simply
> abort () if you read too much.  But I wouldn't bother testing for this
> (the rest of coreutils doesn't).

Ok, tests removed.

>> +  /* When wrapping, terminate last line. */
>> +  if (wrap_column && fputs ("\n", out) < 0)
>> +    error (EXIT_FAILURE, errno, _("write error"));
>
> Might this output a newline that is adjacent to a previous newline?
> Is the double-newline intended?

I'm going to rewrite the wrapping logic, so I'll postpone that one.

>> +  if (ferror (in))
>> +    error (EXIT_FAILURE, errno, _("read error"));
>
> It's not correct to use errno here, since it may have been set by
> a previous system call to something other than the errno that
> corresponds to the read error.  (There are two instances
> of this problem.)

I'll check this problem too after the rewrite.

>> +      if (ignore_garbage)
>> +        {
>> +          size_t i;
>> +          for (i = 0; n > 0 && i < n;)
>> +            if (isbase64 (inbuf[sum + i]) || inbuf[sum + i] == '=')
>> +              i++;
>> +            else
>> +              memmove (inbuf + sum + i, inbuf + sum + i + 1, --n - i);
>> +        }
>
> That's an O(N**2) algorithm.  Why not use an O(N) algorithm instead?
> You can do that by copying each valid byte of the buffer as you go.
> You can do this in-place, so you don't need to allocate memory.

Yup.  Maybe I just wanted to penalize people who ignore garbage...

>> +      if (ferror (in))
>> +        error (EXIT_FAILURE, errno, _("read error"));
>
> This is another case where errno might be garbage.
>
>> +  while ((opt = getopt_long (argc, argv, "dqiw", long_options, NULL)) != -1)
>
> Change "dqiw" to "dqiw:" (as mentioned above; w's operand should
> probably be required).

Yup.

>> +      char *out = NULL;
>> +      size_t outlen = 0;
>
> Don't initialize these variables, since the initial values aren't used.

Fixed.

>> +      if (decode)
>> +    {
>> ...
>> +      free (out);
>> +    }
>> +      else
>> +    {
>> ...
>> +      free (out);
>> +    }
>
> The "free (out);" can be hoisted out of the if-then-else.

Done.

>> +  if (fclose (stdin) == EOF)
>> +    error (EXIT_FAILURE, errno, _("standard input"));
>
> This code shouldn't be executed unless you actually read stdin.

I moved into the previous if, but I guess the logic will change if the
tool is modified to read files too.

Thanks!  I'll send the patches separately.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]