grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Make grub_strtoul() "end" pointer have the right const param


From: Peter Jones
Subject: Re: [PATCH] Make grub_strtoul() "end" pointer have the right const params.
Date: Fri, 21 Feb 2020 16:35:59 -0500

On Tue, Feb 18, 2020 at 10:39:07PM -0500, Nicholas Vinson wrote:
> On 2/18/20 19:32, Peter Jones wrote:
> > On Tue, Feb 04, 2020 at 08:04:30PM -0500, Nicholas Vinson wrote:
> > > On 2/4/20 16:02, Peter Jones wrote:
> > > > grub_strtoul() and grub_strtoull() don't make the /pointer/ to "end" be
> > > > const like normal implementations do, and as a result, at many places in
> > > 
> > > grub_strtoul() and grub_strtoull() appear to be patterned after the C
> > > standard functions strtoul() and strtoull().
> > > 
> > > According to C18, the signatures for those functions are:
> > > 
> > >      unsigned long int strtoul(const char * restrict nptr,
> > >                                char ** restrict endptr, int base);
> > > 
> > >      unsigned long long int strtoull(const char * restrict nptr,
> > >                                      char ** restrict endptr, int base);
> > 
> > It's true that the standard does not mandate any const usage on the
> > endptr parameter, and it does have restrict.  I'm happy to add restrict,
> > though I'm going to ignore it for the rest of this email, because the
> > standard literally says it changes no semantics.  Having "const" on both the
> > object pointed to by **endptr and on the outer pointer is still
> > compatible with the standard, though - nothing that's allowed by the
> > standard isn't allowed with those qualifiers.  Semantically, the
> > behavior is described in a way that is compatible with making those
> > const as well: the value written to *endptr must point inside the object
> 
> I'm afraid that's not true.  That interpretation ignores the fact that the
> caller is permitted to use both nptr (provided the first argument is not
> const char *) and endptr to modify the underlying object. For example, the
> standard allows:
> 
>     char *num = "12345_";
>     char *end;
>     unsigned long long l;
> 
>     l = strtoull(num, &end, 10); // returns 12,345. end points to '_'
> 
>     **endptr = '6';
(assuming you mean *end not **endptr)
> 
>     l = strtoull(num, &end, 10); // returns 123,456. end points to '\0'
> 
> With the proposed modifications all such use cases would require explicit
> casts to suppress the warning gcc would generate.

That's a fair point, this usage will get you a warning - but this is
another case where having to cast it to const for the call is a giant
flag saying "I'm doing something really weird" to anyone reading the
code.  This is also something that currently occurs in our moderately
large source tree exactly zero times, though there are five cases (three
in grub-core/lib/syslinux_parse.c , two in grub-core/net/http.c ) that
look vaguely similar to that, without actually changing the character
objects.

> > pointed to by nptr, and neither function is allowed to modify that
> > object. >
> > Adding the const requirement does give us benefits though:
> > - It means when someone is modifying grub_strtoul() or grub_strtoull(),
> >    as inevitably happens every once in a while, making an error which
> >    would violate those requirements will cause an error building that
> >    change, rather than making it in to the tree.  That is, "const char **"
> >    protects us from making an error that writes modifies the object nptr
> >    points at, and "char ** const" protects us from accidentally changing
> >    the local pointer instead of the caller's pointer
> 
> 'const char **' also requires the caller to cast away the const or use
> strange pointer arithmetic to modify the underlying object.

That's true, but it's also the case that since grub_strtoul() is not
strtoul(), neither having it or not having it is "right" or "wrong", but
just slightly different.  Looking at the code we've got, that gives us five 
cases
where we have to do that cast, versus 26 cases where we currently have
to cast pointers to const char * objects by using (char **).  Between
that being a pretty big ratio and the actual advantages we get, I'd much
prefer to have this problem over what we've already got.

> 'char ** const' is probably OK semantically.

Yes, this part constrains behavior that is implicitly required of
upstream strtoul() implementations anyway.

> > - Likewise, if someone isn't thinking through what they're doing and
> >    takes the const out in order to "fix" some perceived problem, that
> >    should alert whomever is reviewing the code.  In both of these
> 
> agreed.
> 
> >    regards, having the const qualifiers helps us trust that our
> >    implementation is correct.
> > - It lets us pass in pointers that we *want* to be const from the
> >    caller's point of view.  It means that if a function calling strtoul()
> >    has a "const char *" argument, it can safely pass that as nptr as well
> >    as passing its address as endptr; without making the object const (i.e
> 
> If you add the 'restrict' keyword, then endptr cannot point to any part of
> the object nptr points to.

Assuming you meant that *endptr can't (endptr shouldn't even without
restrict, it's a pointer to a pointer), this is incorrect for several
reasons.  Firstly, because the type qualifier places restrictions on
*accesses* of objects referred to by the pointer, not on the *values* of
that pointer or any other.  Secondly, because of the type qualifier
syntax for * (see original commit message), with this declaration (and
the matching definition):

  unsigned long strtoul(const char * restrict nptr,
                        char ** restrict endptr, int base);
 
the second argument declares the symbol endptr, which represents a
restrict-qualified pointer to a pointer to a character object.  Accesses
to the character object, i.e. *nptr=... or eventually **endptr=...) are
not restricted via the *second* argument's declaration, only the
accesses to *endptr are.  This qualifier says that nothing strtoul()
does will change the value of *endptr except assigning to *endptr,
either directly or indirectly.  That is, strtoul() can still do the
slightly obnoxious:

  char ** restrict * endptrptr = &endptr;
  **endptrptr = nptr;

but it can't do the rather insane:

  /* assume the caller did:
   *   char *ptrs[2] = { "12345_", NULL };
   *   unsigned long l = strtoul(ptrs[0], &ptrs[1], 10);
   */
  char **nptrptr = &nptr;
  *endptr = NULL;
  nptrptr[1] = nptr;

This code is horrifying, but without restrict on the second argument
(given that there are no other callers of strtoul()) the behavior is
defined and even correct.  In the presence of restrict it will result in
undefined behavior - but it's strtoul()'s behavior that's undefined, not
that of the caller.

There are also several behaviors banned by the restrict parameter on
nptr.

I can go into significantly more detail here, but the point is that
restrict doesn't actually matter to the "const" parts at all, and it
also doesn't really change much practically in terms of strtoul().

> The standard C functions have never allowed this
> usage which is why 'restrict' was added to the declarations in C99.

Again, no, that's not correct, it just wasn't ever a good idea, and in
an API it can lead to undesired or undefined behavior; with restrict
both of those cases become undefined.
 
> It's syntactically valid to do this with the GRUB variants (just like
> it is with C89), but it's not logically valid.
> 
> >    at least "const char ** endptr"), that'll get you a warning about
> >    discarding the type qualifier.  That applies equally to cases that
> >    want to wrap these functions with another function that has similar
> >    semantics; having const qualifiers allows that function to declare
> >    those things const or not.  Not having the qualifiers here means the
> >    calling function also can't, which is where we wind up having a lot of
> >    the ugly code.
> 
> The wrapping functions can have all const parameters and still use the
> strto* functions without casting.  Example below:
> 
>     long foo(const char *nptr, const char ** const endptr) {
>         char *tmp
>         long l;
> 
>         l = strtoul(nptr, &tmp, 10);
>         *endptr = tmp;
>         return l;
>     }
> 
> Are there functions in GRUB that attempt to do this?

Well, of course both grub_strtol() and grub_strtoul() just call into
grub_strtoull().

> > - The same applies to cases where the object is (or should be) genuinely
> >    immutable - for instance, data that's in .rodata, in data provided
> >    by an option ROM, or firmware configuration table.  It is better to be
> >    able to keep the type qualifier throughout the whole stack.
> > - It allows the optimizer to make assumptions about how we're using it
> >    which may allow it to generate more efficient output without having to
> >    analyze the caller.
> > 
> > > Therefore, I recommend the GRUB maintainers keep the existing signatures 
> > > for
> > > grub_strtoul() and grub_strtoull() as they more closely follow the
> > > signatures of the standard functions.
> > 
> > Obviously, I don't agree.
> > 
> > > I also skimmed through the rest of the changes and it seems to me most of
> > > the misuse of these calls could be solved simply by doing some variation 
> > > of:
> > > 
> > >      const char *ptr = ...;
> > >      char *end;
> > > 
> > >      strtoul(ptr, &end, 10);
> > >      ptr = end;
> > 
> > While this is valid in many of the existing cases, it's not as good in
> > several ways.  In addition to the last three points I made above, this
> > also creates an unnecessary alias into the character object, which the
> > compiler then has to track.  In this case, since it is both a qualified
> > type of the same type of object and a character object, it doesn't
> > violate the aliasing rules ("-fstrict-aliasing -Wstrict-aliasing" isn't
> > allowed to complain), but it can cause the compiler to both take longer
> > and generate more, slower code than if that were:
> > 
> >    const char *ptr = ...;
> >    unsigned long val;
> > 
> >    val = strtoul(ptr, &ptr, 10);
> > 
> 
> As I stated above this is not logically valid.

Again I don't agree; the compiler does the expected thing with it, and
we use it usefully already in many places.  It appears to comply with
the standard in every way *except* that it discards the const qualifier,
though the called function does in fact treat the character array and
the local endptr as if they are const.

> > That's true for both the caller and in the implementation of strtoul()
> > itself - though I don't think with current GCC it *will* cause the
> > strtoul() output to be much different, because it's pretty simple, but
> > for a more complex function like sscanf() calling it, the chances are a
> > bit rougher.  Hypothetically, it can also reduce the chances that it
> > winds up getting inlined if we were to use -flto.
> 
> It's also not logically valid to use sscanf() that way either.  The only
> standard C function I can think of where it is logically valid to have
> source and destination buffers overlap is memmove().

None of these examples are of buffers overlapping.  They have one
argument that points to a buffer as an input parameter and one that
points to a pointer, which is an output parameter and is not allowed to
be inside the buffer the other one points to.  They don't write to the
buffer.

-- 
  Peter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]