lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Avoiding unsigned [Was: More Clang fixes]


From: Greg Chicares
Subject: [lmi] Avoiding unsigned [Was: More Clang fixes]
Date: Mon, 11 Jun 2018 23:43:12 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

On 2016-03-27 21:24, Vadim Zeitlin wrote:
[...]
> <digression-on-the-use-of-unsigned-you-should-feel-free-to-skip>
> 
>  I agree with this but I also have to say that I'm not really sure that
> using "unsigned" here was a design flaw. Lakos gives, of course, a lot of
> very good advices, but the one about not using "unsigned" has never seemed
> convincing enough to me. Trusting the summary of his arguments at
> http://groups.google.com/group/comp.lang.c++.moderated/msg/849321a0199501e5
> (taken from a comment in multidimgrid_tools.hpp), they're rather simple to
> refute:
> 
>> 1. Using unsigned does not ensure at compile time that negative numbers
>> will not be passed at runtime ...
[...]
>> 1. ... Sure you get warnings for "strlen(myStr) < myInt", but not for int
>> x = -1; f(x);
>
[...snip five other arguments in which you see little merit (and most ofthem 
seem pretty weak to me, too...]

That clc++m post doesn't make as good a case as can be made. I find
the arguments quoted in 'ssize_lmi.hpp' more persuasive:

/// 42:55 Stroustrup: "Whenever you mix signed and unsigned numbers you
/// get trouble. The rules are just very surprising, and they turn up in
/// code in the strangest places. They correlate very strongly with bugs.

That's it in a nutshell. For example, here's how I'm changing cell deletion
in the census manager (commit 137c611117):

-    unsigned int const newsel = std::min
-        (static_cast<std::size_t>(erasures.front())
-        ,cell_parms().size() - 1
+    int const newsel = std::min
+        (erasures.front()
+        ,lmi::ssize(cell_parms()) - 1

I can't really say whether the removed code was correct. We have 'erasures',
which contains row numbers in list_window_->GetSelections(); can its first
element be negative? (Maybe you know the answer, but my understanding of wx
is far less deep.) If it is -1, then std::min() converts it to 4294967295
= 2^32-1 in order to do the comparison...

/// Carruth: "How frequently have you written arithmetic and wanted ...
/// 2^32 modular behavior? That's really unusual. Why would you select a
/// type which gives you that?"

Then we take a standard container's size(), which I'm sure is nonnegative;
but it might very well be zero, and when we subtract one from it, we know
we get something like 2^32-1 or 2^64-1.

Then we take the lesser of two quantities, one or both of which might have
gone through a wormhole that turns small negatives into gigantic positives.
How can we feel sure that does the right thing?

Isn't it much easier to judge whether the replacement code is correct?
+    int const newsel = std::min
+        (erasures.front()
+        ,lmi::ssize(cell_parms()) - 1
We have erasures.front(), which is presumably either a small positive value
(because loading two billion cells into the census manager is implausible)
or maybe a small negative (wx sometimes uses small negatives like
  #define wxNOT_FOUND       (-1)
and even its most negative symbolic constants like
  wxID_AUTO_LOWEST = -1000000
are much closer to zero than 2^32 is). And we have vector.size()-1, which
is no lower than -1. Finally, when we call std::min(), we get the lower of
these two values--the one that's really lower, not the one whose residue
(mod 2^N) is lower.

In that panel discussion, Stroustrup continues:

/// Now, when people use unsigned numbers, they usually have a reason,
/// and the reason will be something like, well, 'it can't be negative',
/// or 'I need an extra bit'. If you need an extra bit, I am very
/// reluctant to believe you that you really need it, and I don't think
/// that's a good reason.

It seemed like a good reason in the early 1980s. I had arrays like
  double array[50][100];
that were larger than 32768 bytes but smaller than 65536, for which that
extra bit was handy. But I also needed arrays like
  double array[100][100];
so what I really needed was a 32- or 64-bit address space: one extra bit
wasn't worth much, and modulo 2^N arithmetic was too high a price to pay,
because...

/// ... When you think you can't have negative numbers,
/// you will have somebody who initialized your unsigned with minus two,
/// and think they get minus two--and things like that. It is just highly
/// error prone.

I watched the whole "Ask Us Anything" video from "Going Native 2013";
this is the only question that came up three separate times, and nobody
on the panel disagreed:

/// ... I think one of the sad things about the standard library
/// is that the indices are unsigned, whereas array indices are signed,
/// and you're sort of doomed to have confusion and problems with that.
/// There are far too many integer types. There are far too lenient rules
/// for mixing them together, and it's a major bug source--which is why
/// I'm saying: stay as simple as you can, use integers till you really,
/// really need something else."

Of course, that doesn't mean that all experts unanimously agree; that's
why the cited clc++m thread, and others like it, are so long.

>  So, on balance, I only see one tiny argument not to use unsigned. OTOH
> there are two big arguments in favour of using it, of very different
> varieties:
> 
> 1. Theoretic one: it provides more semantic information which is always
>    a good thing. It also reduces cognitive dissonance which happens when
>    something that obviously can't be negative is represented by a signed
>    type. Again, for me this is pretty similar to representing an integer
>    by a double: sure, it can be done, but this seems to be obviously wrong.

Okay, then, how about using something like one of the following?
  using count_t = int;
  using index_t = int;
  using nonneg_int = int;
where, instead of 'int', we could use this (as in 'ssize_lmi.hpp'):
  using ssize_t = std::make_signed_t<std::size_t>;
or Herb Sutter's recommendation:
  https://github.com/isocpp/CppCoreGuidelines/pull/1115
  namespace gsl { using index = ptrdiff_t; }
That would provide the same semantic suggestion as 'unsigned',
while avoiding mixed-mode arithmetic.

There seem to be two schools of thought:

school 1 emphasizes range, so:
  'int' means "signed integer"
  'unsigned" connotes a nonnegative "counting number"

school 2 emphasizes arithmetic, so:
  'int' means ℤ
  'unsigned' means ℤ/(2^N)ℤ

although the C and C++ builtin types don't quite behave according to
either school. This proposal attempts to introduce a "counting number"
typedef that avoids the weird conversion rules for mixed-mode arithmetic.

> 2. Practical one: unsigned type (size_t) is used everywhere in the standard
>    library and if the rest of the code uses signed types, it requires
>    converting, implicitly (dangerous; results in warnings) or explicitly
>    (annoying; can still be dangerous) back and forth between them.

We can attack that problem at the source. As a first step, when we want
to write something like this (see commit 048b2258):
     LMI_ASSERT(v.size() == length); // error
and it fails to compile because it compares int to std::size_t, then
instead of adding a cast to make it compile:
-    LMI_ASSERT(v.size() == static_cast<unsigned int>(length));
we can get the size as a compatible type:
+    LMI_ASSERT(lmi::ssize(v) == length);
(which throws if v.size() is too big to represent as ssize_t).

And instead of changing the type of an index variable:
     for(int j = 0; j < coi.size(); ++j) // error
-    for(unsigned int j = 0; j < coi.size(); ++j)
we can just choose ssize() instead of size():
+    for(int j = 0; j < lmi::ssize(coi); ++j)

That way, we can systematically remove most of the unsigned integers
we declare, and keep others from leaking in from the standard library.
By minimizing our use of unsigned types (except for bit manipulations),
we choke off the potential for mixed-arithmetic surprises.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]