[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Problem of the week: testing a testing tool

From: Greg Chicares
Subject: Re: [lmi] Problem of the week: testing a testing tool
Date: Thu, 18 Jan 2007 16:07:38 +0000
User-agent: Thunderbird (Windows/20060516)

On 2006-12-24 12:19 UTC, Greg Chicares wrote:
> make unit_tests build_type=mpatrol unit_test_targets=input_test
> [add '.exe' at the end for msw]
> Running input_test:
>   Speed tests...
>   Overhead: [3.931e+000] 1 iteration took 3930 milliseconds
>   Vector  : [9.064e+000] 1 iteration took 9063 milliseconds
>   Read    : [1.#IOe+000] 0 iterations took 15031 milliseconds
>   Write   : [7.772e+000] 1 iteration took 7771 milliseconds
>   'cns' io: [1.#IOe+000] 0 iterations took 54502 milliseconds
>   'ill' io: [1.#IOe+000] 0 iterations took 14409 milliseconds
> .... 27 tests succeeded
> no errors detected
>   mpatrol: total warnings:    0
>   mpatrol: total errors:      0
> 0. What's obviously wrong here on the face of it?

(A) "1.#IOe+000" stems from an infinite value

(B) "3.931e+000" sec != 3930 msec; others similarly mismatch

(C) "0 iterations" actually signifies an initial calibration,
  which took so long that no further iterations were performed

> 1. Which revisions introduced defects observable above?

The last-digit mismatch arose when one number was printed as with
while the other was cast to an integer and printed as with
and ought to have been noticed when the results were first
presented together; the specific revisions were identified
earlier in this thread.

The other defects seem to have been present since the initial
revisions were checked in. Much of this code is ancient.

> 2. How could those defects have been detected automatically?

The last-digit mismatch (B) wouldn't have been detected by any
reasonable unit test, but the others could have been.

> 3. How could those defects have been prevented?

(A) don't divide by a value that could possibly be zero

(B) several interrelated points:
 - don't use a cast to effect truncation: prefer floor(), e.g.
 - don't truncate when rounding would be preferable
 - library print facilities already round floating-point values
   implicitly; don't waste effort rounding them explicitly first

(C) "0 iterations" is deliberate (it lets the author of a test
  know that it took longer than hoped) but precious and obscure

> 4. How should those defects be removed?

### Stop the bleeding with a spike solution

When I first noticed these problems, I immediately tried several
simple changes, which reproduced the symptoms noted above within
the timer class's own unit test, then prevented them. As these
pages explain, the purpose is to gain knowledge quickly, and it's
okay if these changes later get thrown away. Here they are:

[make issues (A) and (C) manifest]
     std::cout << "  " << aliquot_timer(wait_half_a_second, 0.1) << '\n';
+    // TODO ?? Why is this needed to reproduce the problem?
+    std::cout << "  " << aliquot_timer(wait_half_a_second, 0.01) << '\n';

[address issue (B)]
-    oss << static_cast<int>(1000.0 * elapsed_usec());
+    oss << std::fixed << std::setprecision(0) << 1000.0 * elapsed_usec();

[address issue (A)]
-        << "[" << timer.elapsed_usec() / z << "] "
+        // TODO ?? Convoluted--refactor. Treat as a special case?
+        << "[" << timer.elapsed_usec() / (0 != z ? z : 1) << "] "
[address issue (C)]
+        // TODO ?? Convoluted--refactor. Say it more clearly.
+        << (0 != z ? "" : "th")
+        << " iteration" << (z <= 1 ? "" : "s") << " took "
-        << " iteration" << ((1 == z) ? "" : "s") << " took "

### But the operation isn't over as soon as the bleeding stops

The rest is just a matter of craftsmanship. We've increased
correctness, but there might be similar, unnoticed defects
elsewhere; this is the best time to identify them. And we're
not done...
| until the system is in its simplest, tidiest, neatest form.
The paragraph in which those words occur is worth reading as one
expert's brief summary of agile development. Here's another way
of putting it:
| first, implement the thing in a simple straightforward way;
| second, refactor the code to produce the simplest system
The simplest system--not the simplest patch.

### Fix one problem at a time, and root out its congeners

Issue (B) is comparatively easy: remove the offensive cast, and
let the stream inserter take care of rounding. First, the "spike"
solution for (B) is all right, so let's just adopt it:

Second, remove this gratuitous pessimization, because it cannot
be part of "the simplest system":

and search for similar code that needs to be rewritten--for
example, eliminate this latent defect by removing duplication:
taking care to test it, e.g.:
  /c/lmi/src/build/lmi/msw_generic/gcc/ship[0]$elapsed_time.exe sleep 10
  Elapsed time: 10012 milliseconds
and mark this one for action later:

The defect we've noticed is but one manifestation of a defective
practice. We want to fix not only this particular symptom, but
also any latent problems caused by the same practice. Inspection
of the whole codebase reveals this similar defect:
That's a potential production problem that we've found by taking
care in fixing an "inconsequential" problem observed in unit-test
code that isn't even part of the production system. It's a cheap
way to find a "real" problem.

The truncation-by-typecasting problem appears in 'timer.hpp':

    double const w = std::min(std::log10(d), static_cast<double>(ULONG_MAX));
    unsigned long int const x = static_cast<unsigned long int>(w);
    double const volatile y = std::pow(10.0, static_cast<double>(x));
    return static_cast<unsigned long int>(y);

and well might contribute to problem (A); we'll take note of that
and address it later.

### Make the problem reproducible in the failing unit's own tests

To address issues (A) and (C), the first step is to make them
more readily reproducible. Running 'input_test' with 'mpatrol'
suffices for their reproduction, but that's balky. The purpose of
unit testing is to test a unit in isolation, so the timer's unit
test needs to be made more powerful:

Notice that we already had a test that looks sufficient:

    // Test an operation that has to take longer than the hinted
    // time limit, in order to make sure it executes the operation
    // exactly once.
    std::cout << "  " << TimeAnAliquot(wait_half_a_second, 0.1) << '\n';

yet does not exhibit these symptoms: apparently it executes the
operation once for initial calibration, then once more later on.

Let's commit the spike solutions, marking each as a defect:
They're commented out for now, to prevent them from suppressing
the symptoms before we fix the underlying problems.

### If it's too hard to test, refactor it to make testing easier

Suspecting that the problem lurks in the four-line block quoted
above, we did an initial refactoring (which an inline comment had
suggested anyway):
Then I isolated that block in its own unit-tested function:
executing the work casually to give others an opportunity to
improve it. Let's do it carefully now, reflecting everyone's
suggestions, by moving comments along with the reshuffled code
that they pertain to:
and, most important of all, making the test actually meaningful:
Clearly the original test demonstrated only that the code worked
with numbers in the vicinity of ten, whereas the point of adding
a test was to confirm our suspicion that it would fail for less-
felicitous inputs.

Furthermore, because we seem to be getting an infinity with code
that uses C standard-library functions, it's sensible to try
another compiler that has a different C runtime library:
Indeed, when we run the unit test with that compiler now, its C
rtl issues numerous diagnostics. It also illuminates issue (A) by
saying "INF" instead of the bizarre "1.#IOe+000": that's one
advantage of using more than one toolchain.

### When the code smells, grab some bleach, and follow your nose

Now let us examine the suspect code:

    double const w = std::min(std::log10(d), static_cast<double>(ULONG_MAX));

This is clearly a mistake: ULONG_MAX is almost certainly far
greater than DBL_MAX_EXP, so that the extra code does nothing.
Presumably the intention was to limit 'd', not its logarithm.

    unsigned long int const x = static_cast<unsigned long int>(w);
    double const volatile y = std::pow(10.0, static_cast<double>(x));

This cast to an integral type is wrong. For the last two unit
tests, 'd' is probably about 0.2 and 0.02 respectively, making
its logarithm about -0.7 and -1.7 . According to C++98 4.9/1,
the cast yields 0 in the first test (and 10^0 is one, which
explains the unexpected outcome of that test, noted above). In
the second test (added as a "spike"), it has undefined behavior,
which in practice may result in calculating 10^ULONG_MAX, which
could return an infinity. What was intended is more clearly said
in less code:

    std::pow(10.0, std::floor(std::log10(d)))

and, following Lakos, we would much prefer to avoid 'unsigned' in
the interface anyway; let's take care of those problems:
and note that we've cleared up a symptom:

### Quality through discipline--not busywork

An expert might look at our original four suspicious lines of
code, instantly perceive the intention, and immediately replace
them with a correct pow(floor(log)) implementation (with due
regard to UB, EDOM, and ERANGE). That's perfectly okay. As long
as it's impeccable, it's not necessary to make it a separate
function for testing, or to write specific tests for it at all:
those are just means to an end. What's not okay is writing
defective code without making an effort to validate it--what we
lack in brilliance, we must make up for through discipline.

### Epilogue

The original spike "solution" couldn't be adopted as such,
because without our other work it boiled down to:

    // Calculate z for the sole purpose of using it as a divisor.
    // Oops, that calculation sometimes returns zero. Instead of
    // rooting out the problem, take another path when this known
    // defect arises. Hope there are no other defects. Hope there
    // is no undefined behavior that could show up later, say,
    // when we upgrade our compiler. Hope nobody copies this code
    // elsewhere. Don't write a comment explaining this bandaid.
    // Don't try to understand the real problem. Don't look for
    // similar problems elsewhere.
    timer.elapsed_usec() / (0 != z ? z : 1)

But now it's much less offensive. We take an alternative path if
certain documented behavior occurs. We've rooted out the problem;
removed undefined behavior; removed other defects in the same
vicinity; removed or at least marked similar defects elsewhere;
and tested and documented our work.

In production code, problems like this can cost a lot of money.
In fact, an invalid type conversion like the one we just fixed
did cost a great deal of money in this situation:

| ...after a decade of development costing $7 billion....
| The destroyed rocket and its cargo were valued at $500 million.
| ...It turned out that the cause of the failure was a software
| error...conversion from 64-bit floating point to 16-bit signed
| integer value. The floating point number which was converted
| had a value greater than what could be represented by a 16-bit
| signed integer.

Thank goodness this was just a propaedeutic exercise. Its purpose
being fulfilled, I'm going to ratify that spike solution and move
on to other business that's more urgent--leaving issues we've
identified but not yet fixed for later resolution.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]