lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Problem of the week: testing a testing tool


From: Greg Chicares
Subject: Re: [lmi] Problem of the week: testing a testing tool
Date: Fri, 05 Jan 2007 16:33:01 +0000
User-agent: Thunderbird 1.5.0.4 (Windows/20060516)

On 2007-1-5 13:02 UTC, Boutin, Wendy wrote:
> Greg Chicares wrote:
>> make unit_tests build_type=mpatrol unit_test_targets=input_test
[...]
>> 0. What's obviously wrong here on the face of it?
> 
> These three lines:
> 
>   Read    : [1.#IOe+000] 0 iterations took 15031 milliseconds
>   'cns' io: [1.#IOe+000] 0 iterations took 54502 milliseconds
>   'ill' io: [1.#IOe+000] 0 iterations took 14409 milliseconds
> 
> are wrong in several ways:
> - [1.#IOe+000]

In a separate email I explain that this demonstrates a defect in
the C rtl. Still, we're trying to print an infinity, and that's
not good.

> - zero iterations should take zero milliseconds

Not if you have two different definitions of the number zero,
which is actually the case here; still, one of them's wrong.
Does the inline documentation shed any light on this?

> This line:
>   Write   : [7.772e+000] 1 iteration took 7771 milliseconds
> seems to demonstrate a potential rounding discrepancy between
>   7.772e+000
> and
>   7771 milliseconds
> I suspect something's gone awry with the timer.

When two formatted numbers should match but don't, I suspect
either the values or the formatting. IOW, it could be that
we're reading the timer twice, at times one millisecond
apart, or it could be something even simpler than that.

> Given what I can see on the face of it, these results don't correspond
> to with the results I observe above:
>> .... 27 tests succeeded
> 
> I don't actually count twenty seven tests.

How many do you count? Very naively:

  C:/lmi/src/lmi[0]$grep --count TEST input_test.cpp
  27

but that's misleading. Some occurrences are in comments, but
some are in functions that are called more than once. Don't
those two things offset each other exactly, though? If they
don't, then we have a significant problem in the unit-test
framework.

>> no errors detected
> 
> This *is* an error: [1.#IOe+000]

Okay--then read it as
  number of defects present: unknowable
  number of defects automatically detected by the unit test: 0

>>   mpatrol: total warnings:    0
>>   mpatrol: total errors:      0
> 
> And because of my last two comments, I'm not convinced these
> last two lines are actually correct either, but that's pure
> speculation because of the faulty data surrounding it. And,
> as Rick mentioned, I'm having a little trouble with mpatrol,
> so I'm only guessing you added that output locally.

No, I presented a verbatim et literatim screen copy; 'mpatrol'
however can find only memory problems, just as the compiler
reports only the types of problems it can diagnose.

>> 1. Which revisions introduced defects observable above?
> 
> This one looks like a candidate:
>   
> http://cvs.savannah.nongnu.org/viewcvs/lmi/timer.hpp?root=lmi&sortby=date&r2=1.11&r1=1.10

You're comparing two revisions that are both later than my
original email, though. Maybe I should have asked "Which
lines of code engender these problems?" first. Then the point
is to look for weak discipline that needs to be shored up, so
that we can grep for (and fix) similar mistakes made in the
past and avoid making similar mistakes in the future.

> but I can reproduce the problem in earlier revisions if I
> make this change:
> 
>  -  std::string TimeAnAliquot(F f, double max_seconds = 1.0)
>  +  std::string TimeAnAliquot(F f, double max_seconds = 0.001)

Reproducing a problem is good. Can you reproduce it repeatably
by changing the test, rather than changing what's tested?

>> 2. How could those defects have been detected automatically?
> 
> Setting up a unit test for each condition that causes the problem
> and running the tests regularly. I think 'make cvs_ready' could've
> assisted with this one because it actually uses mpatrol, which
> illuminated symptoms of the problems. Although I don't believe
> mpatrol actually contributes to the cause. I think mpatrol helped
> track it down. That tool makes programs run so much slower that
> it may have caused it to hit a maximum limit; this area starts
> getting a little shady for me.

Your analysis is correct. Can you reproduce the problem without
using 'mpatrol' at all? When I saw this, I reflexively thought:

  http://extremeprogramming.org/rules/bugs.html
| When a bug is found tests are created to guard against it
| coming back.

so I went to add a test--and found one already in place. It
didn't reproduce these problems, though. Testing code is likely
to have more defects than tested code.

>> 3. How could those defects have been prevented?
> 
> If unit tests existed that covered these particular problems,
> then regression testing the unit tests

Because we endeavor to follow
  http://extremeprogramming.org/rules/unittests.html
unit tests are a subset of regression tests, BTW.

(We also have system tests that are regression tests.)

> and analyzing the results
> before committing the changes should help prevention. Today, I'm
> not certain (because I couldn't test it) if 'cvs_ready' would've
> helped because I aborted my attempt to fix that problem in light
> of getting through this one.
> 
> Although the rounding discrepancies I saw in the timer output
> may have been catchable if the unit tests were regression tested.

Actually, I don't think so. We aren't using any "real-time" OS,
so the behavior of any timer is only approximate--run this test
repeatedly, and results will vary--so we can't really insist
that wait_half_a_second() delays for 500000 microseconds exactly,
or even approximately (a greedy background process could be
grabbing all available machine cycles).

That particular problem could have been prevented by eschewing
one particular bad practice (I won't rob you of the joy of
discovering it). It could have been caught in code review. It
could have been observed earlier, and it could have been made
more easily observable with a better unit test. But it's not
practicable to write a unit test for every number we format.

> Running 'nychthemeral_test' and analyzing its results in the
> duration it represents more regularly could've also helped
> prevent this.
> 
>> 4. How should those defects be removed?
> 
> I'm not exactly sure that I truly identified the problems yet, so
> I haven't entered the solution domain. But more generally speaking,
> unit tests should be developed that cover the "problem" conditions
> and none should fail.

That's a good way to start; the best way I know, in fact.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]