[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Testing approach (was Re: [Gnu-arch-users] more on the merge-fest)

From: Misha Dorman
Subject: Re: Testing approach (was Re: [Gnu-arch-users] more on the merge-fest)
Date: Fri, 28 Nov 2003 16:51:16 +0000

Samuel wrote:
Consequently, if writing a program that
evaluates [y=mx+b], there is NO reason to test it.  We know it
already works.

Do you _really_ mean we don't need to test it at all?
Or do you mean that we verify by inspection instead of dynamic testing?
Or do you mean that we only need two test cases (confirming two points on the line is enough to prove the line is correct, if we know it is a line).

Of course there is always the possibility that it was implemented non-linearly (e.g. coded as y = m z + b, where z = x*x) or even non-continuously (e.g. via a table lookup, with the table being incorrectly initialised or the lookup code having a bug, or [more likely] as a result of range overflow in * or +).

Obviously we should spend our testing/verification budget as wisely as we can. What that means will depend on the size of the budget, which will depend on the quality requirements of the system. For some "government work" (e.g. defence, aerospace/ATC, railway signalling) the verification budget is huge and we will use both inspection and multiple test cases (border, outlier, normal, error) to ensure the software is as bug-free as possible. It doesn't always work, even so (see Ariane 5, for example).

Human inspection is notoriously poor at picking up flawed assumptions (we just *know* that y = mx + b is linear, so don't consider the non-linear/non-continuous cases). In most cases this is a good tradeoff, but sometimes it isn't safe enough. Thats one reason for using coverage analysis metrics, to help prompt the tester to cover cases which make a difference to the code, but were missed because the tester didn't realise that it might matter[1].

Samuel wrote:
When one writes unit tests, you must recognize that there are reasonable
limitations to what you can do. ... you write tests only for what can break.

Yes -- or rather, we test what WE THINK can break. Ideally, you should write tests only for those areas where the cost of writing the test is less than the amortised cost of leaving the bug in. Obviously, we have to estimate/guess/use heuristics to determine the costs and thus decide which cases these are[2].

Samuel wrote:
Given a full ADT, .. if it deletes some arbitrary middle element correctly, then we can *safely* assume that it works for *all* middle elements. If you can't, then you
might want to consider re-implementing the code so that you can.

At high level, that is true, but when you get to the detail its more complex. Consider an ADT which has a requirement to be able to add any item in amortised O(logN) time. We can provide this (e.g. using a heap, balance-binary-tree, or skip lists, or an auto-resizing hash table), but it will mean that just handling insertion/deletion of head, tail and a single arbitrary middle element is not sufficient to fully test the insert/delete method. However, even if the tester is able to spot the performance requirement and work out that not only is a performance test needed, but also that additional functional tests are needed the details of WHICH additional tests are needed will depend on the implementation chosed, not the specification. Again, coverage metrics can help identify the need for additional test cases.

XP-style test-driven-design doesn't explicitly handle this sort of case. We probably have already coded the simple (linked list, perhaps) implementation, then we add the performance requirement. We add the "performance test" test cases to the collection test, then code up the balance-binary-tree algorithm, or whatever. But at that point we don't necessarily have sufficient test cases to cover the complexity of the insert routine -- we need to augment TDD with some post-coding tests.[3]

Testing well is a hard job, and getting the cost-benefit balance right is even harder. TDD is not a panacea. Nor are coverage metrics. Nor is unit testing in any other form.

Samuel wrote:
On Wednesday 26 November 2003 03:21 am, Andrew Suffield wrote:
> So, now you know that somewhere in that 500 line chunk lies a bug. Are
> you seriously suggesting that you can often spot it by inspection?
Not only am I suggesting it, I've done it ... hundreds of times.

With only enough info to narrow it down to 500 lines? I'm surprised.

Not because it isn't possible -- I've done it myself (probably tens or hundreds of times, though that is over the last ~15 years) and usually on code written by other people -- but rather because I got the impression that your tests would be granular enough to narrow it down more than that. Typically, I would expect testing/DbC granularity sufficient to narrow bugs down to 20-50 lines or so at most (approximately, a method).

Just because you _can_ find bugs by inspection in 500-line stretches of code -- and perhaps even enjoy the challenge -- doesn't mean it is a good idea :-)

Misha Dorman

[1] The identification, definition, selection and automation of sufficient coverage metrics to spot the faults alluded to in my example is left as an exercise for the reader. In the absence of such metrics, "lint" and similar tools can also help find such assumptions, as can the use of DbC-style assertions (which at least mean that if/when they are shown to be erroneous, it becomes obvious quickly).

[2] "Cost" of course, is a complex function, both when applied to testing and when applied to residual bugs. In OSS, in particular, it might be defined to include the "cost to self-image and personal reputation" of a bug.

[3] "Pure" TDD would probably say that we should gradually build up to the balance-binary-tree algorithm (first make it a tree, then add the "add monotonically increasing items" case which only requires "add on right of rightmost" support, then add other cases, including the re-balance cases). But I'm not sure that's a particularly likely scenario -- most programmers will get balance-binary-tree from an algorithms book, rather than deriving it from first principles. Can any TDD experts comment?

Get Hotmail on your mobile phone

reply via email to

[Prev in Thread] Current Thread [Next in Thread]