Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Sunday, July 05, 2009

Test passed/failed

More wrt test monitoring.

I concluded the last post with (slightly extended):
TEST RUN: test1
TEST START: test2
TEST CHECK OKAY: test2 check1
TEST STARTED: test2.1
TEST PASSED: test2.1
TEST FINISHED: test2.1
TEST END: test2
Implying 2 top level tests, test1 and test2. Test1 is a "monad", reported by TEST RUN ooutside of START/END. Test2 is bracketed by START/END, and contains subtest 2.1.

When I started testing seriously, I thought that all tests could be classified passed/failed. That is always a worthwhile goal. If it could be accomplished automatically, it might suggest what we might express in pseudo-XML as:

<test name="test1" result="passed"/>
<test name="test2">
<test-check result="ok" test_name="test1" check_name="check1"/>
...
</test name="test2" result="passed">

My pseudo-XML allows attributes on the close. Without this, one might just expeft a TEST PASSED message immediately before the close.

However, over the years I have learned that things are not always so clear cut. While it is always best to write completely automated tests that clearly pass or fail ...

Sometimes you write tests and just run them, but do not automatically determine pass or fail.

Sometimes manual inspection of the output is required.

Sometimes you just want to say that you have run the test, but you have not yet automated the checking of the results... and sometimes, given real-world schedule pressure, you never get around to automating the checking. In such cases, IMHO it is better to say


TEST STARTED: foo
TEST TBD: have not yet automated results checking yet
TEST ENDED: foo


than it would be to just omit the test.

Oftentimes, the fact that a test has compiled and run tells you something. Or, rather: if the test fails to compile or run it tells you that you definitely have a problem.

Sometimes you can automate part of a test, but needmanual inspection for other parts. In this case, I think reporting "TEST PASSED" is dangerously misleading:


TEST STARTED: foo
TEST PASSED
TEST ENDED: foo


or, better


TEST STARTED: foo
TEST PASSED
TEST TBD: foo: need manual inspection of rest of test output
TEST ENDED: foo


I think that "TEST PASSED" tends to imply that the entire test has passed. If you say "TEST PASSED" without a label test name, it tends to imply that the enclosing test has passed.

Better to say


TEST STARTED: foo
TEST PASSED: sub-test bar of test foo
TEST TBD: foo: need manual inspection of rest of test output
TEST ENDED: foo


I have recently started using other phrases, such as "TEST CHECK"


TEST STARTED: foo
TEST CHECK OKAY: foo check1
TEST PASSED: sub-test bar of test foo
TEST TBD: foo: need manual inspection of rest of test output
TEST ENDED: foo


Q: what is the difference between a TEST PASSED: subtest and a TEST CHECK OKAY (or TEST CHECK PASSED)? Not much: mainly, the name tends to imply something about importance. Saying that a test or subtest passed seems to imply that something freestanding has passed. A check within a test seems naturally less important.

This is along the lines of assertions. Some XUnit tests count all assertions passed. While this can be useful - particularly if some edit accidentally removes thousands of assertions - I myself have found that the number of assertions gives a false measure of test effort.

It may be that I am conflating "test" with "test scenario". A "test scenario" or "test case" may be subject to thousands of assertions. Particularly if the asserts are in the infrastructure. But I really want to count test cases and scenarios.

Here's one reason why I try to distinguish #tests passed from #checks performed:
  • my test monitor performs consistency checks such as tests_passed = test_cases, tests_started = tests_ended, etc.
What I really want is things like
  • Number of tests that had positive indication of complete success - tests passed. (Or, at least, success as complete as any test can indicate.)
  • Number of tests that had postive indication of a failure or error.
  • Similarly, warnings.
  • Number of tests that had no positive indication - a monad "TEST RUN" message was sen, or perhaps a TEST START/END pair, but no positive indication.
  • Number of tests where failure can be inferred - e.g. TEST START without a corresponding test end.

No comments: