The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Sunday, July 05, 2009

Gilding the Lily

By the way, I recognize that in the preceding posts I risk the sort of gilding the lily that so many software teams do when they define many, many, different comment keywords.

    E.g. http://www.cs.clemson.edu/~mark/330/colwell/p6-coding-standards.html.
    This is relatively small - only 4 primary keywords, with 3 secondary.
    Other example standardsare much larger.

    - This "flag" is used to indicate items that require later definition. It stands for To Be Defined (or Determined). The ensuing comment should provide more particulars.
    - This "flag" is used to indicate items that have been defined and are now awaiting implementation. It stands for Not Yet Implemented. The ensuing comment should provide more particulars.
    - This "flag" is used to indicate the existence of an explicit machine dependency in the code. Again, the ensuing comment should provide more particulars.
    - This "flag" is used to indicate the existence of a bug of some form. It should be followed immediately in the comment (on the same line) with one of the keywords incomplete, untested, or wrong, to indicate its type along with a more descriptive comment if appropriate. If none of the keywords is applicable, some other descriptive word should be used along with a more extensive comment.

The most comonly used such keyword is probably
  • TBD or TODO
and other common keywords include
  • NYI: not yet implemented
  • BUG:
  • NOTE:
But frequency of use rapidly goes down. E,g. although I approve of the concept of MACHDEP, I hardly ever used it during the P6 project, and I had totally forgotten about it.

Comments such as these work best when automated.

E.g., although I hate doxygen, doxygen comments get automatically extracted, and are hence more likely to be seen and corrected.

E.g. ditto Perldoc. Again, I find perldoc makes code harder to read.

Preparing this blog, I found
"The Fine Art of Commenting",
Bernhard Spuida, 2002.

Spuida mentions several such systems, including Microsoft's C# .NET system,
where XML tags are embedded in comments.
As indicated by Spuida's title "Senior Word Wrangler",
this motivation seems to be mainly about preparing documentation.

It was fun to re-find the Indian Hill Coding Standards

Every few years I seem to go through a spate of reading of coding standards.
I think every programmer should be familiar with several, many, different coding standards.
So that he or she can pick parts that work well,
for projects that need those parts.

Final words:
Brian Marick's lynx.
Hypertext embedded within the code itself.
My own hope of wiki files in the source trees,
and/or linked to from the code itself.
Programming languages using extensible notations, such as XML,
allowing arbitrary structured annotations.

Test passed/failed

More wrt test monitoring.

I concluded the last post with (slightly extended):
TEST RUN: test1
TEST CHECK OKAY: test2 check1
TEST PASSED: test2.1
TEST END: test2
Implying 2 top level tests, test1 and test2. Test1 is a "monad", reported by TEST RUN ooutside of START/END. Test2 is bracketed by START/END, and contains subtest 2.1.

When I started testing seriously, I thought that all tests could be classified passed/failed. That is always a worthwhile goal. If it could be accomplished automatically, it might suggest what we might express in pseudo-XML as:

<test name="test1" result="passed"/>
<test name="test2">
<test-check result="ok" test_name="test1" check_name="check1"/>
</test name="test2" result="passed">

My pseudo-XML allows attributes on the close. Without this, one might just expeft a TEST PASSED message immediately before the close.

However, over the years I have learned that things are not always so clear cut. While it is always best to write completely automated tests that clearly pass or fail ...

Sometimes you write tests and just run them, but do not automatically determine pass or fail.

Sometimes manual inspection of the output is required.

Sometimes you just want to say that you have run the test, but you have not yet automated the checking of the results... and sometimes, given real-world schedule pressure, you never get around to automating the checking. In such cases, IMHO it is better to say

TEST TBD: have not yet automated results checking yet

than it would be to just omit the test.

Oftentimes, the fact that a test has compiled and run tells you something. Or, rather: if the test fails to compile or run it tells you that you definitely have a problem.

Sometimes you can automate part of a test, but needmanual inspection for other parts. In this case, I think reporting "TEST PASSED" is dangerously misleading:


or, better

TEST TBD: foo: need manual inspection of rest of test output

I think that "TEST PASSED" tends to imply that the entire test has passed. If you say "TEST PASSED" without a label test name, it tends to imply that the enclosing test has passed.

Better to say

TEST PASSED: sub-test bar of test foo
TEST TBD: foo: need manual inspection of rest of test output

I have recently started using other phrases, such as "TEST CHECK"

TEST CHECK OKAY: foo check1
TEST PASSED: sub-test bar of test foo
TEST TBD: foo: need manual inspection of rest of test output

Q: what is the difference between a TEST PASSED: subtest and a TEST CHECK OKAY (or TEST CHECK PASSED)? Not much: mainly, the name tends to imply something about importance. Saying that a test or subtest passed seems to imply that something freestanding has passed. A check within a test seems naturally less important.

This is along the lines of assertions. Some XUnit tests count all assertions passed. While this can be useful - particularly if some edit accidentally removes thousands of assertions - I myself have found that the number of assertions gives a false measure of test effort.

It may be that I am conflating "test" with "test scenario". A "test scenario" or "test case" may be subject to thousands of assertions. Particularly if the asserts are in the infrastructure. But I really want to count test cases and scenarios.

Here's one reason why I try to distinguish #tests passed from #checks performed:
  • my test monitor performs consistency checks such as tests_passed = test_cases, tests_started = tests_ended, etc.
What I really want is things like
  • Number of tests that had positive indication of complete success - tests passed. (Or, at least, success as complete as any test can indicate.)
  • Number of tests that had postive indication of a failure or error.
  • Similarly, warnings.
  • Number of tests that had no positive indication - a monad "TEST RUN" message was sen, or perhaps a TEST START/END pair, but no positive indication.
  • Number of tests where failure can be inferred - e.g. TEST START without a corresponding test end.

Test monitoring keywords

Playing with my text-test-summarizer-monitor, a little Perl/Tcl script that looks at the output of tests, filtering things likes tests started/ended, passed/failed, looking for various common indications of problems. Throws up a Tcl widget. The closest thing I have to a green bar.

Here's an annoyance: my test may look like

TEST END: test1
TEST CHECK OKAY: test2 check1
TEST END: test2

Oftentimes I like to announce "TEST STARTED" and "TEST ENDED". (I just had to extend my scripts to handle both START and STARTED. This is useful in case the test crashes in the middle, and you never get to the test end.

However, occasionally the test infrastructure does not allow this. Occasionally I just say, once, at the end "I ran this test, and it passed". That's what I mean above by the "TEST END: test1" without the corresponding TEST START.

In XML, this would be simple:

<test name="test1"/>
<test name="test2">
<test-check result="ok" test_name="test1" check_name="check1"/>
</test name="test2">

  1. I added an attribute to the closing, </test name="test2">. Although not part of standard XML, occasionally this has helped me. I call this, therefore, pseudo-XML
  2. Note that test is available in both and ... form
Note that this is pretty verbose. Although human readable, I would not call it human friendly.
Because the pseudo-XML is not so human friendly, I often prefer to print messages such as

TEST END: test1
TEST CHECK OKAY: test2 check1
TEST END: test2

But here I run into terminology: I don't have a natural way of having a message, that is not confused with START/END.

First: I am reasonably happy with brackets {BEGIN,START}/{END,FINISH},
and {STARTED}/{ENDED,FINISHED}. English grammar, how inconsistent.
I want to tolerate all of these, since I have used all from time to time,
and regularly find a test suite falsely broken if I assume that I have consistently used START and not STARTED.

(I'm going to arbitrarily reject TEST COMPLETED. Too verbose. Especially the dual TEST INITIATED. At least, I'll eject until it seems I encounter it a lot.)

But a TEST END without a TEST START is too confusing. I need an English phrase that doesn't need a corresponding start. Let's try a few:

  • TEST PASSED: test name.
      with the corresponding
    • TEST FAILED: test name

    • However, there might be some confusion because I definitely want to use TEST PASSED/FAILED withing TEST START/END. See below.

  • TEST RESULT: test name
      Similarly, there might be some confusion because I might want to use TEST RESULT within TEST START/END.

  • TEST RUN: test name
      Nice because of the possible dual TEST NOT RUN

I think that I am arriving at the conclusion that any of the above, outside a TEST START/END, make sense, and should be considered equivalent to &lttest .../>

I am not currently checking for the proper nesting of tests, but I could be.

I think that it would be good to have a count of top level tests, either TEST START/END brackets or TEST outside such brackets, but ignoring stuff within the brackets.

Giving me

TEST RUN: test1
TEST CHECK OKAY: test2 check1
TEST END: test2