Testing UI

Whenever I hear someone saying that his team is practicing automatic testing (TDD or whatever other buzzword) I find my self asking: "But how do you test the UI?". I get all sort of answers which only prove the inherent difficulties.

So today, I tried UISpec4J. It is too early for me to comment on the overall quality of this library, but the first impression is quite good. However, their documentation fails to provide the reader with a quick, fully-working example (which is the single most important piece of information that the user of a library needs). The following fragment is my attempt to fill this gap: take it, compile it, run it.

The subject program:

import java.awt.event.*;
import javax.swing.*;

public class Win extends JFrame
private static final long
serialVersionUID = -9206602636016215477L;

public Win()
JPanel panel = new JPanel();
final JButton b = new JButton("A Button");

JButton t = new JButton("Toggle");
t.addActionListener(new ActionListener()
boolean show = false;
public void actionPerformed(ActionEvent e)
show = !show;



...and the test program:
import org.junit.Test;
import org.uispec4j.*;
import org.uispec4j.assertion.UISpecAssert;

public class Win_Test

public void test1() throws Exception
Panel panel = new Panel(new Win());

Button b = panel.getButton("A Button");
Button t = panel.getButton("Toggle");




  • The constructor of the tested widget (class Win) should not display the widget. This will make UISpec4J angry because it runs the tested window in some private, hidden, realm.
  • The test program will need both the JUnit and the UISpec4J jars to compile
  • The current version supports Java 5 but not Java 6.

You can't always get what you want

So you're using TDD and you are test-infected and you have extremely low defect rate, the world is smiling at you and everything is super. Can you reach a point where your test suite does not need any more test cases?

Sadly, the mathematical answer is No. There's no such thing as a "big-enough" test suite.

Here is the explanation: Suppose your test suite contains n test cases. We can say that this suite is large enough if it guarantees that the n+1 test case is already covered by the existing n cases.

I claim that if you give me a class that passes these n tests I can give you a class that still passes these n test but fails for the n+1 test. In other words: passing the first n tests guarantees nothing with respect to any subsequent tests.

If you're really interested in the details, then here's how I can make the class fail in this last test: I will obtain the stacktace - new Exception().getStackTrace() - and look for the test method of that n+1 test. Of course this is a highly superficial way for a program to fail a test but it is still a bug just like any other "natural" bug (an int overflow with large inputs, a collection stored in a static field that eventually explodes, ...).

A few corollaries:
  1. Every test suite always has one more thing that it can check for.
  2. A bug that was not caught is not an indication to the test suite's lack of strength.
As you can see my point here is that blindly growing your test suite is almost a futile effort. (mathematically speaking, no matter how many tests you can write the total amount will still be just a tiny fracture of infinity). Instead of going for the masses, you should strive to make your suite "smarter". Here are three suggestions:
  • Don't treat the tested class as a black box. If you know nothing about the implementation then you have to test each method separately. On the other hand, if you allow your self to know that method f() merely calls g() with some default parameters, then you can dramatically reduce the amount of testing for f(). In theory, the Black-box approach for testing sounds like an excellent idea (you test the interface not the implementation) . The sad reality is that this approach is impractical - you number of tests you have to write is astronomical.
  • Prefer thin interfaces. If the public interface of your class has just one method, it can be tested much more effectively than a class with (say) two public methods.
  • Invest in high-level test. Unit testing is great. But higher level tests (aka: functional-, acceptance- tests) examine larger parts of your code in real-world scenarios. Thus, if you feel your suite is not strong enough, I would recommend adding more high-level tests. The return on investment (that is: the increase in the strength of your suite) will be much bigger than with adding low-level test.

Principles of Good Design

It seems that we are on a never ending quest for seeking an ultimate answer to an immortal question: "What is a good program?". In a previous post I discussed the difficulties inherent in this question.

People are constantly trying to make progress with respect to this question. Here are several (very well put) suggestions for producing high-quality code.

Excerpt #1, “Continuous Design” / Jim Shore:

Continuous design makes change easy by focusing on these design goals:

  • DRY (Don't Repeat Yourself): There's little duplication.
  • Explicit: Code clearly states its purpose, usually without needing comments.
  • Simple: Specific approaches are preferred over generic ones. Design patterns support features, not extensibility.
  • Cohesive: Related code and concepts are grouped together.
  • Decoupled: Unrelated code and concepts can be changed independently.
  • Isolated: Third-party code is isolated behind a single interface.
  • Present-day: The design doesn't try to predict future features.
  • No hooks: Interfaces, factory methods, events, and other extensibility "hooks" are left out unless they meet a current need.

Excerpt #2, “Orthogonality and the DRY Principle” / Andy Hunt and Dave Thomas:

All programming is maintenance programming, because you are rarely writing original code. If you look at the actual time you spend programming, you write a bit here and then you go back and make a change. Or you go back and fix a bug. Or you rip it out altogether and replace it with something else. But you are very quickly maintaining code even if it's a brand new project with a fresh source file. You spend most of your time in maintenance mode.

So you may as well just bite the bullet and say, "I'm maintaining from day one." The disciplines that apply to maintenance should apply globally.


Most people take DRY (Don't Repeat Yourself) to mean you shouldn't duplicate code. That's not its intention. The idea behind DRY is far grander than that.

DRY says that every piece of system knowledge should have one authoritative, unambiguous representation. Every piece of knowledge in the development of something should have a single representation. A system's knowledge is far broader than just its code. It refers to database schemas, test plans, the build system, even documentation.

The 12-steps plan (for extending the Javac compiler)

The list may seem long but its the absolute minimum (I am an XP believer so I didn't put anything unless I was sure that you will need from day one). Also, the principles implied by these steps are relevant for extending other compilers, and even for other kinds of applications.

So here is the 12-steps plan:
  1. Get the Javac source, commit these sources to to your source repository and then tag the snapshot. Thus, revision 1.1 of a file will always be the original.

  2. Reformat all source files according to your preferred coding style.

  3. Commit to the source repository and tag this snapshot. Thus, revision 1.2 of a file will be the reformatted form of the original. If something stops working, you will be able to easily diff to see what is it that you broke. Note that diffing with revision 1.1 will not be useful - the output will be cluttered with many differences due to the difference in coding styles.

  4. Compiler entry point. Create a new main class for the compiler, e.g.: MyCompiler. Your new functionality will be activated by having MyCompiler create subclasses of the original compiler modules (see the "Subclassing a module" section in this post).

  5. Execution entry point. Create a class (e.g.: MyLauncher) that will run your compiled program (a replacement for the "java" program).

    Discussion: In its simplest form MyLauncher.main() accepts a command line with a user supplied class path, the name of a main class and a vector of command line arguments. It will first create a new class-loader configured with these two locations (order is important):
    • The user supplied class-path
    • The class-path with which MyLauncher.main() was invoked: System.getProperty("java.class.path").

    The parent of this new class-loader should be the system class loader (that is: ClassLoader.getSystemClassLoader()).
    MyLauncher.main() will load the user specified main class from the newly created loader and will start the program by invoking the main() method of that class.

  6. Add a test/src/ and test/bin/ directories to the your project.

  7. Write a small Java program under test/src. For example: test/src/HelloWorld.java.

  8. Sanity test. Write a JUnit test case that uses MyCompiler to compiler test/src/HelloWorld.java, and
    then uses MyLauncher to run test/bin/HelloWorld.class. The test case should capture the output of the program and compare it with the expected output.

  9. Repeat step 9 but with a large, heavily tested, open-sourced, program. After compiling the tests case will run the test suite of the compiled program.

  10. Add a bootstrap/ directory to your project root.

  11. Bootstrapping test. write a JUnit test that uses MyCompiler to compile MyCompiler itself and places the generated class files at the bootstrap/ directory. The test will then create a new class loader, which points at the bootstrap directory, and will load the MyLauncher class from this class loader. Finally, it will run the test cases from steps 9 & 10 via that recently loaded MyLauncher class.

  12. Deploy Javac's full test environment, as described here.

That's it. You're now ready to extend the compiler.


In my previous post, TC made a comment about humane interface. As I was writing my reply to him I realized that in order to express my opinion about design of interfaces I must first establish a couple of fundamental issues. Invariants is one of them.

Invariants, what are they?

Simply put, the invariants are the assumptions that you have about your variables. Here are a few simple examples:

  1. When I write a public static void main(String[] args) method I expect the args variables to contain the command line arguments passed to the program.
  2. when I write a recursive method that computes factorials:
    static int fact(int n) { return n == 0 ? 1 : n*fact(n-1); }
    I am assuming that n is >= 0.
  3. In the following method I am expecting the array argument, arr, to have at least one element. If this assumption does not hold I will get a division-by-zero error:
    static double average(double[] arr) { 
    double sum = 0.0;
    for(double d : arr)
    sum += d;
    return sum / arr.length;

  4. In a naive iteration over a linked list (as realized below by method contains()) we usually assume that the list is cycle-free:
    class Node {
    private Node next;
    private int value;
    void setNext(Node n) { next = n; }
    void setValue(int num) { value = num; }

    boolean contains(int num) {
    for(Node curr = this; curr != null; curr = curr.next)
    if(curr.value == num)
    return true;
    return false;
As I said, these are just simple examples. Once you understand the notion of invariants you start seeing them everywhere.

So how do you call situation where the set of invariants assumed by the programmer does not match the the actual invariants delivered by the code? A Bug!

It seems that most bugs are caused by a program that fails to keep its invariants. It is therefore imperative for a programmer to know whether a certain invariant is actually delivered by the code.

Is there an invariants-aware style of programming?

In object-oriented languages there are a few mechanisms that can help a programmer in establishing and understanding invariants. I'll try to enumerate the four most valuable ones. I am deliberately ignoring the issue of types (which establishes invariants in statically typed languages) as it seems to be a well understood concept.

  1. Design by Contract (DBC) - Intuitively, DBC is similar to Java's assert keyword. In its essence DBC is a much broader concept, attributed to the Eiffel programming language. You can read about it here and here.

    Anyway, by using assertions a programmer can explicitly express his expectations from his variables in certain points in the program.
    Here is the average() method, rewritten with assertions:
    static double average(double[] arr) { 
    assert arr != null;
    assert arr.length >= 1;
    double sum = 0.0;
    for(double d : arr)
    sum += d;
    return sum / arr.length;

    There are two main reasons why this version of average() is better than the previous one: First, you get a runtime error (which is better than a silent wrong result) if you try to break the invariant. Second, this version makes the invariant explicit: a quick look at the first two lines of average() will immediately tell you what are the legal values for the arr parameter.

    The DBC mechanism is highly flexible. It can capture every conceivable invariant. On the other hand, DBC is checked only at run-time. Therefore an erroneous call such as average(new double[0]) will still be legal from the compiler's point of view.

  2. Encapsulation - By restricting the visibility of a field one can make sure that the field is changed only if its new value complies with the invariants of the class.

    In the Node class (presented above) we made the next field private. This means that all changes must go through the setNext() method. Therefore, we can ensure the cycle-free invariant of the list by placing additional code in setNext(), as follows:
    class Node {
    void setNext(Node n) {
    Node old = next;
    next = n;
    Set<Node> set = new HashSet<Node>();
    for(Node curr = this; curr != null; curr = curr.next) {
    if(set.contains(curr)) {
    next = old; // Rollback the change
    throw new IllegalArgumentException();

    Note that without encapsulation (or if next had public visibility) the invariant-checking code inside setNext() would be of no practical value: client code could work around the check by accessing the next field directly.

  3. Final fields - In Java, a field can be declared to be final which means that every constructor must initialize it (exactly once) and that methods cannot change its value. These requirements are checked by the compiler.

    As a direct result, a final field delivers the equality invariant: if f is a final field of an object x, and x.f == y at some point in the program, then we know for sue that the equality x.f == y will hold throughout the full life-time of x.

    This invariant may seem trivial but it is highly significant. Client code can sample the value of a final field once-and-for-all. There's no need to refresh the sampled value because it will never change. This dramatically simplifies the development of caches, statefull algorithms and multi-threaded code in general.

    Moreover, unlike the first two items on our list, final fields are checked at compile-time rather than run-time: It is impossible to compile a program where there is a final field that does not uphold the equality invariant. So, while final fields are not as versatile a mechanism as DBC, they deliver a much stronger promise.

  4. Sequencing - Sometimes our invariant requires that a certain operation will take place only if another operation had happened before. Can we statically enforce such a condition? Obviously, we can use DBC but this will be a dynamic (run-time) enforcement.

    Luckily, we can rely on constructors to statically enforce the happened before invariant.

    In Java, every object is initialized exactly once. It is impossible to invoke a method on an object unless it was successfully initialized by one of the constructors of the class. The idea underlying sequencing is therefore simple: By placing an operation at the constructor(s) we ensure that it will happen before the operations implemented by the methods.

    To make things more concrete, consider this (error prone) class:

    public class Resource {
    private final File file;
    private Reader reader;

    public Resource(File f) { file = f; }

    public void open() {
    if(reader != null)
    reader = new FileReader(file);

    // Pitfall: next() can be called even if open() was not
    public int next() { return reader.read(); }

    // Same pitfall
    public void skip(long n) { reader.skip(n); }
    What we have here is a sequencing problem. If client code calls next() or skip() without a prior call to open() we will get a null pointer error. Here is how we can use the constructor to eradicate this problem:

    public class SafeResource {
    private final File file;
    private Reader reader;

    public SafeResource(File f) { file = f; reader = new FileReader(file); }
    public int next() { return reader.read(); }
    public void skip(long n) { reader.skip(n); }
    With SafeResource it is impossible to misuse the methods next() or skip(). This is a much safer design and it relies on Java's fundamental property that an object must be initialized exactly once.

    Another alternative for making class Resource safer is this:

    public class ReadableResource {
    private Resource res;

    public ReadableResource(Resource r) { res = r; r.open(); }
    public int next() { return res.read(); }
    public void skip(long n) { res.skip(n); }
    It should be noted that the use of constructors to ensure proper ordering will not help you if your code is vulnerable to null pointer problems. For example, if you turn this code:
    Resource r = ...;
    into this:
    Resource r = ...;
    ReadableResource rr = new ReadableResource(r);
    Then you successfully solved the sequencing bug, but if r can receive a null value then you'll run into a null pointer problem in either version.

So to sum up, we discussed the use of encapsulation and DBC to achieve run-time checked invariants. We then showed how constructors and final fields can be seen as devices that establish invariants that are checked at compile-time.

This should affect the way you read and write your code: A final field is much more than an assign-once filed. It gives you a valuable piece of information about the design of the class; A constructor is not just a block of code where you setup the new instance. It is the place where you should put all operations that must take place before a method can be invoked.

The Mathematics of Engineering

Suppose you need to write a new class. After thinking a bit you realize there are two ways to implement the class. The first one is slow but thorough (henceforth- ST): it will take you ten days to complete, but it can easily evolve to support a new set of features. Let's assume that a typical extension of this implementation will require five days of work.

The second way is quick and dirty (QD): you will need only one day to complete the task, but the resulting code will not be very scalable (that is, it will be difficult to extend it to support additional features).

So which approach should you choose? To answer this question we estimate the total development time needed for supporting the extended set of features.

  • The ST approach requires 15 days (10 for the initial version, 5 for the extension).
  • The QD approach requires at most 16 days (1 for the initial version, then throwing away the code and re-programming for 15 straight days)
At first sight we note that the penalty of choosing the QD approach was just one day of work (the amount of time required for the QD implementation)

But, this is just one side of the story. Here is why you should prefer the QD approach despite this penalty:
  1. It is possible that (eventually) you will not need the extended features. Thus, you will waste nine days of work for something that will never be used.
  2. It likely that you can reuse some of the QD code instead of throwing it away. For the very least you can reuse your test cases (remember: we assumed a similar interface). This can dramatically reduce the penalty.
  3. Getting something to a working state is very important. Other members of your team, who need this class, may have to wait until you finish writing it. Thus, by releasing your class sooner (i.e., choosing the QD approach), you increase the overall throughput of your team. This gain is well worth the QD penalty.
Of course, if you are about to take such a decision you will have to make these time estimates yourself based on the task at hand.

Whatever your final decision is, remember this simple moral: The penalty that the QD approach carries is much smaller than what it initially seems.

Hacking a compiler

Q: How can I discover where a certain compiler operation is implemented?
A: When you hack (in the positive sense of the word) a compiler you have one important factor working for you: Unlike most programs, a compiler has well defined semantics. This means that you can predict its behavior when you give it a certain input. In the domain of compilers an input is source code to compile.

So, the trick for placing a breakpoint, is to use the compiler's error messages. Suppose you want to find the location where the type compatibility checking takes place. You start by writing a program that creates a type error. Then compile the program and write down the error message that you get: "Incompatible types\n Found:....\nRequired:...\n". Then use your favorite grep-like utility to find the line where this string appears and that's basically it.

The reality is a bit more complicated than that. In Sun's Javac compiler the error messages are listed in a properties file: com.sun.tools.javac.resources.compiler.properties The messages are keyed by strings that look like this: "compiler.err.prob.found.req.1". So you need to find the key in this file, then search this key inside the compiler's code base. When you do that you need to remove the prefix "compiler.err" (it is automatically added by the error reporting module).

So to conclude our little example, you actually need to search for "prob.found.req". This string has four occurrences, all inside com.sun.tools.javac.comp.Check.java. You can now place a breakpoint in each of these places. One of them is what you need

Theory of Relativity

Is it possible to determine whether a program is a good or bad? Can we measure the quality of a program?

I am afraid that for the time being the answer to these questions is "No".
Sure, we have all sorts of practices (e.g.: test as you write), principles (avoid duplicated code), and suggestions (e.g.: try to make your classes loosely coupled), but these are merely operational directives which are believed to produce a "good" program. The nature of a "good" program is still largely undefined.

Actually what we (software engineers) look for is some set of ''target functions'' which can be quantitatively measure the quality of the program. With such functions the task of making a better program boils down to maximizing the values reported by these functions for that program.

In the world of algorithms we already have two dominant target functions: speed and memory consumption. These functions can be easily measured if we have a concrete implementation of the algorithm. Moreover, in many cases we can even express these functions as a simple mathematical expression, known as "time complexity" or "space complexity".

On the other hand, in the world of software engineering things are much more fluid. To see that let's try to think of such target functions that can describe a program's quality: development time till first release; development time till second release; number of bugs; time needed to introduce a new developer to the code; time for solving a bug; time for replacing a certain part with another; etc.

These functions cannot be easily measured. We don't have a device that can measure the number of bugs in the program, the same way a simple stop-watch can measure the execution time of an algorithm.

In addition there is also the problem of inverse correlation. For example, it is reasonable to expect that a design that minimzes the development time might lead to an increased number of bugs. This means that even if we had devices for measuring these functions, program's quality would still be a relative concept: A program that is "good" in one sense is likely to be "bad" in another.

My point is that there is no such thing as an absolutely good program. When we programmers argue about design we tend to use absolute terms, such as: "this design is better than the other one". We allow ourselves do that because we have an implicit assumption about the target function that is appropriate for the project we are working on.

Personally, I adopted the following practice, in order to expose this subliminal assumption- When I evaluate a design decision I try answer these two questions: (1) Which target functions am I improving? (2) Which target functions will get worse as a result of this decision?

These questions are not as easy as they look. Providing intelligent answers to these questions requires good design skills.

The not so important grammar

Having went back to the Dragon Book the last couple of days (the last time I had to so, Bill Clinton was campaigning for his first term), a few radical ideas came to mind:

Is it really necessary to separate the lexical scanner from the parser?

It seems that space and time considerations have led to an architecture where the scanner translates the whole input into a stream of a tokens, which is then fed into the parser's input port. It seems that in a modern system a recursive descent parser can easily do the regexp matching in a lazy manner (i.e.: when it tries to read the next token).

This will make it possible to :
  1. Easily define "local" keywords: if a certain token is a tagged as a keyword in some rule, in other rules it may be tagged as an identifier. For example, the "throws" keyword in Java is used in a very specific place, in Java's grammar. If we turn it into a local keyword we will be able to have a varialbe named "throws" with no conflicats arousing.
  2. The parser generator can check if two competing regexps are have a non-empty intersection and issue an error/warning. In current global lexers this is an inherent issue, e.g.: every Java keyword is also an identifier. The resulting ambiguities are resolved thru precedence rules (usually: the first definition wins). If we use "local" keywords, the set of competing regexps is much smaller => chances of conflicts are smaller => we can afford to have an error whenever such a conflict occurs.

Why is the grammar of a language so important ?

Again I think that we stick with old an practice which needs to be reconsidered (to make it clear: I am not saying that a language designer should not publish the grammar of his lanuage. My point is that it delivers very little information).

So again, in the old days, the languages were not as rich as today (in particular: a much more primitive type system) so there were very few errors that could be caught by the compiler's semantic analyzer. Most of the errors reported by the compiler were syntax/parsing errors.

On the other hand, in a modern statically typed language (object-oritented or functional) there are tons of errors that are due to the semantic analysis, such as: extending a final class, specifying a class in the implements clause, a non initialized final field, a non conforming throws clause, instantiating an abstract class, .....

As a result, most of the inoframtion we developers needs does not come from the grammar rules of the language. This information comes from the language specification document which focuses on semantics (and run time behavior). And, when we look for the correct syntax of a language construct we simply go to Google.