Invariants

2 comments
In my previous post, TC made a comment about humane interface. As I was writing my reply to him I realized that in order to express my opinion about design of interfaces I must first establish a couple of fundamental issues. Invariants is one of them.

Invariants, what are they?

Simply put, the invariants are the assumptions that you have about your variables. Here are a few simple examples:

  1. When I write a public static void main(String[] args) method I expect the args variables to contain the command line arguments passed to the program.
  2. when I write a recursive method that computes factorials:
    static int fact(int n) { return n == 0 ? 1 : n*fact(n-1); }
    I am assuming that n is >= 0.
  3. In the following method I am expecting the array argument, arr, to have at least one element. If this assumption does not hold I will get a division-by-zero error:
    static double average(double[] arr) { 
    double sum = 0.0;
    for(double d : arr)
    sum += d;
    return sum / arr.length;
    }

  4. In a naive iteration over a linked list (as realized below by method contains()) we usually assume that the list is cycle-free:
    class Node {
    private Node next;
    private int value;
    void setNext(Node n) { next = n; }
    void setValue(int num) { value = num; }

    boolean contains(int num) {
    for(Node curr = this; curr != null; curr = curr.next)
    if(curr.value == num)
    return true;
    return false;
    }
    }
As I said, these are just simple examples. Once you understand the notion of invariants you start seeing them everywhere.

So how do you call situation where the set of invariants assumed by the programmer does not match the the actual invariants delivered by the code? A Bug!

It seems that most bugs are caused by a program that fails to keep its invariants. It is therefore imperative for a programmer to know whether a certain invariant is actually delivered by the code.

Is there an invariants-aware style of programming?

In object-oriented languages there are a few mechanisms that can help a programmer in establishing and understanding invariants. I'll try to enumerate the four most valuable ones. I am deliberately ignoring the issue of types (which establishes invariants in statically typed languages) as it seems to be a well understood concept.

  1. Design by Contract (DBC) - Intuitively, DBC is similar to Java's assert keyword. In its essence DBC is a much broader concept, attributed to the Eiffel programming language. You can read about it here and here.

    Anyway, by using assertions a programmer can explicitly express his expectations from his variables in certain points in the program.
    Here is the average() method, rewritten with assertions:
    static double average(double[] arr) { 
    assert arr != null;
    assert arr.length >= 1;
    double sum = 0.0;
    for(double d : arr)
    sum += d;
    return sum / arr.length;
    }

    There are two main reasons why this version of average() is better than the previous one: First, you get a runtime error (which is better than a silent wrong result) if you try to break the invariant. Second, this version makes the invariant explicit: a quick look at the first two lines of average() will immediately tell you what are the legal values for the arr parameter.

    The DBC mechanism is highly flexible. It can capture every conceivable invariant. On the other hand, DBC is checked only at run-time. Therefore an erroneous call such as average(new double[0]) will still be legal from the compiler's point of view.

  2. Encapsulation - By restricting the visibility of a field one can make sure that the field is changed only if its new value complies with the invariants of the class.

    In the Node class (presented above) we made the next field private. This means that all changes must go through the setNext() method. Therefore, we can ensure the cycle-free invariant of the list by placing additional code in setNext(), as follows:
    class Node {
    ...
    void setNext(Node n) {
    Node old = next;
    next = n;
    Set<Node> set = new HashSet<Node>();
    for(Node curr = this; curr != null; curr = curr.next) {
    if(set.contains(curr)) {
    next = old; // Rollback the change
    throw new IllegalArgumentException();
    }
    set.add(curr);
    }
    }
    ...
    }

    Note that without encapsulation (or if next had public visibility) the invariant-checking code inside setNext() would be of no practical value: client code could work around the check by accessing the next field directly.

  3. Final fields - In Java, a field can be declared to be final which means that every constructor must initialize it (exactly once) and that methods cannot change its value. These requirements are checked by the compiler.

    As a direct result, a final field delivers the equality invariant: if f is a final field of an object x, and x.f == y at some point in the program, then we know for sue that the equality x.f == y will hold throughout the full life-time of x.

    This invariant may seem trivial but it is highly significant. Client code can sample the value of a final field once-and-for-all. There's no need to refresh the sampled value because it will never change. This dramatically simplifies the development of caches, statefull algorithms and multi-threaded code in general.

    Moreover, unlike the first two items on our list, final fields are checked at compile-time rather than run-time: It is impossible to compile a program where there is a final field that does not uphold the equality invariant. So, while final fields are not as versatile a mechanism as DBC, they deliver a much stronger promise.

  4. Sequencing - Sometimes our invariant requires that a certain operation will take place only if another operation had happened before. Can we statically enforce such a condition? Obviously, we can use DBC but this will be a dynamic (run-time) enforcement.

    Luckily, we can rely on constructors to statically enforce the happened before invariant.

    In Java, every object is initialized exactly once. It is impossible to invoke a method on an object unless it was successfully initialized by one of the constructors of the class. The idea underlying sequencing is therefore simple: By placing an operation at the constructor(s) we ensure that it will happen before the operations implemented by the methods.

    To make things more concrete, consider this (error prone) class:

    public class Resource {
    private final File file;
    private Reader reader;

    public Resource(File f) { file = f; }

    public void open() {
    if(reader != null)
    reader = new FileReader(file);
    }

    // Pitfall: next() can be called even if open() was not
    public int next() { return reader.read(); }

    // Same pitfall
    public void skip(long n) { reader.skip(n); }
    }
    What we have here is a sequencing problem. If client code calls next() or skip() without a prior call to open() we will get a null pointer error. Here is how we can use the constructor to eradicate this problem:

    public class SafeResource {
    private final File file;
    private Reader reader;

    public SafeResource(File f) { file = f; reader = new FileReader(file); }
    public int next() { return reader.read(); }
    public void skip(long n) { reader.skip(n); }
    }
    With SafeResource it is impossible to misuse the methods next() or skip(). This is a much safer design and it relies on Java's fundamental property that an object must be initialized exactly once.

    Another alternative for making class Resource safer is this:

    public class ReadableResource {
    private Resource res;

    public ReadableResource(Resource r) { res = r; r.open(); }
    public int next() { return res.read(); }
    public void skip(long n) { res.skip(n); }
    }
    It should be noted that the use of constructors to ensure proper ordering will not help you if your code is vulnerable to null pointer problems. For example, if you turn this code:
    Resource r = ...;
    r.next();
    into this:
    Resource r = ...;
    ReadableResource rr = new ReadableResource(r);
    rr.next();
    Then you successfully solved the sequencing bug, but if r can receive a null value then you'll run into a null pointer problem in either version.


So to sum up, we discussed the use of encapsulation and DBC to achieve run-time checked invariants. We then showed how constructors and final fields can be seen as devices that establish invariants that are checked at compile-time.

This should affect the way you read and write your code: A final field is much more than an assign-once filed. It gives you a valuable piece of information about the design of the class; A constructor is not just a block of code where you setup the new instance. It is the place where you should put all operations that must take place before a method can be invoked.