I thought that a small project does not need testing...

(...Or: You can't over-estimate the importance of a testing infrastructure)

Over the last few weeks I had to write three small (1-2 days of work) applications. They were a command line utility, a Swing application that analyzes the complexity of Java methods, and a web-page--a single .html file--that lets it user extract some information from a REST server (Javascript, JQuery, Ajax).

I don't think that LOC is a good code metric, but just to give a feeling on the sizes of these projects, their respective LOC (including blanks, comments, unit tests, HTML) values are: 515, 1383, 664.

Reflecting on these three application I see a common theme: While I did unit test (and even TDD-ed) certain parts of the apps, I didn't put much emphasize on making the code testable. After all, these are very small apps which should not take very long to develop, so investing in testability seemed like a waste of time. I believed that the benefits of automatic testing will not outweigh the initial cost of putting the required scaffolding in place.

I was wrong. Even in web-page of 664 lines (including plain HTML), I quickly got to a position where the behavior of the app was quite intricate. In order to make sure that new functionality does not tamper with existing one I found myself repeatedly rerunning a lengthy series of manual tests. At the next round, there was even more "existing functionality" to test...

The total testing effort is actually similar to the sum of a simple arithmetic series: Sn = 1 + 2 + 3 + ... + n. In such a series the value of Sn grows ~ n^2. This means that the time needed for adding a new piece of functionality will rise as the app grows. Eventually it will reach a point where the time of implementing a feature will not be determined by the complexity of the feature but by the complexity of the app. All features, little or big, will take a lot of time to complete because the dominant cost is the testing, not the implementation.

If I decide not to (manually) test every new increment of functionality I am at the risk of not detecting bugs the moment they are introduced. This incurs significantly longer times for fixing these bugs when they are eventually detected.

Of course, at the beginning everything looked fine. However, after just a few hours the signs of a technical debt became evident: the code grew messy. I was afraid to refactor. I felt that I am coding in "extreme cautious" mode. I am no longer in control of my code. I could not move it in the direction that I wanted.

The amazing thing is the short distance that you have to walk in order for this effect to kick in. It usually took less than half a day for me to realize that manual testing slows me down.

Will I do things differently in the future? Yes. I will start with a testable skeleton of the app before adding any substantial behavior to it. The "start-from-a-skeleton" practice is already quite popular. The emphasize here is two fold:
  • It should be a testable skeleton. This will let you build testability into the system from the very start.
  • The extra cost of a testable skeleton pays off even in extra-small projects.
A thorough treatment of this topic is given in chapter Four of GOOS (by Steve Freeman and Nat Pryce) which talks about "Kick-Starting the Test-Driven Cycle". In particular, they argue that the goal of every first iteration should be to "Test a Walking Skeleton". Go read it.

I don't want to hear the "we-need-to-separate-tests-from-code" excuse

Many programmers tend to place their unit tests in a dedicated source folder. I do it differently. In my projects, the unit-test of class SomeClass is called SomeClass_Tests and it is located in the very same folder as SomeClass.java.

This has all sort of benefits: I can instantly see if a class has a test. I can instantly jump from the test to the testee and vice-versa. I don't have to create two parallel hierarchies of folders. It is very unlikely that I will rename the production class but not its test. Renaming of a package affects both tests and production classes. etc. There is one word that summarizes these benefits: Locality. Things that change together ought to be placed as close as possible.

There are IDE plugins out there that provide similar capabilities over a project structure that has separate source folders. However, these plugins will not help you when you access your code not through your IDE (for instance, I often explore my code repository via a browser).

But even more importantly, as I got more and more (test) infected I realized that I don't want to have this test-production separation. The tests are not something external that needs to be put away. Tests are a central piece of knowledge about my code. I want them to be near by. Think about it: Would you separate your Javadoc text from the class/method it is describing? Will you find it productive to write code in a language which dictates that fields are defined in one file and methods are defined in another file? (Actually, this is pretty much what happens in C++ ...).

Of course not.

The main difficulty is the packaging/deployment phase. I often heard the argument that "we need to have two separate folders because otherwise our deliverable (.jar/.war/...) will include both production code and testing code, and this is bad".

Is it really bad? First, In many situations (web-apps anyone?) the size of the binary does not matter much. Second, in situations where the deliverable includes the source code, the tests can be quite handy as usage samples.

If you're in a situation where these arguments do not hold, and you absolutely cannot place test code in your deliverable, then the remainder of this post is for you.

I often said that it is not hard to write a program that will delete .class files of test classes. It requires some knowledge of bytecode inspection techniques which apparently is not very common. So, as a service to the Java community and for the common good, I cleared a few hours and wrote it. The result is class-wiper (sources) a command line utility that will recursively scan the given directories and will delete all .class files therein that are related to JUnit.

Specifically, it will delete a class if either: (a) it mentions the @Test annotation; or (b) It uses directly or indirectly a class that mentions @Test. Your production code will never meet neither of these conditions. If you don't want test classes to reach your client you just need to invoke it from your build script, as follows:

 java -jar class-wiper <binary-dir1>  <binary-dir2> ...

This software is provided absolutely for free. Use it anyway you like. Change it. Tweak it. Hack it. Sell it. Whatever. I only ask one simple thing: please stop saying that you need to place your tests in a separate directory because you don't have a way to prevent your tests from reaching the deliverable.

Repeat after me: Immutable objects will not slow you down

As noted by Stephan Schmidt the advent of functional programming promotes the use of immutable objects even in non-functional languages. Immutable objects have many positive traits: they are safer, they throw less exception, they are not prone to problems of covariance, they prevent races when used in a multi-threaded settings, etc.

The main (only?) down side of immutable object is the fact that they make it difficult to update the data in your program. This should not be a surprise. After all, if they are immutable then it only makes sense that they will not encourage mutations.

Here is the general description of the problem: if x is an immutable object and x.f == y, and you want x.f to point at z, then you can't just set x.y to z. You need to create a new object, x', that is identical to x in all aspects except that x'.f == z. You then need to find all references to x and reroute them to x'.

In certain cases (such as: many object holding references to x) this rerouting may be quite hard and error prone. Also, if the objects pointing at x are immutable themselves, then further updates need to be carried out across your data structure. These difficulties are the reason why most C/C++/Java/C# folks tend to prefer designs with mutable objects, despite the benefits of immutability.

Side note: functional languages usually offer powerful pattern matching mechanisms that simplify the process of updating (immutable) data structures.

Anyway, even in cases where correctly rerouting references is not that bigger problem, developers are sometimes reluctant to use immutability due to performance issues. The argument goes as follows: If I make this class immutable, I will have to allocate a new object every time I update one of the fields. This extra allocation will slow down my program.

This argument is, by and large, incorrect.

Reason #1. Most of the code you're writing will not affect the performance of your program. There is no point in optimizing your code prematurely.

Reason #2. Mutable objects lead to defensive getters/setters. That is, If a method returns a field pointing at a mutable object it usually needs to create a copy of it to prevent the caller from breaking the invariants of the class. This means that an object will be duplicated even if the caller just wants to read its content. Given that reads are more frequent than writes we actually have that immutable objects yield faster programs simply because they do not imply object copying with every get/set operation.

I experienced this effect a few years ago when I had to optimize CPU-intensive code that dealt with queries over in-memory tables. The profiler indicated that most of the time my program was busy duplicating rows of these tables which, usually, were not mutated at all. Switching into an immutable design resulted in a significant performance boost.

Still not convinced? Maybe this will help you. When Josh Bloch discusses API design decisions that create performance problems, he gives as an example Java's Dimension class which is---wait for it---mutable. Specifically, each call to Component.getSize() must allocate a new Dimension object which leads to numerous needless allocations. Further details are given in Josh's excellent lecture, How to Design a Good API & Why it Matters (Immutability of class Dimension is discussed ~ 32 min. into the talk).