Is Your Program Greener or Longer?

Q: Is A cucumber greener or longer?
A: Greener. It is also green sideways.

Most programs can be viewed from several different perspectives. If we take a web-application build around the MVC architecture, we can view the program as an assembly of three subsystems: the model, the view, and the controller. At the same time, we can also see the program as an assembly of pieces of code each one dedicated to a web page.

This dichotomy is not uncommon. In each program one can see several orthogonal decompositions. Although the the decomposition into packages/classes is very dominant, other conceptual decompositions are just as viable.

For the rest of this post I'll concentrate on two decompositions: the architectural decomposition sees the code as a collection of subsystem, such as: data access, user interface, security, persistence. The functional decomposition sees the program as a collection of features (user-visible functionality) such as: login, register, delete, submit, search, ...

We can make the reasonable assumption that in most programs the number of features is far greater than the number of subsystems. Also, to keep things simple I will assume that each feature "touches" each subsystem.

(Although it very difficult to provide an algorithm that identifies all features or all subsystems in a program most programmers can usually recognize, by means of intuition, both the architectural and the functional decompositions of their program).

The big question is this: What's the easiest way to develop a program. Is it feature by feature, or subsystem by subsystem?

This question may seem to be making as much sense as asking if a cucumber is greener or longer. If we have 40 features, 4 subsystems, and every feature contributes 100 lines of code to each subsystem then it does not matter whether our work is split subsystem-wise or feature-wise. Either way total effort will be 40 times 4 times 100 = 16,000.

I think that the effort model employed by the last paragraph is flawed. I believe that the effort required for completing a programming task is more than the effort for completing two smaller tasks that amount to the bigger task. In computer science lingo, this means that:

the effort function is super linear

If f is a super-linear function then f(a+b) > f(a) + f(b). An obvious example is sorting. Sorting an array of 1,000 elements is more work than sorting 10 arrays of 100 elements. I believe that the development effort function is super linear. In fact, it seems to be closer to a quadratic function than to a linear function.

This means that It will take less effort to complete a program if your programming tasks follow features (many small tasks) than subsystems (few large tasks). The project will seem harder if developed according to architectural decomposition.

The reasoning suggested here is not a mathematical proof. It is merely a phenomena I witnessed. Nonetheless, the effort is super-linear thesis provides an explanation for issues such as: The success of TDD; the small-tasks practice of agile methodologies; the effectiveness of REPL tools; and more. So, although we cannot prove this thesis it may very well be one of the axioms of programming.

The Democratic Approach to Programming

Greg Stein published this post (see also here) which links to a 1996 question by one, Larry Page. In the question Mr. Page complains that Java's HTTP client API does not allow him to set the User-Agent field of requests. Although this may be a hoax (conspiracy fans may even say that it was planted by someone from Google who does not like Java), it certainly raises a few thoughts.

It may be seen as a yet another argument in favor of dynamic typing/monkey patching. I'll put this issue aside. I think that the thin interface perspective is just as interesting.

With thin interfaces an object provides only a small set of services. This leads to layering of abstractions: A new abstraction provides a new set of services on top of existing ones. We end up with a hierarchy of abstractions, each one tailored for a different type of usage.

With fat interfaces the API tends to evolve into a single uber-abstraction. As this debate shows this one-size-fits-all approach may be either good or bad, depending on your point view. Larry Page witnessed the negative side. According to his question, the underlying abstraction failed to provide the service he needed. It is just like you are given this big machine to help you do something, but it does not do exactly what you need. You can either disassemble the machine and build it all over again which is a lot of work (after all, it is a big machine) or you can build a new machine from scratch. Either way, it's a lot of work. If you already rely on this machine for other purposes then even replacing it will be expensive. As Jake Blues puts it, you find that "you're really up Shit Creek".

It seems that Shit-Creeks like this are quite rare with thin interfaces. Instead of one abstraction with many services you have many abstraction each providing a small set of services. Pluralism of abstractions promotes modularity because you can assemble them according to your needs. Tweaking is easier since you're not as bounded by contracts because each contract is relatively small.

As a direct result, thin interfaces put you, the client, in charge. This is a more democratic approach to programming. Let's spread the power around.

Source Code Quality Estimation is Hard

As you might have guessed I don't believe in metrics that estimate software quality by measuring source code. In the most general case they are bounded by this paradox. Even if not taken too generally, there are still considerable difficulties related to such metrics.

In order to avoid confusion and misinterpretation let me say that (a) the items on this list reflect current knowledge. Future research may find solutions to the problems therein; (b) I am not claiming here that every metric is bad nor that all reasons hold for all metrics.

1. Source code quality metrics will never be able to detect infinite loops.

In other words, you can get a good score for a program that does stupid things at run time.

2. Lack of normalization

Most metrics are not normalized: they do not have a scale by which scores can be compared. With a non-normalized metric the distance between 15 and 20 may completely different than the distance between 25 and 30. In other words, if 100 is the perfect score, then it may be that getting from 99 to 100 is a million times more difficult than from 98 to 99.

3. Indirectness: metrics do not measure the "real thing"

The real thing is software quality, which is a vague notion . For the sake of the discussion let's define quality as "number of bugs". Source code metrics do not measure number of bugs directly. They measure artifacts which are only remotely related to number of bugs.

Think about a 100-meter dash. Instead of determining the winner by directly measuring the running times of the contestants, you measure the number of practices they had in the last four weeks.

4. Blind spots

Source code metrics are mostly blind to DSLs, to the universal design pattern and a bunch of other useful techniques.

5. Detecting the negative

Many metrics work by by collecting negative points. E.g., the Cyclomatic complexity metric gives you negative points for branches in your code. The problem is that Spotting "bad" things is not equivalent for spotting "good" things.

The first implication is that such a metric can only help you when you are doing something wrong. It does not tell you that your are doing something right. In a way, it leads to mediocrity.

Moreover, let's see what happens when you use a metric that detects the negative and also has blind spots (which, sadly, most of the metrics are so) . Such a metric will encourage you to make the blind spots as bigger as possible. As the blind spot grows, the amount of negative things that can be detected lessens. The metric simply beats its own purpose.

6. Focus on things that can be measured

By definition, metrics ignore things that cannot be measured. Focusing only on the measurable makes sense as much estimating the quality of a novel by calculating the number of sentences per chapter.

Longer is Sometimes Better

The post that I swore never to write

(I never wanted to write this post. In fact, I wanted not to write it. I think it focuses on stylistic aspects which are overrated. I do think that a clear style is important, but I don't think we should be religious about it. In other words, If the method's style gets an "A" I don't see any point in trying to raise its score to "A+". Writing another unit test will give a much higher ROI).

A comment by Martin in response to one of my latest posts said that sometimes longer is better. Since people tend to prefer compactness, I think that it may be worthwhile to mention some points in favor of length. Please bare in mind that I am not saying that longer code is better. Not at all. Shorter should be the norm, but it should not be followed religiously.

Width vs. height

Programmers often attempt to reduce their method's height (number of lines) by consolidating two (or more) lines into one. If this attempt increases the width of the method (number of columns) then they actually make it more difficult to understand the code. This is because we don't read content left to right, line by line. Our eyes movement follows the F-shaped pattern:

  • Users first read in a horizontal movement, usually across the upper part of the content area

  • Next, users move down the page a bit and then read across in a second horizontal movement that typically covers a shorter area than the previous movement

  • Finally, users scan the content's left side in a vertical movement

Here's how it looks like:

(Copyright © 2006 by Jakob Nielsen. ISSN 1548-5552)

This means that we have greater chances of reading things on the left hand side of the screen. We can better understand greater parts if most of the code is located closer to the left.

One may argue that the cited research studied users looking at web-sites rather than programmers looking at source code. Thus, maybe the F-pattern is not relevant WRT programming. This is a valid argument, but it needs to be verified by research. To the best of my knowledge (and I'd be glad to stand corrected), there is no support for this argument in the literature.


You get a bug report from a user. The stacktrace shows a null pointer exception on this line:


You don't know which method call returned null. Was it m1(), m2(), or m3()?
If each call were on a line of its own you'd have much more information. Sometimes this can make a huge difference.

A single-line if

Writing the "else" part in the same line as the if() makes the code shorter but more difficult to debug:

for(int i = 0; i < getTop(); ++i)
if(get(i).equals(b)) return;

Suppose you're interested in the return statement, but since you have a single line if() You cannot place a breakpoint on the return statement. Placing a breakpoint on the if() is not practical if getTop() returns a large value (even 50 is large value in this context).

Another example:

void f(int n, int m) {
int x = g(n);
h(m, x);
if(!isValid(x)) return;

// do something useful

You are trying to find out why f() is not doing what it was supposed to do. So you step through f() and suddenly, surprise!, you find yourself in another method.

After some thinking you realize that you are back in f()'s caller. That's because the last thing you remember is that you were standing on "if(!isValid(x) return;".

Now, here are your thoughts: "Aha! That's exactly what's wrong. I should not exit from f() under these circumstances. Is the problem in g() that computes x's value or in isValid() that checks it? Gee, I don't know. If only I knew what x's value is. Oh boy! I have to start the debugging session all over again"

This problem will be a no-problem if you split the if() across two lines. During debugging you will have a visual indication that the if() succeeded.

Note that this discussion is pertaining not only to if()'s but also to the ternary operator, "?:".

Temporary variables explain the code

There is this refactoring step of Replace temp with query. Each application of this refactoring saves one line - the line where the variable was defined.

Martin Fowler is a very smart guy. He knows that reducing line count is not the only game in town. That's why immediately next to "replace temp with query" he placed Introduce explaining variable. He knows that a variable that is used only once is still useful: it explains the code, and it worth to have it, despite the extra line that it occupies.

BTW, this duality is quite common Fowler's Refactoring book. See, for example, Hide delegate vs. Remove middle man. It seems that Fowler is well aware that every decision has its trade offs.

Capturing return values

So many I found my self debugging a method and not knowing its return value. This happens when the code is written by someone who wants to save a line. Instead of writing

int f() {
// do something
int result = g();
return result;

He writes

int f() {
// do something
return g();

The problem with the latter form is that it hides f()'s return value. If this is the common style, then when you step out of f() you find that f()'s caller is returning f()'s return value directly to *its* caller, which in turn.....

This is so annoying. You have to step out of many methods. When you get to inspect this value, you realize that it is wrong but now you lost the original stack frame. Think what happens if your code was called from library whose source code is out of your reach.

(Although this is a special case of the previous point, it is related to events so annoying that it deserves a section of its own)


It should be noted that other than the first point (the F-shaped pattern) the points made here do not capture some cosmic truth about programming. They depend on the language, on the tools and probably on some cultural factors. You'll have to reconsider them when you switch to a new language/tool/team.

Finally, by all means, don't try to make your code longer. Try to make it shorter. But as you make it shorter, think about these points. They may help you avoid code that is too short.

Formalizing the Paradox

During the debate about static analysis I realized that my precise claim was not well understood. Let me make things clearer by formally proving that static type checking is at odds with quality estimation based on static detection of tangled packages.


d(p) - Static dependency of a program p, measured as the number of tangled packages.

q(p) - Quality metric of a program p (there are almost no restriction about it. Can be number of defects, cost per feature, etc. Can be calculated analytically or measured empirically).

"Equivalent programs" - Two programs that produce the same output for the same input.


[A1] Given two equivalent programs, x and y, then if x's code provides more type information that y's the following holds: q(x) > q(y)

[A2] q(p) and d(p) are inversely correlated


[1] Given a statically typed program, p1, compute q(p1), d(p1).

[2] Obtain a program p2 that upholds the following conditions: p2 is equivalent to p1; p2 has no tangled packages; p2 has less type information that p1. The transformation rules described here ensure the existence of a program p2. However, any p2 that satisfies these conditions is suitable.

[3] p2 has less type information than p1 (see [2]). Based on [A1] we have that q(p1) > q(p2)

[4] p2 has no tangled packages (see [2]) so necessarily d(p1) >= d(p2). Based on [A2] we have that q(p1) <= q(p2)

The conclusions of [3] and [4] form a contradiction: q(p1) > q(p2), q(p1) <= q(p2). Therefore we have that one of our assumptions, either [A1] or [A2] is wrong.

I am tempted to put a Q.E.D here, but that is not the point. I do not think that either static type checking (that is: [A1]) or that static analysis ([A2]) is completely useless. However, this little proof shows that applying them indiscriminately is logically flawed. Universal statements such as "a tangle was formed hence quality eroded" are ill formed without additional reasoning and explanations.

Update 1: Changed the phrasing of [2] to explicitly express the requirements from p2.

Update 2: Removed an unnecessary step.

Software Formulae are Illusive

This post started as part 4 of my "mythbusting" mini-series (part 3, part 2, part 1). After a few iterations of writing I realized it deserves to have its own title.

The mini-series focused on static detection of tangled packages. This post examines a somewhat broader issue. I'll argue that in general we cannot fully understand our programs. Therefore, we find it difficult to formulate recipes for writing high quality programs.

The heart of programming

Software construction is the task of turning some vague description of "something" into real, precise, unambiguous code. In other words, turning the informal into formal is central to our profession.

Informal = any sort of description: from a wild thought running through a bright mind, on to user stories, to use cases, to requirement documents, to whatever.
Formal = An unambiguous sequence of computations

Once you made the transition from informal into formal you have a program. Church's thesis tells us that it does not matter what language you use. In fact, there's no such thing as a formal specification which is not a program: If it is formal (that is: unambiguous) then there is a way to write an interpreter that will run it.

Note that almost-formal is actually informal. If there's even a single step left undefined, then the behavior of the whole program is undefined. You cannot predict what the computer will do when it tries to execute this undefined step. In principle, this is similar to the well known rule "if your premise is false, you can prove any conclusion": If I assume that zero equals one I can then prove that my cat is a windshield wiper.

Side note: sometimes one can analyze the formal parts and prove that the informal parts are meaningless. If you manage, for example, to show that the computer will never reach these undefined steps, then you may still reason about the well defined ones. Sadly, there is no general way to do it. Each time you'll have to devise a proof that suits the problem at hand.


What meaningful things can we say about a software project?

If one tries to reason about the informal side of the project (the vague description), there is almost nothing to hang on to. The informal description is just too fuzzy. It can be interpreted any way you want. It can mean anything and nothing at the same time. In a way, it is too promiscuous. So, you can say a lot of things, but not meaningful ones.

On the other hand, if you examine the formal side (i.e: the program) there is very little you can say. This is not just a statement. This is a real, rock solid, proof (Turing, Rice). You don't know if a program halts. you don't know if it is equivalent to some other program. You don't know if it produces a certain output.

Let me put it this way: thanks to Turing, we have a mathematically-sound proof which shows that we can have very few mathematically-sound proofs about programs. Almost like a catch-22.

How come reasoning about programs is so hard? Even without getting into the details of Turing's proof I can say that the proof relies on the fact that a program can be thought of as a machine that delivers some useful service. This service may be another program. A quick example: the JVM is a program that runs a Java program. So, in some way, we have a machine that may contain *a similar machine*. I don't think that such machines exist outside the world of programming. This amazing property is due to the fact that every programming language is expressive enough to implement every other language. This property lies at the heart of Turing's proof.

What are the devices that make programming languages so powerful? Surprisingly it does not take much to become a fully fledged programming language. You need: (a) an if()-like construct so that you can make decisions; (b) capacity to compute new values from old ones; (c) Loops and/or recursive functions; (d) dynamically allocated memory and/or dynamically created functions (a-la closures).

Summarizing the issue of reasoning we see that there very are few intelligent things one can say about a software project. The informal description is too promiscuous. The formal leaves very little reasoning space. Either way, you loose. There's also no middle ground: an almost formal is still informal. Bugger!

I am a programmer. How is this discussion related to my day job?

Here's how. Every now and then someone is trying to sell you some formula with the claim that it will make your programs better. In its simplest form the formula is expressed as an imperative: "avoid classes with more than (say) five fields" (which doesn't make much sense but I've seen it once), but it can also take other shapes and forms. Here is a more delicate paraphrasing of the same formula: "minimize your program's score on the FPC (Fields-Per-Class) metric". Sometime the formula is embodied within a software engineering tool that scans the code and detects "problems" of many sorts.

Whenever I hear about a formula I try to confront it with the things that make it so hard to reason about programs: (a) a program can contain other programs; (b) the same thing can be expressed in several different ways. Specifically, I take the idiom (or pattern or construct) whose usage is discouraged by the formula. and I try to find ways to emulate this idiom.

To make things concrete, let's go back to our "avoid classes with more than five fields" formula. How can we emulate fields without defining fields? Easy. Define a single field which holds a property list. This will minimize the number of fields, where at the same time will allow the object to hold as many fields as you want.

Having found this weakness we can devise a precondition to guard against it: "The formula is viable only with classes that do not rely on property lists". This is the simplest precondition I could think of. Alas, property lists are quite useful, so this formula may be much less universal than it seems. Clearly, someone else may come up with a less restrictive precondition. I found that it is not that easy.

Just for plain fun, let's quickly go over one more formula. If you need to restrict the number of classes you can use a single class with several closures. Thus, you can configure the behavior of individual instances just as if they came from different classes.

Final thoughts

One may argue that these counter examples are benign since they will only occur if a "clever" programmer attempts to game the system. I disagree. Once a team accepts a formula, the formula will affect the code that the team produces. The programmers will try to satisfy the formula, but not because they want to game the system. On the contrary: they want to produce the best software in the world. The formula provides them with the notion of "good", so they will try to satisfy it out of good intentions (see what Joel says about arbitrary incentives). As always, it all boils down to the lack of an absolute-, measurable-, objective- notion of software quality. Sadly, every single formula that I saw was very far from the real thing.

I think we humans ought to be humble. our ability to reason about software projects is limited. As a direct result, our ability to devise software development formulae is also limited.

Finally, there is some cute irony here. The things that make it impossible for a compiler to determine whether our program halts (or: produces the desired output, etc.) are also the things that make it difficult for humans to devise general software formulae. This similarity is not coincidental: reasoning about programs is provably difficult. You'll see its implications in all sorts of places.

Update 1: Added the "Final thoughts" section

Mythbusting - Part 4

This post has moved to here. My apology for the inconvenience.

Blogging Issues

This is a short, self-reflective, post.

Item 1: I started using Feedburner last week It seems that the pinging mechanism that should make sure Google Reader is synchronized with my Feedburner-manged RSS feed does not work very well, to say the least. I therefore turned back on this blogspot's native feed. I now manually ping Feedburner (OK, I will have a greasemonkey do it for me) whenever new content is published.

Item 2: How come November was such a productive months (in terms of number of posts)?
It's kind of a long story but I can condense it into five simple steps:
1. I installed the Google Reader Watcher Firefox add-on. This made me more aware to things taking place at the blogging scene.
2. I delved into Steve Yegge's blog. His posts are always thought provoking.
3. I dicovered that reading Yegge's blog motivated me to sit down and publish all these half-completed posts that were waiting (as "sitting drafts") for many months.
4. As a direct result I started writing. After the fourth post things became easier.
5. More content resulted in more visitor comments. This, in turn, motivated me to write even more.

That's basically it.

Mythbusting - Part 3 (The Inherent Paradox)

This is the third part in this mini series whose goal is to show why static dependency analysis is largely useless.

Part 1 primarily used reasoning and essay-like arguments.
Part 2 provided hard data --- that is: actual dependency graphs --- showing that tangled packages (let alone tangled classes) are common in very successful libraries and tools.

This part will probably be much more entertaining: It shows the logical flaw underlying static dependency analysis.

So, here is a Java program with two classes. The code itself is silly. I just wanted to have a program with cyclic dependency that is doing something (no matter how silly).

package p1;
import p2.B;

public class A {
private B b;

public void setB(B b) { this.b = b; }
public void f(String s) {

package p2;
import p1.A;

public class B {
private A a;

public void setA(A a) { this.a = a; }
public void g(String s) {
if(s.length() != 0)

As expected, the dependency graph of packages p1, p2 is a cycle:

Now, let's rewrite this program by applying the following rules:
  1. All variables, parameters, fields, return types will have the type "Object"
  2. Every method invocation such as x.m(y,z) will be rewritten as:, "m", y, z). The Util class itself is shown in the fragment below.

(Reservation: I've intentionally left out rules for the handling of overloading, static members, and object creation as these are not needed for the example shown here)

Here is how the program looks like (to avoid confusion I now use packages pp1, pp2 rather than p1, p2).

package pp1;

import pp2.B;
import static;

public class A {
private Object b;

public void setB(Object b) { this.b = b; }
public void f(Object s) {
call(b, "g", s);

package pp2;
import static;

public class B {
private Object a;

public void setA(Object a) { this.a = a; }
public void g(Object s) {
if(!call(s, "length").equals(0))
call(a, "f", call(s, "substring", 1));

Of course, we also need to write the static method (in package pp0). Here it is:

package pp0;

import java.lang.reflect.Method;

public class Util {
public static Object call(Object r, String s,
Object... args) {
for(Method m : r.getClass().getMethods()) {

if(m.getParameterTypes().length != args.length)

try {
return m.invoke(r, args);
} catch(Exception e) {
throw new RuntimeException(e);
return null;

We can now take a look at this dependency graph of pp0, pp1, pp2:

(Irony mode on) Wow! No tangles! This technique is awesome. No matter how big and tangled your program is, you can apply these two very simple transformation rules and you will get this nice-, loosely coupled-, code, where each class depends only on Amazing! (Irony mode off)

Do you see the paradox? Tools that detect cyclic-dependencies are using type information from your source code to help you find highly coupled modules. However, these tools encourage you to eradicate the information on which they rely. In a sense, they want you to make them blind. This makes no sense.

Clearly, the transformation rules that we applied did not really eliminate coupling. They just swept it under the rug. The coupling is still there, waiting for run-time to raise its head. If one claims that cyclic dependency analysis is not valid *after* the transformation, due to inability to determine the run-time dependencies, one should also acknowledge that cyclic dependency analysis was not valid *before* the transformation due to the same reason: inability to determine run-time dependencies.

Is there a morale here? I guess there is. Think about dynamically typed languages:

  • If you believe in static analysis of dependencies you should start developing in a dynamically typed language and you will immediately boost your static analysis score through the roof: no cycles, no coupling of packages/classes/methods.

  • (otherwise) you don't believe in static analysis. You know that the static picture is only a distorted reflection of the dynamic behavior. Hence, you know that you cannot trust a tool that tells you that your program is OK. If this is what you think, then you are already writing in Smalltalk, Self, Javascript or Ruby.

Update 1: Added "(Turning irony on)"

Update 2: Added the reservation

Mythbusting - Part 2 (in Pictures)

Preface: If a certain property is common, in a statistically significant manner, in successful projects, then it is quite natural to assume that this property is a positive trait.
The data presented here shows that tangled packages/classes are quite common. This suggests that the belief that tangling is a *negative* property, is in odds with reality (Of course, this is not a scientific study).

In the Mythbusting post I said that tangles of packages are benign. I based this claim on the fact that many successful libraries contain such tangles. In particular, I said that 20 out of 23 libraries that are bundled with the Spring framework, do contain tangles. Here are the actual graphs of five libraries/tools which you might be acquainted with.

These images were produced by Structure101.

Update 1: Added the preface (as per vinod's suggestion)

Mythbusting - Part 1

(Disclaimer: as always, the issue of reasoning about software quality is largely a matter of personal judgment)

Yes, I really like Adam and Jamie. This post is about myths that you won't see on their TV show.

Actually, this post is motivated by Ian Sutton's post titled: "Software erosion in pictures - Findbugs". That piece shows the package dependency graphs of several releases of Findbugs. The graphs were obtained by Structure101, a static analyzer that reads class files. The conclusion there is that Findbugs is eroding since the number of cycles (tangles) in the graph, as well as the size of these cycles, grows over time.

Ian's premise is that tangled packages are indicative of poor code quality.

I think this premise is based on myths that needs to be carefully examined. So this is what I'll do here.

Myth #1: Tangled packages cannot be tested separately

The counter argument is this: When the packages were not tangled were they tested separately? It is likely that the answer to this question is "No". The lack of compile-time dependency does not mean lack of run-time dependency. Maybe the two packages were already implicitly related (let's say through data passed as a String), and we just made this relation explicit (through a dedicated class).

Myth #2: Tangled packages cannot be deployed separately

This is not true in Java (and in many other modern languages) since we have dynamic loading. And again, due to hidden run-time dependencies, it may very well be the case that you will not be able to separately deploy packages even if they were *not* statically tangled.

Myth #3: Non tangled packages can be modified without affecting each other

Assume you pass an Iterator, implemented by class from package A to some code from package B. The packages are not statically related (they depend on Iterator but not on each other). Still, they must agree on certain things: Can this iterator's next() method return null? What is the semantics of remove()? If one side breaks the contract, the other side will bleed.

Myth #4: If I eliminated a package level tangle, I solved some dependency issues

Not necessarily. You might have just consolidated the participating packages. So the dependency is still there, this time at the class level.

Myth #5: A similar analysis is also valuable at the class level

This is usually not true. The *full* dependency graph of many well written classes is so cluttered that it is practically useless. Most visualization tools workaround this by (a) showing (by default) dependencies only from the same package; (b) ignoring library classes. When you choose the non-default view you see how intricate the network of classes is. This happens with almost every program you examine regardless of what you think of its quality. Additionally, class level analysis is also ignorant of run-time dependencies.

There a few more myths, but I think we can stop now. However, there is one more thing that I wanted to talk about: After reading Ian's post I did some field research: I took all the jar files that come along with Spring (including spring.jar itself) and fed them into Structure101, just like Ian. All in all: 23 jar files. Do you know how many of these *didn't* have tangled packages?

The answer is: 3.

Let me repeat that: 20 libraries had their packages tangled.

So, if you want your library to be used by Spring you should make the packages tangled. That's what the numbers are telling us. As long as you keep writing test cases, you have no reason to be afraid of tangles: the developers of JUnit, Dom4J, Log4J, Ant, Antlr (partial list) are happily living with tangles.

Shorter is Better?

It is widely believed that compactness is a desirable quality of code. Sadly, I often see people claiming that compactness is an absolute metric, in the sense that shorter code is always superior to longer code.

Leaving aside the issue of how length should be measured (character count? line count? token count? executable size?), here are two different Javascript functions that fulfill the same task: translate an integer in the range 0..13 to roman numerals.

function toRoman1(n) {
return n == 0 ? "" :
n % 5 == 4 ? "I" + toRoman1(n + 1) : n >= 5 ?
84 + Math.floor(n / 5)*2)) +
toRoman1(n % 5) : "I" + toRoman1(n - 1)

function toRoman2(n) {

if(n < 4 && n > 0)
return "I" + toRoman2(n - 1);

if(n == 4 || n == 9)
return "I" + toRoman2(n + 1);

if(n >= 10)
return "X" + toRoman2(n - 10);

if(n >= 5)
return "V" + toRoman2(n - 5);

return "";

The former function, toRoman1(), is much shorter than the latter. Moreover, its cyclomatic complexity is lower (since it has fewer branches) suggesting that it is simpler to understand. Still, for most programmers, toRoman1() is more difficult to read, to write, and to debug.

My advice is to go easy with this "shorter is always better" claim. Just like many other things in programming, the right degree of compactness is also a matter of striking a balance. Or, as Allen Holub puts it: "Good and bad are not absolutes".

Update 1: Fixed the no-return-value issue spotted by Yardena.

Update 2: Added the cyclomatic complexity point.

Is it a bug? Is it a feature?

In his latest post Jeff Atwood argues that there should be no distinction between a bug and a feature request. I couldn't agree more. Both bugs and feature requests are actually requests to change the behavior of the program.

I guess this distinction is based on the assumption that the implications of fixing a bug tend to be "local" (i.e.: affecting smaller parts of the program) where-as those of implementing a feature tend to be "global". Thus, implementing a feature is more likely to introduce new bugs, so it makes sense to postpone it until most current bugs are solved.

I disagree with this thesis.

First, I saw many bug fixes that resulted in new bugs being introduced, thereby contradicting the "locality" assumption.

Second, in a test-infected program (that is: a program with a strong test suite), the chances of unwanted side effects being introduced due to a feature implementation are practically zero: you write the code, you run the tests, and if the bar is green then everything is likely to be fine.

One final point: It seems that this distinction, along with several other (mis)conceptions came from the Waterfall approach: With development and debugging being two separate phases it is only natural to classify the items on the project's to-do list as being either a feature or a bug.

On Syntax and Religion

Ola Bini published a piece showing how the introduction of a special variable, the "it" variable, into the Ioke language makes regular-expression matching less verbose. In short, the it-variable captures the result of the expression examined by the enclosing "if" or "unless" decision. Thus, you can write (pseudo-code):
if(/[a-z]*/.matches("ab")) {
println(it.matchLength()); }

instead of
var x = /[a-z]*/.matches("ab");
if(x != null) { println(x.matchLength()); }

Personally, I tend to agree with Ola regarding the virtue of this mechanism.

The comments made by readers show how subjective - not to say: biased - the whole issue of "syntactic elegance" is. As of Tuesday evening (GMT), here is a short summary of the various opinions.

Marcelo prefers a Smalltalk-like style:
/[a-z]*/.ifMatches("ab", res,
{ println(res.matchLength()); },
{ println("no match"); })

Stefano wants the variable declaration to be explicit:
if(x = /[a-z]*/.matches("ab")) {
println(x.matchLength()); }

Paul advocates the introduction of "=>" as a pattern matching operator:
/[a-z]*/ => x

Antares Trader argues for a closure with a parameter:
if(/[a-z]*/.matches("ab")) { |x|
println(x.matchLength()); }

Johannes says that Option types neatly express the idiom in question.

I think that local optimizations of the syntax (in contrast to global aspects such as: typing discipline, programming paradigm, object model, standard library) are largely a waste of time. Will programmers not use the language because they cannot change the name of the "it" variable? Even fifty (local) improvements to the syntax, will not make a difference. The exact notation chosen for this construct will have nearly no impact on Ioke's future success.

Therefore, my rule of thumb is this: It is worth while to debate over a local syntactic issue only if you have knock-out argument. If you are only going to win by points, don't bother.


Lately I've been using the acronym C.R.U.P as a shorthand for "Code that is RedUndant in the Program". In this post I will explore some of the negative implications of crup. In order to reduce verbosity I will use the word "crup" instead of the acronym C.R.U.P. (Too bad the word crup has no meaning. Really wish it did).

Program development can be thought of as a process where you repeatedly add new pieces of functionality. The implementation of each such piece may depend on existing functionality but may also require the introduction of new functionality, which in turn spawns additional functionality pieces. This model can be naturally represented as a tree where child nodes represent pieces of functionality that were introduced as part of the implementation of the parent. Note that a node can be method, a class, a collaboration of classes, a module, whatever.

Redundant functionality (that is: redundant nodes, crup) is a node that provides more services that its parent expects. In other words, you can replace the subtree rooted at this node with a smaller tree and the program as a whole will still deliver the behavior it was expected to deliver.

programmers often introduce crup into the code-base, thinking that a certain piece of functionality will be needed in the future, and that it is less effort to introduce it sooner than later. This resonates with the famous YAGNI debate: The pro-YAGNI people say that crup is really bad. The pro-crup people say that by predicting future behavior of a program you can ensure your program's structure is well optimized for the long run, and that no post-facto refactoring will be able to match that.

So, let's throw some numbers into the picture: each node has 5 children. 1 in 5 children is a crup node. The tree has 4 levels (that is: a root node, five children, 25 grandchildren, 125 grand-grand children). Total number of nodes is therefore: 1+5+25+125=156.

How much crup do we have in the program? Let's see. The root is obviously not redundant. However, one of its children (level-2) is redundant along with its whole subtree. That's 1+5+25 = 31.

The other four nodes at level-2 are not crup, but each of them is a proud parent of a level-3 crup node: 4*(1+5) = 24. We are left with sixteen non-crup level-3 nodes. These nodes have eighty children, of which one fifth is crup: 16.

So in this program the total crup factor is: (31 + 24 + 16) / 156 = 71 / 156 = 45%. In other words, almost half of the effort went into redundant functionality. That's a lot of extra work. The programmer must posses good future-prediction skills to make this extra work pay off.

The crup factor rises exponentially with the height of tree. If the tree had 5 levels, our crup factor would rise to 56%.

This little post is by no means a scientific study. There are a lot of other factors to consider if one wants to make a scientific claim regarding crup, which is kind of ironic since empirical study of real life software projects/software development is a thorny issue. Here is one factor that mitigates the negative effects of crup: if a prediction regarding a crup node, which is close to the root, turns out to be correct, then the vast majority of the cruppy nodes in the program will stop being crup.

On the other hand, other factors compound the crup effect. For instance, Maintenance and developments costs are probably super-linear with respect to program size (e.g.: program size doubles the but costs quadruple). Thus, Even a small amount of residual crup has a significant negative impact on the program's complexity.

As I said, I am not trying to make a scientific claim here. I just wanted to make the intuition behind YAGNI a bit more concrete. The aforementioned example, with its 45% crup factor, suggests that you could be much more productive if you just cut the crap.

Structural Subtyping == Reduced Compile-Time Coupling

Although I've been wrapping my mind around structural subtyping in Java for quite some time now, there is one argument in favor of structural typing that didn't get the attention it deserves: it is much easier to develop code against an API that relies on structural types due to reduced compile-time dependencies.

As always I'll illustrate my point with an example. We'll start with the standard (non-structural) approach.

(Side note: In the very first lecture at the Programming-Languages class I took, my teacher, Prof. Ron Pinter, made the point that a programming language is an excellent device for formally expressing ideas: "Instead of battling a natural language hopelessly trying to get the precise meaning, just write the thing in a PL and your intention will be clear". I think this is a very good advice)

Suppose someone has finally written the huge algorithm that realizes the overwhelmingly complicated logic of the bridge-keeper from Monty Python and the Holly Grail. The algorithm is encapsulated within the Bridgekeeper class whose protocol relies on the Knight interface.

public interface Knight {
public String answerMe(String question);

class Birdgekeeper {
public boolean canCross(Knight k) { ... }

My friend Graham developed an AbstractKnight class which provides some services common to all knights:

public abstract class AbstractKnight
implements Knight {
private final String name;

protected AbstractKnight(String name_) {
name = name_;

protected abstract String nonNameQuestion(String q);

public String answerMe(String question) {
if("What is your name?".equals(question))
return name;
String answer = nonNameQuestion(question);
if(answer == null)
answer = "I don't know that";
return answer;

If I want to develop a concrete subclass of AbstractKnight (Lancelot or Robin come to mind), Graham needs to send me two jars (or two source file packages):
  1. Graham.jar - Providing the definition of the AbstractKnight class
  2. Bridgekeeper.jar - Providing the definition of the Knight interface

Yes. Without Bridgekeeper.jar I will not be able to compile my Lancelot class, despite the fact that Lancelot is only indirectly coupled with Knight. When you compile a class the compiler requires all its ancestors to be present.

Graham is therefore forced to distribute Bridgekeeper.jar to allow other programmers to subclass his AbstractKnight class. This often leads to proliferation of download-able packages (with/without dependencies) which is a constant headache for open source developers. Of course, if you choose to download the full package (the one that does include the dependencies) you are then risking having several different version of the same library on your machine, which - again - leads to headaches.

With structural subtyping things are much simpler:

// library: Bridgekeeper.jar
public struct Knight {
public String answerMe(String question);

... // everything else remains the same

// library: Graham.jar
public abstract class AbstractKnight {
// "implements Knight" removed

... // everything else remains the same


With this approach, AbstractKnight has no compile-time dependency on Knight. Thus, you can compile your Lancelot class even if Bridgekeeper.jar is not present on your classpath:

public class Lancelot extends AbstractKnight {

public Lancelot() {
super("Sir Lancelot of Camelot");

protected String nonNameQuestion(String q) {
if("What is your quest?".equals(q))
return "To seek the Holy Grail";
if("What is your favorite color?".equals(q))
return "Blue";
return null;

Update 1: OK. I couldn't resist the temptation. Here it is, in its full glory, scene 35:

Bulletprooof - II

Following my previous post, here is another example for bulletproofing your API (again, I am not saying that you should, just wanted to provide a few concrete examples).

Simple (non bulletproof) version:

public class Resource {
public enum Mode { READ, WRITE }
public Resource(String fileName) { ... }
public void open(Mode m, boolean autoFlush) { ... }
public char read() { ... }
public void write(char c) { ... }

Legal usage:

Resource res = ... ;, false);
char c =;

Illegal usage:

Resource res = ... ;, false);
res.write('a'); // Runtime error:
// Resource was opened for READ

The bulletproofing technique remains the same: introduce new classes to make it possible for the compiler to distinguish between the different variants.

public class Readable {
public char read() { ... }

public class Writable {
public void write(char c) { ... }

public class Resource {
public Resource(String fileName) { ... }
public Readable openRead(boolean autoFlush) { ... }
public Writable openWrite(boolean autoFlush) { ... }


This post waited a long-long time to be written. The motivation to finally do it came from a discussion I had with Andreas Brink at OOPSLA'08. We both felt that TDD made us more productive. In the pre-TDD era we were always trying to make our API bullet-proof, i.e.: that it will be really difficult to write code that will fail at run time. With TDD this is not such a big concern. You write simpler code (which is much faster than writing bulletproof code) and you rely on the tests suite to keep from you doing mistakes.

I don't want to get into the can-tests-really-prevent-bugs debate. My intention here is just to show what I mean by a bulletproof design. Here is a simple (non-bulletproof) Flight class:

public class Flight {
private final int numSeats;
private final Set<Traveler> travelers
= new HashSet<Traveler>();

public Flight(int numSeats_) {
numSeats = numSeats_;

public void book(Traveler t) {
if(travelers.size() >= numSeats)
throw new RuntimeException("Fully booked");

public void unbook(Traveler t) {
throw new IllegalArgumentException("t=" + t);

public class Traveler {

The predicament lies in the Flight.unbook() method. If we pass the wrong argument, this method will fail:

Filght f1 = ...;
Filght f2 = ...;
Traveler t = ...;;
f2.unbook(t); // Illegal argument error!

And now, let's see the bulletproof approach. The trick is to use static type checking - sometimes in conjunction with encapsulation - to make it impossible to issue an incorrect unbook() request:

public class Flight {

private final int numSeats;
private final Set<Traveler> travelers
= new HashSet<Traveler>();

public Flight(int numSeats_) {
numSeats = numSeats_;

public Traveler book(Traveler t) {
if(travelers.size() >= numSeats)
throw new RuntimeException("Fully booked");
return new Reservation(t);

public class Reservation {
public final Traveler traveler;
public Reservation(Traveler t) {
traveler = t;

public void unbook() {

With this design unbook requests (now issued through a Reservation object) cannot fail:

Filght f1 = ...;
Filght f2 = ...;
Traveler t = ...;
Flight.Reservation r =;
r.unbook(); // Always safe!

Three Kinds of Bugs

I tend to classify bugs into one of these three groups:
  • Laziness
  • Lack of Creativity
  • Betrayal

The running example that I'll use here is a method that finds the largest value in a list of integers.


public static Integer maxA(List<integer> list) {
int result = Integer.MAX_VALUE;
for(int i = 0; i < result)
if(result < list.get(i))
result = list.get(i);
return result;

Do you see the problem? result's initial value is completely wrong. Such bugs often happen when the programmer copies the code of a similar method but neglects to make all the neccssary adjustments.

Why is this group called "laziness"? That's because this method will not pass even the simplest test: assertEquals(new Integer(30), maxA(Arrays.asList(10, 20, 30))); In other words, if you weren't lazy you could have caught the bug with the very first test you threw at it. Actually, you don't even need a test case: just run it once on any piece of input and you'll find the bug. BTW, this "run it once" practice shows why read-eval-print shells are useful. Too bad they are common only in the arena of dynamically typed languages.

Lack of Creativity

public static Integer maxB(List<integer> list) {
int result = 0;
for(int i = 0; i < result)
if(result < list.get(i))
result = list.get(i);
return result;

maxB() is clearly better than maxA(). It will not fail at the very first attempt. In fact, as long as your input contains only positive integers this method will seem fine. You need a certain degree of creativity to devise an input that will manifest the bug, such as: assertEquals(new Integer(-4), maxB(Arrays.asList(-4)));


public static Integer maxC(List<integer> list) {
int result = list.get(0);
for(int i = 1; i < result)
if(result < list.get(i))
result = list.get(i);
return result;

This version works quite well. It will pass both of the aforementioned tests. The bug here is much more subtle. With Betrayal bugs the root of the problem is a disagreement between two methods.

Such a disagreement may occur if the method that builds the list (which is then fed into maxC()) will also generate empty lists. This situation contradicts maxC()'s assumption that the list is non empty. In a sense, maxC() is betrayed by its own program.

It is very tempting to say something like: "Well, maxC() has a clear precondition. This precondition is not met, so the bug is in the caller". However tempting, this view is overly simplistic. Suppose the program method that the code that calls maxC(), looks like this:

public class Driver {
public int n;

public List<Integer> listBuilder() {
List<Integer> newList = new ArrayList<Integer>
for(int i = 0; i < n; ++i)
return newList;

public Integer m() {
return maxC(listBuilder());

The contract of listBuilder() says that it should return a list with the integers 0..n-1 (inclusive). In particular, if n is zero the returned list should be empty. The contract of m() says that it should apply maxC() on the list created by listBuilder(). The contract of maxC() says that it should return the highest number from a non empty list.

So, every method is doing exactly what it is supposed to be doing, and yet the net effect is a runtime error. Despite the fact that maxC() is the one being betrayed we may decide that the correct way to fix the program is to change maxC() rather than listBuilder(). This is typically the case when we realize that our program needs to handle a wider range of inputs.

This type of bug cannot be detected by a testing maxC() in isolation. We need a higher level test, to expose the bug.

One final comment: Earlier I defined m()'s contract to be: "inbvoke maxC() on whetever listBuilder() returns" . I guess that some people may claim that this is not a valid contract for a method since it specifies implementation details rather than taking a black-box approach for specifying m()'s behavior. I am sorry to disappoint you guys, but such an approach will only work with simple programs. If your program is doing complicated computations you will not be able to express the contract (purely) in terms of input- vs. output- data. The input data will undergo such complicated transformations that you will have to use some form of abstraction, which is exactly what I did. Given that abstraction is the process of ignoring some details, there is an inherent lose of precision when one tries to formulate the contracts of elaborate computations.

Reuse: A goal or a by-product?

I found this gem in the comp.object mailing list:
"Re-use is a happy side-effect of good designs. The primary goal is managing dependencies between modules, by making them more pluggable"

I like this statement because it captures an important insight that is too often overlooked due to us following common conceptions without questioning them. Reuse is not a goal. Reuse is a trait of source code that usually goes hand in hand with good programs. This does not mean that a high degree of reuse is a must for achieving high quality code.

If your program's complexity is well controlled then you're in a good position regardless of the amount of duplicated code therein. Conversely, having no duplications does not mean that you successfully tamed complexity. One reason for that is that highly-reused modules are also highly-coupled: even a tiny a modification in a (highly reused) module is likely to break a good fraction of its many clients. This is sheer mathematics: the more clients you have, the greater the chances that you break something.

Degradation of locality is also a risk: when you reuse (for instance by means of inheritance), the various pieces making up your class are actually defined outside the scope the class. This will make it difficult to understand what the class is doing because it's actual definition is scattered across several locations.

These ideas are not new. They were already expressed by Riachard Gabriel in his book "Patterns of Software".

Gabriel coined the term "compression" to describe reuse within a program. In the very first chapter, "Reuse vs. Compression", he says:
"Compression is a little dangerous because it requires the programmer to understand a fair amount about the context from which compressed code will take its meaning.

...The primary feature for easy maintenance is locality: Locality is that characteristic of source code that enables a programmer to understand that source by looking at only a small portion of it. Compressed code doesn’t have this property, unless you are using a very fancy programming environment."

Java Scripting

Last week I devoted the time for going through Steve Yegge's talk on Rhino (Rhino = " open-source implementation of JavaScript written entirely in Java"). In particular, I liked the idea that thanks to the scripting support of Java6, Javascript is "...finally becoming the scripting language for Java".

Naturally, I wanted to see how easy it is to make Java scriptable. So I sat down and played with the Java scripting engine and my answer is this: very easy!

package rhinodemo;

import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;

public class Factorial {
private final int n;
private int result = 1;

public Factorial(int n) {
this.n = n;

public int getResult() {
return result;

public Factorial compute() {
if(n >= 1) {
Factorial that = new Factorial(n-1);
result = n * that.getResult();

return this;

public static void main(String[] args)
throws Exception {
ScriptEngineManager mgr
= new ScriptEngineManager();
ScriptEngine engine
= mgr.getEngineByName("JavaScript");

"importClass(Packages.rhinodemo.Factorial);" +
"importClass(Packages.junit.framework.Assert);" +
"" +
"Assert.assertEquals(24, " +
" new Factorial(4).compute().getResult());" +

What we have here is a Java class, Factorial, that computes n!. The protocol of this class is deliberately cumbersome: I wanted to have a real class with a real constructor and two meaningful methods, and not just one static method. The main() method executes Javascript code that tests the Factorial class using JUnit's Assert class (you need to have junit.jar on your classpath).

Even this short program demonstrates a few key points:
  • Accessing Java libraries from Javascript
  • Passing Javascript objects to Java code
  • Creating instances of Java classes from within Javascript

An Arithemetic Expressions Solver in 64 Lines

A few months ago Martin Fowler published a post titled Parser Fear. As always (well, almost always) I found myself agreeing with his main point: people are often intimidated by parsers.

Fowler's post advocates the use parser generators (in particular, ANTLR). I'd like to go a little further and make the following points:
  • Writing a recursive decent parser is much easier than you think.
  • You don't need a parser generator to overcome your parser fear.
  • Get over the trauma induced by the compiler class you took.

Actually I only need to show a good argument for point #1. The other two points naturally follow.
I will make my case by using the well established technique of proof by example: A 64 lines Java class that evaluates arithmetic expressions supporting +, -, *, /, negation, and parenthesis.

public class CompactSolver {
private String input;

public CompactSolver(String input_) {
input = input_.trim();

private char peek(int offset) {
return offset >= input.length() ? '\0' :

private boolean consumeIf(char c) {
if (peek(0) != c)
return false;
return true;

private String consume(int numChars) {
if (numChars == 0)
throw new RuntimeException("syntax error");
String result = input.substring(0, numChars);
input = input.substring(numChars).trim();
return result;

public double eval() {
double lhs = mul();
if (consumeIf('+'))
return lhs + eval();
else if (consumeIf('-'))
return lhs - eval();
return lhs;

private double mul() {
double lhs = unary();
if (consumeIf('*'))
return lhs * mul();
else if (consumeIf('/'))
return lhs / mul();
return lhs;

private double unary() {
if (consumeIf('-'))
return -unary();

if (consumeIf('(')) {
double result = eval();
if (!consumeIf(')'))
throw new RuntimeException("Missing ')'");
return result;

return literal();

private double literal() {
for (int i = 0; true; ++i)
if (!Character.isDigit(peek(i)))
return Integer.parseInt(consume(i));

Update 1: Changed the precedence of the (unary) negation operator (Many thanks to Orlando).

Examining Command-Query-Separation

I have already written at least once on the command-query-separation (CQS) principle. Today, I would like to show a class which benefits from NOT following this principle.

So, here is a snippet from a class that parses and evaluates arithmetic expressions.

public class Solver1 {
private String input;

public Solver1(String input_) {
input = input_.trim();

private boolean consumeIf(String s) {
if (!input.startsWith(s))
return false;
input = input.substring(1).trim();
return true;

public double eval() {
double lhs = multiplication();
if (consumeIf("+"))
return lhs + eval();
else if (consumeIf("-"))
return lhs - eval();
return lhs;

private double multiplication() {
double lhs = unary();
if (consumeIf("*"))
return lhs * multiplication();
else if (consumeIf("/"))
return lhs / multiplication();
return lhs;

Looking at consumeIf() we note that it violates the CQS principle. This method is a query (it computes and returns a result - either true of false) which is also a command (it changes the value of a field). Let's see the looks of the code after decomposing consumeIf() into two methods: a query and a command:

public class Solver2 {
private String input;

public Solver2(String input_) {
input = input_.trim();

private boolean nextIs(String s) {
return input.startsWith(s);

private void move(int numChars) {
input = input.substring(numChars).trim();

public double eval() {
double lhs = multiplication();
if (nextIs("+")) {
return lhs + eval();
else if (nextIs("-")) {
return lhs - eval();
return lhs;

private double multiplication() {
double lhs = unary();
if (nextIs("*")) {
return lhs * multiplication();
else if (nextIs("/")) {
return lhs / multiplication();
return lhs;

There is a 25% increase in code size, 42 vs. 33 lines of code, but I will put this aside because I think that #LOC is not a very good estimate of code quality. (Actually, I don't believe in any code quality metric)

Nonetheless there is a another issue which is much more significant: in the second program we have this recurring idiom of calling nextIs() and then calling move(), as in: if (nextIs("/")) { move(1); ... }. These two calls are mutually dependent: the parameter passed to move() must be the length of the string passed to nextIs(). This idiom is error-prone as the programmer may accidentally change one of the parameters while leaving the other intact.

This shows that arbitrarily applying the CQS principle may yield bad results. A good programmer should be aware of that and should make judgment calls rather than blindly following principles. In fact, this happens a lot with many other programming principles. They are usually much closer to guidelines (and therefore require judgment) than to absolute rules. I think that Lars summarized it pretty well: "I generally dislike '-isms' because they set up rules that people try to apply without understanding the preconditions that need to be met."

Evaluating a Language

I decided to try Ruby. Don't get me wrong. I had already written a few things in Ruby. But today I was a bit more serious about it.

My motivation was to get some more insight regarding the never-ending static vs. dynamic typing debate. So, this morning I wrote a Java program (the standard arithmetic expressions parser/evaluator) and then I tried to write it in Ruby. Unlike previous occasions I actually took the time and searched the web for a real Ruby IDE. After a few disappointments (#1, #2 and #3 - it was on my Windows machine, maybe they work better w/ Linux) I settled on NetBeans.

My Ruby skills felt somewhat rusty so I decided to warm-up by TDDing an easy kata. After the third or fourth test case I noticed a very funny thing: Ruby was much easier than I remembered. Suddenly I didn't feel so rusty. Thinking about it I realized that this was not due to the language but rather due to the IDE. In all past attempts I was too lazy to install an IDE so my Ruby experience was always through a text editor and a command prompt. Once I had an IDE at my disposal (and I mean a serious IDE: completion, go to declaration, templates, the works) the learning curve has flattened.

For example, I wasn't falling back to Java's curly braces as much as I used to in the past, because pairs of def/end keywords were being injected by the IDE as I was typing; Fixing the (highly non-friendly) syntax error: "unexpected $end, expecting kEND" was much less painful since I had this nice red thingy on the editor's side bar; Determining whether an expression such as assert_equal(0, x.get_result) can also be written as assert_equal 0, x.get_result was a breeze.

The morale here is two fold. First, there is the trivial issue of WYSIWYG tools making it possible even for a complete beginner to get things done. Second, you can't really evaluate a language without an IDE. When you evaluate a language you are unconsciously comparing it against your programming experience with your current, favorite, language. For sure, you have all the tooling support you need for your current language, so unless you get some tooling support for the new language, the comparison is unfair. Once you set up an IDE for the new language your programming experience will (dramatically) improve.

I guess there is a deep psychological issue here. We are humans. We always try to do as little work as possible. When we try a new language we tend to download the least amount of tools. We rationalize it by saying something like: "well I am a seasoned programmer. I don't need an IDE/Debugger/whatever to get the feeling of the language". This is a risky -- and usually a plain wrong -- assumption.

Whiteoak 2.1

This week, Whiteoak 2.1 has been officially made public. This release includes a command line compiler, an Eclipse plugin, a new mailing list and a top-notch, AJAX-powered web-site.

Whiteoak in a nutshell

The Whiteoak programming language is an extension to Java that introduces a new degree of flexibility into the language.

Motivating the development of Whiteoak is the understanding that
due to extensive use of libraries, components and external data sources, the developer has limited control on how the objects in her program are created. The developer then spends a great deal of her time just gluing together systems of objects using all sorts of fancy patterns.

Whiteoak addresses this predicament by the introduction of structural types, thereby allowing abstraction over objects. This mechanism nicely complements the well known inheritance mechanism which allows abstraction over classes. Specifically, Whiteoak's key features are:
  • Structural subtyping (AKA: duck typing, structural conformance)
  • Introduction of default, override-able, method implementations to existing objects
  • Virtual fields and constructors
  • Anonymous types
  • Type expressions, with mixin-like or traits-like semantics

Kent Beck Redefines Simplicity

I just finished watching Kent Beck talking on Ease at Work. In part 6 (out of 7) he answers a question from the audience and makes this occasional remark which gives a fresh perspective on the issue of simplicity in software:
"Programming so that you can feel like a hero is a whole lot different than programming so that you have a program"

The Axioms of Programming

  1. There's no fundamental difference between programming and designing.
  2. Super-linearity: Programming is super-linear by nature.
  3. Instability: The external behavior of a component will need to change over time.
  4. Discontinuity: Small changes yield non-proportional results. (Gives rise to the Butterfly Effect: A human cannot predict the full effect of a change in the program. Automatic tools are pretty bad at it as well).
  5. Entropy: The total entropy of the code will naturally increase. One needs to invest external energy (time) in order to decrease entropy (Writing ugly code takes less time than writing clean code)
  6. The Uncertainty Principle: The more detailed the specification the more error-prone it is.
  7. Unpredictability: An accurate estimation of development time requires time that is proportional to the estimate itself.

Quoting Allen Holub

Found this piece by Allen Holub that makes the same point as the one I made in this post: design is all about trade offs.
Design, by nature, is a series of trade-offs. Every choice has a good and bad side, and you make your choice in the context of overall criteria defined by necessity. Good and bad are not absolutes, however. A good decision in one context might be bad in another. If you don't understand both sides of an issue, you cannot make an intelligent choice; in fact, if you don't understand all the ramifications of your actions, you're not designing at all. You're stumbling in the dark

Who Cares?

From time to time you see people who passionately argue that code such as this:

// Version 1
public static String[] findClasses(String jarFile)
throws Exception
List<String> list = new ArrayList<String>();
ZipFile f = new ZipFile(jarFile);
Enumeration<? extends ZipEntry> entries
= f.entries();
ZipEntry ze = entries.nextElement();
String name = ze.getName();

for(Iterator<String> i = list.iterator();
i.hasNext(); )
String name =;
int pos = name.lastIndexOf('.');
String suffix = "";
if(pos >= 0)
suffix = name.substring(pos + 1);


String[] arr = new String[list.size()];
for(int i = 0; i < arr.length; ++i)
String name = list.get(i);
name = name.replace('/', '.');
arr[i] = name;

return arr;

... is evil because it can be rewritten in a much more compact way, like this:

// Version 2
public static String[] findClasses(String jarFile)
throws Exception {
List<String> list = new ArrayList<String>();
Enumeration<? extends ZipEntry> entries
= new ZipFile(jarFile).entries();
while(entries.hasMoreElements()) {
String name = entries.nextElement().getName();
list.add(name.replace('/', '.');

return list.toArray(new String[0]);

In this post I don't want to debate whether the latter version is indeed better than the former. My claim is quite different, and can be summarized in two words: Who Cares ?!

Let me put it this way: even if we assume, for the sake of the argument, that version 2 is better than 1, there's still no need to refactor version 1. This is because version 1 is designed/implemented in such a way that its quality does not affect the quality of the program as a whole.

Here a few key observations about version 1:

  • It relies (solely) on classes from Java's standard library. It does not depend on application code.

  • It does not change the program's internal state. Specifically, the method does not change its input object (the jarFile variable); it is a static method so it has no instance fields; and, the method does not touch any static fields.

  • The method's code does not impose any restrictions of its own on the input object. The only restrictions (such as: jarFile must specify the path to an existing, readable, legal jar file) come from Java's standard library classes to which the input is fed.

These observation point out that this method is actually quite close to a library method. A library method never depends on application code; Library code has no access to the program's internal data structure (except for what it receives as inputs); Library code will not realize your program's business logic simply because it is not aware of your program.

If version 1 resembles a library method, why don't we turn into an actual library? After all this would make it reusable in different contexts. So, we move the method into a dedicated class which we then compile into a jar. We add the jar to our program's class path.

We now notice a very strange thing: we no longer care how version 1 is written. This is because we never care about the inner workings of library methods or classes - we are interested only in their external behavior. The key issue is what the library does, not how it does it. This is a direct result of the fact that a library hides implementation details and exposes only interfaces.

At this point a light-bulb goes off over your head. You realize that you don't need to bother yourself with creating artificial libraries. You just need to keep in mind the familiar principle of thinking in terms of API design. This means, among other things, that you force yourself to program against the API of your own modules.

(I am not saying that the API of this findClasses() method is perfect. The point is that both version 1 and version 2 offer the exact same API so they are virtually identical, API-wise).

One may err and think that we found a magic cure to all ill-designed programs: "Hey, I just need to treat a crappy piece of my code as a module with an API and my program's quality will rise".

Clearly, this is a misunderstanding of the process: not every piece of code can be treated as a module. In this example, the key issue was that our method presented library-like characteristics from the very beginning. The right conclusion is therefore this one:

if you have a well-isolated piece of code -- no dependencies; minimal, predictable side effects; no domain knowledge; a crisp interface -- then you can write it anyway you want.

Actually, there is an even deeper theme that runs throughout this post. It goes like this:
if you have a well-isolated piece of code, then you have a well-written piece of code. Stop fiddling with it.

A Solid Definition for Quality in Software

Is there a criteria for distinguishing well-written (or well designed) programs from ill-written programs?

Here is what may very well be the best definition for quality in software:

A well-written program is a program where the cost of
implementing a feature
is constant throughout the program's lifetime.

The rationale is simple - if it gradually takes more and more time to add a feature to a program, you'll ultimately reach a point where it is just too expensive to change the code. This is plain economy: past that point it won't be worthwhile to add even the tiniest feature and the project will practically stall.

This was the intuition. To make things more precise, let me explain the basic terminology:
  • Cost: time needed for programming. In this post I will speak about the total cost of the program and about the cost per feature.
  • Feature: A change in the external behavior of the program whose completion can be determined objectively. For the purposes of this discussion there is no need to distinguish between adding something new, changing something old or fixing something that is broken. We will collectively refer to these activities as "implementing a feature". This definition coincides with the definition of a story in the Extreme-Programming world.

The last term that needs to be precisely defined is "constant". Providing a good definition for this term is hard because development costs tend to fluctuate so we need to find a definition that can tolerate fluctuations. In short, the challenge boils down to this: If it takes 10 hours to implement feature #1 and 12 hours to implement feature #2 does this mean the cost is not constant?

The best way to abstract away these natural fluctuations in costs, is to use the big-O notation that is typically used for expressing the performance of algorithms. Using this notation we obtain the following precise definition of software quality:

A program is considered to be well-written if its total cost function behaves like o(n) where n is the number of features in the program

Note that a total cost of O(n) reflects an O(1) cost per feature (amortized), so we can rewrite this definitions in terms of a per-feature cost.

Why do I like this approach for quality? Four reasons:

First, it can be objectively measured.

Second, it is normalized: this approach establishes a standard scale by which program can be measured and compared. For example: a total cost of O(n*n*n) is certainly worse than O(n*n); or: a total cost O(n*log n) is usually a close enough approximation of O(n). In short, we can quantify how far are we from the "good" end of the scale.

Third, this measure of quality captures the bottom line: value delivered to clients. Many software metrics capture artifacts of the development process (hours worked, lines of code, classes written, method per class, etc.). Such metrics only testify to the quality of the process. They do not testify to the quality of the outcome. In order for these metrics to be useful one should show that the measured artifact is correlated with properties of the outcome. This rarely happens. The definition presented here largely ignores the process. It cuts to the chase and measures the outcome.

Finally, this approach acknowledges the fact that a program is a complex system. It does so by measuring the program as a whole. It is difficult (or even impossible) to estimate the quality of a program from the quality of its individual sub-parts. Here are a few examples to illustrate the pitfalls of measuring software quality in pieces:
  • Example #1: Suppose I have a piece of code that - when examined in isolation - turns out to be really crappy. Somewhere in the program there is an excellent suite of tests that cover this piece from every possible angle. This suite makes the piece much more manageable than a similar well-written piece that has no tests.

  • Example #2: Sometime bad code can be located in unimportant modules. Imagine a program that realizes sophisticated analysis on data coming from CSV (comma separated values) files. The input module of this program reads the input file and populates an in-memory matrix. Even if this input module is badly written, its effect on the overall quality of the program is minimal. At any point I can replace the implementation of the input module because it has a crisp interface (the said matrix) that isolates it from the rest of the code.

  • Example #3: The problem that my program needs to solve can be nicely expressed as a Domain Specific Language (DSL). Thus, I define a DSL (an external DSL to be precise) , write an interpreter for this DSL and then implement the solution to the problem by writing a program in that DSL. If the DSL is a good DSL (that is: solutions to domain problems can be easily expressed in it) the quality of the whole program is high regardless of the quality of the interpreter.

The definition of quality presented here is quite close to the notion of RTF as coined by Ron Jeffries. However, the definition here is a bit more relaxed - it does not require tested features. This is because constant delivery of features requires constant refactoring (you can't plan everything in advance). And, constant refactoring requires testing - otherwise you will not be able to detect incorrect refactoring steps an you'll end up introducing bugs into the program which sooner or later will slow you down.