Is Your Program Greener or Longer?

Q: Is A cucumber greener or longer?
A: Greener. It is also green sideways.

Most programs can be viewed from several different perspectives. If we take a web-application build around the MVC architecture, we can view the program as an assembly of three subsystems: the model, the view, and the controller. At the same time, we can also see the program as an assembly of pieces of code each one dedicated to a web page.

This dichotomy is not uncommon. In each program one can see several orthogonal decompositions. Although the the decomposition into packages/classes is very dominant, other conceptual decompositions are just as viable.

For the rest of this post I'll concentrate on two decompositions: the architectural decomposition sees the code as a collection of subsystem, such as: data access, user interface, security, persistence. The functional decomposition sees the program as a collection of features (user-visible functionality) such as: login, register, delete, submit, search, ...

We can make the reasonable assumption that in most programs the number of features is far greater than the number of subsystems. Also, to keep things simple I will assume that each feature "touches" each subsystem.

(Although it very difficult to provide an algorithm that identifies all features or all subsystems in a program most programmers can usually recognize, by means of intuition, both the architectural and the functional decompositions of their program).

The big question is this: What's the easiest way to develop a program. Is it feature by feature, or subsystem by subsystem?

This question may seem to be making as much sense as asking if a cucumber is greener or longer. If we have 40 features, 4 subsystems, and every feature contributes 100 lines of code to each subsystem then it does not matter whether our work is split subsystem-wise or feature-wise. Either way total effort will be 40 times 4 times 100 = 16,000.

I think that the effort model employed by the last paragraph is flawed. I believe that the effort required for completing a programming task is more than the effort for completing two smaller tasks that amount to the bigger task. In computer science lingo, this means that:

the effort function is super linear

If f is a super-linear function then f(a+b) > f(a) + f(b). An obvious example is sorting. Sorting an array of 1,000 elements is more work than sorting 10 arrays of 100 elements. I believe that the development effort function is super linear. In fact, it seems to be closer to a quadratic function than to a linear function.

This means that It will take less effort to complete a program if your programming tasks follow features (many small tasks) than subsystems (few large tasks). The project will seem harder if developed according to architectural decomposition.

The reasoning suggested here is not a mathematical proof. It is merely a phenomena I witnessed. Nonetheless, the effort is super-linear thesis provides an explanation for issues such as: The success of TDD; the small-tasks practice of agile methodologies; the effectiveness of REPL tools; and more. So, although we cannot prove this thesis it may very well be one of the axioms of programming.

The Democratic Approach to Programming

Greg Stein published this post (see also here) which links to a 1996 question by one, Larry Page. In the question Mr. Page complains that Java's HTTP client API does not allow him to set the User-Agent field of requests. Although this may be a hoax (conspiracy fans may even say that it was planted by someone from Google who does not like Java), it certainly raises a few thoughts.

It may be seen as a yet another argument in favor of dynamic typing/monkey patching. I'll put this issue aside. I think that the thin interface perspective is just as interesting.

With thin interfaces an object provides only a small set of services. This leads to layering of abstractions: A new abstraction provides a new set of services on top of existing ones. We end up with a hierarchy of abstractions, each one tailored for a different type of usage.

With fat interfaces the API tends to evolve into a single uber-abstraction. As this debate shows this one-size-fits-all approach may be either good or bad, depending on your point view. Larry Page witnessed the negative side. According to his question, the underlying abstraction failed to provide the service he needed. It is just like you are given this big machine to help you do something, but it does not do exactly what you need. You can either disassemble the machine and build it all over again which is a lot of work (after all, it is a big machine) or you can build a new machine from scratch. Either way, it's a lot of work. If you already rely on this machine for other purposes then even replacing it will be expensive. As Jake Blues puts it, you find that "you're really up Shit Creek".

It seems that Shit-Creeks like this are quite rare with thin interfaces. Instead of one abstraction with many services you have many abstraction each providing a small set of services. Pluralism of abstractions promotes modularity because you can assemble them according to your needs. Tweaking is easier since you're not as bounded by contracts because each contract is relatively small.

As a direct result, thin interfaces put you, the client, in charge. This is a more democratic approach to programming. Let's spread the power around.

Source Code Quality Estimation is Hard

As you might have guessed I don't believe in metrics that estimate software quality by measuring source code. In the most general case they are bounded by this paradox. Even if not taken too generally, there are still considerable difficulties related to such metrics.

In order to avoid confusion and misinterpretation let me say that (a) the items on this list reflect current knowledge. Future research may find solutions to the problems therein; (b) I am not claiming here that every metric is bad nor that all reasons hold for all metrics.

1. Source code quality metrics will never be able to detect infinite loops.

In other words, you can get a good score for a program that does stupid things at run time.

2. Lack of normalization

Most metrics are not normalized: they do not have a scale by which scores can be compared. With a non-normalized metric the distance between 15 and 20 may completely different than the distance between 25 and 30. In other words, if 100 is the perfect score, then it may be that getting from 99 to 100 is a million times more difficult than from 98 to 99.

3. Indirectness: metrics do not measure the "real thing"

The real thing is software quality, which is a vague notion . For the sake of the discussion let's define quality as "number of bugs". Source code metrics do not measure number of bugs directly. They measure artifacts which are only remotely related to number of bugs.

Think about a 100-meter dash. Instead of determining the winner by directly measuring the running times of the contestants, you measure the number of practices they had in the last four weeks.

4. Blind spots

Source code metrics are mostly blind to DSLs, to the universal design pattern and a bunch of other useful techniques.

5. Detecting the negative

Many metrics work by by collecting negative points. E.g., the Cyclomatic complexity metric gives you negative points for branches in your code. The problem is that Spotting "bad" things is not equivalent for spotting "good" things.

The first implication is that such a metric can only help you when you are doing something wrong. It does not tell you that your are doing something right. In a way, it leads to mediocrity.

Moreover, let's see what happens when you use a metric that detects the negative and also has blind spots (which, sadly, most of the metrics are so) . Such a metric will encourage you to make the blind spots as bigger as possible. As the blind spot grows, the amount of negative things that can be detected lessens. The metric simply beats its own purpose.

6. Focus on things that can be measured

By definition, metrics ignore things that cannot be measured. Focusing only on the measurable makes sense as much estimating the quality of a novel by calculating the number of sentences per chapter.

Longer is Sometimes Better

The post that I swore never to write

(I never wanted to write this post. In fact, I wanted not to write it. I think it focuses on stylistic aspects which are overrated. I do think that a clear style is important, but I don't think we should be religious about it. In other words, If the method's style gets an "A" I don't see any point in trying to raise its score to "A+". Writing another unit test will give a much higher ROI).

A comment by Martin in response to one of my latest posts said that sometimes longer is better. Since people tend to prefer compactness, I think that it may be worthwhile to mention some points in favor of length. Please bare in mind that I am not saying that longer code is better. Not at all. Shorter should be the norm, but it should not be followed religiously.

Width vs. height

Programmers often attempt to reduce their method's height (number of lines) by consolidating two (or more) lines into one. If this attempt increases the width of the method (number of columns) then they actually make it more difficult to understand the code. This is because we don't read content left to right, line by line. Our eyes movement follows the F-shaped pattern:

  • Users first read in a horizontal movement, usually across the upper part of the content area

  • Next, users move down the page a bit and then read across in a second horizontal movement that typically covers a shorter area than the previous movement

  • Finally, users scan the content's left side in a vertical movement

Here's how it looks like:

(Copyright © 2006 by Jakob Nielsen. ISSN 1548-5552)

This means that we have greater chances of reading things on the left hand side of the screen. We can better understand greater parts if most of the code is located closer to the left.

One may argue that the cited research studied users looking at web-sites rather than programmers looking at source code. Thus, maybe the F-pattern is not relevant WRT programming. This is a valid argument, but it needs to be verified by research. To the best of my knowledge (and I'd be glad to stand corrected), there is no support for this argument in the literature.


You get a bug report from a user. The stacktrace shows a null pointer exception on this line:


You don't know which method call returned null. Was it m1(), m2(), or m3()?
If each call were on a line of its own you'd have much more information. Sometimes this can make a huge difference.

A single-line if

Writing the "else" part in the same line as the if() makes the code shorter but more difficult to debug:

for(int i = 0; i < getTop(); ++i)
if(get(i).equals(b)) return;

Suppose you're interested in the return statement, but since you have a single line if() You cannot place a breakpoint on the return statement. Placing a breakpoint on the if() is not practical if getTop() returns a large value (even 50 is large value in this context).

Another example:

void f(int n, int m) {
int x = g(n);
h(m, x);
if(!isValid(x)) return;

// do something useful

You are trying to find out why f() is not doing what it was supposed to do. So you step through f() and suddenly, surprise!, you find yourself in another method.

After some thinking you realize that you are back in f()'s caller. That's because the last thing you remember is that you were standing on "if(!isValid(x) return;".

Now, here are your thoughts: "Aha! That's exactly what's wrong. I should not exit from f() under these circumstances. Is the problem in g() that computes x's value or in isValid() that checks it? Gee, I don't know. If only I knew what x's value is. Oh boy! I have to start the debugging session all over again"

This problem will be a no-problem if you split the if() across two lines. During debugging you will have a visual indication that the if() succeeded.

Note that this discussion is pertaining not only to if()'s but also to the ternary operator, "?:".

Temporary variables explain the code

There is this refactoring step of Replace temp with query. Each application of this refactoring saves one line - the line where the variable was defined.

Martin Fowler is a very smart guy. He knows that reducing line count is not the only game in town. That's why immediately next to "replace temp with query" he placed Introduce explaining variable. He knows that a variable that is used only once is still useful: it explains the code, and it worth to have it, despite the extra line that it occupies.

BTW, this duality is quite common Fowler's Refactoring book. See, for example, Hide delegate vs. Remove middle man. It seems that Fowler is well aware that every decision has its trade offs.

Capturing return values

So many I found my self debugging a method and not knowing its return value. This happens when the code is written by someone who wants to save a line. Instead of writing

int f() {
// do something
int result = g();
return result;

He writes

int f() {
// do something
return g();

The problem with the latter form is that it hides f()'s return value. If this is the common style, then when you step out of f() you find that f()'s caller is returning f()'s return value directly to *its* caller, which in turn.....

This is so annoying. You have to step out of many methods. When you get to inspect this value, you realize that it is wrong but now you lost the original stack frame. Think what happens if your code was called from library whose source code is out of your reach.

(Although this is a special case of the previous point, it is related to events so annoying that it deserves a section of its own)


It should be noted that other than the first point (the F-shaped pattern) the points made here do not capture some cosmic truth about programming. They depend on the language, on the tools and probably on some cultural factors. You'll have to reconsider them when you switch to a new language/tool/team.

Finally, by all means, don't try to make your code longer. Try to make it shorter. But as you make it shorter, think about these points. They may help you avoid code that is too short.

Formalizing the Paradox

During the debate about static analysis I realized that my precise claim was not well understood. Let me make things clearer by formally proving that static type checking is at odds with quality estimation based on static detection of tangled packages.


d(p) - Static dependency of a program p, measured as the number of tangled packages.

q(p) - Quality metric of a program p (there are almost no restriction about it. Can be number of defects, cost per feature, etc. Can be calculated analytically or measured empirically).

"Equivalent programs" - Two programs that produce the same output for the same input.


[A1] Given two equivalent programs, x and y, then if x's code provides more type information that y's the following holds: q(x) > q(y)

[A2] q(p) and d(p) are inversely correlated


[1] Given a statically typed program, p1, compute q(p1), d(p1).

[2] Obtain a program p2 that upholds the following conditions: p2 is equivalent to p1; p2 has no tangled packages; p2 has less type information that p1. The transformation rules described here ensure the existence of a program p2. However, any p2 that satisfies these conditions is suitable.

[3] p2 has less type information than p1 (see [2]). Based on [A1] we have that q(p1) > q(p2)

[4] p2 has no tangled packages (see [2]) so necessarily d(p1) >= d(p2). Based on [A2] we have that q(p1) <= q(p2)

The conclusions of [3] and [4] form a contradiction: q(p1) > q(p2), q(p1) <= q(p2). Therefore we have that one of our assumptions, either [A1] or [A2] is wrong.

I am tempted to put a Q.E.D here, but that is not the point. I do not think that either static type checking (that is: [A1]) or that static analysis ([A2]) is completely useless. However, this little proof shows that applying them indiscriminately is logically flawed. Universal statements such as "a tangle was formed hence quality eroded" are ill formed without additional reasoning and explanations.

Update 1: Changed the phrasing of [2] to explicitly express the requirements from p2.

Update 2: Removed an unnecessary step.

Software Formulae are Illusive

This post started as part 4 of my "mythbusting" mini-series (part 3, part 2, part 1). After a few iterations of writing I realized it deserves to have its own title.

The mini-series focused on static detection of tangled packages. This post examines a somewhat broader issue. I'll argue that in general we cannot fully understand our programs. Therefore, we find it difficult to formulate recipes for writing high quality programs.

The heart of programming

Software construction is the task of turning some vague description of "something" into real, precise, unambiguous code. In other words, turning the informal into formal is central to our profession.

Informal = any sort of description: from a wild thought running through a bright mind, on to user stories, to use cases, to requirement documents, to whatever.
Formal = An unambiguous sequence of computations

Once you made the transition from informal into formal you have a program. Church's thesis tells us that it does not matter what language you use. In fact, there's no such thing as a formal specification which is not a program: If it is formal (that is: unambiguous) then there is a way to write an interpreter that will run it.

Note that almost-formal is actually informal. If there's even a single step left undefined, then the behavior of the whole program is undefined. You cannot predict what the computer will do when it tries to execute this undefined step. In principle, this is similar to the well known rule "if your premise is false, you can prove any conclusion": If I assume that zero equals one I can then prove that my cat is a windshield wiper.

Side note: sometimes one can analyze the formal parts and prove that the informal parts are meaningless. If you manage, for example, to show that the computer will never reach these undefined steps, then you may still reason about the well defined ones. Sadly, there is no general way to do it. Each time you'll have to devise a proof that suits the problem at hand.


What meaningful things can we say about a software project?

If one tries to reason about the informal side of the project (the vague description), there is almost nothing to hang on to. The informal description is just too fuzzy. It can be interpreted any way you want. It can mean anything and nothing at the same time. In a way, it is too promiscuous. So, you can say a lot of things, but not meaningful ones.

On the other hand, if you examine the formal side (i.e: the program) there is very little you can say. This is not just a statement. This is a real, rock solid, proof (Turing, Rice). You don't know if a program halts. you don't know if it is equivalent to some other program. You don't know if it produces a certain output.

Let me put it this way: thanks to Turing, we have a mathematically-sound proof which shows that we can have very few mathematically-sound proofs about programs. Almost like a catch-22.

How come reasoning about programs is so hard? Even without getting into the details of Turing's proof I can say that the proof relies on the fact that a program can be thought of as a machine that delivers some useful service. This service may be another program. A quick example: the JVM is a program that runs a Java program. So, in some way, we have a machine that may contain *a similar machine*. I don't think that such machines exist outside the world of programming. This amazing property is due to the fact that every programming language is expressive enough to implement every other language. This property lies at the heart of Turing's proof.

What are the devices that make programming languages so powerful? Surprisingly it does not take much to become a fully fledged programming language. You need: (a) an if()-like construct so that you can make decisions; (b) capacity to compute new values from old ones; (c) Loops and/or recursive functions; (d) dynamically allocated memory and/or dynamically created functions (a-la closures).

Summarizing the issue of reasoning we see that there very are few intelligent things one can say about a software project. The informal description is too promiscuous. The formal leaves very little reasoning space. Either way, you loose. There's also no middle ground: an almost formal is still informal. Bugger!

I am a programmer. How is this discussion related to my day job?

Here's how. Every now and then someone is trying to sell you some formula with the claim that it will make your programs better. In its simplest form the formula is expressed as an imperative: "avoid classes with more than (say) five fields" (which doesn't make much sense but I've seen it once), but it can also take other shapes and forms. Here is a more delicate paraphrasing of the same formula: "minimize your program's score on the FPC (Fields-Per-Class) metric". Sometime the formula is embodied within a software engineering tool that scans the code and detects "problems" of many sorts.

Whenever I hear about a formula I try to confront it with the things that make it so hard to reason about programs: (a) a program can contain other programs; (b) the same thing can be expressed in several different ways. Specifically, I take the idiom (or pattern or construct) whose usage is discouraged by the formula. and I try to find ways to emulate this idiom.

To make things concrete, let's go back to our "avoid classes with more than five fields" formula. How can we emulate fields without defining fields? Easy. Define a single field which holds a property list. This will minimize the number of fields, where at the same time will allow the object to hold as many fields as you want.

Having found this weakness we can devise a precondition to guard against it: "The formula is viable only with classes that do not rely on property lists". This is the simplest precondition I could think of. Alas, property lists are quite useful, so this formula may be much less universal than it seems. Clearly, someone else may come up with a less restrictive precondition. I found that it is not that easy.

Just for plain fun, let's quickly go over one more formula. If you need to restrict the number of classes you can use a single class with several closures. Thus, you can configure the behavior of individual instances just as if they came from different classes.

Final thoughts

One may argue that these counter examples are benign since they will only occur if a "clever" programmer attempts to game the system. I disagree. Once a team accepts a formula, the formula will affect the code that the team produces. The programmers will try to satisfy the formula, but not because they want to game the system. On the contrary: they want to produce the best software in the world. The formula provides them with the notion of "good", so they will try to satisfy it out of good intentions (see what Joel says about arbitrary incentives). As always, it all boils down to the lack of an absolute-, measurable-, objective- notion of software quality. Sadly, every single formula that I saw was very far from the real thing.

I think we humans ought to be humble. our ability to reason about software projects is limited. As a direct result, our ability to devise software development formulae is also limited.

Finally, there is some cute irony here. The things that make it impossible for a compiler to determine whether our program halts (or: produces the desired output, etc.) are also the things that make it difficult for humans to devise general software formulae. This similarity is not coincidental: reasoning about programs is provably difficult. You'll see its implications in all sorts of places.

Update 1: Added the "Final thoughts" section

Mythbusting - Part 4

This post has moved to here. My apology for the inconvenience.