Software Formulae are Illusive

This post started as part 4 of my "mythbusting" mini-series (part 3, part 2, part 1). After a few iterations of writing I realized it deserves to have its own title.

The mini-series focused on static detection of tangled packages. This post examines a somewhat broader issue. I'll argue that in general we cannot fully understand our programs. Therefore, we find it difficult to formulate recipes for writing high quality programs.

The heart of programming

Software construction is the task of turning some vague description of "something" into real, precise, unambiguous code. In other words, turning the informal into formal is central to our profession.

Informal = any sort of description: from a wild thought running through a bright mind, on to user stories, to use cases, to requirement documents, to whatever.
Formal = An unambiguous sequence of computations

Once you made the transition from informal into formal you have a program. Church's thesis tells us that it does not matter what language you use. In fact, there's no such thing as a formal specification which is not a program: If it is formal (that is: unambiguous) then there is a way to write an interpreter that will run it.

Note that almost-formal is actually informal. If there's even a single step left undefined, then the behavior of the whole program is undefined. You cannot predict what the computer will do when it tries to execute this undefined step. In principle, this is similar to the well known rule "if your premise is false, you can prove any conclusion": If I assume that zero equals one I can then prove that my cat is a windshield wiper.

Side note: sometimes one can analyze the formal parts and prove that the informal parts are meaningless. If you manage, for example, to show that the computer will never reach these undefined steps, then you may still reason about the well defined ones. Sadly, there is no general way to do it. Each time you'll have to devise a proof that suits the problem at hand.


What meaningful things can we say about a software project?

If one tries to reason about the informal side of the project (the vague description), there is almost nothing to hang on to. The informal description is just too fuzzy. It can be interpreted any way you want. It can mean anything and nothing at the same time. In a way, it is too promiscuous. So, you can say a lot of things, but not meaningful ones.

On the other hand, if you examine the formal side (i.e: the program) there is very little you can say. This is not just a statement. This is a real, rock solid, proof (Turing, Rice). You don't know if a program halts. you don't know if it is equivalent to some other program. You don't know if it produces a certain output.

Let me put it this way: thanks to Turing, we have a mathematically-sound proof which shows that we can have very few mathematically-sound proofs about programs. Almost like a catch-22.

How come reasoning about programs is so hard? Even without getting into the details of Turing's proof I can say that the proof relies on the fact that a program can be thought of as a machine that delivers some useful service. This service may be another program. A quick example: the JVM is a program that runs a Java program. So, in some way, we have a machine that may contain *a similar machine*. I don't think that such machines exist outside the world of programming. This amazing property is due to the fact that every programming language is expressive enough to implement every other language. This property lies at the heart of Turing's proof.

What are the devices that make programming languages so powerful? Surprisingly it does not take much to become a fully fledged programming language. You need: (a) an if()-like construct so that you can make decisions; (b) capacity to compute new values from old ones; (c) Loops and/or recursive functions; (d) dynamically allocated memory and/or dynamically created functions (a-la closures).

Summarizing the issue of reasoning we see that there very are few intelligent things one can say about a software project. The informal description is too promiscuous. The formal leaves very little reasoning space. Either way, you loose. There's also no middle ground: an almost formal is still informal. Bugger!

I am a programmer. How is this discussion related to my day job?

Here's how. Every now and then someone is trying to sell you some formula with the claim that it will make your programs better. In its simplest form the formula is expressed as an imperative: "avoid classes with more than (say) five fields" (which doesn't make much sense but I've seen it once), but it can also take other shapes and forms. Here is a more delicate paraphrasing of the same formula: "minimize your program's score on the FPC (Fields-Per-Class) metric". Sometime the formula is embodied within a software engineering tool that scans the code and detects "problems" of many sorts.

Whenever I hear about a formula I try to confront it with the things that make it so hard to reason about programs: (a) a program can contain other programs; (b) the same thing can be expressed in several different ways. Specifically, I take the idiom (or pattern or construct) whose usage is discouraged by the formula. and I try to find ways to emulate this idiom.

To make things concrete, let's go back to our "avoid classes with more than five fields" formula. How can we emulate fields without defining fields? Easy. Define a single field which holds a property list. This will minimize the number of fields, where at the same time will allow the object to hold as many fields as you want.

Having found this weakness we can devise a precondition to guard against it: "The formula is viable only with classes that do not rely on property lists". This is the simplest precondition I could think of. Alas, property lists are quite useful, so this formula may be much less universal than it seems. Clearly, someone else may come up with a less restrictive precondition. I found that it is not that easy.

Just for plain fun, let's quickly go over one more formula. If you need to restrict the number of classes you can use a single class with several closures. Thus, you can configure the behavior of individual instances just as if they came from different classes.

Final thoughts

One may argue that these counter examples are benign since they will only occur if a "clever" programmer attempts to game the system. I disagree. Once a team accepts a formula, the formula will affect the code that the team produces. The programmers will try to satisfy the formula, but not because they want to game the system. On the contrary: they want to produce the best software in the world. The formula provides them with the notion of "good", so they will try to satisfy it out of good intentions (see what Joel says about arbitrary incentives). As always, it all boils down to the lack of an absolute-, measurable-, objective- notion of software quality. Sadly, every single formula that I saw was very far from the real thing.

I think we humans ought to be humble. our ability to reason about software projects is limited. As a direct result, our ability to devise software development formulae is also limited.

Finally, there is some cute irony here. The things that make it impossible for a compiler to determine whether our program halts (or: produces the desired output, etc.) are also the things that make it difficult for humans to devise general software formulae. This similarity is not coincidental: reasoning about programs is provably difficult. You'll see its implications in all sorts of places.

Update 1: Added the "Final thoughts" section

1 comments :: Software Formulae are Illusive

  1. Thought provoking! Your key point--we are limited in our ability to reason about software, and therefore software formulae--is very important ... and not recognized widely enough.

    I would challenge the notion that the best metaphor for writing software is mathematical, though. This metaphor implies that the only audience for the code is the compiler. Instead, I find the literary metaphor to be very helpful. This implies that my code is being read by my fellow developers, and by golly if they can't understand it, then my code is crap. Even if it compiles and passes automated and ad hoc tests.

    A lot of the "formulae" focus on making code more readable. For example, "method is too long" or "cyclomatic complexity score is too high" probably indicate that the code is hard to read and understand. And if that cannot be proven mathematically, so what? Good code is good literature. And I can't prove that Dickens' Christmas Carol is better literature than a transcription of a talkative Alzheimers patient, either. So I don't reject the formulae out of hand; I just try to apply them with some nuance.

Post a Comment