The Essence of Programming

1 comments
What's the difference between a using a calculator and writing a computer program?

I argue that the fundamental difference is in mechanisms for reuse.
if you are to write a program the simplest program that computes n! then your code would look like this:

if(n == 1)
return 1;
if(n == 2)
return 2;
if(n == 3)
return 6;
if(n == 4)
return 1*2*3*4;
if(n == 5)
return 1*2*3*4*5;
...


This is a very simple. If you are writing computer programs like this then you are essentially using your computer as a calculator: anyone can write it because you specify the output for every possible input. It is also very easy to understand the code. However, it would take many days to write a program (using this style) that computes n! for n values between 0 and 10000. Moreover, if you need to change something (for example compute (n-1)! rather than n!) then you have to make many changes which again take a lot of time.

Fundamentally, the power of a computer program over calculators comes from loops. We can take a sequence of computations and repeatedly execute it many times
int r = 1;
for(int i = 1; i <= n; ++i)
r *= i;
return r;


Wow!. We wrote only 4 lines of code and yet we now have a program that computes n! for practically every n value (Clearly, this also implies the notion of variable - otherwise the body of he loop would compute the exact same value every time). The downside is that this program is a bit more complicated to write. It requires some skill to write and understand the code.

This loop is an example for a very simple reuse mechanism: instead of having the computation of n! duplicated for every possible n value, we repeatedly use the expression r *= i (with a different set of values in the r and i variables) such that it computed the correct result for us. This is illustrated the basic property of code reuse: write once use many times.

(Side note: The full explanation regarding the importance of loops, goes back to the seminal works of Turing and Church: a programming model cannot be Turing complete if it does not support loops or recursion.)

Of course there are many other mechanisms for reuse: functions, inheritance, mixins, traits, genericity and the likes. As soon as you start using these the complexity of the code rises. With functions you can rewrite the factorial program as follows:
function factorial(n) {
if (n==1)
return 1
else
return n * factorial(n-1);
}

Now the code is more elegant (that is: we reuse a bigger part of it) but again, it becomes more difficult to write and read. If you take someone who has just started learning programming he will probably be more comfortable with the loop version than with this recursive version.

If we are to move on to the inheritance, then we are in the object-oriented (OO) water. To explain OO-based reuse we need to understand objects, methods, overriding, which is even a greater step in terms of complexity.

My point is this: most of the complexity of programming comes from the need to reuse code, but we cannot get rid of this complexity because without reuse our programming language is reduced to a simple calculator.


This is something that should be remembered when you think about visual programming environments and other tools that are supposed to make it possible for non-programmers to write programs: many of these devices usually support only a simple form of reuse so an excessive use of these will lead to the familiar problems of duplicated code. If you try to improve the tool by introducing reuse mechanisms then it rapidly becomes so complicated that it eventually requires programming-like skills to use it, which beats it own purpose.

Dependencies

0 comments
A dependency graph is often used for appreciating/understanding the design of a (statically typed) program. The idea is simple: take every class in your program and create a corresponding graph vertex. Then, for each class X that uses a class Y create a graph edge between the corresponding vertices.

This graph can then be presented in various formats: a diagram, a list of pairs (of class names), etc. It is common to check the existence of cycles in a program's graph. Cycles are usually considered to be evil, so programmers are often complemented for developing cycle-free programs.

Having presented the background, I'd like to make two quick statements on this topic:
  1. Self dependency. Every class (and I mean every last one of them) always depends upon itself. Why? you cannot shield a class from changes in itself. For example, if you remove a private filed (let alone a public one) from a class, C, which classes will be affected? Right, C itself.

    It is not rare to see programmers who think that a class will depend upon itself only if it has a recursive method or a a recursive structure. This is a common fallacy. Every class has self-dependency, even if its methods are not recursive.

  2. Cycles. Suppose you have a cycle of two classes, that is: X depends on Y and Y depends on X. The standard trick for breaking the cycle is to extract an interface Z out of Y. Then make X use Z instead of Y. You may now think that X is no longer depending on Y.

    Now, ask yourself this question: "If you change something in Y, will X not bleed?". The answer is: Yes. When your program is running, X will still invoke Y's methods (through the Z interface) and if these methods behave differently, X will be affected. In other words, the introduction of Z eliminated the compile-time dependency but not the run-time dependency.

    Am I saying that there's no point in detecting compile-time cycles? No. Compile-time cycles are important but for a different reason: reusability. If X and Y are cyclic, then you cannot just take X and reuse in some other program/module. You will always have to carry class Y as a baggage (and vice versa). This restricts reusability. On the other hand, if X only depends on the (extracted interface) Z, then you can take only X and reuse it just by plugging-in some implementation for Z.