Blogging Issues

1 comments
This is a short, self-reflective, post.

Item 1: I started using Feedburner last week It seems that the pinging mechanism that should make sure Google Reader is synchronized with my Feedburner-manged RSS feed does not work very well, to say the least. I therefore turned back on this blogspot's native feed. I now manually ping Feedburner (OK, I will have a greasemonkey do it for me) whenever new content is published.

Item 2: How come November was such a productive months (in terms of number of posts)?
It's kind of a long story but I can condense it into five simple steps:
1. I installed the Google Reader Watcher Firefox add-on. This made me more aware to things taking place at the blogging scene.
2. I delved into Steve Yegge's blog. His posts are always thought provoking.
3. I dicovered that reading Yegge's blog motivated me to sit down and publish all these half-completed posts that were waiting (as "sitting drafts") for many months.
4. As a direct result I started writing. After the fourth post things became easier.
5. More content resulted in more visitor comments. This, in turn, motivated me to write even more.

That's basically it.

Mythbusting - Part 3 (The Inherent Paradox)

30 comments
This is the third part in this mini series whose goal is to show why static dependency analysis is largely useless.

Part 1 primarily used reasoning and essay-like arguments.
Part 2 provided hard data --- that is: actual dependency graphs --- showing that tangled packages (let alone tangled classes) are common in very successful libraries and tools.

This part will probably be much more entertaining: It shows the logical flaw underlying static dependency analysis.

So, here is a Java program with two classes. The code itself is silly. I just wanted to have a program with cyclic dependency that is doing something (no matter how silly).


package p1;
import p2.B;

public class A {
private B b;

public void setB(B b) { this.b = b; }
public void f(String s) {
b.g(s);
}
}

package p2;
import p1.A;

public class B {
private A a;

public void setA(A a) { this.a = a; }
public void g(String s) {
if(s.length() != 0)
a.f(s.substring(1));
}
}


As expected, the dependency graph of packages p1, p2 is a cycle:



Now, let's rewrite this program by applying the following rules:
  1. All variables, parameters, fields, return types will have the type "Object"
  2. Every method invocation such as x.m(y,z) will be rewritten as: Util.call(x, "m", y, z). The Util class itself is shown in the fragment below.

(Reservation: I've intentionally left out rules for the handling of overloading, static members, and object creation as these are not needed for the example shown here)

Here is how the program looks like (to avoid confusion I now use packages pp1, pp2 rather than p1, p2).

package pp1;

import pp2.B;
import static pp0.Util.call;

public class A {
private Object b;

public void setB(Object b) { this.b = b; }
public void f(Object s) {
call(b, "g", s);
}
}

package pp2;
import static pp0.Util.call;

public class B {
private Object a;

public void setA(Object a) { this.a = a; }
public void g(Object s) {
if(!call(s, "length").equals(0))
call(a, "f", call(s, "substring", 1));
}
}

Of course, we also need to write the static method Util.call (in package pp0). Here it is:

package pp0;

import java.lang.reflect.Method;

public class Util {
public static Object call(Object r, String s,
Object... args) {
for(Method m : r.getClass().getMethods()) {
if(!m.getName().equals(s))
continue;

if(m.getParameterTypes().length != args.length)
continue;

try {
return m.invoke(r, args);
} catch(Exception e) {
throw new RuntimeException(e);
}
}
return null;
}
}


We can now take a look at this dependency graph of pp0, pp1, pp2:



(Irony mode on) Wow! No tangles! This technique is awesome. No matter how big and tangled your program is, you can apply these two very simple transformation rules and you will get this nice-, loosely coupled-, code, where each class depends only on Util.call(). Amazing! (Irony mode off)

Do you see the paradox? Tools that detect cyclic-dependencies are using type information from your source code to help you find highly coupled modules. However, these tools encourage you to eradicate the information on which they rely. In a sense, they want you to make them blind. This makes no sense.

Clearly, the transformation rules that we applied did not really eliminate coupling. They just swept it under the rug. The coupling is still there, waiting for run-time to raise its head. If one claims that cyclic dependency analysis is not valid *after* the transformation, due to inability to determine the run-time dependencies, one should also acknowledge that cyclic dependency analysis was not valid *before* the transformation due to the same reason: inability to determine run-time dependencies.

Is there a morale here? I guess there is. Think about dynamically typed languages:

  • If you believe in static analysis of dependencies you should start developing in a dynamically typed language and you will immediately boost your static analysis score through the roof: no cycles, no coupling of packages/classes/methods.

  • (otherwise) you don't believe in static analysis. You know that the static picture is only a distorted reflection of the dynamic behavior. Hence, you know that you cannot trust a tool that tells you that your program is OK. If this is what you think, then you are already writing in Smalltalk, Self, Javascript or Ruby.


Update 1: Added "(Turning irony on)"

Update 2: Added the reservation

Mythbusting - Part 2 (in Pictures)

6 comments
Preface: If a certain property is common, in a statistically significant manner, in successful projects, then it is quite natural to assume that this property is a positive trait.
The data presented here shows that tangled packages/classes are quite common. This suggests that the belief that tangling is a *negative* property, is in odds with reality (Of course, this is not a scientific study).


In the Mythbusting post I said that tangles of packages are benign. I based this claim on the fact that many successful libraries contain such tangles. In particular, I said that 20 out of 23 libraries that are bundled with the Spring framework, do contain tangles. Here are the actual graphs of five libraries/tools which you might be acquainted with.



These images were produced by Structure101.

Update 1: Added the preface (as per vinod's suggestion)

Mythbusting - Part 1

4 comments
(Disclaimer: as always, the issue of reasoning about software quality is largely a matter of personal judgment)

Yes, I really like Adam and Jamie. This post is about myths that you won't see on their TV show.

Actually, this post is motivated by Ian Sutton's post titled: "Software erosion in pictures - Findbugs". That piece shows the package dependency graphs of several releases of Findbugs. The graphs were obtained by Structure101, a static analyzer that reads class files. The conclusion there is that Findbugs is eroding since the number of cycles (tangles) in the graph, as well as the size of these cycles, grows over time.

Ian's premise is that tangled packages are indicative of poor code quality.

I think this premise is based on myths that needs to be carefully examined. So this is what I'll do here.

Myth #1: Tangled packages cannot be tested separately

The counter argument is this: When the packages were not tangled were they tested separately? It is likely that the answer to this question is "No". The lack of compile-time dependency does not mean lack of run-time dependency. Maybe the two packages were already implicitly related (let's say through data passed as a String), and we just made this relation explicit (through a dedicated class).

Myth #2: Tangled packages cannot be deployed separately

This is not true in Java (and in many other modern languages) since we have dynamic loading. And again, due to hidden run-time dependencies, it may very well be the case that you will not be able to separately deploy packages even if they were *not* statically tangled.

Myth #3: Non tangled packages can be modified without affecting each other

Assume you pass an Iterator, implemented by class from package A to some code from package B. The packages are not statically related (they depend on Iterator but not on each other). Still, they must agree on certain things: Can this iterator's next() method return null? What is the semantics of remove()? If one side breaks the contract, the other side will bleed.

Myth #4: If I eliminated a package level tangle, I solved some dependency issues

Not necessarily. You might have just consolidated the participating packages. So the dependency is still there, this time at the class level.

Myth #5: A similar analysis is also valuable at the class level

This is usually not true. The *full* dependency graph of many well written classes is so cluttered that it is practically useless. Most visualization tools workaround this by (a) showing (by default) dependencies only from the same package; (b) ignoring library classes. When you choose the non-default view you see how intricate the network of classes is. This happens with almost every program you examine regardless of what you think of its quality. Additionally, class level analysis is also ignorant of run-time dependencies.

There a few more myths, but I think we can stop now. However, there is one more thing that I wanted to talk about: After reading Ian's post I did some field research: I took all the jar files that come along with Spring (including spring.jar itself) and fed them into Structure101, just like Ian. All in all: 23 jar files. Do you know how many of these *didn't* have tangled packages?



The answer is: 3.

Let me repeat that: 20 libraries had their packages tangled.


So, if you want your library to be used by Spring you should make the packages tangled. That's what the numbers are telling us. As long as you keep writing test cases, you have no reason to be afraid of tangles: the developers of JUnit, Dom4J, Log4J, Ant, Antlr (partial list) are happily living with tangles.

Shorter is Better?

13 comments
It is widely believed that compactness is a desirable quality of code. Sadly, I often see people claiming that compactness is an absolute metric, in the sense that shorter code is always superior to longer code.

Leaving aside the issue of how length should be measured (character count? line count? token count? executable size?), here are two different Javascript functions that fulfill the same task: translate an integer in the range 0..13 to roman numerals.


function toRoman1(n) {
return n == 0 ? "" :
n % 5 == 4 ? "I" + toRoman1(n + 1) : n >= 5 ?
(String.fromCharCode(
84 + Math.floor(n / 5)*2)) +
toRoman1(n % 5) : "I" + toRoman1(n - 1)
}

function toRoman2(n) {

if(n < 4 && n > 0)
return "I" + toRoman2(n - 1);

if(n == 4 || n == 9)
return "I" + toRoman2(n + 1);

if(n >= 10)
return "X" + toRoman2(n - 10);

if(n >= 5)
return "V" + toRoman2(n - 5);

return "";
}


The former function, toRoman1(), is much shorter than the latter. Moreover, its cyclomatic complexity is lower (since it has fewer branches) suggesting that it is simpler to understand. Still, for most programmers, toRoman1() is more difficult to read, to write, and to debug.

My advice is to go easy with this "shorter is always better" claim. Just like many other things in programming, the right degree of compactness is also a matter of striking a balance. Or, as Allen Holub puts it: "Good and bad are not absolutes".

Update 1: Fixed the no-return-value issue spotted by Yardena.

Update 2: Added the cyclomatic complexity point.

Is it a bug? Is it a feature?

2 comments
In his latest post Jeff Atwood argues that there should be no distinction between a bug and a feature request. I couldn't agree more. Both bugs and feature requests are actually requests to change the behavior of the program.

I guess this distinction is based on the assumption that the implications of fixing a bug tend to be "local" (i.e.: affecting smaller parts of the program) where-as those of implementing a feature tend to be "global". Thus, implementing a feature is more likely to introduce new bugs, so it makes sense to postpone it until most current bugs are solved.

I disagree with this thesis.

First, I saw many bug fixes that resulted in new bugs being introduced, thereby contradicting the "locality" assumption.

Second, in a test-infected program (that is: a program with a strong test suite), the chances of unwanted side effects being introduced due to a feature implementation are practically zero: you write the code, you run the tests, and if the bar is green then everything is likely to be fine.

One final point: It seems that this distinction, along with several other (mis)conceptions came from the Waterfall approach: With development and debugging being two separate phases it is only natural to classify the items on the project's to-do list as being either a feature or a bug.

On Syntax and Religion

1 comments
Ola Bini published a piece showing how the introduction of a special variable, the "it" variable, into the Ioke language makes regular-expression matching less verbose. In short, the it-variable captures the result of the expression examined by the enclosing "if" or "unless" decision. Thus, you can write (pseudo-code):
if(/[a-z]*/.matches("ab")) {
println(it.matchLength()); }

instead of
var x = /[a-z]*/.matches("ab");
if(x != null) { println(x.matchLength()); }


Personally, I tend to agree with Ola regarding the virtue of this mechanism.

The comments made by readers show how subjective - not to say: biased - the whole issue of "syntactic elegance" is. As of Tuesday evening (GMT), here is a short summary of the various opinions.

Marcelo prefers a Smalltalk-like style:
/[a-z]*/.ifMatches("ab", res,
{ println(res.matchLength()); },
{ println("no match"); })


Stefano wants the variable declaration to be explicit:
if(x = /[a-z]*/.matches("ab")) {
println(x.matchLength()); }


Paul advocates the introduction of "=>" as a pattern matching operator:
/[a-z]*/ => x


Antares Trader argues for a closure with a parameter:
if(/[a-z]*/.matches("ab")) { |x|
println(x.matchLength()); }


Johannes says that Option types neatly express the idiom in question.

I think that local optimizations of the syntax (in contrast to global aspects such as: typing discipline, programming paradigm, object model, standard library) are largely a waste of time. Will programmers not use the language because they cannot change the name of the "it" variable? Even fifty (local) improvements to the syntax, will not make a difference. The exact notation chosen for this construct will have nearly no impact on Ioke's future success.

Therefore, my rule of thumb is this: It is worth while to debate over a local syntactic issue only if you have knock-out argument. If you are only going to win by points, don't bother.

Crup

1 comments
Lately I've been using the acronym C.R.U.P as a shorthand for "Code that is RedUndant in the Program". In this post I will explore some of the negative implications of crup. In order to reduce verbosity I will use the word "crup" instead of the acronym C.R.U.P. (Too bad the word crup has no meaning. Really wish it did).

Program development can be thought of as a process where you repeatedly add new pieces of functionality. The implementation of each such piece may depend on existing functionality but may also require the introduction of new functionality, which in turn spawns additional functionality pieces. This model can be naturally represented as a tree where child nodes represent pieces of functionality that were introduced as part of the implementation of the parent. Note that a node can be method, a class, a collaboration of classes, a module, whatever.

Redundant functionality (that is: redundant nodes, crup) is a node that provides more services that its parent expects. In other words, you can replace the subtree rooted at this node with a smaller tree and the program as a whole will still deliver the behavior it was expected to deliver.

programmers often introduce crup into the code-base, thinking that a certain piece of functionality will be needed in the future, and that it is less effort to introduce it sooner than later. This resonates with the famous YAGNI debate: The pro-YAGNI people say that crup is really bad. The pro-crup people say that by predicting future behavior of a program you can ensure your program's structure is well optimized for the long run, and that no post-facto refactoring will be able to match that.

So, let's throw some numbers into the picture: each node has 5 children. 1 in 5 children is a crup node. The tree has 4 levels (that is: a root node, five children, 25 grandchildren, 125 grand-grand children). Total number of nodes is therefore: 1+5+25+125=156.

How much crup do we have in the program? Let's see. The root is obviously not redundant. However, one of its children (level-2) is redundant along with its whole subtree. That's 1+5+25 = 31.

The other four nodes at level-2 are not crup, but each of them is a proud parent of a level-3 crup node: 4*(1+5) = 24. We are left with sixteen non-crup level-3 nodes. These nodes have eighty children, of which one fifth is crup: 16.

So in this program the total crup factor is: (31 + 24 + 16) / 156 = 71 / 156 = 45%. In other words, almost half of the effort went into redundant functionality. That's a lot of extra work. The programmer must posses good future-prediction skills to make this extra work pay off.

The crup factor rises exponentially with the height of tree. If the tree had 5 levels, our crup factor would rise to 56%.

This little post is by no means a scientific study. There are a lot of other factors to consider if one wants to make a scientific claim regarding crup, which is kind of ironic since empirical study of real life software projects/software development is a thorny issue. Here is one factor that mitigates the negative effects of crup: if a prediction regarding a crup node, which is close to the root, turns out to be correct, then the vast majority of the cruppy nodes in the program will stop being crup.

On the other hand, other factors compound the crup effect. For instance, Maintenance and developments costs are probably super-linear with respect to program size (e.g.: program size doubles the but costs quadruple). Thus, Even a small amount of residual crup has a significant negative impact on the program's complexity.

As I said, I am not trying to make a scientific claim here. I just wanted to make the intuition behind YAGNI a bit more concrete. The aforementioned example, with its 45% crup factor, suggests that you could be much more productive if you just cut the crap.

Structural Subtyping == Reduced Compile-Time Coupling

1 comments
Although I've been wrapping my mind around structural subtyping in Java for quite some time now, there is one argument in favor of structural typing that didn't get the attention it deserves: it is much easier to develop code against an API that relies on structural types due to reduced compile-time dependencies.

As always I'll illustrate my point with an example. We'll start with the standard (non-structural) approach.

(Side note: In the very first lecture at the Programming-Languages class I took, my teacher, Prof. Ron Pinter, made the point that a programming language is an excellent device for formally expressing ideas: "Instead of battling a natural language hopelessly trying to get the precise meaning, just write the thing in a PL and your intention will be clear". I think this is a very good advice)

Suppose someone has finally written the huge algorithm that realizes the overwhelmingly complicated logic of the bridge-keeper from Monty Python and the Holly Grail. The algorithm is encapsulated within the Bridgekeeper class whose protocol relies on the Knight interface.


public interface Knight {
public String answerMe(String question);
}

class Birdgekeeper {
public boolean canCross(Knight k) { ... }
}


My friend Graham developed an AbstractKnight class which provides some services common to all knights:


public abstract class AbstractKnight
implements Knight {
private final String name;

protected AbstractKnight(String name_) {
name = name_;
}

protected abstract String nonNameQuestion(String q);

public String answerMe(String question) {
if("What is your name?".equals(question))
return name;
String answer = nonNameQuestion(question);
if(answer == null)
answer = "I don't know that";
return answer;
}
}


If I want to develop a concrete subclass of AbstractKnight (Lancelot or Robin come to mind), Graham needs to send me two jars (or two source file packages):
  1. Graham.jar - Providing the definition of the AbstractKnight class
  2. Bridgekeeper.jar - Providing the definition of the Knight interface


Yes. Without Bridgekeeper.jar I will not be able to compile my Lancelot class, despite the fact that Lancelot is only indirectly coupled with Knight. When you compile a class the compiler requires all its ancestors to be present.

Graham is therefore forced to distribute Bridgekeeper.jar to allow other programmers to subclass his AbstractKnight class. This often leads to proliferation of download-able packages (with/without dependencies) which is a constant headache for open source developers. Of course, if you choose to download the full package (the one that does include the dependencies) you are then risking having several different version of the same library on your machine, which - again - leads to headaches.

With structural subtyping things are much simpler:

// library: Bridgekeeper.jar
public struct Knight {
public String answerMe(String question);
}

... // everything else remains the same


// library: Graham.jar
public abstract class AbstractKnight {
// "implements Knight" removed

... // everything else remains the same

}


With this approach, AbstractKnight has no compile-time dependency on Knight. Thus, you can compile your Lancelot class even if Bridgekeeper.jar is not present on your classpath:

public class Lancelot extends AbstractKnight {

public Lancelot() {
super("Sir Lancelot of Camelot");
}

protected String nonNameQuestion(String q) {
if("What is your quest?".equals(q))
return "To seek the Holy Grail";
if("What is your favorite color?".equals(q))
return "Blue";
return null;
}
}



Update 1: OK. I couldn't resist the temptation. Here it is, in its full glory, scene 35:

Bulletprooof - II

0 comments
Following my previous post, here is another example for bulletproofing your API (again, I am not saying that you should, just wanted to provide a few concrete examples).

Simple (non bulletproof) version:

public class Resource {
public enum Mode { READ, WRITE }
public Resource(String fileName) { ... }
public void open(Mode m, boolean autoFlush) { ... }
public char read() { ... }
public void write(char c) { ... }
}


Legal usage:

Resource res = ... ;
res.open(Resource.Mode.READ, false);
char c = res.read();


Illegal usage:

Resource res = ... ;
res.open(Resource.Mode.READ, false);
res.write('a'); // Runtime error:
// Resource was opened for READ


The bulletproofing technique remains the same: introduce new classes to make it possible for the compiler to distinguish between the different variants.


public class Readable {
public char read() { ... }
}

public class Writable {
public void write(char c) { ... }
}

public class Resource {
public Resource(String fileName) { ... }
public Readable openRead(boolean autoFlush) { ... }
public Writable openWrite(boolean autoFlush) { ... }
}

Bulletproof

0 comments
This post waited a long-long time to be written. The motivation to finally do it came from a discussion I had with Andreas Brink at OOPSLA'08. We both felt that TDD made us more productive. In the pre-TDD era we were always trying to make our API bullet-proof, i.e.: that it will be really difficult to write code that will fail at run time. With TDD this is not such a big concern. You write simpler code (which is much faster than writing bulletproof code) and you rely on the tests suite to keep from you doing mistakes.

I don't want to get into the can-tests-really-prevent-bugs debate. My intention here is just to show what I mean by a bulletproof design. Here is a simple (non-bulletproof) Flight class:


public class Flight {
private final int numSeats;
private final Set<Traveler> travelers
= new HashSet<Traveler>();

public Flight(int numSeats_) {
numSeats = numSeats_;
}

public void book(Traveler t) {
if(travelers.size() >= numSeats)
throw new RuntimeException("Fully booked");
travelers.add(t);
}

public void unbook(Traveler t) {
if(!travelers.contains(t))
throw new IllegalArgumentException("t=" + t);
travelers.remove(t);
}
}

public class Traveler {
...
}


The predicament lies in the Flight.unbook() method. If we pass the wrong argument, this method will fail:

Filght f1 = ...;
Filght f2 = ...;
Traveler t = ...;
f1.book(t);
f2.unbook(t); // Illegal argument error!


And now, let's see the bulletproof approach. The trick is to use static type checking - sometimes in conjunction with encapsulation - to make it impossible to issue an incorrect unbook() request:


public class Flight {

private final int numSeats;
private final Set<Traveler> travelers
= new HashSet<Traveler>();

public Flight(int numSeats_) {
numSeats = numSeats_;
}

public Traveler book(Traveler t) {
if(travelers.size() >= numSeats)
throw new RuntimeException("Fully booked");
travelers.add(t);
return new Reservation(t);
}

public class Reservation {
public final Traveler traveler;
public Reservation(Traveler t) {
traveler = t;
}

public void unbook() {
travelers.remove(traveler);
}
}
}


With this design unbook requests (now issued through a Reservation object) cannot fail:

Filght f1 = ...;
Filght f2 = ...;
Traveler t = ...;
Flight.Reservation r = f1.book(t);
r.unbook(); // Always safe!

Three Kinds of Bugs

0 comments
I tend to classify bugs into one of these three groups:
  • Laziness
  • Lack of Creativity
  • Betrayal

The running example that I'll use here is a method that finds the largest value in a list of integers.

Laziness

public static Integer maxA(List<integer> list) {
int result = Integer.MAX_VALUE;
for(int i = 0; i < result)
if(result < list.get(i))
result = list.get(i);
return result;
}


Do you see the problem? result's initial value is completely wrong. Such bugs often happen when the programmer copies the code of a similar method but neglects to make all the neccssary adjustments.

Why is this group called "laziness"? That's because this method will not pass even the simplest test: assertEquals(new Integer(30), maxA(Arrays.asList(10, 20, 30))); In other words, if you weren't lazy you could have caught the bug with the very first test you threw at it. Actually, you don't even need a test case: just run it once on any piece of input and you'll find the bug. BTW, this "run it once" practice shows why read-eval-print shells are useful. Too bad they are common only in the arena of dynamically typed languages.

Lack of Creativity

public static Integer maxB(List<integer> list) {
int result = 0;
for(int i = 0; i < result)
if(result < list.get(i))
result = list.get(i);
return result;
}


maxB() is clearly better than maxA(). It will not fail at the very first attempt. In fact, as long as your input contains only positive integers this method will seem fine. You need a certain degree of creativity to devise an input that will manifest the bug, such as: assertEquals(new Integer(-4), maxB(Arrays.asList(-4)));

Betrayal

public static Integer maxC(List<integer> list) {
int result = list.get(0);
for(int i = 1; i < result)
if(result < list.get(i))
result = list.get(i);
return result;
}


This version works quite well. It will pass both of the aforementioned tests. The bug here is much more subtle. With Betrayal bugs the root of the problem is a disagreement between two methods.

Such a disagreement may occur if the method that builds the list (which is then fed into maxC()) will also generate empty lists. This situation contradicts maxC()'s assumption that the list is non empty. In a sense, maxC() is betrayed by its own program.

It is very tempting to say something like: "Well, maxC() has a clear precondition. This precondition is not met, so the bug is in the caller". However tempting, this view is overly simplistic. Suppose the program method that the code that calls maxC(), looks like this:


public class Driver {
public int n;

public List<Integer> listBuilder() {
List<Integer> newList = new ArrayList<Integer>
for(int i = 0; i < n; ++i)
newList.add(i);
return newList;
}

public Integer m() {
return maxC(listBuilder());
}
}


The contract of listBuilder() says that it should return a list with the integers 0..n-1 (inclusive). In particular, if n is zero the returned list should be empty. The contract of m() says that it should apply maxC() on the list created by listBuilder(). The contract of maxC() says that it should return the highest number from a non empty list.

So, every method is doing exactly what it is supposed to be doing, and yet the net effect is a runtime error. Despite the fact that maxC() is the one being betrayed we may decide that the correct way to fix the program is to change maxC() rather than listBuilder(). This is typically the case when we realize that our program needs to handle a wider range of inputs.


This type of bug cannot be detected by a testing maxC() in isolation. We need a higher level test, to expose the bug.

One final comment: Earlier I defined m()'s contract to be: "inbvoke maxC() on whetever listBuilder() returns" . I guess that some people may claim that this is not a valid contract for a method since it specifies implementation details rather than taking a black-box approach for specifying m()'s behavior. I am sorry to disappoint you guys, but such an approach will only work with simple programs. If your program is doing complicated computations you will not be able to express the contract (purely) in terms of input- vs. output- data. The input data will undergo such complicated transformations that you will have to use some form of abstraction, which is exactly what I did. Given that abstraction is the process of ignoring some details, there is an inherent lose of precision when one tries to formulate the contracts of elaborate computations.

Reuse: A goal or a by-product?

0 comments
I found this gem in the comp.object mailing list:
"Re-use is a happy side-effect of good designs. The primary goal is managing dependencies between modules, by making them more pluggable"

I like this statement because it captures an important insight that is too often overlooked due to us following common conceptions without questioning them. Reuse is not a goal. Reuse is a trait of source code that usually goes hand in hand with good programs. This does not mean that a high degree of reuse is a must for achieving high quality code.

If your program's complexity is well controlled then you're in a good position regardless of the amount of duplicated code therein. Conversely, having no duplications does not mean that you successfully tamed complexity. One reason for that is that highly-reused modules are also highly-coupled: even a tiny a modification in a (highly reused) module is likely to break a good fraction of its many clients. This is sheer mathematics: the more clients you have, the greater the chances that you break something.

Degradation of locality is also a risk: when you reuse (for instance by means of inheritance), the various pieces making up your class are actually defined outside the scope the class. This will make it difficult to understand what the class is doing because it's actual definition is scattered across several locations.

These ideas are not new. They were already expressed by Riachard Gabriel in his book "Patterns of Software".

Gabriel coined the term "compression" to describe reuse within a program. In the very first chapter, "Reuse vs. Compression", he says:
"Compression is a little dangerous because it requires the programmer to understand a fair amount about the context from which compressed code will take its meaning.

...The primary feature for easy maintenance is locality: Locality is that characteristic of source code that enables a programmer to understand that source by looking at only a small portion of it. Compressed code doesn’t have this property, unless you are using a very fancy programming environment."

Java Scripting

1 comments
Last week I devoted the time for going through Steve Yegge's talk on Rhino (Rhino = "...an open-source implementation of JavaScript written entirely in Java"). In particular, I liked the idea that thanks to the scripting support of Java6, Javascript is "...finally becoming the scripting language for Java".

Naturally, I wanted to see how easy it is to make Java scriptable. So I sat down and played with the Java scripting engine and my answer is this: very easy!


package rhinodemo;

import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;

public class Factorial {
private final int n;
private int result = 1;

public Factorial(int n) {
this.n = n;
}

public int getResult() {
return result;
}

public Factorial compute() {
if(n >= 1) {
Factorial that = new Factorial(n-1);
that.compute();
result = n * that.getResult();
}

return this;
}

public static void main(String[] args)
throws Exception {
ScriptEngineManager mgr
= new ScriptEngineManager();
ScriptEngine engine
= mgr.getEngineByName("JavaScript");

engine.eval(
"importClass(Packages.rhinodemo.Factorial);" +
"importClass(Packages.junit.framework.Assert);" +
"" +
"Assert.assertEquals(24, " +
" new Factorial(4).compute().getResult());" +
"println('-ok-')");
}
}


What we have here is a Java class, Factorial, that computes n!. The protocol of this class is deliberately cumbersome: I wanted to have a real class with a real constructor and two meaningful methods, and not just one static method. The main() method executes Javascript code that tests the Factorial class using JUnit's Assert class (you need to have junit.jar on your classpath).

Even this short program demonstrates a few key points:
  • Accessing Java libraries from Javascript
  • Passing Javascript objects to Java code
  • Creating instances of Java classes from within Javascript

An Arithemetic Expressions Solver in 64 Lines

3 comments
A few months ago Martin Fowler published a post titled Parser Fear. As always (well, almost always) I found myself agreeing with his main point: people are often intimidated by parsers.

Fowler's post advocates the use parser generators (in particular, ANTLR). I'd like to go a little further and make the following points:
  • Writing a recursive decent parser is much easier than you think.
  • You don't need a parser generator to overcome your parser fear.
  • Get over the trauma induced by the compiler class you took.

Actually I only need to show a good argument for point #1. The other two points naturally follow.
I will make my case by using the well established technique of proof by example: A 64 lines Java class that evaluates arithmetic expressions supporting +, -, *, /, negation, and parenthesis.

public class CompactSolver {
private String input;

public CompactSolver(String input_) {
input = input_.trim();
}

private char peek(int offset) {
return offset >= input.length() ? '\0' :
input.charAt(offset);
}

private boolean consumeIf(char c) {
if (peek(0) != c)
return false;
consume(1);
return true;
}

private String consume(int numChars) {
if (numChars == 0)
throw new RuntimeException("syntax error");
String result = input.substring(0, numChars);
input = input.substring(numChars).trim();
return result;
}

public double eval() {
double lhs = mul();
if (consumeIf('+'))
return lhs + eval();
else if (consumeIf('-'))
return lhs - eval();
return lhs;
}

private double mul() {
double lhs = unary();
if (consumeIf('*'))
return lhs * mul();
else if (consumeIf('/'))
return lhs / mul();
return lhs;
}

private double unary() {
if (consumeIf('-'))
return -unary();

if (consumeIf('(')) {
double result = eval();
if (!consumeIf(')'))
throw new RuntimeException("Missing ')'");
return result;
}

return literal();
}

private double literal() {
for (int i = 0; true; ++i)
if (!Character.isDigit(peek(i)))
return Integer.parseInt(consume(i));
}
}



Update 1: Changed the precedence of the (unary) negation operator (Many thanks to Orlando).

Examining Command-Query-Separation

2 comments
I have already written at least once on the command-query-separation (CQS) principle. Today, I would like to show a class which benefits from NOT following this principle.

So, here is a snippet from a class that parses and evaluates arithmetic expressions.

public class Solver1 {
private String input;

public Solver1(String input_) {
input = input_.trim();
}

private boolean consumeIf(String s) {
if (!input.startsWith(s))
return false;
input = input.substring(1).trim();
return true;
}

public double eval() {
double lhs = multiplication();
if (consumeIf("+"))
return lhs + eval();
else if (consumeIf("-"))
return lhs - eval();
return lhs;
}

private double multiplication() {
double lhs = unary();
if (consumeIf("*"))
return lhs * multiplication();
else if (consumeIf("/"))
return lhs / multiplication();
return lhs;
}
...
}

Looking at consumeIf() we note that it violates the CQS principle. This method is a query (it computes and returns a result - either true of false) which is also a command (it changes the value of a field). Let's see the looks of the code after decomposing consumeIf() into two methods: a query and a command:


public class Solver2 {
private String input;

public Solver2(String input_) {
input = input_.trim();
}

private boolean nextIs(String s) {
return input.startsWith(s);
}

private void move(int numChars) {
input = input.substring(numChars).trim();
}

public double eval() {
double lhs = multiplication();
if (nextIs("+")) {
move(1);
return lhs + eval();
}
else if (nextIs("-")) {
move(1);
return lhs - eval();
}
return lhs;
}

private double multiplication() {
double lhs = unary();
if (nextIs("*")) {
move(1);
return lhs * multiplication();
}
else if (nextIs("/")) {
move(1);
return lhs / multiplication();
}
return lhs;
}
...
}


There is a 25% increase in code size, 42 vs. 33 lines of code, but I will put this aside because I think that #LOC is not a very good estimate of code quality. (Actually, I don't believe in any code quality metric)

Nonetheless there is a another issue which is much more significant: in the second program we have this recurring idiom of calling nextIs() and then calling move(), as in: if (nextIs("/")) { move(1); ... }. These two calls are mutually dependent: the parameter passed to move() must be the length of the string passed to nextIs(). This idiom is error-prone as the programmer may accidentally change one of the parameters while leaving the other intact.

This shows that arbitrarily applying the CQS principle may yield bad results. A good programmer should be aware of that and should make judgment calls rather than blindly following principles. In fact, this happens a lot with many other programming principles. They are usually much closer to guidelines (and therefore require judgment) than to absolute rules. I think that Lars summarized it pretty well: "I generally dislike '-isms' because they set up rules that people try to apply without understanding the preconditions that need to be met."

Evaluating a Language

1 comments
I decided to try Ruby. Don't get me wrong. I had already written a few things in Ruby. But today I was a bit more serious about it.

My motivation was to get some more insight regarding the never-ending static vs. dynamic typing debate. So, this morning I wrote a Java program (the standard arithmetic expressions parser/evaluator) and then I tried to write it in Ruby. Unlike previous occasions I actually took the time and searched the web for a real Ruby IDE. After a few disappointments (#1, #2 and #3 - it was on my Windows machine, maybe they work better w/ Linux) I settled on NetBeans.

My Ruby skills felt somewhat rusty so I decided to warm-up by TDDing an easy kata. After the third or fourth test case I noticed a very funny thing: Ruby was much easier than I remembered. Suddenly I didn't feel so rusty. Thinking about it I realized that this was not due to the language but rather due to the IDE. In all past attempts I was too lazy to install an IDE so my Ruby experience was always through a text editor and a command prompt. Once I had an IDE at my disposal (and I mean a serious IDE: completion, go to declaration, templates, the works) the learning curve has flattened.

For example, I wasn't falling back to Java's curly braces as much as I used to in the past, because pairs of def/end keywords were being injected by the IDE as I was typing; Fixing the (highly non-friendly) syntax error: "unexpected $end, expecting kEND" was much less painful since I had this nice red thingy on the editor's side bar; Determining whether an expression such as assert_equal(0, x.get_result) can also be written as assert_equal 0, x.get_result was a breeze.

The morale here is two fold. First, there is the trivial issue of WYSIWYG tools making it possible even for a complete beginner to get things done. Second, you can't really evaluate a language without an IDE. When you evaluate a language you are unconsciously comparing it against your programming experience with your current, favorite, language. For sure, you have all the tooling support you need for your current language, so unless you get some tooling support for the new language, the comparison is unfair. Once you set up an IDE for the new language your programming experience will (dramatically) improve.

I guess there is a deep psychological issue here. We are humans. We always try to do as little work as possible. When we try a new language we tend to download the least amount of tools. We rationalize it by saying something like: "well I am a seasoned programmer. I don't need an IDE/Debugger/whatever to get the feeling of the language". This is a risky -- and usually a plain wrong -- assumption.