On the similarity of agile development and concurrent programming

7 comments
This is one these posts that I've been willing to write for ages. The inspiration came from reading Kent Beck's Extreme Programming Explained and from a post by Stephan Schmidt.

Concurrent Programming

Let's say you need to write a program that crunches a large set of inputs. There are three different calculations that need to be carried out for each input value. Each calculation is relatively complicated and relies on data structures for looking up previously computed values.

In a single-core settings things are simple. You can either choose to feed each input into all three algorithms or, otherwise, to run the first algorithm on all inputs and then turn to the other two algorithms. Either way, your throughput will be roughly the same (modulo caching issues).

The story is different in multi-core machines. By and large, there are two design options:

  • Option 1 - Decomposition by Algorithms

    Creare three threads, one for each algorithm. Each thread scans the input and feeds the values into its own algorithm.

    There are almost no data races - the data structure supporting algorithm 1 are only accessed by thread 1, thus obviating the need to guard against other threads. Cache behavior also seem to be quite good: each thread touches only a part of the heap, thus promoting data locality and cache-friendliness.
  • Option 2 - Decomposition by Inputs

    Creare N threads (N == #cores). Each thread takes the next value from the input, and feeds it, in turn, into each of the three algorithms.

    This design puts a greater burden on the programmer. Each thread touches all data structures, thus requiring careful locking. Locality is also not so good: after accessing data structures of algorithm 1, a thread will start executing algorithm 2 (for the same input) thus touching different data structures resulting in more cache misses.
It seems that decomposition by algorithms is superior to decomposition by inputs. It is simpler to implement (less locking), and is likely to be more cache friendly. Are there any advantages to using decomposition by inputs?

Yes!

  • Scalability. Option 1 is great if you have three cores on your machine. If your code runs on a machine with (say) 8 core, then you need to redesign your code such that it has five more threads, possibly by decomposing existing algorithm into smaller, concurrently running, sub-algorithms. This incurs a substantial engineering effort.

    Option 2, on the other hand, is highly scalable. If you're running on more cores, you just need to spawn more threads. You can even let your code figure the optimal number of threads dynamically, thereby allowing it to adjust to unforeseen circumstances (such as: if the O/S starts running a CPU-intensive process that leaves your program with fewer cores).
  • Reduced starvation. In option 1, if execution times of the various algorithms are not equal, some threads will finish before the others (idle), thereby making overall throughput sub-optimal.

    In option 2, a thread may go idle only at the very end of the execution: when the number of remaining inputs is less than #cores, which is a fracture of the total iterations.
  • Less up-front estimations. In option 1, one needs to estimate the execution times of the various algorithms, across all possible target machines, in order to minimize the effect of starvation.

    In option 2, such estimations are practically redundant due to the reduced starvation.
  • Progress monitoring. Given that number of inputs is far larger than number of algorithms in a program, prediction of time-to-completion is more accurate in option 2 (c.f. Law of Large Numbers).
  • Easy On/Off. In option 2, if you need to stop in the middle, you just need to stop dispatching work items to threads. Pretty soon all threads will stop. In option 1, the programmer needs to build shutdown logic into each thread/algorithm (e.g.: inspect a shared, atomic boolean and bail out if it goes to false)

    In much the same way, it is also easy to restart later: you know that all inputs (up to a certain point) were fully processed, and all remaining inputs are fully un-processed. Thus, you have perfectly valid (though partial) output

    An "Abort" in option 1 leaves each thread in a different location, thereby making it difficult to resume later. The outputs, across the algorithms, are non-aligned (in addition to being partial).
Of course, decomposition by inputs is not always better than decomposition by algorithms. Inter-thread locking may be, in certain cases, so costly that option 1 is still faster. Nonetheless, I argue that, in general, option 2 is likely to be more scalable, yield better throughput (due to less starvation) etc.

Agile Development

Please re-read the above text, after applying the following substitutions:
  • Threads, Cores -> Programmers
  • Inputs -> Features
  • Algorithm -> Module, Subsytem (or any other large piece of a program)
  • Execution Time -> Development Time
  • Cache miss -> Developer switching between different parts of the code

(I believe that this is just another manifestation of the super-linearity axiom: in the world of software development, smaller tasks are more cost-effective than larger ones).

PS
I am in the process of further expanding this topic in my bliki, http://collidingobjects.herokuapp.com. Stay tuned.



Making your Swing App Mac/OSX compliant

5 comments
Running your plain Vanilla Swing on a Mac, for the first time, will be quite surprising. Your app will stand out, in a bad way, among all other Mac apps because Swing's menu bar is not compliant with OS X's menu bar.

Main difficulties are as follows:
  • Your JFrame will have its own menu bar - your menu items will not be shown on the (shared) menu bar

  • Using the Command key as the standard accelerator

  • Application menu is named after the (fully qualified name of the) main class

  • Supporting the standard "Preferences..." application menu item

  • Not breaking compatibility with the other OSes (Linux/Windows)


Alvin J. Alexander has written a fairly detailed tutorial about this subject but it is somewhat outdated now. When I tried using classes from com.apple.eawt (as suggested there) Eclipse complained:

Access restriction: The type Application is not accessible due to restriction on required library /System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Classes/ui.jar

(In addition, any static dependency on such Apple-specific classes will break the program when it is runs on Linux/Windows).

On the other hand, Eirik Bjørsnøs Macify library provides a good solution to this difficulty, but it does not address the first three points from above.

Here's a short program (github, zip) that shows how these issues can be completely solved. It was derived from both Alvin's and Eirik's articles. It contains Eirik's Macify library (as a .jar) and these two Java classes:

// Launcher.java
package com.blogspot.javadots.swingmac;

import javax.swing.*;
import org.simplericity.macify.eawt.*;

public class Launcher {

private static void macSetup(String appName) {
String os = System.getProperty("os.name").toLowerCase();
boolean isMac = os.startsWith("mac os x");

if(!isMac)
return;

System.setProperty("apple.laf.useScreenMenuBar", "true");
System.setProperty("com.apple.mrj.application.apple.menu.about.name",
appName);
}

public static void main(String[] args) throws Exception {
macSetup("swing-mac");
UIManager.setLookAndFeel(UIManager.getSystemLookAndFeelClassName());

SwingUtilities.invokeLater(new Runnable() {

@Override
public void run() {
Application app = new DefaultApplication();
Main main = new Main();
app.addApplicationListener(main.getApplicationListener());

app.addPreferencesMenuItem();
app.setEnabledPreferencesMenu(true);
}
});
}
}

// Main.java
package com.blogspot.javadots.swingmac;

import java.awt.Toolkit;
import java.awt.event.KeyEvent;
import javax.swing.*;
import org.simplericity.macify.eawt.*;

public class Main {

private JFrame f = new JFrame();
private MyApplicationListener listener = new MyApplicationListener();

public Main() {

JMenuBar mb = new JMenuBar();
f.setJMenuBar(mb);
JMenu m = new JMenu("File");
mb.add(m);

addItem(m, "Open", KeyEvent.VK_O);
addItem(m, "Save", KeyEvent.VK_S);
addItem(m, "Save As", KeyEvent.VK_A);
addItem(m, "Import", KeyEvent.VK_I);
addItem(m, "Export", KeyEvent.VK_E);

f.setTitle("Main");
f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
f.setSize(400, 300);
f.setVisible(true);
}

private void addItem(JMenu m, String name, int accelerator) {
JMenuItem mi = new JMenuItem(name);
mi.setAccelerator(KeyStroke.getKeyStroke(accelerator,
Toolkit.getDefaultToolkit().getMenuShortcutKeyMask()));
m.add(mi);
}

public ApplicationListener getApplicationListener() {
return listener;
}

// Must be public!!
public class MyApplicationListener implements ApplicationListener {

private void handle(ApplicationEvent event, String message) {
JOptionPane.showMessageDialog(f, message);
event.setHandled(true);
}

public void handleAbout(ApplicationEvent event) {
handle(event, "aboutAction");
}

public void handleOpenApplication(ApplicationEvent event) {
// Ok, we know our application started
// Not much to do about that..
}

public void handleOpenFile(ApplicationEvent event) {
handle(event, "openFileInEditor: " + event.getFilename());
}

public void handlePreferences(ApplicationEvent event) {
handle(event, "preferencesAction");
}

public void handlePrintFile(ApplicationEvent event) {
handle(event, "Sorry, printing not implemented");
}

public void handleQuit(ApplicationEvent event) {
handle(event, "exitAction");
System.exit(0);
}

public void handleReOpenApplication(ApplicationEvent event) {
event.setHandled(true);
f.setVisible(true);
}
}
}



Here are the key points. The order is highly important.
  1. Create a dedicated "Launcher" class to carry out the setup phase. A dedicated class ensures that setup takes place before any other UI interaction.
  2. Set system property apple.laf.useScreenMenuBar to true.
  3. Set system property om.apple.mrj.application.apple.menu.about.name to the application's name.
  4. Use Look&Feel to UIManager.getSystemLookAndFeelClassName()
  5. Use Toolkit.getDefaultToolkit().getMenuShortcutKeyMask() to obtain an OS-correct accelerator key.
  6. Register an application listener with Macify's Application object. The listener's implementing class must be declared as public.
  7. Call addPreferencesMenuItem() and setEnabledPreferencesMenu(true) on that application object.


The resulting code will work just fine on either Mac/Linux/Windows. The Macify library uses reflection to dynamically discover OSX's runtime system and to wire your listener to it. The use of reflection allows your code to compile on any operating system and also to run fine on Linux/Windows where OSX runtime is clearly not present.

Top Ten Getting-Started-With-Dojo Tips

7 comments
You heard good things about the Dojo library. What is the absolute minimum you need to know in order to start coding effectively with Dojo?

Discalimer: This post does not claim that Dojo is better than JQuery nor the converse. Each library has its strengths. My personal view is that JQuery offers a well designed programming model (the $("selector") thing is ingenious). On the other hand, Dojo currently offers a more extensive set of standard widgets.


#1: Importing Modules

dojo.xd.js is the basic module. Once you imported it into your page via <script src=".."> you can use dojo.require('fully.qualified.name') to import additional Dojo modules.

<html>
<head>
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
</script>
<script type="text/javascript">
dojo.require("dijit.form.Button");
</script>
</head>
<body/>
</html>


#2: addOnLoad()

addOnLoad() lets you register a function that will be called once page loading is finished. This is Dojo's cross-browser-compatible way to hook the onLoad event.

<html>
<head>
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
</script>

<script type="text/javascript">
dojo.addOnLoad(function() { alert("hi"); })
</script>
</head>
<body/>
</html>


#3: Widget creation -- Programmatic Style

There are two ways to create widgets: Programmatically (shown here) and declaratively (see tip #5). Either way, you must first import the corresponding module via a require() call.

In the programmatic you create a widget by calling its constructors, which typically takes two parameters:

  • options: a plain Javascript object specifying widget-specific options

  • id: The ID of a DOM node which will host this new widget


In the example below, a button widget is created via new dijit.form.Button({}, "click-me-button"), which means: no options; ID of hosting element is "click-me-button".


<html>
<head>
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
</script>
<script type="text/javascript">
dojo.require("dijit.form.Button");
dojo.addOnLoad(function() {
var button = new dijit.form.Button({}, "click-me-button");
button.attr("label", "Click Me");
});
</script>
<link rel="stylesheet" type="text/css"
href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>
</head>
<body class="claro">
<div id="click-me-button"/>
</body>
</html>


#4: Overriding a Widget's Methods

Any method defined in the constructor's first parameter will be attached to the newly created widget, thereby overriding an existing method with the same name. The code below overrides the onClick method of the Button widget.

Initial values for the widget's properties can be specified in a similar manner: { label: "Click me" }


<html>
<head>
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
</script>
<script type="text/javascript">
dojo.require("dijit.form.Button");
dojo.addOnLoad(function() {
new dijit.form.Button({
onClick: function() { alert("Thank you!"); },
label: "Click me!"
}, "click-me-button");
});
</script>
<link rel="stylesheet" type="text/css"
href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>
</head>
<body class="claro">
<div id="click-me-button"/>
</body>
</html>


#5: Widget creation -- Declarative Style

The declarative style lets you define widget by using HTML markup. To enable this you MUST specify djConfig="parseOnLoad: true" at the <script src="dojo.xd.js"> element.

You can then use a dojoType="dijit.form.Button" HTML-attribute to tell the Dojo parser to create a button widget that will be hosted by the enclosing HTML element.

A nested <script type="dojo/method" event="onClick" args="evt"> element will define a callback method for the onClick event. A nested <script type="dojo/connect"> element will define code that will be executed when the widget is created.


<html>
<head>
<script djConfig="parseOnLoad: true" type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
</script>
<script type="text/javascript">
dojo.require("dijit.form.Button");
</script>

<link rel="stylesheet" type="text/css"
href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>

</head>
<body class="claro">
<div dojoType="dijit.form.Button">
<script type="dojo/connect">
this.attr("label", "Click Me!");
</script>
<script type="dojo/method" event="onClick" args="evt">
alert("Thank you!");
</script>
</div>
</body>
</html>


#6: Defining widget variables -- Declarative Style

If you add a jsId="myButton" attribute to an HTML element that defines a Dojo widget (i.e. has a dojoType attribute) the Dojo parser will assign the widget to a global variable named myButton.

This allows programmatic access to a declaratively-defined widget.


<html>
<head>
<script djConfig="parseOnLoad: true" type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
</script>
<script type="text/javascript">
dojo.require("dijit.form.Button");
dojo.addOnLoad(function() {
alert("Press OK to change style");
myButton.attr("style", "color:red; font-weight:bold;");
});
</script>

<link rel="stylesheet" type="text/css"
href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>

</head>
<body class="claro">
<div dojoType="dijit.form.Button" jsId="myButton">
A simple button
</div>
</body>
</html>


#7: Obtaining the associated DOM node

Widgets and DOM nodes are distinct objects. In order to get the DOM node associated with a widget, use the widget's .domNode property.

                
<html>
<head>
<script djConfig="parseOnLoad: true" type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
</script>
<script type="text/javascript">
dojo.require("dijit.form.Button");
dojo.addOnLoad(function() {
myButton.domNode.innerHTML = myButton.domNode.innerHTML.bold();
});
</script>

<link rel="stylesheet" type="text/css"
href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>

</head>

<body class="claro">
<div dojoType="dijit.form.Button" jsId="myButton">
A simple button
</div>
</body>
</html>


#8: Looking up a DOM node

dojo.byId("someId") is Dojo's cross-browser-compatible way to obtain a DOM node by its ID.


<html>
<head>
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
</script>
<script type="text/javascript">
dojo.addOnLoad(function() {
dojo.byId("some.div").innerHTML = "found it!";
});
</script>
</head>
<body>
<div id="some.div"/>
</body>
</html>


#9: The Widget -> Model -> Store Pattern

Sophisticated Dojo Widgets, such as the Tree or the DataGrid widgets rely on the following structure:

The widget observes a model which observes a data store which maintains the actual data.

There are several (predefined) models that can work with each widget. The Tree widget, for example, can work with either a TreeStoreModel or a ForestStoreModel. These models can work with stores such as ItemFileWriteStore or ItemFileReadStore.


<html>
<head>
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
</script>
<script type="text/javascript">
dojo.require("dojo.data.ItemFileWriteStore");
dojo.require( "dijit.Tree" );

function initPage() {
var store = new dojo.data.ItemFileWriteStore({ data:
{
identifier: 'id',
label: 'name',
items: [
{ id: 1, name: 'Star Wars Saga', root: true,
children:[{_reference: 2}, {_reference: 3}, {_reference: 4}] },
{ id: 2, name: 'Star Wars' },
{ id: 3, name: 'The Empire Strikes Back' },
{ id: 4, name: 'Return of the Jedi' },
]
}
});

var treeModel = new dijit.tree.ForestStoreModel({
store: store,
query: { 'root': true }
});

var widget = new dijit.Tree({model: treeModel, showRoot: false }, "div-tree")
}

dojo.addOnLoad(initPage);
</script>
<link rel="stylesheet" type="text/css"
href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>
</head>

<body class="claro">
<div id="div-tree"></div>
</body>
</html>


#10: Change the Widget's Content by Mutating its Data Store

When you're dealing with Widget-Model-Store setup and you want to change the content displayed by the widget, the correct way to do it is to change the data store object. These changes will be propagated along the observation chain and will eventually be reflected at the UI.

Obviously, the store object must support mutations. Read-only stores (such as: ItemFileReadStore) will not work for you, here.

The code below invokes store.deleteItem() when the "Delete 'Return of the Jedi'" button is clicked. The associated Tree widget is automagically updated.


<html>
<head>
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
</script>
<script type="text/javascript">
dojo.require("dijit.form.Button");
dojo.require("dojo.data.ItemFileWriteStore");
dojo.require( "dijit.Tree" );

function initPage() {
var store = new dojo.data.ItemFileWriteStore({ data:
{
identifier: 'id',
label: 'name',
items: [
{ id: 1, name: 'Star Wars Saga', root: true,
children:[{_reference: 2}, {_reference: 3}, {_reference: 4}] },
{ id: 2, name: 'Star Wars' },
{ id: 3, name: 'The Empire Strikes Back' },
{ id: 4, name: 'Return of the Jedi' },
]
}
});

var treeModel = new dijit.tree.ForestStoreModel({
store: store,
query: { 'root': true }
});

var widget = new dijit.Tree({model: treeModel, showRoot: false }, "div-tree");

new dijit.form.Button({
label: "Delete 'Return of the Jedi'",
onClick: function() {
store.fetchItemByIdentity({ identity: 4, onItem: function(item) {
store.deleteItem(item);
} });
}
}, "div-button");
}

dojo.addOnLoad(initPage);
</script>
<link rel="stylesheet" type="text/css"
href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>
</head>

<body class="claro">
<div id="div-button"></div>
<div id="div-tree"></div>
</body>
</html>

I thought that a small project does not need testing...

3 comments
(...Or: You can't over-estimate the importance of a testing infrastructure)

Over the last few weeks I had to write three small (1-2 days of work) applications. They were a command line utility, a Swing application that analyzes the complexity of Java methods, and a web-page--a single .html file--that lets it user extract some information from a REST server (Javascript, JQuery, Ajax).

I don't think that LOC is a good code metric, but just to give a feeling on the sizes of these projects, their respective LOC (including blanks, comments, unit tests, HTML) values are: 515, 1383, 664.

Reflecting on these three application I see a common theme: While I did unit test (and even TDD-ed) certain parts of the apps, I didn't put much emphasize on making the code testable. After all, these are very small apps which should not take very long to develop, so investing in testability seemed like a waste of time. I believed that the benefits of automatic testing will not outweigh the initial cost of putting the required scaffolding in place.

I was wrong. Even in web-page of 664 lines (including plain HTML), I quickly got to a position where the behavior of the app was quite intricate. In order to make sure that new functionality does not tamper with existing one I found myself repeatedly rerunning a lengthy series of manual tests. At the next round, there was even more "existing functionality" to test...

The total testing effort is actually similar to the sum of a simple arithmetic series: Sn = 1 + 2 + 3 + ... + n. In such a series the value of Sn grows ~ n^2. This means that the time needed for adding a new piece of functionality will rise as the app grows. Eventually it will reach a point where the time of implementing a feature will not be determined by the complexity of the feature but by the complexity of the app. All features, little or big, will take a lot of time to complete because the dominant cost is the testing, not the implementation.

If I decide not to (manually) test every new increment of functionality I am at the risk of not detecting bugs the moment they are introduced. This incurs significantly longer times for fixing these bugs when they are eventually detected.

Of course, at the beginning everything looked fine. However, after just a few hours the signs of a technical debt became evident: the code grew messy. I was afraid to refactor. I felt that I am coding in "extreme cautious" mode. I am no longer in control of my code. I could not move it in the direction that I wanted.

The amazing thing is the short distance that you have to walk in order for this effect to kick in. It usually took less than half a day for me to realize that manual testing slows me down.

Will I do things differently in the future? Yes. I will start with a testable skeleton of the app before adding any substantial behavior to it. The "start-from-a-skeleton" practice is already quite popular. The emphasize here is two fold:
  • It should be a testable skeleton. This will let you build testability into the system from the very start.
  • The extra cost of a testable skeleton pays off even in extra-small projects.
A thorough treatment of this topic is given in chapter Four of GOOS (by Steve Freeman and Nat Pryce) which talks about "Kick-Starting the Test-Driven Cycle". In particular, they argue that the goal of every first iteration should be to "Test a Walking Skeleton". Go read it.

I don't want to hear the "we-need-to-separate-tests-from-code" excuse

12 comments
Many programmers tend to place their unit tests in a dedicated source folder. I do it differently. In my projects, the unit-test of class SomeClass is called SomeClass_Tests and it is located in the very same folder as SomeClass.java.




This has all sort of benefits: I can instantly see if a class has a test. I can instantly jump from the test to the testee and vice-versa. I don't have to create two parallel hierarchies of folders. It is very unlikely that I will rename the production class but not its test. Renaming of a package affects both tests and production classes. etc. There is one word that summarizes these benefits: Locality. Things that change together ought to be placed as close as possible.

There are IDE plugins out there that provide similar capabilities over a project structure that has separate source folders. However, these plugins will not help you when you access your code not through your IDE (for instance, I often explore my code repository via a browser).

But even more importantly, as I got more and more (test) infected I realized that I don't want to have this test-production separation. The tests are not something external that needs to be put away. Tests are a central piece of knowledge about my code. I want them to be near by. Think about it: Would you separate your Javadoc text from the class/method it is describing? Will you find it productive to write code in a language which dictates that fields are defined in one file and methods are defined in another file? (Actually, this is pretty much what happens in C++ ...).

Of course not.

The main difficulty is the packaging/deployment phase. I often heard the argument that "we need to have two separate folders because otherwise our deliverable (.jar/.war/...) will include both production code and testing code, and this is bad".

Is it really bad? First, In many situations (web-apps anyone?) the size of the binary does not matter much. Second, in situations where the deliverable includes the source code, the tests can be quite handy as usage samples.

If you're in a situation where these arguments do not hold, and you absolutely cannot place test code in your deliverable, then the remainder of this post is for you.

I often said that it is not hard to write a program that will delete .class files of test classes. It requires some knowledge of bytecode inspection techniques which apparently is not very common. So, as a service to the Java community and for the common good, I cleared a few hours and wrote it. The result is class-wiper (sources) a command line utility that will recursively scan the given directories and will delete all .class files therein that are related to JUnit.

Specifically, it will delete a class if either: (a) it mentions the @Test annotation; or (b) It uses directly or indirectly a class that mentions @Test. Your production code will never meet neither of these conditions. If you don't want test classes to reach your client you just need to invoke it from your build script, as follows:

 java -jar class-wiper <binary-dir1>  <binary-dir2> ...


This software is provided absolutely for free. Use it anyway you like. Change it. Tweak it. Hack it. Sell it. Whatever. I only ask one simple thing: please stop saying that you need to place your tests in a separate directory because you don't have a way to prevent your tests from reaching the deliverable.

Repeat after me: Immutable objects will not slow you down

8 comments
As noted by Stephan Schmidt the advent of functional programming promotes the use of immutable objects even in non-functional languages. Immutable objects have many positive traits: they are safer, they throw less exception, they are not prone to problems of covariance, they prevent races when used in a multi-threaded settings, etc.

The main (only?) down side of immutable object is the fact that they make it difficult to update the data in your program. This should not be a surprise. After all, if they are immutable then it only makes sense that they will not encourage mutations.

Here is the general description of the problem: if x is an immutable object and x.f == y, and you want x.f to point at z, then you can't just set x.y to z. You need to create a new object, x', that is identical to x in all aspects except that x'.f == z. You then need to find all references to x and reroute them to x'.

In certain cases (such as: many object holding references to x) this rerouting may be quite hard and error prone. Also, if the objects pointing at x are immutable themselves, then further updates need to be carried out across your data structure. These difficulties are the reason why most C/C++/Java/C# folks tend to prefer designs with mutable objects, despite the benefits of immutability.

Side note: functional languages usually offer powerful pattern matching mechanisms that simplify the process of updating (immutable) data structures.

Anyway, even in cases where correctly rerouting references is not that bigger problem, developers are sometimes reluctant to use immutability due to performance issues. The argument goes as follows: If I make this class immutable, I will have to allocate a new object every time I update one of the fields. This extra allocation will slow down my program.

This argument is, by and large, incorrect.

Reason #1. Most of the code you're writing will not affect the performance of your program. There is no point in optimizing your code prematurely.

Reason #2. Mutable objects lead to defensive getters/setters. That is, If a method returns a field pointing at a mutable object it usually needs to create a copy of it to prevent the caller from breaking the invariants of the class. This means that an object will be duplicated even if the caller just wants to read its content. Given that reads are more frequent than writes we actually have that immutable objects yield faster programs simply because they do not imply object copying with every get/set operation.

I experienced this effect a few years ago when I had to optimize CPU-intensive code that dealt with queries over in-memory tables. The profiler indicated that most of the time my program was busy duplicating rows of these tables which, usually, were not mutated at all. Switching into an immutable design resulted in a significant performance boost.

Still not convinced? Maybe this will help you. When Josh Bloch discusses API design decisions that create performance problems, he gives as an example Java's Dimension class which is---wait for it---mutable. Specifically, each call to Component.getSize() must allocate a new Dimension object which leads to numerous needless allocations. Further details are given in Josh's excellent lecture, How to Design a Good API & Why it Matters (Immutability of class Dimension is discussed ~ 32 min. into the talk).

So You Want to Practice your Code Reviewing Skills? - Summary

0 comments
Earlier this week I published So You Want to Practice your Code Reviewing Skills? which challenged the readers to find bugs in ~ 150 LOC.


The replies were very interesting. Here's a summary

Concurrency/Distribution/Singleton
  • instance() method is not synchronized
  • sessionCounter needs to be volatile/guarded by synchronization (in visitorCount())
  • (Mutable) Singeltons cannot be distributed. is If there is several instance of the webapp running, not all values will be counted because several counter will exist and not be synchronized from JVM to JVM.
  • The singleton implementation is broken. See, for instance, this article.
General
  • DataBaseHelperget() method may return the empty string ("") if no rows are in the table, and the singleton only expects a valid number or null, which will cause a NumberFormatException
  • the counter may roll over. For a 7 year old web app this is perfectly possible.
  • I consider constructs like this catch(Exception e) { sLogger.error("", e); } to be a bug.
JDBC
  • The DB change isn't being committed so it has to rely on the underlying driver's behavior.
  • The insert statement doesn't explicitly defines column names, which is susceptible to failure if the order of rows in the DB table changes.
  • The update statement updates the row with param = 'SESSION_COUNT' although the insert inserts 'session_count'. Usually this would not work without special set-up in the db and/or connection.
  • generateResult() does not close the ResultSet object. Note that the originating Statement object is hidden inside the DataBase.performSqlQuery() method (whose code is not given) so Statement also remains open. The closing of these two objects is deferred to the GC.
Final thoughts
  • Do not use singletons. A singleton is a smell that indicates the need for dependency injection.
  • In general, service objects in web-apps should not maintain state in (plain-old-Java) fields. If you need to have some state, write it to the DB. That's the only way to share state in a distributed settings.
  • The nastiest bug of all: the combination of exceptions being silently absorbed, and DataBaseHelper.get() returning an empty string ("") leads to a situation where sessionCounter stays zero, which will reset the visitor count at the DB to 100, thereby loosing the (correct) visitor count.

    Here's the scenario: In generateResult() an SqlException if fired (let's say due to a temporary network problem). It is silently caught by generateResult() which then returns an empty list. get() will return an empty string, which will yield a NumberFormatException in reloadParamsFromDB(), which - again - is silently ignored.
    Therefore, no assignment to sessionCounter is taking place, so it retains its default value, zero. At the 100th call to addSession() sessionCounter current value (100) will be written to the DB.

    Solution: add to the value at the DB instead of writing to it.

So You Want to Practice your Code Reviewing Skills?

7 comments
The code below is taken from a real web-app that has been up and running for > 7 years. This specific fragment is realizing the visitor count functionality: keeping track on the number of visitors hitting the site. Each time a new session is created SiteInfo.instance().addSession() is called.

Your task (should you choose to accept it...) is to find bugs in this code. In other words: will the visitor count, (as reported by SiteInfo.visitorCount()) always be correct? If not, what values can be seen there? How can we fix the code?

Please ignore design issues (I don't like the singleton anymore than you do), or technological issues, such as: "well you should just rewrite the whole thing with Spring + Hibernate". Just examine the code-as is and try to determine if/where can it fail.



public class SiteInfo {

private static SiteInfo inst;

private SiteInfo() {
//
};

// This is a singleton (Yak!)
public static SiteInfo instance() {
if(inst == null)
inst = new SiteInfo();
return inst;
}

private int sessionCounter = 0;
private static final String INIT = "100";

public int visitorCount() {
return sessionCounter;
}

public synchronized void reloadParamsFromDB() {
Connection con = null;
ConnectionPool pool = null;
try {

pool = ConnectionPoolFactory.getInstance();
con = pool.getConnection();

String countStr = DataBaseHelper.get(con, "config", "value",
"param", "session_count");
if (countStr == null) {
String q = "INSERT INTO config VALUES ('session_count','" + INIT + "')";
try {
runCommand(q, con);
countStr = INIT;
}
catch (Exception e) {
log.error("", e);
}
}

sessionCounter = Integer.parseInt(countStr);
}
catch (Exception e) {
log.error("", e);
}
finally {
ConnectionPoolFactory.release(pool, con);
}
}

private void runCommand(String q, Connection con) throws Exception {
Statement stmt = con.createStatement();
try {
stmt.executeUpdate(q);
}
finally {
stmt.close();
}
}

public synchronized void addSession() {
sessionCounter++;
if(sessionCounter % 100 != 0)
return;

Connection con = ConnectionPoolFactory.getInstance().getConnection();
Statement stmt = null;
try {
stmt = con.createStatement();
String sql = "UPDATE config SET value='" + sessionCounter
+ "' WHERE param='SESSION_COUNT'";
stmt.execute(sql);
}
catch (Exception ex) {
log.error("", ex);
}
finally {
try {
if(stmt != null)
stmt.close();
ConnectionPoolFactory.release(con);
}
catch(SQLException e) {
log.error("", e);
}
}
}
}

public class DataBaseHelper {
public static String get(Connection con, String table,
String columnA, String columnB, String columnAValue) {
String result = "";
String sqlQuery = "SELECT " + columnB + " from " + table
+ " WHERE " + columnA + " = '" + columnAValue + "'";

// Run the query. Translate the result set into a list of maps.
// Each map corresponds to a single row in the ResultSet
List<Map<Object,Object>> rows = generateResult(sqlQuery, con);
try {
Iterator<Map<Object,Object>> iter = rows.iterator();
if(iter.hasNext()) {
Map<Object,Object> m = iter.next();

result = (String) (m.get(columnB));
if (result == null)
return null;
}
}
catch (Exception e) {
return null;
}
return result;
}

public static List<Map<Object,Object>> generateResult(String query, Connection con) {
List<Map<Object,Object>> result = new ArrayList<Map<Object,Object>>();
try {
ResultSet resultSet = DataBase.performSqlQuery(query, con);
if(resultSet == null)
throw new Exception("Impossible");

ResultSetMetaData resultSetMetaData = resultSet.getMetaData();
int columnCount = resultSetMetaData.getColumnCount();

String[] columnNames = new String[columnCount];
for(int i = 0; i < columnCount; i++)
columnNames[i] = resultSetMetaData.getColumnName(i + 1);

while(resultSet.next()) {
Map<Object,Object> map = new HashMap<Object,Object>();
for(int i = 0; i < columnCount; i++) {
String col = columnNames[i];
map.put(col, resultSet.getString(i + 1));
}

result.add(map);
}
}
catch(Exception e) {
sLogger.error("", e);
}

return result;
}
}

JUnit Rules!

5 comments
Rules are a simple, yet amazingly powerful, mechanism introduced in JUnit version 4.7. They allow developers to easily customize JUnit's behavior by exposing meta information regarding the currently executing test. This post provides a straightforward example for writing a custom rule that augments JUnit with some useful functionality.

My subject class is IntSet: A set of integers implementing the standard operations of add(), remove(), contains(), clear() in O(1) time. To make this performance guarantee the set needs to know (in advance) the range of the values (min..max) and its size limit (number of elements that it will accommodate).

All in all, IntSet looks something like this:

public class IntSet {
... // Some private fields
public IntSet(int limit, int min, int max) { ... }
public int size() { ... }
public boolean contains(int n) { ... }
public void add(int n) { ... }
public void remove(int n) { ... }
}


One of my unit tests specifies the behavior of IntSet when its size limit is reached. If I'm only interested in the type of the exception I can specify it via the expected attribute of the @Test annotation:


@Test(expected=IllegalStateException.class)
public void shouldNotExceedCapacity() {
IntSet s = new IntSet(2, -10, 100); // Set size limit to 2
s.add(30);
s.add(40);
s.add(50); // Insertion of the 3rd element should fail
}


There are two drawbacks with this test. First, It only asserts the type of the an exception. It does not check the error message specified for the exception. Second, it does not assert that the exception was triggered by the last add() call. In other words, if we have a bug and the 2nd add() call is failing - with the same type of exception - the test will still pass.

To overcome this limitation we want to check the error message of the thrown exception. Specifically, we want to verify that the execution of the method fires an exception whose error message is "Cannot insert '50' - The set is full". Clearly, the chances of such an exception being thrown by the 2nd call are pretty slim.

Extending JUnit in such a manner is pretty easy thanks to the rules mechanism:


public class IntSet_Tests {

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
@interface Throwing {
public String value();
}

@Rule
public MethodRule mr = new MethodRule()
{
@Override
public Statement apply(final Statement base, FrameworkMethod m, Object o) {
Throwing t = m.getAnnotation(Throwing.class);
if(t == null)
return base;

final String message = t.value();
return new Statement() {

@Override
public void evaluate() throws Throwable {
try {
base.evaluate();
fail("No exception was thrown");
}
catch(AssertionError e) {
throw e;
}
catch(Exception e) {
assertEquals("Incorrect exception message", message, e.getMessage());
}
}
};
}
};

// All sort of @Test methods ...


// And now, a method that asserts the error message
@Throwing("Cannot insert '50' - The set is full")
@Test
public void shouldNotExceedCapacity() {
IntSet s = new IntSet(2, -10, 100);
s.add(30);
s.add(40);
s.add(50);
}
}


First we define a new annotation, @Throwing. Then we define a field annotated with @Rule to provide the custom handling of this annotation. Finally, we annotate the shouldNotExceedCapacity() method with a @Throwing("Cannot insert '50' - The set is full") annotation.

The mechanism works as follows: before each test method is run, JUnit creates a Statement object which is merely a command object through which the acutal method can be invoked. JUnit passes this object along with a FrameworkMethod object (a wrapper of Java's Method) and the unit test instance to all @Rule fields defined at the test class.

A @Rule field must be public and must implement the MethodRule interface (of course, you can instead extend one of several classes conveniently defined by JUnit). In the apply() method, above, we create a new Statement object that wraps the original one. The new evaluate() method will check that if an exception is thrown its message matches the text specified by the @Throwing annotation attached to the method.

Obviously, there are other ways to do that. For instance, one can use the ExpectedException class (a predefined JUnit rule) to achieve a similar effect. The purpose of this post is to surface the (mighty) powers of JUnit meta programming.

Axiom: Instability

2 comments
I had already blogged about the Axioms of programming: the fundamental rules that govern the development of every piece of (substantial) software. In this post I want to focus to on the Instability Axiom:

The external behavior of a component will need to change over time

Here's a real story. I once was involved with a very a small project, let's call it the PQR project: three people working part time for three weeks, putting together an HTTP server a web client and a GUI client - all are very simple. During these three weeks we were also learning some new technologies so actual coding time (of all developers combined) was about 20-25 days.

During these three weeks two important changes were applied to PQR's specification:
  • The technology with which the GUI client was implemented had to be changed. Instead of implementing it over Tech.A we had to switch to Tech.B.

  • The initial specs defined the data that should be persisted by the server. As we were playing with the intermediate versions of the project, we came to realize that a descent user experience requires that additional information will be persisted.

The main point of this post is not if/how we managed to support these changes. The point is that even in small projects specs are not stable. We were not successful in defining the project's goals for a three week period in a project which is as simple as industrial projects get. Of course, If the project were more complicated (more people, wider scope) then the instability was likely to be even higher.

This example indicates that a "fire and forget" type of development (AKA: "divide and conquer" ) where one breaks down the desired functionality into a few large pieces, assigns each piece to a programmer, and then lets each programmer work on his task in isolation of his peers (until an "integration" milestone is approaching) is broken.

First, external forces will change the specs, thus affecting the assigned tasks. In the PQR project, the change in client technology was due to some external factors (business/marketing constraints). Even though the initial specs were examined and audited by several layers of approvals, no one had predicted this change.

Second, feedback from working early versions of the product (even with partial functionality) will change our understanding of the product and its desired capabilities. In PQR, the change regarding which-information-should-be-persisted was driven by experimentation with early versions.

If we had taken a Fire-and-Forget approach then our ability to respond to the first change were very limited as every team member was in the middle of his large task. Also, by the time a first working version were available, very little time was left to implement significant changes.

Bottom line: Software is unstable. Breaking the effort into tiny tasks with frequent integration points (I am speaking about granularity of hours not weeks) is an excellent way to cope with this inherent instability.

Programmers are Decision Makers

1 comments
So yesterday I bumped into this bug: a listener of a tree object (not Swing's JTree. A tree with nodes and labeled edges, a-la Data Structures 101) was referencing a null object:

public class Auditor implements TreeListener {

private Map<String,Date> map = new HashMap<String,Date>();

public void nodeAdded(Node n) {
map.put(n.id(), new Date());
}

public void nodeRemoved(Node n) {
Date d = map.get(n.id());
String s = d.toString(); // <-- NPE is here!!
map.removeKey(n.id());

// some other things ...
}
}
My unit tests told me that the Auditor and Tree class are working fine - the tree fires the correct events and the Auditor reacts correctly. Of course, a Green bar does not guarantee "no bugs" so I double checked the code and it seemed fine.

Indeed the problem was elsewhere. Turns out that a bug in my app caused each listener to be registered twice. Auditor was not expecting to be called twice for each event, hence the NPE.

So I fixed the bug. Then I went back to the addListener() method of the Tree class. It is a very simple method:

public class Tree {

private List<TreeListener> listeners
= new ArrayList<TreeListener>();

public void addListener(TreeListener arg) {
listeners.add(arg);
}

// Additional methods ...
}

Looking at the code I tried to think if this method should be changed in order to prevent future occurrences of this bug. As I was thinking about it I realized that this simple method is an excellent illustration to one software's most fundamental truths:

Programming is about making (design) decisions.

I am not the first one to say this (see Joakim Holm's "Programming is all Design" or Alistair Cockburn). Here's the (probably partial) list of decision that a programmer has to make when writing a one-liner method that should simply add a listener to a list.


  1. Should the listeners field be private? Maybe we want subclasses to manipulate/inspect it in some way, which calls for protected visibility.

  2. What is the order of notification? Should we use a "first-registered first-notified" policy? or maybe "last-registered first-notified". Is order important at all? If not then maybe we would like to force the application not to depend on the order by random shuffling?

  3. Can a listener be registered with more than one Tree object? If so, then the listener interface should probably specify the originating tree, not just the node

  4. Can a listener be registered more than once with a single Tree object? Usually the answer is no, but there are scenarios where multiple registration does make sense.

  5. Assuming each listener may be registered exactly once, how do we deal with multiple registrations?
    • Throw an exception?
    • A standard Java assertion? Could be disabled which may be good or bad depending on context.
    • Throw an AssertionError? Has a more dramatic effect than a plain Exception. Also will be reported by JUnit as a failure rather than an error. Again, may be either good or bad.
    • Silently ignore. Return from the method without adding the listener. This seems very natural but it has the drawback that it hides the problem. If I choose to silently ignore then I will get no notification about future problems. How about logging?
    • Return a Boolean value indicating whether the listener was already registered. Puts the responsibility on the client code to check this value. Will not always happen.

  6. Do I need to provide some thread protection?
    • Define the method as synchronized?
    • Define the listeners field as a ConcurrentLinkedList?

  7. How should this list of listeners interact with the Garbage Collector? Should a listener that is only accessible from this list be collectible (by storing it as a WeakReference)? A "No" answer puts additional burden on client code: must remove listeners from the Tree object when they are no longer needed. If listeners are dynamically added as the app is running this may become a big problem.

    Note that choosing weak references is not always a good idea. Some listeners are only accessible from the listened-to object. Take for example a listener that inspects the tree and updates the title of the main window. It is created, pushed onto the listeners list and all other traces to it are lost. A weak reference will make such a listener disappear with the very first collection cycle.

  8. Do I want to refactor the whole "listeners" concern into a dedicated strategy class. This will improve testability and decoupling.

As I said this is a partial list. Point is simple: a programming task requires much more decision (and mental energy) that what initially seems. In other words: If you want to design everything upfront you'll end up programming upfront but without the assistance of tools such as IDE, Unit tests, Refactoring, and the likes. Good luck.

Katacast: String Calculator – Groovy

0 comments
The original idea behind Code Katas was that of a small programming exercise whose solution often includes some interesting challenge. In katacasts.com Corey Haines further explains that:

Over time, the concept of Katas grew from a problem to solve to a solution to practice. Uncle Bob Martin (among others) began talking about the idea of practicing the solution to a kata until the steps and keystrokes became like second nature, and you could do them without thinking.


In TDD Katas the focus in on doing the TDD cycle, Red-Green-Refactor, right. Corey had recently published a screencast where yours truly solves the StringCalculator TDD Kata in Groovy. It illustrates some points about TDDing (like "Fake it till you make it" or "triangulation") in a very clear way.

So, If you feel like you want to better understand TDD you can watch the Kata on Corey's site. An explanation of the different moves is given in the accompanying text.

Another recommendation is to watch Gary Bernhardt's screencast where he refactors cyclomatic complexity code in Python.

Enjoy.

Swing - Tabs to Spaces

12 comments
How do you make a swing text component (JTextEditor, JEditorPane, etc.) to insert spaces whenever the user hits the "Tab" key?

The answer is below. It is based on setting a document filter on the component's document object. The (static) install() method can do all the setup for you.

package com.blogspot.javadots;

import javax.swing.text.AbstractDocument;
import javax.swing.text.AttributeSet;
import javax.swing.text.BadLocationException;
import javax.swing.text.DocumentFilter;
import javax.swing.text.JTextComponent;

public class TabToSpaceFilter extends DocumentFilter {

private String spaces = "";
private JTextComponent textComponent;

public TabToSpaceFilter(int tabSize, JTextComponent tc) {
textComponent = tc;
for(int i = 0; i < tabSize; ++i)
spaces += " ";
}

@Override
public void insertString(FilterBypass fb, int offset, String text,
AttributeSet attr) throws BadLocationException {
super.insertString(fb, offset, translate(offset, text), attr);
}

@Override
public void replace(FilterBypass fb, int offset, int length,
String text, AttributeSet attr) throws BadLocationException {
super.replace(fb, offset, length, translate(offset, text), attr);
}

private String translate(int offset, String s) {
int col = columnOf(offset);

StringBuilder sb = new StringBuilder();
int top = s.length();
for(int i = 0; i < top; ++i, ++col) {
char c = s.charAt(i);
if(c == '\t')
sb.append(spaces.substring(col % spaces.length()));
else
sb.append(c);
}

return sb.toString();
}

private int columnOf(int i) {
String s = textComponent.getText();
if(i == 0)
return 0;
int prev = s.lastIndexOf("\n", i-1);
if(prev < 0)
return i;

return (i-prev)-1;
}

public static void install(int tabSize, JTextComponent area) {
AbstractDocument ad = (AbstractDocument) area.getDocument();
ad.setDocumentFilter(new TabToSpaceFilter(tabSize, area));
}
}

How JUnit's assertArrayEquals() should be implemented

2 comments
I find myself writing this assertion method at (almost) any project where #tests > K where K is typically 10. In other words: Every project.



public static void arrays(Object[] actual, Object... expected) {
for(int i = 0; i < Math.min(actual.length, expected.length); ++i)
Assert.assertEquals("Array mismatch at index " + i + ":", expected[i],
actual[i]);

Assert.assertEquals("Array length mismatch", expected.length,
actual.length);
}