Java: Developing On The Streets

From 0 to Node.Js/Jasmine/CoffeeScript in 10m

Context/Background
I wanted to get started with Node.js, Jasmine, and CoffeeScript. How to go about it in an Ubuntu system? For some reason installing node.js via sudo apt-get install nodejs (full instructions) didn't work for me. I overcame this by locally compiling it, but then other issues came up as I moved on to Jasmine. It took some trial-and-error time but eventually I zeroed in on the solution. A key part is the installation of npm (a package manager for Node.js).

Starting Point
You have downloaded Node.js source code (here, node-v0.6.10.tar.gz) into your ~/Downloads directory.

From .tar.gz to Jasmine...

mkdir -p /tmp/installation
sudo apt-get install npm
cd /tmp/installation
tar -zxf ~/Downloads/node-v0.6.10.tar.gz 
cd node-v0.6.10/
./configure
make
node -e "console.log('Node.js is up')"
curl https://npmjs.org/install.sh | sudo sh
sudo npm install jasmine-node -g
cat <<EOF > app1.spec.js
var app = require('./app1.js');

describe("my app", function() {
  it("greets you", function() {
    expect(app.helloWorld("Martin")).toEqual("Hello, Martin");
  });
})      
EOF

cat <<EOF > app1.js
exports.helloWorld = function(s) {
  return "Hello, " + s;
}
EOF

jasmine-node app1.spec.js

... to CoffeeScript

sudo npm install -g coffee-script

cat <<"EOF" > app2.spec.coffee
app = require './app2'

describe "my app", -> 
  it "greets you", -> 
    (expect app.helloWorld "Martin").toEqual "Hello, Martin"
EOF

cat <<"EOF" > app2.coffee
exports.helloWorld = (s) ->
  "Hello, " + s;
EOF

jasmine-node --coffee app2.spec.coffee

Final thoughts
This post's goal is to reduce setup time/learning curve such that others can start using these (great) tools right away. As a Node.js noob, I didn't even know that npm exists. I also found it difficult setting up my first toy programs: a Node.js program, a Jasmine spec, a CoffeeScript program, and a Jasmine/CoffeeScript spec. There are syntactic issues and framework issues (such as: Node's require() and exports thingy) which you need to get exactly right in order to make the pieces fit together. An out-of-the-box running sample - which is what this post tries to provide - can mitigate much of this pain.

I also found it difficult to find good web-pages discussing this process. Thus, for the common good, I will put a few keywords to help future Noobies find it: jasmine node.js coffeescript Ubuntu npm spec installation setup getting started hello world program.

On the similarity of agile development and concurrent programming

7 comments

This is one these posts that I've been willing to write for ages. The inspiration came from reading Kent Beck's Extreme Programming Explained and from a post by Stephan Schmidt.

Concurrent Programming

Let's say you need to write a program that crunches a large set of inputs. There are three different calculations that need to be carried out for each input value. Each calculation is relatively complicated and relies on data structures for looking up previously computed values.

In a single-core settings things are simple. You can either choose to feed each input into all three algorithms or, otherwise, to run the first algorithm on all inputs and then turn to the other two algorithms. Either way, your throughput will be roughly the same (modulo caching issues).

The story is different in multi-core machines. By and large, there are two design options:

Option 1 - Decomposition by Algorithms

Creare three threads, one for each algorithm. Each thread scans the input and feeds the values into its own algorithm.

There are almost no data races - the data structure supporting algorithm 1 are only accessed by thread 1, thus obviating the need to guard against other threads. Cache behavior also seem to be quite good: each thread touches only a part of the heap, thus promoting data locality and cache-friendliness.
Option 2 - Decomposition by Inputs

Creare N threads (N == #cores). Each thread takes the next value from the input, and feeds it, in turn, into each of the three algorithms.

This design puts a greater burden on the programmer. Each thread touches all data structures, thus requiring careful locking. Locality is also not so good: after accessing data structures of algorithm 1, a thread will start executing algorithm 2 (for the same input) thus touching different data structures resulting in more cache misses.

It seems that decomposition by algorithms is superior to decomposition by inputs. It is simpler to implement (less locking), and is likely to be more cache friendly. Are there any advantages to using decomposition by inputs?

Yes!

Scalability. Option 1 is great if you have three cores on your machine. If your code runs on a machine with (say) 8 core, then you need to redesign your code such that it has five more threads, possibly by decomposing existing algorithm into smaller, concurrently running, sub-algorithms. This incurs a substantial engineering effort.

Option 2, on the other hand, is highly scalable. If you're running on more cores, you just need to spawn more threads. You can even let your code figure the optimal number of threads dynamically, thereby allowing it to adjust to unforeseen circumstances (such as: if the O/S starts running a CPU-intensive process that leaves your program with fewer cores).
Reduced starvation. In option 1, if execution times of the various algorithms are not equal, some threads will finish before the others (idle), thereby making overall throughput sub-optimal.

In option 2, a thread may go idle only at the very end of the execution: when the number of remaining inputs is less than #cores, which is a fracture of the total iterations.
Less up-front estimations. In option 1, one needs to estimate the execution times of the various algorithms, across all possible target machines, in order to minimize the effect of starvation.

In option 2, such estimations are practically redundant due to the reduced starvation.
Progress monitoring. Given that number of inputs is far larger than number of algorithms in a program, prediction of time-to-completion is more accurate in option 2 (c.f. Law of Large Numbers).
Easy On/Off. In option 2, if you need to stop in the middle, you just need to stop dispatching work items to threads. Pretty soon all threads will stop. In option 1, the programmer needs to build shutdown logic into each thread/algorithm (e.g.: inspect a shared, atomic boolean and bail out if it goes to false)

In much the same way, it is also easy to restart later: you know that all inputs (up to a certain point) were fully processed, and all remaining inputs are fully un-processed. Thus, you have perfectly valid (though partial) output

An "Abort" in option 1 leaves each thread in a different location, thereby making it difficult to resume later. The outputs, across the algorithms, are non-aligned (in addition to being partial).

Of course, decomposition by inputs is not always better than decomposition by algorithms. Inter-thread locking may be, in certain cases, so costly that option 1 is still faster. Nonetheless, I argue that, in general, option 2 is likely to be more scalable, yield better throughput (due to less starvation) etc.

Agile Development

Please re-read the above text, after applying the following substitutions:

Threads, Cores -> Programmers
Inputs -> Features
Algorithm -> Module, Subsytem (or any other large piece of a program)
Execution Time -> Development Time
Cache miss -> Developer switching between different parts of the code

(I believe that this is just another manifestation of the super-linearity axiom: in the world of software development, smaller tasks are more cost-effective than larger ones).

PS
I am in the process of further expanding this topic in my bliki, http://collidingobjects.herokuapp.com. Stay tuned.

Making your Swing App Mac/OSX compliant

5 comments

Running your plain Vanilla Swing on a Mac, for the first time, will be quite surprising. Your app will stand out, in a bad way, among all other Mac apps because Swing's menu bar is not compliant with OS X's menu bar.

Main difficulties are as follows:

Your JFrame will have its own menu bar - your menu items will not be shown on the (shared) menu bar

Using the Command key as the standard accelerator

Application menu is named after the (fully qualified name of the) main class

Supporting the standard "Preferences..." application menu item

Not breaking compatibility with the other OSes (Linux/Windows)

Alvin J. Alexander has written a fairly detailed tutorial about this subject but it is somewhat outdated now. When I tried using classes from com.apple.eawt (as suggested there) Eclipse complained:

Access restriction: The type Application is not accessible due to restriction on required library /System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Classes/ui.jar

(In addition, any static dependency on such Apple-specific classes will break the program when it is runs on Linux/Windows).

On the other hand, Eirik Bjørsnøs Macify library provides a good solution to this difficulty, but it does not address the first three points from above.

Here's a short program (github, zip) that shows how these issues can be completely solved. It was derived from both Alvin's and Eirik's articles. It contains Eirik's Macify library (as a .jar) and these two Java classes:


// Launcher.java 
package com.blogspot.javadots.swingmac;

import javax.swing.*;
import org.simplericity.macify.eawt.*;

public class Launcher {

  private static void macSetup(String appName) {
     String os = System.getProperty("os.name").toLowerCase();
     boolean isMac = os.startsWith("mac os x");    

     if(!isMac)
        return;
    
     System.setProperty("apple.laf.useScreenMenuBar", "true");
     System.setProperty("com.apple.mrj.application.apple.menu.about.name",
        appName);  
  }
 
  public static void main(String[] args) throws Exception {
     macSetup("swing-mac");
     UIManager.setLookAndFeel(UIManager.getSystemLookAndFeelClassName());
  
     SwingUtilities.invokeLater(new Runnable() {
    
        @Override
        public void run() {
           Application app = new DefaultApplication();
           Main main = new Main();           
           app.addApplicationListener(main.getApplicationListener());
          
           app.addPreferencesMenuItem();
           app.setEnabledPreferencesMenu(true);           
        }
     });
  }
}

// Main.java
package com.blogspot.javadots.swingmac;

import java.awt.Toolkit;
import java.awt.event.KeyEvent;
import javax.swing.*;
import org.simplericity.macify.eawt.*;

public class Main  {

  private JFrame f = new JFrame();
  private MyApplicationListener listener = new MyApplicationListener();

  public Main() {
    
     JMenuBar mb = new JMenuBar();
     f.setJMenuBar(mb);
     JMenu m = new JMenu("File");
     mb.add(m);

     addItem(m, "Open", KeyEvent.VK_O);
     addItem(m, "Save", KeyEvent.VK_S);
     addItem(m, "Save As", KeyEvent.VK_A);
     addItem(m, "Import", KeyEvent.VK_I);
     addItem(m, "Export", KeyEvent.VK_E);
    
     f.setTitle("Main");
     f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
     f.setSize(400, 300);
     f.setVisible(true);
  }

  private void addItem(JMenu m, String name, int accelerator) {
     JMenuItem mi = new JMenuItem(name);
     mi.setAccelerator(KeyStroke.getKeyStroke(accelerator,
        Toolkit.getDefaultToolkit().getMenuShortcutKeyMask()));
     m.add(mi);
  }

  public ApplicationListener getApplicationListener() {
     return listener;
  }

  // Must be public!!
  public class MyApplicationListener implements ApplicationListener {

     private void handle(ApplicationEvent event, String message) {
        JOptionPane.showMessageDialog(f, message);
        event.setHandled(true);
     }

     public void handleAbout(ApplicationEvent event) {
        handle(event, "aboutAction");
     }

     public void handleOpenApplication(ApplicationEvent event) {
        // Ok, we know our application started
        // Not much to do about that..
     }

     public void handleOpenFile(ApplicationEvent event) {
        handle(event, "openFileInEditor: " + event.getFilename());
     }

     public void handlePreferences(ApplicationEvent event) {
        handle(event, "preferencesAction");
     }

     public void handlePrintFile(ApplicationEvent event) {
        handle(event, "Sorry, printing not implemented");
     }

     public void handleQuit(ApplicationEvent event) {
        handle(event, "exitAction");
        System.exit(0);
     }

     public void handleReOpenApplication(ApplicationEvent event) {
        event.setHandled(true);
        f.setVisible(true);
     }
  }
}

Here are the key points. The order is highly important.

Create a dedicated "Launcher" class to carry out the setup phase. A dedicated class ensures that setup takes place before any other UI interaction.
Set system property apple.laf.useScreenMenuBar to true.
Set system property om.apple.mrj.application.apple.menu.about.name to the application's name.
Use Look&Feel to UIManager.getSystemLookAndFeelClassName()
Use Toolkit.getDefaultToolkit().getMenuShortcutKeyMask() to obtain an OS-correct accelerator key.
Register an application listener with Macify's Application object. The listener's implementing class must be declared as public.
Call addPreferencesMenuItem() and setEnabledPreferencesMenu(true) on that application object.

The resulting code will work just fine on either Mac/Linux/Windows. The Macify library uses reflection to dynamically discover OSX's runtime system and to wire your listener to it. The use of reflection allows your code to compile on any operating system and also to run fine on Linux/Windows where OSX runtime is clearly not present.

Top Ten Getting-Started-With-Dojo Tips

7 comments

You heard good things about the Dojo library. What is the absolute minimum you need to know in order to start coding effectively with Dojo?

Discalimer: This post does not claim that Dojo is better than JQuery nor the converse. Each library has its strengths. My personal view is that JQuery offers a well designed programming model (the $("selector") thing is ingenious). On the other hand, Dojo currently offers a more extensive set of standard widgets.

#1: Importing Modules

dojo.xd.js is the basic module. Once you imported it into your page via <script src=".."> you can use dojo.require('fully.qualified.name') to import additional Dojo modules.


<html>
 <head>
    <script type="text/javascript" 
       src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
    </script>
    <script type="text/javascript">
       dojo.require("dijit.form.Button");
    </script>
 </head>
 <body/>
</html>

#2: addOnLoad()

addOnLoad() lets you register a function that will be called once page loading is finished. This is Dojo's cross-browser-compatible way to hook the onLoad event.


<html>
 <head>
    <script type="text/javascript" 
       src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
    </script>

    <script type="text/javascript">
       dojo.addOnLoad(function() { alert("hi"); })
    </script>
 </head>
 <body/>
</html>

#3: Widget creation -- Programmatic Style

There are two ways to create widgets: Programmatically (shown here) and declaratively (see tip #5). Either way, you must first import the corresponding module via a require() call.

In the programmatic you create a widget by calling its constructors, which typically takes two parameters:

options: a plain Javascript object specifying widget-specific options

id: The ID of a DOM node which will host this new widget

In the example below, a button widget is created via new dijit.form.Button({}, "click-me-button"), which means: no options; ID of hosting element is "click-me-button".


<html>
 <head>
    <script type="text/javascript" 
       src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
    </script>
    <script type="text/javascript">
       dojo.require("dijit.form.Button");
       dojo.addOnLoad(function() {
          var button = new dijit.form.Button({}, "click-me-button");
          button.attr("label", "Click Me");
       });
    </script>
    <link rel="stylesheet" type="text/css" 
       href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>
  </head>
  <body class="claro">
     <div id="click-me-button"/>
  </body>
</html>

#4: Overriding a Widget's Methods

Any method defined in the constructor's first parameter will be attached to the newly created widget, thereby overriding an existing method with the same name. The code below overrides the onClick method of the Button widget.

Initial values for the widget's properties can be specified in a similar manner: { label: "Click me" }


<html>
 <head>
    <script type="text/javascript" 
       src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
    </script>
    <script type="text/javascript">
       dojo.require("dijit.form.Button");
       dojo.addOnLoad(function() {
          new dijit.form.Button({
                onClick: function() { alert("Thank you!"); },
                label: "Click me!"
             }, "click-me-button");
       });
    </script>
    <link rel="stylesheet" type="text/css" 
       href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>
  </head>
  <body class="claro">
     <div id="click-me-button"/>
  </body>
</html>

#5: Widget creation -- Declarative Style

The declarative style lets you define widget by using HTML markup. To enable this you MUST specify djConfig="parseOnLoad: true" at the <script src="dojo.xd.js"> element.

You can then use a dojoType="dijit.form.Button" HTML-attribute to tell the Dojo parser to create a button widget that will be hosted by the enclosing HTML element.

A nested <script type="dojo/method" event="onClick" args="evt"> element will define a callback method for the onClick event. A nested <script type="dojo/connect"> element will define code that will be executed when the widget is created.


<html>
 <head>
    <script djConfig="parseOnLoad: true" type="text/javascript" 
       src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
    </script>
    <script type="text/javascript">
       dojo.require("dijit.form.Button");
    </script>

    <link rel="stylesheet" type="text/css" 
       href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>

 </head>
 <body class="claro">
    <div dojoType="dijit.form.Button">
       <script type="dojo/connect">
          this.attr("label", "Click Me!");
       </script>
       <script type="dojo/method" event="onClick" args="evt">
          alert("Thank you!");
       </script>      
    </div>
 </body>
</html>

#6: Defining widget variables -- Declarative Style

If you add a jsId="myButton" attribute to an HTML element that defines a Dojo widget (i.e. has a dojoType attribute) the Dojo parser will assign the widget to a global variable named myButton.

This allows programmatic access to a declaratively-defined widget.


<html>
 <head>
    <script djConfig="parseOnLoad: true" type="text/javascript" 
       src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
    </script>
    <script type="text/javascript">
       dojo.require("dijit.form.Button");
       dojo.addOnLoad(function() {
          alert("Press OK to change style");
          myButton.attr("style", "color:red; font-weight:bold;");
       });
    </script>

    <link rel="stylesheet" type="text/css" 
       href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>

 </head>
 <body class="claro">
    <div dojoType="dijit.form.Button" jsId="myButton">
       A simple button
    </div>  
 </body>
</html>

#7: Obtaining the associated DOM node

Widgets and DOM nodes are distinct objects. In order to get the DOM node associated with a widget, use the widget's .domNode property.

                
<html>
 <head>
    <script djConfig="parseOnLoad: true" type="text/javascript" 
       src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
    </script>
    <script type="text/javascript">
       dojo.require("dijit.form.Button");
       dojo.addOnLoad(function() {
          myButton.domNode.innerHTML = myButton.domNode.innerHTML.bold();
       });
    </script>

    <link rel="stylesheet" type="text/css" 
       href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>

 </head>

 <body class="claro">
    <div dojoType="dijit.form.Button" jsId="myButton">
       A simple button
    </div>  
 </body>
</html>

#8: Looking up a DOM node

dojo.byId("someId") is Dojo's cross-browser-compatible way to obtain a DOM node by its ID.


<html>
 <head>
    <script type="text/javascript" 
       src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
    </script>
    <script type="text/javascript">
       dojo.addOnLoad(function() {
          dojo.byId("some.div").innerHTML = "found it!";
       });
    </script>
 </head>
 <body>
    <div id="some.div"/>
 </body>
</html>

#9: The Widget -> Model -> Store Pattern

Sophisticated Dojo Widgets, such as the Tree or the DataGrid widgets rely on the following structure:

The widget observes a model which observes a data store which maintains the actual data.

There are several (predefined) models that can work with each widget. The Tree widget, for example, can work with either a TreeStoreModel or a ForestStoreModel. These models can work with stores such as ItemFileWriteStore or ItemFileReadStore.


<html>
 <head>
    <script type="text/javascript" 
       src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
    </script>
    <script type="text/javascript">
       dojo.require("dojo.data.ItemFileWriteStore");
       dojo.require( "dijit.Tree" );

       function initPage() {
          var store = new dojo.data.ItemFileWriteStore({ data:
             {
                identifier: 'id',
                label: 'name',
                items: [
                   { id: 1, name: 'Star Wars Saga', root: true,
                      children:[{_reference: 2}, {_reference: 3}, {_reference: 4}] },
                   { id: 2, name: 'Star Wars' },
                   { id: 3, name: 'The Empire Strikes Back' },
                   { id: 4, name: 'Return of the Jedi' },
                ]
             }
          });
        
          var treeModel = new dijit.tree.ForestStoreModel({
             store: store,
             query: { 'root': true }
          });

          var widget = new dijit.Tree({model: treeModel, showRoot: false }, "div-tree")
       }
     
       dojo.addOnLoad(initPage);
    </script>
    <link rel="stylesheet" type="text/css" 
       href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>
 </head>

  <body class="claro">
    <div id="div-tree"></div>
 </body>
</html>

#10: Change the Widget's Content by Mutating its Data Store

When you're dealing with Widget-Model-Store setup and you want to change the content displayed by the widget, the correct way to do it is to change the data store object. These changes will be propagated along the observation chain and will eventually be reflected at the UI.

Obviously, the store object must support mutations. Read-only stores (such as: ItemFileReadStore) will not work for you, here.

The code below invokes store.deleteItem() when the "Delete 'Return of the Jedi'" button is clicked. The associated Tree widget is automagically updated.


<html>
 <head>
    <script type="text/javascript" 
       src="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dojo/dojo.xd.js">
    </script>
    <script type="text/javascript">
       dojo.require("dijit.form.Button");
       dojo.require("dojo.data.ItemFileWriteStore");
       dojo.require( "dijit.Tree" );

       function initPage() {
          var store = new dojo.data.ItemFileWriteStore({ data:
             {
                identifier: 'id',
                label: 'name',
                items: [
                   { id: 1, name: 'Star Wars Saga', root: true,
                      children:[{_reference: 2}, {_reference: 3}, {_reference: 4}] },
                   { id: 2, name: 'Star Wars' },
                   { id: 3, name: 'The Empire Strikes Back' },
                   { id: 4, name: 'Return of the Jedi' },
                ]
             }
          });
        
          var treeModel = new dijit.tree.ForestStoreModel({
             store: store,
             query: { 'root': true }
          });

          var widget = new dijit.Tree({model: treeModel, showRoot: false }, "div-tree");
        
          new dijit.form.Button({
                label: "Delete 'Return of the Jedi'",
                onClick: function() {
                   store.fetchItemByIdentity({ identity: 4, onItem: function(item) {
                      store.deleteItem(item);
                   } });
                }
             }, "div-button");
       }
     
       dojo.addOnLoad(initPage);
    </script>
    <link rel="stylesheet" type="text/css" 
       href="http://ajax.googleapis.com/ajax/libs/dojo/1.5/dijit/themes/claro/claro.css"/>
 </head>

  <body class="claro">
    <div id="div-button"></div>
    <div id="div-tree"></div>
 </body>
</html>

I thought that a small project does not need testing...

3 comments

(...Or: You can't over-estimate the importance of a testing infrastructure)

Over the last few weeks I had to write three small (1-2 days of work) applications. They were a command line utility, a Swing application that analyzes the complexity of Java methods, and a web-page--a single .html file--that lets it user extract some information from a REST server (Javascript, JQuery, Ajax).

I don't think that LOC is a good code metric, but just to give a feeling on the sizes of these projects, their respective LOC (including blanks, comments, unit tests, HTML) values are: 515, 1383, 664.

Reflecting on these three application I see a common theme: While I did unit test (and even TDD-ed) certain parts of the apps, I didn't put much emphasize on making the code testable. After all, these are very small apps which should not take very long to develop, so investing in testability seemed like a waste of time. I believed that the benefits of automatic testing will not outweigh the initial cost of putting the required scaffolding in place.

I was wrong. Even in web-page of 664 lines (including plain HTML), I quickly got to a position where the behavior of the app was quite intricate. In order to make sure that new functionality does not tamper with existing one I found myself repeatedly rerunning a lengthy series of manual tests. At the next round, there was even more "existing functionality" to test...

The total testing effort is actually similar to the sum of a simple arithmetic series: Sn = 1 + 2 + 3 + ... + n. In such a series the value of Sn grows ~ n^2. This means that the time needed for adding a new piece of functionality will rise as the app grows. Eventually it will reach a point where the time of implementing a feature will not be determined by the complexity of the feature but by the complexity of the app. All features, little or big, will take a lot of time to complete because the dominant cost is the testing, not the implementation.

If I decide not to (manually) test every new increment of functionality I am at the risk of not detecting bugs the moment they are introduced. This incurs significantly longer times for fixing these bugs when they are eventually detected.

Of course, at the beginning everything looked fine. However, after just a few hours the signs of a technical debt became evident: the code grew messy. I was afraid to refactor. I felt that I am coding in "extreme cautious" mode. I am no longer in control of my code. I could not move it in the direction that I wanted.

The amazing thing is the short distance that you have to walk in order for this effect to kick in. It usually took less than half a day for me to realize that manual testing slows me down.

Will I do things differently in the future? Yes. I will start with a testable skeleton of the app before adding any substantial behavior to it. The "start-from-a-skeleton" practice is already quite popular. The emphasize here is two fold:

It should be a testable skeleton. This will let you build testability into the system from the very start.
The extra cost of a testable skeleton pays off even in extra-small projects.

A thorough treatment of this topic is given in chapter Four of GOOS (by Steve Freeman and Nat Pryce) which talks about "Kick-Starting the Test-Driven Cycle". In particular, they argue that the goal of every first iteration should be to "Test a Walking Skeleton". Go read it.

I don't want to hear the "we-need-to-separate-tests-from-code" excuse

12 comments

Many programmers tend to place their unit tests in a dedicated source folder. I do it differently. In my projects, the unit-test of class SomeClass is called SomeClass_Tests and it is located in the very same folder as SomeClass.java.

This has all sort of benefits: I can instantly see if a class has a test. I can instantly jump from the test to the testee and vice-versa. I don't have to create two parallel hierarchies of folders. It is very unlikely that I will rename the production class but not its test. Renaming of a package affects both tests and production classes. etc. There is one word that summarizes these benefits: Locality. Things that change together ought to be placed as close as possible.

There are IDE plugins out there that provide similar capabilities over a project structure that has separate source folders. However, these plugins will not help you when you access your code not through your IDE (for instance, I often explore my code repository via a browser).

But even more importantly, as I got more and more (test) infected I realized that I don't want to have this test-production separation. The tests are not something external that needs to be put away. Tests are a central piece of knowledge about my code. I want them to be near by. Think about it: Would you separate your Javadoc text from the class/method it is describing? Will you find it productive to write code in a language which dictates that fields are defined in one file and methods are defined in another file? (Actually, this is pretty much what happens in C++ ...).

Of course not.

The main difficulty is the packaging/deployment phase. I often heard the argument that "we need to have two separate folders because otherwise our deliverable (.jar/.war/...) will include both production code and testing code, and this is bad".

Is it really bad? First, In many situations (web-apps anyone?) the size of the binary does not matter much. Second, in situations where the deliverable includes the source code, the tests can be quite handy as usage samples.

If you're in a situation where these arguments do not hold, and you absolutely cannot place test code in your deliverable, then the remainder of this post is for you.

I often said that it is not hard to write a program that will delete .class files of test classes. It requires some knowledge of bytecode inspection techniques which apparently is not very common. So, as a service to the Java community and for the common good, I cleared a few hours and wrote it. The result is class-wiper (sources) a command line utility that will recursively scan the given directories and will delete all .class files therein that are related to JUnit.

Specifically, it will delete a class if either: (a) it mentions the @Test annotation; or (b) It uses directly or indirectly a class that mentions @Test. Your production code will never meet neither of these conditions. If you don't want test classes to reach your client you just need to invoke it from your build script, as follows:

 java -jar class-wiper <binary-dir1>  <binary-dir2> ...

This software is provided absolutely for free. Use it anyway you like. Change it. Tweak it. Hack it. Sell it. Whatever. I only ask one simple thing: please stop saying that you need to place your tests in a separate directory because you don't have a way to prevent your tests from reaching the deliverable.

Repeat after me: Immutable objects will not slow you down

8 comments

As noted by Stephan Schmidt the advent of functional programming promotes the use of immutable objects even in non-functional languages. Immutable objects have many positive traits: they are safer, they throw less exception, they are not prone to problems of covariance, they prevent races when used in a multi-threaded settings, etc.

The main (only?) down side of immutable object is the fact that they make it difficult to update the data in your program. This should not be a surprise. After all, if they are immutable then it only makes sense that they will not encourage mutations.

Here is the general description of the problem: if x is an immutable object and x.f == y, and you want x.f to point at z, then you can't just set x.y to z. You need to create a new object, x', that is identical to x in all aspects except that x'.f == z. You then need to find all references to x and reroute them to x'.

In certain cases (such as: many object holding references to x) this rerouting may be quite hard and error prone. Also, if the objects pointing at x are immutable themselves, then further updates need to be carried out across your data structure. These difficulties are the reason why most C/C++/Java/C# folks tend to prefer designs with mutable objects, despite the benefits of immutability.

Side note: functional languages usually offer powerful pattern matching mechanisms that simplify the process of updating (immutable) data structures.

Anyway, even in cases where correctly rerouting references is not that bigger problem, developers are sometimes reluctant to use immutability due to performance issues. The argument goes as follows: If I make this class immutable, I will have to allocate a new object every time I update one of the fields. This extra allocation will slow down my program.

This argument is, by and large, incorrect.

Reason #1. Most of the code you're writing will not affect the performance of your program. There is no point in optimizing your code prematurely.

Reason #2. Mutable objects lead to defensive getters/setters. That is, If a method returns a field pointing at a mutable object it usually needs to create a copy of it to prevent the caller from breaking the invariants of the class. This means that an object will be duplicated even if the caller just wants to read its content. Given that reads are more frequent than writes we actually have that immutable objects yield faster programs simply because they do not imply object copying with every get/set operation.

I experienced this effect a few years ago when I had to optimize CPU-intensive code that dealt with queries over in-memory tables. The profiler indicated that most of the time my program was busy duplicating rows of these tables which, usually, were not mutated at all. Switching into an immutable design resulted in a significant performance boost.

Still not convinced? Maybe this will help you. When Josh Bloch discusses API design decisions that create performance problems, he gives as an example Java's Dimension class which is---wait for it---mutable. Specifically, each call to Component.getSize() must allocate a new Dimension object which leads to numerous needless allocations. Further details are given in Josh's excellent lecture, How to Design a Good API & Why it Matters (Immutability of class Dimension is discussed ~ 32 min. into the talk).

So You Want to Practice your Code Reviewing Skills? - Summary

0 comments

Earlier this week I published So You Want to Practice your Code Reviewing Skills? which challenged the readers to find bugs in ~ 150 LOC.

The replies were very interesting. Here's a summary

Concurrency/Distribution/Singleton

instance() method is not synchronized
sessionCounter needs to be volatile/guarded by synchronization (in visitorCount())
(Mutable) Singeltons cannot be distributed. is If there is several instance of the webapp running, not all values will be counted because several counter will exist and not be synchronized from JVM to JVM.
The singleton implementation is broken. See, for instance, this article.

General

DataBaseHelperget() method may return the empty string ("") if no rows are in the table, and the singleton only expects a valid number or null, which will cause a NumberFormatException
the counter may roll over. For a 7 year old web app this is perfectly possible.
I consider constructs like this catch(Exception e) { sLogger.error("", e); } to be a bug.

JDBC

The DB change isn't being committed so it has to rely on the underlying driver's behavior.
The insert statement doesn't explicitly defines column names, which is susceptible to failure if the order of rows in the DB table changes.
The update statement updates the row with param = 'SESSION_COUNT' although the insert inserts 'session_count'. Usually this would not work without special set-up in the db and/or connection.
generateResult() does not close the ResultSet object. Note that the originating Statement object is hidden inside the DataBase.performSqlQuery() method (whose code is not given) so Statement also remains open. The closing of these two objects is deferred to the GC.

Final thoughts

Do not use singletons. A singleton is a smell that indicates the need for dependency injection.
In general, service objects in web-apps should not maintain state in (plain-old-Java) fields. If you need to have some state, write it to the DB. That's the only way to share state in a distributed settings.
The nastiest bug of all: the combination of exceptions being silently absorbed, and DataBaseHelper.get() returning an empty string ("") leads to a situation where sessionCounter stays zero, which will reset the visitor count at the DB to 100, thereby loosing the (correct) visitor count.

Here's the scenario: In generateResult() an SqlException if fired (let's say due to a temporary network problem). It is silently caught by generateResult() which then returns an empty list. get() will return an empty string, which will yield a NumberFormatException in reloadParamsFromDB(), which - again - is silently ignored.
Therefore, no assignment to sessionCounter is taking place, so it retains its default value, zero. At the 100th call to addSession() sessionCounter current value (100) will be written to the DB.

Solution: add to the value at the DB instead of writing to it.

So You Want to Practice your Code Reviewing Skills?

7 comments

The code below is taken from a real web-app that has been up and running for > 7 years. This specific fragment is realizing the visitor count functionality: keeping track on the number of visitors hitting the site. Each time a new session is created SiteInfo.instance().addSession() is called.

Your task (should you choose to accept it...) is to find bugs in this code. In other words: will the visitor count, (as reported by SiteInfo.visitorCount()) always be correct? If not, what values can be seen there? How can we fix the code?

Please ignore design issues (I don't like the singleton anymore than you do), or technological issues, such as: "well you should just rewrite the whole thing with Spring + Hibernate". Just examine the code-as is and try to determine if/where can it fail.



public class SiteInfo {
   
   private static SiteInfo inst;
      
   private SiteInfo() {
      //
   };
   
   // This is a singleton (Yak!)
   public static SiteInfo instance() {
      if(inst == null) 
         inst = new SiteInfo();
      return inst;
   }

   private int sessionCounter = 0; 
   private static final String INIT = "100";
      
   public int visitorCount() {
      return sessionCounter;
   }
      
   public synchronized void reloadParamsFromDB() {
      Connection con = null;
      ConnectionPool pool = null;
      try {

         pool = ConnectionPoolFactory.getInstance();
         con = pool.getConnection();

         String countStr = DataBaseHelper.get(con, "config", "value",
            "param", "session_count"); 
         if (countStr == null) {
            String q = "INSERT INTO config VALUES ('session_count','" + INIT + "')";
            try {
               runCommand(q, con);
               countStr = INIT;
            }
            catch (Exception e) {
               log.error("", e);
            }
         }

         sessionCounter = Integer.parseInt(countStr);
      }
      catch (Exception e) {
         log.error("", e);
      }
      finally {
         ConnectionPoolFactory.release(pool, con);
      }
   }
   
   private void runCommand(String q, Connection con) throws Exception {
      Statement stmt = con.createStatement();
      try {
         stmt.executeUpdate(q);         
      }
      finally {
         stmt.close();
     }
   }
   
   public synchronized void addSession() {
      sessionCounter++;
      if(sessionCounter % 100 != 0)
         return;

      Connection con = ConnectionPoolFactory.getInstance().getConnection();
      Statement stmt = null;
      try {
         stmt = con.createStatement();
         String sql = "UPDATE config SET value='" + sessionCounter 
            + "' WHERE param='SESSION_COUNT'";
         stmt.execute(sql);
      }
      catch (Exception ex) {
         log.error("", ex);
      }
      finally {
         try {
            if(stmt != null)
               stmt.close();
            ConnectionPoolFactory.release(con);
         }
         catch(SQLException e) {
            log.error("", e);
         }
      }
   }   
}

public class DataBaseHelper {
   public static String get(Connection con, String table, 
      String columnA, String columnB, String columnAValue) {
      String result = "";
      String sqlQuery = "SELECT " + columnB + " from " + table
            + " WHERE " + columnA + " = '" + columnAValue + "'";

      // Run the query. Translate the result set into a list of maps.
      // Each map corresponds to a single row in the ResultSet
      List<Map<Object,Object>> rows = generateResult(sqlQuery, con);
      try {
         Iterator<Map<Object,Object>> iter = rows.iterator();
         if(iter.hasNext()) {
           Map<Object,Object> m = iter.next();
         
           result = (String) (m.get(columnB));
           if (result == null)
              return null;
         }
      } 
      catch (Exception e) {
         return null;
      }
      return result;
   }  

   public static List<Map<Object,Object>> generateResult(String query, Connection con) {
      List<Map<Object,Object>> result = new ArrayList<Map<Object,Object>>();
      try {
         ResultSet resultSet = DataBase.performSqlQuery(query, con);
         if(resultSet == null)
            throw new Exception("Impossible"); 
            
         ResultSetMetaData resultSetMetaData = resultSet.getMetaData();         
         int columnCount = resultSetMetaData.getColumnCount();
         
         String[] columnNames = new String[columnCount];
         for(int i = 0; i < columnCount; i++)
            columnNames[i] = resultSetMetaData.getColumnName(i + 1);
            
         while(resultSet.next()) {
            Map<Object,Object> map = new HashMap<Object,Object>();
            for(int i = 0; i < columnCount; i++) {
               String col = columnNames[i];
               map.put(col, resultSet.getString(i + 1));
            }

            result.add(map);
         }
      }
      catch(Exception e) {
         sLogger.error("", e);
      }

      return result;
   }
}

JUnit Rules!

5 comments

Rules are a simple, yet amazingly powerful, mechanism introduced in JUnit version 4.7. They allow developers to easily customize JUnit's behavior by exposing meta information regarding the currently executing test. This post provides a straightforward example for writing a custom rule that augments JUnit with some useful functionality.

My subject class is IntSet: A set of integers implementing the standard operations of add(), remove(), contains(), clear() in O(1) time. To make this performance guarantee the set needs to know (in advance) the range of the values (min..max) and its size limit (number of elements that it will accommodate).

All in all, IntSet looks something like this:


public class IntSet {
  ... // Some private fields
  public IntSet(int limit, int min, int max) { ... }
  public int size() { ... }
  public boolean contains(int n) { ... }
  public void add(int n) { ... }
  public void remove(int n) { ... }
}

One of my unit tests specifies the behavior of IntSet when its size limit is reached. If I'm only interested in the type of the exception I can specify it via the expected attribute of the @Test annotation:


  @Test(expected=IllegalStateException.class)
  public void shouldNotExceedCapacity() {
    IntSet s = new IntSet(2, -10, 100);  // Set size limit to 2
    s.add(30);
    s.add(40);
    s.add(50);   // Insertion of the 3rd element should fail
  }

There are two drawbacks with this test. First, It only asserts the type of the an exception. It does not check the error message specified for the exception. Second, it does not assert that the exception was triggered by the last add() call. In other words, if we have a bug and the 2nd add() call is failing - with the same type of exception - the test will still pass.

To overcome this limitation we want to check the error message of the thrown exception. Specifically, we want to verify that the execution of the method fires an exception whose error message is "Cannot insert '50' - The set is full". Clearly, the chances of such an exception being thrown by the 2nd call are pretty slim.

Extending JUnit in such a manner is pretty easy thanks to the rules mechanism:


public class IntSet_Tests {

  @Retention(RetentionPolicy.RUNTIME)
  @Target(ElementType.METHOD)
  @interface Throwing {
    public String value();
  }
  
  @Rule
  public MethodRule mr = new MethodRule()
  {
    @Override
    public Statement apply(final Statement base, FrameworkMethod m, Object o) {
      Throwing t = m.getAnnotation(Throwing.class);
      if(t == null)
        return base;
      
      final String message = t.value();
      return new Statement() {
        
        @Override
        public void evaluate() throws Throwable {
          try {
            base.evaluate();
            fail("No exception was thrown");
          }
          catch(AssertionError e) {
            throw e;
          }
          catch(Exception e) {
            assertEquals("Incorrect exception message", message, e.getMessage());
          }
        }
      };
    }    
  };

  // All sort of @Test methods ...


  // And now, a method that asserts the error message
  @Throwing("Cannot insert '50' - The set is full")
  @Test
  public void shouldNotExceedCapacity() {
    IntSet s = new IntSet(2, -10, 100);
    s.add(30);
    s.add(40);
    s.add(50);
  }
}

First we define a new annotation, @Throwing. Then we define a field annotated with @Rule to provide the custom handling of this annotation. Finally, we annotate the shouldNotExceedCapacity() method with a @Throwing("Cannot insert '50' - The set is full") annotation.

The mechanism works as follows: before each test method is run, JUnit creates a Statement object which is merely a command object through which the acutal method can be invoked. JUnit passes this object along with a FrameworkMethod object (a wrapper of Java's Method) and the unit test instance to all @Rule fields defined at the test class.

A @Rule field must be public and must implement the MethodRule interface (of course, you can instead extend one of several classes conveniently defined by JUnit). In the apply() method, above, we create a new Statement object that wraps the original one. The new evaluate() method will check that if an exception is thrown its message matches the text specified by the @Throwing annotation attached to the method.

Obviously, there are other ways to do that. For instance, one can use the ExpectedException class (a predefined JUnit rule) to achieve a similar effect. The purpose of this post is to surface the (mighty) powers of JUnit meta programming.

Java: Developing On The Streets

From 0 to Node.Js/Jasmine/CoffeeScript in 10m

On the similarity of agile development and concurrent programming

Making your Swing App Mac/OSX compliant

Top Ten Getting-Started-With-Dojo Tips

I thought that a small project does not need testing...

I don't want to hear the "we-need-to-separate-tests-from-code" excuse

Repeat after me: Immutable objects will not slow you down

So You Want to Practice your Code Reviewing Skills? - Summary

So You Want to Practice your Code Reviewing Skills?

JUnit Rules!

About Me

Blog Archive

My Blog List