Java: Developing On The Streets

I don't want to hear the "we-need-to-separate-tests-from-code" excuse

Many programmers tend to place their unit tests in a dedicated source folder. I do it differently. In my projects, the unit-test of class SomeClass is called SomeClass_Tests and it is located in the very same folder as SomeClass.java.

This has all sort of benefits: I can instantly see if a class has a test. I can instantly jump from the test to the testee and vice-versa. I don't have to create two parallel hierarchies of folders. It is very unlikely that I will rename the production class but not its test. Renaming of a package affects both tests and production classes. etc. There is one word that summarizes these benefits: Locality. Things that change together ought to be placed as close as possible.

There are IDE plugins out there that provide similar capabilities over a project structure that has separate source folders. However, these plugins will not help you when you access your code not through your IDE (for instance, I often explore my code repository via a browser).

But even more importantly, as I got more and more (test) infected I realized that I don't want to have this test-production separation. The tests are not something external that needs to be put away. Tests are a central piece of knowledge about my code. I want them to be near by. Think about it: Would you separate your Javadoc text from the class/method it is describing? Will you find it productive to write code in a language which dictates that fields are defined in one file and methods are defined in another file? (Actually, this is pretty much what happens in C++ ...).

Of course not.

The main difficulty is the packaging/deployment phase. I often heard the argument that "we need to have two separate folders because otherwise our deliverable (.jar/.war/...) will include both production code and testing code, and this is bad".

Is it really bad? First, In many situations (web-apps anyone?) the size of the binary does not matter much. Second, in situations where the deliverable includes the source code, the tests can be quite handy as usage samples.

If you're in a situation where these arguments do not hold, and you absolutely cannot place test code in your deliverable, then the remainder of this post is for you.

I often said that it is not hard to write a program that will delete .class files of test classes. It requires some knowledge of bytecode inspection techniques which apparently is not very common. So, as a service to the Java community and for the common good, I cleared a few hours and wrote it. The result is class-wiper (sources) a command line utility that will recursively scan the given directories and will delete all .class files therein that are related to JUnit.

Specifically, it will delete a class if either: (a) it mentions the @Test annotation; or (b) It uses directly or indirectly a class that mentions @Test. Your production code will never meet neither of these conditions. If you don't want test classes to reach your client you just need to invoke it from your build script, as follows:

 java -jar class-wiper <binary-dir1>  <binary-dir2> ...

This software is provided absolutely for free. Use it anyway you like. Change it. Tweak it. Hack it. Sell it. Whatever. I only ask one simple thing: please stop saying that you need to place your tests in a separate directory because you don't have a way to prevent your tests from reaching the deliverable.

Repeat after me: Immutable objects will not slow you down

8 comments

As noted by Stephan Schmidt the advent of functional programming promotes the use of immutable objects even in non-functional languages. Immutable objects have many positive traits: they are safer, they throw less exception, they are not prone to problems of covariance, they prevent races when used in a multi-threaded settings, etc.

The main (only?) down side of immutable object is the fact that they make it difficult to update the data in your program. This should not be a surprise. After all, if they are immutable then it only makes sense that they will not encourage mutations.

Here is the general description of the problem: if x is an immutable object and x.f == y, and you want x.f to point at z, then you can't just set x.y to z. You need to create a new object, x', that is identical to x in all aspects except that x'.f == z. You then need to find all references to x and reroute them to x'.

In certain cases (such as: many object holding references to x) this rerouting may be quite hard and error prone. Also, if the objects pointing at x are immutable themselves, then further updates need to be carried out across your data structure. These difficulties are the reason why most C/C++/Java/C# folks tend to prefer designs with mutable objects, despite the benefits of immutability.

Side note: functional languages usually offer powerful pattern matching mechanisms that simplify the process of updating (immutable) data structures.

Anyway, even in cases where correctly rerouting references is not that bigger problem, developers are sometimes reluctant to use immutability due to performance issues. The argument goes as follows: If I make this class immutable, I will have to allocate a new object every time I update one of the fields. This extra allocation will slow down my program.

This argument is, by and large, incorrect.

Reason #1. Most of the code you're writing will not affect the performance of your program. There is no point in optimizing your code prematurely.

Reason #2. Mutable objects lead to defensive getters/setters. That is, If a method returns a field pointing at a mutable object it usually needs to create a copy of it to prevent the caller from breaking the invariants of the class. This means that an object will be duplicated even if the caller just wants to read its content. Given that reads are more frequent than writes we actually have that immutable objects yield faster programs simply because they do not imply object copying with every get/set operation.

I experienced this effect a few years ago when I had to optimize CPU-intensive code that dealt with queries over in-memory tables. The profiler indicated that most of the time my program was busy duplicating rows of these tables which, usually, were not mutated at all. Switching into an immutable design resulted in a significant performance boost.

Still not convinced? Maybe this will help you. When Josh Bloch discusses API design decisions that create performance problems, he gives as an example Java's Dimension class which is---wait for it---mutable. Specifically, each call to Component.getSize() must allocate a new Dimension object which leads to numerous needless allocations. Further details are given in Josh's excellent lecture, How to Design a Good API & Why it Matters (Immutability of class Dimension is discussed ~ 32 min. into the talk).

So You Want to Practice your Code Reviewing Skills? - Summary

0 comments

Earlier this week I published So You Want to Practice your Code Reviewing Skills? which challenged the readers to find bugs in ~ 150 LOC.

The replies were very interesting. Here's a summary

Concurrency/Distribution/Singleton

instance() method is not synchronized
sessionCounter needs to be volatile/guarded by synchronization (in visitorCount())
(Mutable) Singeltons cannot be distributed. is If there is several instance of the webapp running, not all values will be counted because several counter will exist and not be synchronized from JVM to JVM.
The singleton implementation is broken. See, for instance, this article.

General

DataBaseHelperget() method may return the empty string ("") if no rows are in the table, and the singleton only expects a valid number or null, which will cause a NumberFormatException
the counter may roll over. For a 7 year old web app this is perfectly possible.
I consider constructs like this catch(Exception e) { sLogger.error("", e); } to be a bug.

JDBC

The DB change isn't being committed so it has to rely on the underlying driver's behavior.
The insert statement doesn't explicitly defines column names, which is susceptible to failure if the order of rows in the DB table changes.
The update statement updates the row with param = 'SESSION_COUNT' although the insert inserts 'session_count'. Usually this would not work without special set-up in the db and/or connection.
generateResult() does not close the ResultSet object. Note that the originating Statement object is hidden inside the DataBase.performSqlQuery() method (whose code is not given) so Statement also remains open. The closing of these two objects is deferred to the GC.

Final thoughts

Do not use singletons. A singleton is a smell that indicates the need for dependency injection.
In general, service objects in web-apps should not maintain state in (plain-old-Java) fields. If you need to have some state, write it to the DB. That's the only way to share state in a distributed settings.
The nastiest bug of all: the combination of exceptions being silently absorbed, and DataBaseHelper.get() returning an empty string ("") leads to a situation where sessionCounter stays zero, which will reset the visitor count at the DB to 100, thereby loosing the (correct) visitor count.

Here's the scenario: In generateResult() an SqlException if fired (let's say due to a temporary network problem). It is silently caught by generateResult() which then returns an empty list. get() will return an empty string, which will yield a NumberFormatException in reloadParamsFromDB(), which - again - is silently ignored.
Therefore, no assignment to sessionCounter is taking place, so it retains its default value, zero. At the 100th call to addSession() sessionCounter current value (100) will be written to the DB.

Solution: add to the value at the DB instead of writing to it.

So You Want to Practice your Code Reviewing Skills?

7 comments

The code below is taken from a real web-app that has been up and running for > 7 years. This specific fragment is realizing the visitor count functionality: keeping track on the number of visitors hitting the site. Each time a new session is created SiteInfo.instance().addSession() is called.

Your task (should you choose to accept it...) is to find bugs in this code. In other words: will the visitor count, (as reported by SiteInfo.visitorCount()) always be correct? If not, what values can be seen there? How can we fix the code?

Please ignore design issues (I don't like the singleton anymore than you do), or technological issues, such as: "well you should just rewrite the whole thing with Spring + Hibernate". Just examine the code-as is and try to determine if/where can it fail.



public class SiteInfo {
   
   private static SiteInfo inst;
      
   private SiteInfo() {
      //
   };
   
   // This is a singleton (Yak!)
   public static SiteInfo instance() {
      if(inst == null) 
         inst = new SiteInfo();
      return inst;
   }

   private int sessionCounter = 0; 
   private static final String INIT = "100";
      
   public int visitorCount() {
      return sessionCounter;
   }
      
   public synchronized void reloadParamsFromDB() {
      Connection con = null;
      ConnectionPool pool = null;
      try {

         pool = ConnectionPoolFactory.getInstance();
         con = pool.getConnection();

         String countStr = DataBaseHelper.get(con, "config", "value",
            "param", "session_count"); 
         if (countStr == null) {
            String q = "INSERT INTO config VALUES ('session_count','" + INIT + "')";
            try {
               runCommand(q, con);
               countStr = INIT;
            }
            catch (Exception e) {
               log.error("", e);
            }
         }

         sessionCounter = Integer.parseInt(countStr);
      }
      catch (Exception e) {
         log.error("", e);
      }
      finally {
         ConnectionPoolFactory.release(pool, con);
      }
   }
   
   private void runCommand(String q, Connection con) throws Exception {
      Statement stmt = con.createStatement();
      try {
         stmt.executeUpdate(q);         
      }
      finally {
         stmt.close();
     }
   }
   
   public synchronized void addSession() {
      sessionCounter++;
      if(sessionCounter % 100 != 0)
         return;

      Connection con = ConnectionPoolFactory.getInstance().getConnection();
      Statement stmt = null;
      try {
         stmt = con.createStatement();
         String sql = "UPDATE config SET value='" + sessionCounter 
            + "' WHERE param='SESSION_COUNT'";
         stmt.execute(sql);
      }
      catch (Exception ex) {
         log.error("", ex);
      }
      finally {
         try {
            if(stmt != null)
               stmt.close();
            ConnectionPoolFactory.release(con);
         }
         catch(SQLException e) {
            log.error("", e);
         }
      }
   }   
}

public class DataBaseHelper {
   public static String get(Connection con, String table, 
      String columnA, String columnB, String columnAValue) {
      String result = "";
      String sqlQuery = "SELECT " + columnB + " from " + table
            + " WHERE " + columnA + " = '" + columnAValue + "'";

      // Run the query. Translate the result set into a list of maps.
      // Each map corresponds to a single row in the ResultSet
      List<Map<Object,Object>> rows = generateResult(sqlQuery, con);
      try {
         Iterator<Map<Object,Object>> iter = rows.iterator();
         if(iter.hasNext()) {
           Map<Object,Object> m = iter.next();
         
           result = (String) (m.get(columnB));
           if (result == null)
              return null;
         }
      } 
      catch (Exception e) {
         return null;
      }
      return result;
   }  

   public static List<Map<Object,Object>> generateResult(String query, Connection con) {
      List<Map<Object,Object>> result = new ArrayList<Map<Object,Object>>();
      try {
         ResultSet resultSet = DataBase.performSqlQuery(query, con);
         if(resultSet == null)
            throw new Exception("Impossible"); 
            
         ResultSetMetaData resultSetMetaData = resultSet.getMetaData();         
         int columnCount = resultSetMetaData.getColumnCount();
         
         String[] columnNames = new String[columnCount];
         for(int i = 0; i < columnCount; i++)
            columnNames[i] = resultSetMetaData.getColumnName(i + 1);
            
         while(resultSet.next()) {
            Map<Object,Object> map = new HashMap<Object,Object>();
            for(int i = 0; i < columnCount; i++) {
               String col = columnNames[i];
               map.put(col, resultSet.getString(i + 1));
            }

            result.add(map);
         }
      }
      catch(Exception e) {
         sLogger.error("", e);
      }

      return result;
   }
}

JUnit Rules!

5 comments

Rules are a simple, yet amazingly powerful, mechanism introduced in JUnit version 4.7. They allow developers to easily customize JUnit's behavior by exposing meta information regarding the currently executing test. This post provides a straightforward example for writing a custom rule that augments JUnit with some useful functionality.

My subject class is IntSet: A set of integers implementing the standard operations of add(), remove(), contains(), clear() in O(1) time. To make this performance guarantee the set needs to know (in advance) the range of the values (min..max) and its size limit (number of elements that it will accommodate).

All in all, IntSet looks something like this:


public class IntSet {
  ... // Some private fields
  public IntSet(int limit, int min, int max) { ... }
  public int size() { ... }
  public boolean contains(int n) { ... }
  public void add(int n) { ... }
  public void remove(int n) { ... }
}

One of my unit tests specifies the behavior of IntSet when its size limit is reached. If I'm only interested in the type of the exception I can specify it via the expected attribute of the @Test annotation:


  @Test(expected=IllegalStateException.class)
  public void shouldNotExceedCapacity() {
    IntSet s = new IntSet(2, -10, 100);  // Set size limit to 2
    s.add(30);
    s.add(40);
    s.add(50);   // Insertion of the 3rd element should fail
  }

There are two drawbacks with this test. First, It only asserts the type of the an exception. It does not check the error message specified for the exception. Second, it does not assert that the exception was triggered by the last add() call. In other words, if we have a bug and the 2nd add() call is failing - with the same type of exception - the test will still pass.

To overcome this limitation we want to check the error message of the thrown exception. Specifically, we want to verify that the execution of the method fires an exception whose error message is "Cannot insert '50' - The set is full". Clearly, the chances of such an exception being thrown by the 2nd call are pretty slim.

Extending JUnit in such a manner is pretty easy thanks to the rules mechanism:


public class IntSet_Tests {

  @Retention(RetentionPolicy.RUNTIME)
  @Target(ElementType.METHOD)
  @interface Throwing {
    public String value();
  }
  
  @Rule
  public MethodRule mr = new MethodRule()
  {
    @Override
    public Statement apply(final Statement base, FrameworkMethod m, Object o) {
      Throwing t = m.getAnnotation(Throwing.class);
      if(t == null)
        return base;
      
      final String message = t.value();
      return new Statement() {
        
        @Override
        public void evaluate() throws Throwable {
          try {
            base.evaluate();
            fail("No exception was thrown");
          }
          catch(AssertionError e) {
            throw e;
          }
          catch(Exception e) {
            assertEquals("Incorrect exception message", message, e.getMessage());
          }
        }
      };
    }    
  };

  // All sort of @Test methods ...


  // And now, a method that asserts the error message
  @Throwing("Cannot insert '50' - The set is full")
  @Test
  public void shouldNotExceedCapacity() {
    IntSet s = new IntSet(2, -10, 100);
    s.add(30);
    s.add(40);
    s.add(50);
  }
}

First we define a new annotation, @Throwing. Then we define a field annotated with @Rule to provide the custom handling of this annotation. Finally, we annotate the shouldNotExceedCapacity() method with a @Throwing("Cannot insert '50' - The set is full") annotation.

The mechanism works as follows: before each test method is run, JUnit creates a Statement object which is merely a command object through which the acutal method can be invoked. JUnit passes this object along with a FrameworkMethod object (a wrapper of Java's Method) and the unit test instance to all @Rule fields defined at the test class.

A @Rule field must be public and must implement the MethodRule interface (of course, you can instead extend one of several classes conveniently defined by JUnit). In the apply() method, above, we create a new Statement object that wraps the original one. The new evaluate() method will check that if an exception is thrown its message matches the text specified by the @Throwing annotation attached to the method.

Obviously, there are other ways to do that. For instance, one can use the ExpectedException class (a predefined JUnit rule) to achieve a similar effect. The purpose of this post is to surface the (mighty) powers of JUnit meta programming.

Axiom: Instability

2 comments

I had already blogged about the Axioms of programming: the fundamental rules that govern the development of every piece of (substantial) software. In this post I want to focus to on the Instability Axiom:

The external behavior of a component will need to change over time

Here's a real story. I once was involved with a very a small project, let's call it the PQR project: three people working part time for three weeks, putting together an HTTP server a web client and a GUI client - all are very simple. During these three weeks we were also learning some new technologies so actual coding time (of all developers combined) was about 20-25 days.

During these three weeks two important changes were applied to PQR's specification:

The technology with which the GUI client was implemented had to be changed. Instead of implementing it over Tech.A we had to switch to Tech.B.
The initial specs defined the data that should be persisted by the server. As we were playing with the intermediate versions of the project, we came to realize that a descent user experience requires that additional information will be persisted.

The main point of this post is not if/how we managed to support these changes. The point is that even in small projects specs are not stable. We were not successful in defining the project's goals for a three week period in a project which is as simple as industrial projects get. Of course, If the project were more complicated (more people, wider scope) then the instability was likely to be even higher.

This example indicates that a "fire and forget" type of development (AKA: "divide and conquer" ) where one breaks down the desired functionality into a few large pieces, assigns each piece to a programmer, and then lets each programmer work on his task in isolation of his peers (until an "integration" milestone is approaching) is broken.

First, external forces will change the specs, thus affecting the assigned tasks. In the PQR project, the change in client technology was due to some external factors (business/marketing constraints). Even though the initial specs were examined and audited by several layers of approvals, no one had predicted this change.

Second, feedback from working early versions of the product (even with partial functionality) will change our understanding of the product and its desired capabilities. In PQR, the change regarding which-information-should-be-persisted was driven by experimentation with early versions.

If we had taken a Fire-and-Forget approach then our ability to respond to the first change were very limited as every team member was in the middle of his large task. Also, by the time a first working version were available, very little time was left to implement significant changes.

Bottom line: Software is unstable. Breaking the effort into tiny tasks with frequent integration points (I am speaking about granularity of hours not weeks) is an excellent way to cope with this inherent instability.

Programmers are Decision Makers

1 comments

So yesterday I bumped into this bug: a listener of a tree object (not Swing's JTree. A tree with nodes and labeled edges, a-la Data Structures 101) was referencing a null object:


public class Auditor implements TreeListener {

  private Map<String,Date> map = new HashMap<String,Date>();

  public void nodeAdded(Node n) {
    map.put(n.id(), new Date());
  }

  public void nodeRemoved(Node n) {
    Date d = map.get(n.id());
    String s = d.toString();  // <-- NPE is here!!
    map.removeKey(n.id());

    // some other things ...
  }
}

My unit tests told me that the Auditor and Tree class are working fine - the tree fires the correct events and the Auditor reacts correctly. Of course, a Green bar does not guarantee "no bugs" so I double checked the code and it seemed fine.

Indeed the problem was elsewhere. Turns out that a bug in my app caused each listener to be registered twice. Auditor was not expecting to be called twice for each event, hence the NPE.

So I fixed the bug. Then I went back to the addListener() method of the Tree class. It is a very simple method:

public class Tree {

  private List<TreeListener> listeners
    = new ArrayList<TreeListener>();

  public void addListener(TreeListener arg) {
    listeners.add(arg);
  }

  // Additional methods ...
}

Looking at the code I tried to think if this method should be changed in order to prevent future occurrences of this bug. As I was thinking about it I realized that this simple method is an excellent illustration to one software's most fundamental truths:

Programming is about making (design) decisions.

I am not the first one to say this (see Joakim Holm's "Programming is all Design" or Alistair Cockburn). Here's the (probably partial) list of decision that a programmer has to make when writing a one-liner method that should simply add a listener to a list.

Should the listeners field be private? Maybe we want subclasses to manipulate/inspect it in some way, which calls for protected visibility.
What is the order of notification? Should we use a "first-registered first-notified" policy? or maybe "last-registered first-notified". Is order important at all? If not then maybe we would like to force the application not to depend on the order by random shuffling?
Can a listener be registered with more than one Tree object? If so, then the listener interface should probably specify the originating tree, not just the node
Can a listener be registered more than once with a single Tree object? Usually the answer is no, but there are scenarios where multiple registration does make sense.
Assuming each listener may be registered exactly once, how do we deal with multiple registrations?
- Throw an exception?
- A standard Java assertion? Could be disabled which may be good or bad depending on context.
- Throw an AssertionError? Has a more dramatic effect than a plain Exception. Also will be reported by JUnit as a failure rather than an error. Again, may be either good or bad.
- Silently ignore. Return from the method without adding the listener. This seems very natural but it has the drawback that it hides the problem. If I choose to silently ignore then I will get no notification about future problems. How about logging?
- Return a Boolean value indicating whether the listener was already registered. Puts the responsibility on the client code to check this value. Will not always happen.
Do I need to provide some thread protection?
- Define the method as synchronized?
- Define the listeners field as a ConcurrentLinkedList?
How should this list of listeners interact with the Garbage Collector? Should a listener that is only accessible from this list be collectible (by storing it as a WeakReference)? A "No" answer puts additional burden on client code: must remove listeners from the Tree object when they are no longer needed. If listeners are dynamically added as the app is running this may become a big problem.

Note that choosing weak references is not always a good idea. Some listeners are only accessible from the listened-to object. Take for example a listener that inspects the tree and updates the title of the main window. It is created, pushed onto the listeners list and all other traces to it are lost. A weak reference will make such a listener disappear with the very first collection cycle.
Do I want to refactor the whole "listeners" concern into a dedicated strategy class. This will improve testability and decoupling.

As I said this is a partial list. Point is simple: a programming task requires much more decision (and mental energy) that what initially seems. In other words: If you want to design everything upfront you'll end up programming upfront but without the assistance of tools such as IDE, Unit tests, Refactoring, and the likes. Good luck.

Katacast: String Calculator – Groovy

0 comments

The original idea behind Code Katas was that of a small programming exercise whose solution often includes some interesting challenge. In katacasts.com Corey Haines further explains that:

Over time, the concept of Katas grew from a problem to solve to a solution to practice. Uncle Bob Martin (among others) began talking about the idea of practicing the solution to a kata until the steps and keystrokes became like second nature, and you could do them without thinking.

In TDD Katas the focus in on doing the TDD cycle, Red-Green-Refactor, right. Corey had recently published a screencast where yours truly solves the StringCalculator TDD Kata in Groovy. It illustrates some points about TDDing (like "Fake it till you make it" or "triangulation") in a very clear way.

So, If you feel like you want to better understand TDD you can watch the Kata on Corey's site. An explanation of the different moves is given in the accompanying text.

Another recommendation is to watch Gary Bernhardt's screencast where he refactors cyclomatic complexity code in Python.

Enjoy.

Swing - Tabs to Spaces

12 comments

How do you make a swing text component (JTextEditor, JEditorPane, etc.) to insert spaces whenever the user hits the "Tab" key?

The answer is below. It is based on setting a document filter on the component's document object. The (static) install() method can do all the setup for you.

package com.blogspot.javadots;

import javax.swing.text.AbstractDocument;
import javax.swing.text.AttributeSet;
import javax.swing.text.BadLocationException;
import javax.swing.text.DocumentFilter;
import javax.swing.text.JTextComponent;

public class TabToSpaceFilter extends DocumentFilter {
   
   private String spaces = "";
   private JTextComponent textComponent;
   
   public TabToSpaceFilter(int tabSize, JTextComponent tc) {
      textComponent = tc;
      for(int i = 0; i < tabSize; ++i)
         spaces += " ";         
   }
   
   @Override
   public void insertString(FilterBypass fb, int offset, String text,
      AttributeSet attr) throws BadLocationException {
      super.insertString(fb, offset, translate(offset, text), attr);
   }

   @Override
   public void replace(FilterBypass fb, int offset, int length,
      String text, AttributeSet attr) throws BadLocationException {
      super.replace(fb, offset, length, translate(offset, text), attr);
   }

   private String translate(int offset, String s) {
      int col = columnOf(offset);
      
      StringBuilder sb = new StringBuilder();
      int top = s.length();
      for(int i = 0; i < top; ++i, ++col) {
         char c = s.charAt(i);
         if(c == '\t') 
            sb.append(spaces.substring(col % spaces.length()));
         else
            sb.append(c);
      }   
      
      return sb.toString();
   }

   private int columnOf(int i) {
      String s = textComponent.getText();
      if(i == 0)
         return 0;
      int prev = s.lastIndexOf("\n", i-1);
      if(prev < 0)
         return i;
      
      return (i-prev)-1;
   }
   
   public static void install(int tabSize, JTextComponent area) {
      AbstractDocument ad = (AbstractDocument) area.getDocument();
      ad.setDocumentFilter(new TabToSpaceFilter(tabSize, area));
   }
}

How JUnit's assertArrayEquals() should be implemented