Java: Developing On The Streets: 2006

Inline temps? I think not

Are these two snippets equivalent?

Snippet A:

  this.binding.type 
    = this.initialization.resolvedType;
  this.type.resolvedType = this.binding.type;     
  this.initialization.setExpectedType(
    this.initialization.resolvedType);
  this.initialization = new CastExpression(this.binding.type);

Sinppet B:

  TypeBinding temp 
    = this.initialization.resolvedType;
  this.binding.type = temp;
  this.type.resolvedType = temp;     
  this.initialization.setExpectedType(temp);
  this.initialization = new CastExpression(temp);

Which of the two snippets is easier to read?

The reason that the second piece of code is more straight forward is due to the local variable "temp" which gives a symbolic name to some value. Such a symbolic name makes it easy to understand the code and follow the flow of data therein.

Too many people advocate the eliminatation of temporary variables. Even the (otherwise excellent) refactroing catalog shows you how to "Inline Temps" and "Replace Temps with Queries". True, the motivation there is to transform the code such that further refactoring steps are made possible, but still, it leaves the wrong impression.

My two cents: Introduce temps! Replace queries with temps!

Safe and easy elimination of dead code

1 comments

One of my projects is an extension to an open source Java project. A new version of the open source project has been released so I sat down to upgrade my baseline code to this new version.
This transition created some compilation problems, which were quite easy to fix (quite easy ~ 2 hours). Naturally, I used this opportunity to do some refactoring.

In particular, I introduced a global factory, and changed the code such that instantiations of major classes are now delegated to a factory method in this factory object, instead of having an explicit, hard-wired, new T() expressions.

Having completed that I noticed I still have a compilation error in one of my classes. It was a small factory class, which I had used for similar purposes as the global one, but to a smaller extent.

Anyway, I thought that this class is not needed anymore since its functionality is already provided by the global factory. If this is indeed that case, then I don't need to bother about fixing this class, I just need to delete it.

The problem: How can I determine whether my program uses a certain class?
The solution: Just delete it. If no new compiler errors appear, then this class is a dead class, and can stay in the trash forever. If not, you can always restore it from the CVS.

So I deleted the class. A new compiler error appeared at another class, but (again) this class seemed to be redundant in the new design. So I deleted this other class, and that's it: All my compiler errors disappeared, indicating that these classes are really, absolutely, definitely dead ("These classes wouldn't "voom" if you put four million volts through it!").

The moral: The easiest way to know if you can delete a class is to delete it. Let the compiler tell you if it really needs it.

The insight: Can you really do it in a dynamic language (SmallTalk, Ruby and the rest of the gang)? Would you 100% trust some tool saying that your Smalltalk program does not use a certain class?

Bad idiom: Returning a parameter

2 comments

Some Background

It is common to design a method whose return type is the same as the enclosing class. Logger.log is such a method:

class Logger {
   int n = 0;
   Logger log(String s) { 
      System.out.println("[" + (n++) + "] " + s);  
      return this; 
   }
   void flush() { System.out.flush(); }
}

Basically, there are two reasons for doing that. First, this design makes it possible to send several messages to an object in a single compact expression, as in:

logger.log("abc").log("def").flush();

The second reason is related to immutable classes. In such classes a "state changing" method is actually a method that returns a new object. This style of programming (which I personally quite fond of) is ripped off the world of funtional-programming. Here is how it comes about in an immutable Java List:

class List<T> {
   public final T t;
   public final List<T> ts;
   public List(T x, List<T> xs) { t = x; ts = xs; }
   public List<T> prepend(T x) { 
      return new List<T>(x, this); 
   }
}

The actual sin (i.e.: Returning a paramete)

So we now know what are chain methods. The bad idiom I want to talk about is that of returning a parameter:

class SomeClass {
   public StringBuilder dump(StringBuilder sb) {
      sb.append(this.toString());
      return sb;
   }
}

Why would a programmer design such a method?
Well, (again) it makes it possible to write a compact expression with several method calls:

void someClient(SomeClass o1) { 
   // Assumes that dump(x) returns x
   StringBuilder builder = new StringBuilder("o1=");
   o1.dump(builder);
   builder.append("\n");
   System.out.println(builder.toString());
}

So what's the problem?
It is completely unclear whether such a method (a-la SomeClass.dump()) should return its parameter or a new copy of it.

someClient() assumes that dump() returns the object that is passed in as the sb parameter, so it ignores since it already has this object pointed by the builder variable. Consequently someClient() would break if SomeClass.dump() were to return a new object.

In the other case, if the client expects the method to return a new object, the caller may mutate the object (passed as parameter) without knowing it is effectively mutating the result:

void someClient(SomeClass o1) { 
   // Assumes that a new object is returned from dump(x)
   StringBuilder builder = new StringBuilder("o1=");
   StringBuilder result = o1.dump(builder);
   builder.clear(); builder.append("....");
   System.out.println(builder.toString() + " " + result.toString());
}

Just imagine a class with such a method, which is also overridden in a sublcass. The superclass' method takes the first approach where the overriding method takes the second approach. A nightmare, isn't it?

Aftermath

Why is a method retuning a parameter more risky than a chain method (as described in the background)?
A misunderstanding between a client and a provider can occur also with chain methods. But, there is one difference that make chain methods somewhat safer: the protocol of a class usually indicates whether it is an immutable class (in which case the return values should not be ignored) or not.

In the case of a method returning a parameter the indication is buried inside the body of the method (which is usually not available to the client), so we are much more likely to make the wrong guess.

One may argue that this problem can be easily solved using proper documentation. As a principle, design should not rely on comments, for two reasons: (1) comments are written in plain English so they tend to unclear and imprecise. (2) When the code changes there is nothing that forces the developer to keeps the comment in sync with the changes.

Finally, It seems that (imperative) programming languages need to support a "non-ignorable-return-type" annotation. The compiler would issue an error if the return value of method carrying such an annotation is not assigned to a variable. This would turn the problem described in this post into a non-problem.

Bad idiom: assign and return

0 comments

Here is the first example for an ill-designed method. I encountered this idiom as I was solving a bug (gross estimate: 60 minutes), in a project based on the Eclipse Java compiler (JDT).

The problem is illustrated by the computeTotal(Pricer p) method below (in Eclipse JDT it occurs in the Expression.resolveType(BlockScope s) method). Can you spot the bug?

class Item { ... }
class Pricer { public double priceOf(Item i) { ... } }

abstract class Cart {

   Cart(List<Item> is) { items = is; }
   double computeTotal(Pricer p) {
      double result = 0.0;
      for(Item i : items)
         result += p.priceOf(i);
      return result;
   }

   protected final List<Item> items;
   public double total;
}

class DiscountCart extends Cart {

   DiscountCart(List<Item> is) { super(is); }
   void add(Item i) { is.add(i); }

   public double computeTotal(Pricer p) {
      return super.computeTotal(p)*0.85;
   }
}

computeTotal() must return a value. so you typically write code the computes total price and returns it. There is nothing that reminds you to store this result at the total field (inherited from the superclass).
Even in this small snippet it is likely to miss this subtle issue, nonetheless if Cart has many more methods, and computeTotal() is just one of serveral methods which you override/implement.

The underlying problem here is that of overlapping conerncs: We expect a single method to act both as a getter and as a mutator. The developer of subclasses is likely to overlook one of these concerns.

It seems that the best solution is to split the responsibilites: Change the return type of computeTotal() to void, thereby indicating that it is a mutator (of the total field) rather than a getter. This will make the method more coherent, thus clarifying its contract.

Note that the problem is not due to Cart.total being a public field. Even if we had a getter method that encapsulates Cart.total, such a method could not have calculated the total value without a Pricer object. So you'd have to add a Pricer parameter to your getter method, which then becomes just a renamed version of computeTotal().

Javac Internals

4 comments

In my research project I develop the Whiteoak compiler. Whiteoak is a language that extends Java with features like structural conformance and type classes.

Here is some information to help you better undersrand the design of the javac compiler (version 1.5). This should be helpful for anyone trying to hack javac to support additional functionality.

Parsing (package: com.sun.tools.javac.parser)

The lexical scanner is writen by hand. Javac does not use an autmatically generated scnner (for performance reasons). If you need to add a keyword/operator/etc. open the Scanner class.
For the same reasons, the Parse class is also a manually written recursive descent parser.
The superclass of all AST classes is com.sun.tools.javac.tree.Tree. Subclasses are defined as inner classes of Tree (you CAN define your own nodes outside of the Tree class).
AST nodes can be instantiated simply by calling the constructor. You don't have to use the factory class com.sun.tools.javac.tree.TreeMaker. Nonetheless, TreeMaker is convenient because it sets the pos field of the generated nodes, thereby making sure error messages are assoicated with the correct source line.

The basic architecture

The main class: com.sun.tools.javac.main.Main. The main method there is compile()
A single context object is created for each inovcation of Main.compile(). This context object represents the current compilation session.
An AST class provides just the basic information about a node. The compilation process is carried out through several "modules" that are implemented externanlly to the AST classes, E.g.: type checking, bytecode generation, inference of type paramters, ...
Some of the modules are implemented as visitors. Other modules are used by visitor modules but are not visitors themselves.
The modules are singletons with respect to a compilation session and are cached by the Context object. Thus, if you need to access (for example) the symtab module use Symtab.instance(context) to get this instance.

Major modules

Here are the modules that do most of the compilation work.

com.sun.tools.javac.comp.Lower - Rewrites AST nodes (e.g.: generates calls to Integer.valueOf()whenever a boxing of an int is in order);
com.sun.tools.javac.comp.Attr - Augments the AST with type/symbol infromation
com.sun.tools.javac.comp.Enter - Defines symbols for encountered definitions
com.sun.tools.javac.jvm.Gen - Generates bytecode
com.sun.tools.javac.code.Types - Type system (subclassing, unboxed type, ...)
com.sun.tools.javac.code.Symtab - Predefined symbols (primitive types, Object type, Class type)
com.sun.tools.javac.code.Check -Type checking helper class
com.sun.tools.javac.util.Name.Table - String table

Utility Classes (package: com.sun.tools.javac.util)

List - A generic immutable (functional) list. Note that many methods return a new list so you cannot ignore the return type. For example, prepending the integer 5 to the list ms: List ns = ms.prepend(5);
ListBuffer - A builder for a List
Name - Similar to Java's String, but uses a string table to ensure that two identical strings will be represented by the same Name object. Use one of the fromXXX() static methods to obtain an object. Use Name.Table.instance(Context) to obtain the Table module.

Symbols vs. Nodes

Some of the AST nodes have a symbol field (in particular: Tree.Ident as well as the various nodes that represent declarations). The Symbols hierarchy is somewhat similar to the AST hierarchy so it is important to understand the differences to avoid confusion.

The first step in the compilation process is lexical scanning+parsing which outputs a TREE structure (the AST). A tree has no cycles so if your program has three references to a variable named 'x' the resulting AST will have three identifier nodes holding the token 'x'.

In some point during compilation the compiler realizes that these references all refer to the same entity: the variable x. Obviously, it is much easier to work with a single node than three (for example, you don't want to record x's type in three different locations).
To this end, the compiler generates a GRAPH structure where each definition in the program is represented exactly once. In our simple example, the compiler creates a VarSymbol object to represent the x variable. Then, the symbol field in the three AST nodes (the three references to 'x') is set to point at this VarSymbol object.
Later on, when x's type is resolved, the type field in the VarSymbol vertex will be assigned with a ClassSymbol object, which in turn points to symbols representing its superclass, superinterfaces, fields, methods, etc.

To conclude, the symbol field in an AST node associates the TREE node with the corresponding vertex from the GRAPH of symbols.

Subclassing a module

Let's assume you want to subcalsss the Symtab modue.

First, define a new subclass of the Symtab class: NewSymtab.
Add a constructor that takes a Context object as a parameter.

Add a preRegister() method to the NewSymtab class:

public static void preRegister(final Context context)
   {
      context.put(symtabKey, new Context.Factory()
      {
         public Symtab make()
         {
            return new NewSymtab(context);
         }
      });
   }

Add a call to NewSymtab.preRegister(context) immediately after the new Context object is created in Main (Subclassing Main to do that is a good idea).
Note that NewSymtab will inherit a static instance(Context) method from its Symtab. The caching framework ensures that this method will return the correct object (i.e., a NewSymtab object) but the return type is the superclass type: Symtab. Therefore, you should redefine the static instance() method as follows:
public static NewSymtab instance(Context context)
{
return (NewSymtab) Symtab.instance(context);
}
If your subclass needs to access other modules, add these modules as fields, and initialize them in the constructor by calling the corresponding instance(Context) method. In some cases you can inherit such fields from the super-class but usually the visibility level is private so the subclass cannot access these.

Developing on the streets

0 comments

A strange thing happens with programming. On the one hand, modern programming languages are so much better than what we programmers used to have twenty-, ten- and even five- years ago. On the other hand, developing a serious program is still a difficult and time consuming task. It seems that Object Oritented Programming does not live up to its promises: Many times it is a long time to develop even the initial prototype, and then it is difficult to change the program, or even just to understand your own source code.

Is oop really the answer? Is aop any better?
Is static type safety really necessary? Aren't we better off with dynamic languages?
Is is true that Haskell programs are simpler than Java programs?
Is there a plain and simple answer to all these nagging little problems of Programming, Debugging and Everything?!

Probably, the answer to these questions is 42.

Therefore, the (only?) plausible goal for this blog is to make these questions a bit more concrete, hoping that someday we will understand what the answer means.

Therefore, in this blog I will collect specific examples of programming difficulties on the one hand, and nice design solutions on the other hand. From time to time some insights (regaeding software design or programming langues design) will stem from these examples.

And, yes, I will also write about my day to day programming task.

Java: Developing On The Streets

Inline temps? I think not

Safe and easy elimination of dead code

Bad idiom: Returning a parameter

Bad idiom: assign and return

Javac Internals

Developing on the streets

About Me

Blog Archive

My Blog List