Bad idiom: assign and return

Here is the first example for an ill-designed method. I encountered this idiom as I was solving a bug (gross estimate: 60 minutes), in a project based on the Eclipse Java compiler (JDT).

The problem is illustrated by the computeTotal(Pricer p) method below (in Eclipse JDT it occurs in the Expression.resolveType(BlockScope s) method). Can you spot the bug?
class Item { ... }
class Pricer { public double priceOf(Item i) { ... } }

abstract class Cart {

Cart(List<Item> is) { items = is; }
double computeTotal(Pricer p) {
double result = 0.0;
for(Item i : items)
result += p.priceOf(i);
return result;

protected final List<Item> items;
public double total;

class DiscountCart extends Cart {

DiscountCart(List<Item> is) { super(is); }
void add(Item i) { is.add(i); }

public double computeTotal(Pricer p) {
return super.computeTotal(p)*0.85;
computeTotal() must return a value. so you typically write code the computes total price and returns it. There is nothing that reminds you to store this result at the total field (inherited from the superclass).
Even in this small snippet it is likely to miss this subtle issue, nonetheless if Cart has many more methods, and computeTotal() is just one of serveral methods which you override/implement.

The underlying problem here is that of overlapping conerncs: We expect a single method to act both as a getter and as a mutator. The developer of subclasses is likely to overlook one of these concerns.

It seems that the best solution is to split the responsibilites: Change the return type of computeTotal() to void, thereby indicating that it is a mutator (of the total field) rather than a getter. This will make the method more coherent, thus clarifying its contract.

Note that the problem is not due to being a public field. Even if we had a getter method that encapsulates, such a method could not have calculated the total value without a Pricer object. So you'd have to add a Pricer parameter to your getter method, which then becomes just a renamed version of computeTotal().

Javac Internals

In my research project I develop the Whiteoak compiler. Whiteoak is a language that extends Java with features like structural conformance and type classes.

Here is some information to help you better undersrand the design of the javac compiler (version 1.5). This should be helpful for anyone trying to hack javac to support additional functionality.

Parsing (package:
  • The lexical scanner is writen by hand. Javac does not use an autmatically generated scnner (for performance reasons). If you need to add a keyword/operator/etc. open the Scanner class.
  • For the same reasons, the Parse class is also a manually written recursive descent parser.
  • The superclass of all AST classes is Subclasses are defined as inner classes of Tree (you CAN define your own nodes outside of the Tree class).
  • AST nodes can be instantiated simply by calling the constructor. You don't have to use the factory class Nonetheless, TreeMaker is convenient because it sets the pos field of the generated nodes, thereby making sure error messages are assoicated with the correct source line.
The basic architecture
  • The main class: The main method there is compile()
  • A single context object is created for each inovcation of Main.compile(). This context object represents the current compilation session.
  • An AST class provides just the basic information about a node. The compilation process is carried out through several "modules" that are implemented externanlly to the AST classes, E.g.: type checking, bytecode generation, inference of type paramters, ...
  • Some of the modules are implemented as visitors. Other modules are used by visitor modules but are not visitors themselves.
  • The modules are singletons with respect to a compilation session and are cached by the Context object. Thus, if you need to access (for example) the symtab module use Symtab.instance(context) to get this instance.
Major modules

Here are the modules that do most of the compilation work.
  • - Rewrites AST nodes (e.g.: generates calls to Integer.valueOf()whenever a boxing of an int is in order);
  • - Augments the AST with type/symbol infromation
  • - Defines symbols for encountered definitions
  • - Generates bytecode
  • - Type system (subclassing, unboxed type, ...)
  • - Predefined symbols (primitive types, Object type, Class type)
  • -Type checking helper class
  • - String table
Utility Classes (package:
  • List - A generic immutable (functional) list. Note that many methods return a new list so you cannot ignore the return type. For example, prepending the integer 5 to the list ms: List ns = ms.prepend(5);
  • ListBuffer - A builder for a List
  • Name - Similar to Java's String, but uses a string table to ensure that two identical strings will be represented by the same Name object. Use one of the fromXXX() static methods to obtain an object. Use Name.Table.instance(Context) to obtain the Table module.
Symbols vs. Nodes

Some of the AST nodes have a symbol field (in particular: Tree.Ident as well as the various nodes that represent declarations). The Symbols hierarchy is somewhat similar to the AST hierarchy so it is important to understand the differences to avoid confusion.

The first step in the compilation process is lexical scanning+parsing which outputs a TREE structure (the AST). A tree has no cycles so if your program has three references to a variable named 'x' the resulting AST will have three identifier nodes holding the token 'x'.

In some point during compilation the compiler realizes that these references all refer to the same entity: the variable x. Obviously, it is much easier to work with a single node than three (for example, you don't want to record x's type in three different locations).
To this end, the compiler generates a GRAPH structure where each definition in the program is represented exactly once. In our simple example, the compiler creates a VarSymbol object to represent the x variable. Then, the symbol field in the three AST nodes (the three references to 'x') is set to point at this VarSymbol object.
Later on, when x's type is resolved, the type field in the VarSymbol vertex will be assigned with a ClassSymbol object, which in turn points to symbols representing its superclass, superinterfaces, fields, methods, etc.

To conclude, the symbol field in an AST node associates the TREE node with the corresponding vertex from the GRAPH of symbols.

Subclassing a module

Let's assume you want to subcalsss the Symtab modue.
  1. First, define a new subclass of the Symtab class: NewSymtab.
  2. Add a constructor that takes a Context object as a parameter.
  3. Add a preRegister() method to the NewSymtab class:
    public static void preRegister(final Context context)
    context.put(symtabKey, new Context.Factory()
    public Symtab make()
    return new NewSymtab(context);
  4. Add a call to NewSymtab.preRegister(context) immediately after the new Context object is created in Main (Subclassing Main to do that is a good idea).
  5. Note that NewSymtab will inherit a static instance(Context) method from its Symtab. The caching framework ensures that this method will return the correct object (i.e., a NewSymtab object) but the return type is the superclass type: Symtab. Therefore, you should redefine the static instance() method as follows:
    public static NewSymtab instance(Context context)
    return (NewSymtab) Symtab.instance(context);
  6. If your subclass needs to access other modules, add these modules as fields, and initialize them in the constructor by calling the corresponding instance(Context) method. In some cases you can inherit such fields from the super-class but usually the visibility level is private so the subclass cannot access these.

Developing on the streets

A strange thing happens with programming. On the one hand, modern programming languages are so much better than what we programmers used to have twenty-, ten- and even five- years ago. On the other hand, developing a serious program is still a difficult and time consuming task. It seems that Object Oritented Programming does not live up to its promises: Many times it is a long time to develop even the initial prototype, and then it is difficult to change the program, or even just to understand your own source code.

Is oop really the answer? Is aop any better?
Is static type safety really necessary? Aren't we better off with dynamic languages?
Is is true that Haskell programs are simpler than Java programs?
Is there a plain and simple answer to all these nagging little problems of Programming, Debugging and Everything?!

Probably, the answer to these questions is 42.

Therefore, the (only?) plausible goal for this blog is to make these questions a bit more concrete, hoping that someday we will understand what the answer means.

Therefore, in this blog I will collect specific examples of programming difficulties on the one hand, and nice design solutions on the other hand. From time to time some insights (regaeding software design or programming langues design) will stem from these examples.

And, yes, I will also write about my day to day programming task.