General musings on programming languages, and Java.

Wednesday, June 13, 2007

Improving Your Visitors, and a Request For More Autoboxing in Java 7

Prior to Java 5, there was one really common way of implementing the visitor pattern. I'll use my network simulator as an example, defining a visitor that can visit any kind of network component (a card, a cable, a hub or a computer):

public interface Visitor
{
    void visit(Card card);
    void visit(Hub hub);
    void visit(Cable cable);
    void visit(Computer computer);
}
and an example accept method:
public void accept(Visitor visitor)
{
    visitor.visit(this);
}
For those not initiated with how this works, here's a typical example:
public static void drawAComponent
       (NetworkComponent component,final Graphics graphics)
{
    component.accept(new Visitor()
    {
        public void visit(Card card)
        {
            graphics.drawRect(10,10,20,20);
        }

        public void visit(Computer computer)...
    });
}
It works quite well, and you can adapt it to make it work over a collection. In IPSim there is a tree of components (each component knows its children, and there is a collection of top-level components). So you can do:
network.visitAll(new Visitor(){etc.});
If we want to inspect the object (get a value back from it) rather than do something, then we end up with code like this:
public static Dimension getSize(NetworkComponent component)
{
    Dimension[] result={null};

    component.accept(new Visitor()
    {
        public void visit(Card card)
        {
            result[0]=new Dimension(50,60);
        }
        etc.
    });

    return result[0];
}
Clearly it would be useful if the visitor could return something instead of having to set some value. We could make the Visitor's methods return an int, but then we'd have to write it again if we later needed another type. There are two solutions to this: 1. Make the methods return Object. This obviously results in casting, which we all know is for magicians. Let's move on to Java 5. 2. Make the methods return T, where T is a type parameter on Visitor. Ok, easy:
public interface Visitor<T>
{
    T visit(Card card);
    T visit(Cable cable);
    T visit(Hub hub);
    T visit(Computer computer);
}
and the accept method:
public T accept(Visitor<T> visitor)
{
    return visitor.visit(this);
}
The only problem with that is that T is undeclared in the accept method. Don't make the mistake of declaring it on the class (class Card<T> implements NetworkComponent<t>), because then you'll only be able to use one type of Visitor with each instance you create. Declare it on the method:
public <T> T accept(Visitor<T> visitor)
{
    return visitor.visit(this);
}
Now you've got something more useful, the above clunky code becomes:
public static Dimension getSize(NetworkComponent component)
{
    return component.accept(new Visitor<Dimension>()
    {
        public Dimension visit(Card card)
        {
            return new Dimension(50,60);
        }
        etc.
    });
}
This becomes much more interesting when you apply it across a collection. Instead of writing code that iterates over all the components and does something, we can write code that generates a result. For example, suppose I want to know if any of my components are a crossover cable. I want to visit all the components, and return true if any of them are a crossover cable. I could store a list of booleans, to process later, but that would be wasteful and indirect. What I want to do is apply the || operator between each pair of results.
public static boolean areAnyCrossover(Network network)
{
    return network.visitAll(new Visitor<Boolean>()
    {
        public Boolean visit(Cable cable)
        {
            return cable.isCrossover();
        }

        public Boolean visit(Computer computer)
        {
            return false;
        }

        public Boolean visit(Card card)
        {
            return false;
        }

        public Boolean visit(Hub hub)
        {
            return false;
        }
    },new Combinator<Boolean,Boolean>()
    {
        public Boolean combine(Boolean one,Boolean two)
        {
            return one||two;
        }
    });
}
Whew. This is an implementation (or at least behind the scenes there is one) of the 'reduce' algorithm, plus a bit of mapping going on first. It's very flexible. There's quite a bit of boilerplate above though, and as the number of component types grows, the amount of boilerplate grows and grows. The usual solution is to make an abstract class with empty (return null;) methods for each visit method, which you subclass to customise. That works pretty well (remember the @Override annotation!), but I've been avoiding subclassing for some time now, so I immediately started to wonder whether there was another way to do this. I'm not a zealot, if I didn't find another way, I'd just subclass. I found another way..
public static boolean areAnyCrossover(Network network)
{
    return network.visitAll(new DefaultVisitor<Boolean>()
        .visitCards(new Function<Card,Boolean>()
        {
            public Boolean invoke(Card card)
            {
                return card.isCrossover();
            }
        })
        .visitAnythingElse(new Function<NetworkComponent,
            Boolean>()
        {
            public Boolean invoke(NetworkComponent any)
            {
                return false;
            }
        }),new Combinator<Boolean,Boolean>()
        {
            public Boolean combine
                (Boolean one,Boolean two)
            {
                return one||two;
            }
        }
    });
}
Ok, it doesn't look brilliant, but it does look a little like functional programming. Let's make these anonymous classes look more like functions:
public static boolean areAnyCrossover(Network network)
{
    return network.visitAll(new DefaultVisitor<Boolean>()
        .visitCards({Card card => card.isCrossover()})
        .visitAnythingElse(
            {NetworkComponent component => false}),
        {Boolean one,Boolean two => one||two});
}
One more improvement would be if it were possible to get at || as a function. Let's imagine that #|| works for this. Let's also imagine that closures allowed you to elide unused parameters, that type inference were supported, and that method references were supported (!):
return network.visitAll(defaultVisitor
    .visitCards(Card#isCrossover())
    .visitAnythingElse({ => false}),#||);
Interestingly, it still looks like Java, and it doesn't get bloated whenever you add a type. I don't suppose for an instant that Java 7 will go this far. For an example of the <T> visitor pattern in current Java, see the source code for javac. It's littered with them, and it's quite nice to work with, but at least for now, remember your @Override.. And back to normal Java, suppose you implement Visitor<T>, but then come across those cases where you don't want to return something.
void doSomething(NetworkComponent component)
{
    component.accept(new Visitor<Void>()
    {
        public Void visit(Card card)
        {
            whatever.
            return null;
        }
        etc.
    };
}
I'd like to be able to use my Visitor type with a <void> type parameter, but I cannot. Please, if you got this far, and are Neal Gafter or someone else relevant, consider allowing primitive generic type parameters. As far as I can see, this can all be done at the compiler level, making a Visitor<void> appear as a Visitor<Void> in bytecode. At the moment, I often implement two Visitor types for each hierarchy - one returning void, and one T. Perhaps that's overkill, but it reminds me that it's a workaround.

16 comments:

swankjesse said...

Nice post. It would be nice syntactic sugar if for all methods that return "Void", a 'return null' was inserted automatically as necessary.

Neal Gafter said...

The current closure spec allows you to elide the result expression if the closure result type is Void. It's the first sub-bullet in the section entitled "Closure Conversion".

Ricky Clarkson said...

Neal, does it allow the same for an ordinary method (i.e., not a closure)? What I'd like is:

public Void x() { }

or

'void' as a valid type parameter.

Unknown said...

Talking about improvements to Visitor, another technique that is used in functional languages is Pattern Matching. Scala implements a more powerful Visitor through case classes and pattern matching. And since Scala runs on the JVM, you can get the benefits of an improved Visitor through much less boilerplates.
Just another thought ..

hlovatt said...

Two comments:

1. C3S infers type Void for all methods not just inner classes/closures, i.e. it does what you want. Rule 13 in:


http://www.artima.com/weblogs/viewpost.jsp?thread=182412


2. The pattern enforcing compiler has multiple dispatch as a pattern. Multiple dispatch is easier to follow than visitor.


http://pec.dev.java.net/nonav/compile/index.html

David R. MacIver said...

I second the call for multiple dispatch. Visitor patterns are an amazingly ugly way of faking a worse version of multiple dispatch. :-) (indeed I generally don't even write them in favour of using instanceof checks for specialisation)

The only problem with introducing multiple dispatch to Java would be the potential for really really subtle bugs with legacy code - all previous code will still be valid, it just might do something very slightly different. Maybe a @Multiple annotation. (Gee, is it just my imagination or do all these "It would be great if we had an annotation which would let me do Foo" comments sound like "It would be great if we had an annotation driven macro system").

Properly working Void type parameters allowing void return types are clearly a good idea though, yes. Primitive type parameters would be nice too.

Ricky Clarkson said...

I've used pattern matching in Haskell before, but it seems more useful for things that are not expressed in types, than for dispatch based on type.

I like Common Lisp's dynamic dispatch support.

(defclass card () ())
(defclass computer () ())
(defmethod draw ((card card)) "drawing a card")
(defmethod draw ((computer computer)) "drawing a computer")

> (draw (make-instance 'computer))
"drawing a computer"

The same syntax works for multiple parameters. I like it because it eliminates the choice as to which class a method that operates on two types goes in.

Alex Buckley suggested that I file an RFE, here it is: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6569520

Anonymous said...

See javax.lang.model packages for visitors like this which allow <Void> as a return type.

Anonymous said...

I'd say, allowing void as Generic Parameter would be a bit tricky:

class Holder<T> {
´ ´ private T t;
´ ´ public T get() {
´ ´ ´ ´ return t;
´ ´ }
}
new Holder<void>();

which would introduce a field of type void and return from within a void method.
Optional return on Void might work better.

David R. MacIver said...

A field of type void is hardly a problem. There's already a Void parameter after all. There's just no way to instantiate it to anything non-null.

Anonymous said...

Well, void is no type (yet) and would be the first primitive type to be nullable (which is its only value, of course).
For returning one could make "return" be an alias for "return null", so this could work, too.
Not sure, if null is the same as void, though. And the result of a Void method is assignable while a void method isn't (yet).

Ricky Clarkson said...

After I submitted the RFE, Alex Buckley (Java 7 lead) sent an email - here's a portion:

"I don't want to touch generics, so need to keep Void as the type arg and then omit 'return null' on Void methods."

I think I agree.

hlovatt said...

In C3S the proposal is that if a method has type Void (any method not just inner classes and not just generic return types) then return or falling off the end is equivalent to return null. I think this is a nice solution, but like all extensions to Java you not only have to ask whether the solution works but also if it is worth it.

I included the idea in C3S because I thought it was worth it. But many other people feel that too many changes are proposed.

hlovatt said...

Here is a multiple dispatch example using the PEC compiler. The PEC compiler uses interfaces as markers, in this case the marker is MultipleDispatch. Also the PEC compiler introduces **no** new syntax, so you need to add dummy methods that act as place holders. (Both these restrictions might change in a future version of PEC.)

First define an interface with multiple dispatch methods:

interface Numeric extends MDEquatable, MDComparable, MultipleDispatch {
__Numeric plus( Numeric rhs );

__Numeric minus( Numeric rhs );

__Numeric times( Numeric rhs );

__Numeric divide( Numeric rhs );
}

MDEquatable give a multiple dispatch equals method and similarly MDComparable a multiple dispatch compare to. As mentioned above the interface MultipleDispatch marks the methods as using multiple dispatch. Then define an abstract class that contains the dummy methods needed to keep the code source conforming to standard Java and hence allow interoperation with existing code and existing tools.

public abstract class AbstractNumeric implements Numeric {
__// Dummy methods for Numeric - body replaced by compiler,
__// therefore body can be anything that will compile
__public final int compareTo( final Object notUsed ) { throw new AssertionError(); }

__// As above for equals, plus, minus, times, divide

__// Default methods for Numeric - the action to take if no closer match method is found
__// These are multiple dispatch methods and are therefore declared as static methods
__public static final boolean equals( final MDEquatable notUsed, final Object notUsed2 ) {
____return false;
__}

__public static final int compareTo( final Comparable lhs, final Object rhs ) {
____throw new ClassCastException( ( rhs == null )
______? "Argument is null"
______: "No multiple-dispatch method that can compare " + lhs + " to " + rhs );
__}

__// No default methods for plus, minus, times, and divide - they throw IllegalArgumentException
}

Then you can write different implementations and put the methods in **any** class. The methods are declared as static methods and all arguments are explicit, since you dispatch on all arguments and the methods can be in any class. E.G. a Double type

public class DoubleMD extends AbstractNumeric {
__private final double value;

__public DoubleMD( final double value ) { this.value = value; }

__// Multiple-dispatch methods for DoubleMDs – note they are static methods
__public final static int compareTo( final DoubleMD lhs, final DoubleMD rhs ) {
____return Double.compare( lhs.value, rhs.value );
__}

__public final static boolean equals( final DoubleMD lhs, final DoubleMD rhs ) {
____return Double.compare( lhs.value, rhs.value ) == 0;
__}

__public final static Numeric plus( final DoubleMD lhs, final DoubleMD rhs ) {
____return new DoubleMD( lhs.value + rhs.value );
__}

__// minus, times, and divide are similar to plus above

__ // Other methods, e.g. normal single-dispatch methods
}

The DoubleMD class isn't that interesting on its own, but now suppose we add a Complex class:

public class ComplexMD extends AbstractNumeric {
__private final double real;

__private final double imaginary;

__public ComplexMD( final double real, final double imaginary ) {
____this.real = real;
____this.imaginary = imaginary;
__}

__// Multiple-dispatch methods for ComplexMD and DoubleMD arguments in the **same** class
__public final static int compareTo( final DoubleMD lhs, final ComplexMD rhs ) {
____return compareTo( new ComplexMD( lhs ), rhs );
__}

__public final static int compareTo( final ComplexMD lhs, final DoubleMD rhs ) {
____return compareTo( lhs, new ComplexMD( rhs ) );
__}

__public final static int compareTo( final ComplexMD lhs, final ComplexMD rhs ) {
____return Double.compare( lhs.abs(), rhs.abs() );
__}

__// As above for equals, plus, minus, times, and divide

__// Other methods, e.g. normal single dispatch
}

Note:

1.That compareTo( DoubleMD, ... ) methods can be in ComplexMD, they don't have to be in DoubleMD.

2.How you can control the action of compareTo by having multiple methods with different argument types.

The code when run selects the best fit method on the basis of the actual type, not the declared type like normal overloading does, e.g.:

__Numeric d = new DoubleMD( 1 );
__Numeric c = new DoubleMD( 2, 3 );
__d.compareTo( c ); // calls compareTo( DoubleMD, ComplexMD )

With single dispatch the above call would call method compareTo( Numeric ) in DoubleMD because the receiver (hidden this) is of type DoubleMD and the argument, c, is declared as Numeric, the fact that c is a ComplexMD would be ignored. Hence in single dispatch the visitor pattern would be used, but the code for a visitor is obscure.

More info at:


https://pec.dev.java.net/nonav/compile/javadoc/pec/compile/multipledispatch/package-summary.html


If any one tries out the PEC compiler let me know how you go (howard dot lovatt at iee dot org).

Ricky Clarkson said...

Howard, that seems like a reasonable implementation. I wonder why you didn't use annotations instead of interface inheritance, and why the compareTo has Object instead of T. Is there a technical limitation, or is it just that it was written before Java 5?

I'm not sure that I've used a statically-typed system that supports dynamic dispatch other than using the visitor pattern, so I have nothing to compare your implementation to. It reminds me of how RIFE implements continuations though.

hlovatt said...

The current PEC that I have released is pre-generics and pre-annotations and I am working on an improved version that will use both. However erasure is a pain for multiple dispatch and annotations have some problems for this application also.

RE erasure: I am still working on this one, I haven't found my ideal solution yet.

RE annotations: For many design patterns you want a type, e.g. Immutable, so that you can declare methods that deal with immutables or lists of immutables. Unfortunately annotations aren't types, they aren't inherited from interfaces, and you can't extend an annotation. Therefore the new design uses a mixture of annotations and marker interfaces.

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.