General musings on programming languages, and Java.

Wednesday, October 31, 2007

Java 7 Example - Pattern Matching

One way you can measure a programming language is by taking a feature, either one that it has, or not, and seeing how close you can get to implementing that feature without writing your own parser, etc. Let's take a look at (and copy!) one of Scala's (and many other languages') features; pattern matching:

animal match {
    case Dog() => "woof"
    case Cat() => "meow"
}
It's very similar to a switch statement, and the above really doesn't show how flexible pattern matching is, but it will suffice for now. The usual way of writing the above code in Java would be with some nested instanceof checks and casts, or by adding a speak() method to the Animal type hierarchy.

It's not always practical to add methods to types directly in Java.. e.g., if you're dealing with classes you don't control. And instanceof is fallible - you don't want to have: if (animal instanceof Car) pass compile-time checks, but it does. So let's see how close we can get to the above pattern matching using the Java 7 prototype. I hope you can do better than me!

Here's what I'd ideally want in Java:

Animal animal=Math.random()<0.33 ? new Cat() : Math.random()<0.5 ? new Dog() : new Horse();

System.out.println(match(animal,
                        {Cat c=>"meow"},
                        {Dog d=>d.name+" says woof"},
                        {Horse h=>"neigh"}));
What type would match take as parameters? For now, let's imagine that we're always dealing with Animals, and always returning Strings.

public String match(Animal animal,{? extends Animal=>String}... cases)

That's not legal - {? is an "illegal start of type" according to the compiler. Neal said that because the things on the left hand side of a closure automatically have ? super in their types, I can't also have ? extends. If I replace it with the interface that javac generates it compiles fine:

public String match(Animal animal,OO<? extends Animal,String,null>... cases)

(the null part there specifies the exception types that may be thrown - in a function type you can omit that)

We can't actually call that method without getting at least a warning from the compiler, thanks to varargs using arrays instead of generics. I'm sure that must have made sense to someone at the time. If I convert it to List<OO<? extends Animal,String,null>> cases, I still have a problem. Inside match, I don't know what type ? is, so I can't do anything with it. Next trick then:

class Case<T>
{
    public final Class<T> clazz;
    public final {T=>String} function;

    public Case(Class<T> clazz,{T=>String} function)
    {
        this.clazz=clazz;
        this.function=function;
    }
}
Now I could have List<Case<? extends Animal>> as the parameter type. Here's some sample usage then:
System.out.println(match(animal,
                         new ArrayList<Case<? extends Animal>>(){{
                             add(new Case<Cat>(Cat.class,{Cat c=>"meow"}));
                             add(new Case<Dog>(Dog.class,{Dog d=>"woof"}));
                             add(new Case<Horse>(Horse.class,{h=>"neigh"}));
                         }}));
Note that I'm taking advantage of a little trick for creating lists inline by creating an anonymous subclass of ArrayList and providing a static initialiser. Verbose? Yes. Ugly? Yes. But enough about me. That code will be gone in a moment.

Look at each line. I've got Cat three times in the same line. Ugh. In fact if I make match generic, so that it can return something other than String, then I'll have to add a type parameter to Case for that, and then those lines become even worse. There must be a better way.

I can use a fluent interface/embedded DSL, or, erm, in real words, "readable code" by making match(animal) return a matcher, and putting a method on it, add, that has a type parameter, U, and takes a Class<U> and a U=>String. In fact, let's go the whole generic hog and make it take a Class<U> and a U=>R, where R is the return type we want. When we've finished adding cases, we call done(), which returns R.

Usage:

System.out.println(match(animal)
        .add(Cat.class,{Cat c=>"meow"})
        .add(Dog.class,{Dog d=>"woof"})
        .add(Horse.class,{Horse h=>"neigh"})
        .done());
One step at a time then. match is a simple wrapper method around a type called Matcher, that lets us avoid writing new Matcher<Animal,String>:
public static <T,R> Matcher<T,R> match(T t)
{
    return new Matcher<T,R>(t);
}
Matcher is a class that stores the T passed into its constructor, and an R to return, which is initially null. It has a method, add, that has a type parameter, U, which extends T. Having U extends T guarantees I don't add an invalid case, such as one that tests whether an animal is a Car.

class Matcher<T,R>
{
    public final T t;
    public R r;

    public Matcher(T t)
    {
        this.t=t;
    }

    public <U extends T> Matcher<T,R> add(Class<U> aCase,{U=>R} f)
    {
        if (aCase.isInstance(t))
            r=f.invoke(aCase.cast(t));

        return this;
    }

    public R done()
    {
        return r;
    }
}
The type parameter U is declared to extend T so that you cannot add a case that is impossible, e.g., .add(Car.class,{Car c=>"beep"}) (which makes no sense unless a Car is an Animal).

Get this code, as one compilable/runnable file.

9 comments:

Eugene Kuleshov said...

My first thought after looking at Matcher class was "is that really Java?" :-)

BTW, instead of new ArrayList() {{ }} you can use Arrays.asList() and take advantage of vararg params.

Also, you still have type specified two times in there, but type is already available in the closure class, so you don't really need to wrap that into the type token (aka "Neal gadget"). You can get type using something like closure.getClass().getDeclaredMethods()[0].getParameterTypes() but it would be nice to have better reflection api.

Ricky Clarkson said...

Eugene,

I'd love to use Arrays.asList, but then I get warnings, because function types are generic types.

Specifically, the following gives a warning:

asList({int x=>x*x},{int x=>x*2});

On your second point, I didn't realise that such generic type parameters were available for reflection. It only works that way where you have the type literally in the source, and not in the case where the type parameter you're passing is itself a type parameter. Thanks for that.

I don't think I'll change this blog post to reflect that, at least not right now, but you certainly made me think!

Unknown said...

Don't you have an erasure issue with closure.getClass().getDeclaredMethods()[0].getParameterTypes() ?
Did not test it, just a thought.

Ricky Clarkson said...

Frederic,

No, the parameter type is present at runtime if it was present in source. In other words:

{String s=>s.hashCode()}.getClass()..etc. gives String.class, but:

{T t=>t.hashCode()}.getClass()..etc. will give Object.class (or whatever the upper bound of T is), I think.

I've tested the first one, and found it very useful, but not tested the second.

Eugene Kuleshov said...

Ricky, isn't second example going to be compiled into a concrete class? If so, compiler will have to create a bridge method to handle covariant types. Yes, there will be second invoke() method with erased type info, but there will be also one with concrete types (my example assumed it is actually first one declared, as created by compiler).

Anonymous said...

if (animal instanceof Car) pass compile-time checks, but it does

Will this compile? I thought that instanceof arguments had to be in the same inheritance tree to be compilable...

Anonymous said...

Looks like Java is becoming the next playfield for intellectual masturbation with those C++ like yucky templates. Seriously, there are better languages like Scala. Why revive something that won't work anyways.

Ricky Clarkson said...

Eugene,

I'm afraid I don't understand much of your comment. All closures are compiled into anonymous classes, at least in the current implementation.

Generics can handle covariant types:

{Integer=>String} blob; gets compiled into:

OO<? extends Integer,? super String>, with no bridge method.

Anonymous1, about 'animal instanceof Car',

You're correct, but all it takes is for Car to be an interface instead of a class, and the compiler can't prove that instanceof will always fail. It can always show that Car is not a subtype of Animal, however.

Anonymous2, your comment belongs firmly in 2005, and even then, is pretty misinformed. Scala has very similar generics to Java, very similar closures to this proposal. If you look at the credits for BGGA, Martin Odersky, Scala's creator, is in there.

Java 5's generics are only very superficially like C++ templates - there's no code generation. If all you want is superficiality, then yes, "Java is becoming the next playfield for intellectual masturbation" is a great comment.

Eugene Kuleshov said...

Ricky, OO is an interface from the the javax.lang.function package and you are right that it doesn't have bridge method. However, there is another anonymous class created for the closure instance that will implement OO interface, and that class is the one with bridge method. You can try this:

public static void main(String[] args) {
  foo({Integer i => "3"});
}

static void foo({Integer=>String} o) {
  Class c = o.getClass();
  Method[] mm = c.getDeclaredMethods();
  System.err.println(Arrays.asList(mm));
}

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.