Ricky's technical blog: October 2007

Wednesday, October 31, 2007

Java 7 Example - Pattern Matching

One way you can measure a programming language is by taking a feature, either one that it has, or not, and seeing how close you can get to implementing that feature without writing your own parser, etc. Let's take a look at (and copy!) one of Scala's (and many other languages') features; pattern matching:

animal match {
    case Dog() => "woof"
    case Cat() => "meow"
}

It's very similar to a switch statement, and the above really doesn't show how flexible pattern matching is, but it will suffice for now. The usual way of writing the above code in Java would be with some nested instanceof checks and casts, or by adding a speak() method to the Animal type hierarchy.

It's not always practical to add methods to types directly in Java.. e.g., if you're dealing with classes you don't control. And instanceof is fallible - you don't want to have: if (animal instanceof Car) pass compile-time checks, but it does. So let's see how close we can get to the above pattern matching using the Java 7 prototype. I hope you can do better than me!

Here's what I'd ideally want in Java:

Animal animal=Math.random()<0.33 ? new Cat() : Math.random()<0.5 ? new Dog() : new Horse();

System.out.println(match(animal,
                        {Cat c=>"meow"},
                        {Dog d=>d.name+" says woof"},
                        {Horse h=>"neigh"}));

What type would match take as parameters? For now, let's imagine that we're always dealing with Animals, and always returning Strings.

public String match(Animal animal,{? extends Animal=>String}... cases)

That's not legal - {? is an "illegal start of type" according to the compiler. Neal said that because the things on the left hand side of a closure automatically have ? super in their types, I can't also have ? extends. If I replace it with the interface that javac generates it compiles fine:

public String match(Animal animal,OO<? extends Animal,String,null>... cases)

(the null part there specifies the exception types that may be thrown - in a function type you can omit that)

We can't actually call that method without getting at least a warning from the compiler, thanks to varargs using arrays instead of generics. I'm sure that must have made sense to someone at the time. If I convert it to List<OO<? extends Animal,String,null>> cases, I still have a problem. Inside match, I don't know what type ? is, so I can't do anything with it. Next trick then:

class Case<T>
{
    public final Class<T> clazz;
    public final {T=>String} function;

    public Case(Class<T> clazz,{T=>String} function)
    {
        this.clazz=clazz;
        this.function=function;
    }
}

Now I could have List<Case<? extends Animal>> as the parameter type. Here's some sample usage then:

System.out.println(match(animal,
                         new ArrayList<Case<? extends Animal>>(){{
                             add(new Case<Cat>(Cat.class,{Cat c=>"meow"}));
                             add(new Case<Dog>(Dog.class,{Dog d=>"woof"}));
                             add(new Case<Horse>(Horse.class,{h=>"neigh"}));
                         }}));

Note that I'm taking advantage of a little trick for creating lists inline by creating an anonymous subclass of ArrayList and providing a static initialiser. Verbose? Yes. Ugly? Yes. But enough about me. That code will be gone in a moment.

Look at each line. I've got Cat three times in the same line. Ugh. In fact if I make match generic, so that it can return something other than String, then I'll have to add a type parameter to Case for that, and then those lines become even worse. There must be a better way.

I can use a fluent interface/embedded DSL, or, erm, in real words, "readable code" by making match(animal) return a matcher, and putting a method on it, add, that has a type parameter, U, and takes a Class<U> and a U=>String. In fact, let's go the whole generic hog and make it take a Class<U> and a U=>R, where R is the return type we want. When we've finished adding cases, we call done(), which returns R.

Usage:

System.out.println(match(animal)
        .add(Cat.class,{Cat c=>"meow"})
        .add(Dog.class,{Dog d=>"woof"})
        .add(Horse.class,{Horse h=>"neigh"})
        .done());

One step at a time then. match is a simple wrapper method around a type called Matcher, that lets us avoid writing new Matcher<Animal,String>:

public static <T,R> Matcher<T,R> match(T t)
{
    return new Matcher<T,R>(t);
}

Matcher is a class that stores the T passed into its constructor, and an R to return, which is initially null. It has a method, add, that has a type parameter, U, which extends T. Having U extends T guarantees I don't add an invalid case, such as one that tests whether an animal is a Car.


class Matcher<T,R>
{
    public final T t;
    public R r;

    public Matcher(T t)
    {
        this.t=t;
    }

    public <U extends T> Matcher<T,R> add(Class<U> aCase,{U=>R} f)
    {
        if (aCase.isInstance(t))
            r=f.invoke(aCase.cast(t));

        return this;
    }

    public R done()
    {
        return r;
    }
}

The type parameter U is declared to extend T so that you cannot add a case that is impossible, e.g., .add(Car.class,{Car c=>"beep"}) (which makes no sense unless a Car is an Animal).

Get this code, as one compilable/runnable file.

Tuesday, October 30, 2007

Java 7 Prototype - Surprises and Expectations

In my first week of testing the prototype, I found quite a lot of bugs, and some more subjective defects.

Neal Gafter seems amazingly fast at fixing bugs, so there's probably not much point reporting any bugs on this blog. I'll run through some surprises instead. If you want to see the bugs I found, take a look in the test/ directory in the prototype. Neal put some of my bug reports in as Clarkson2.java etc. It looks like he uses some automated testing on those, so if he breaks one he should know before I report it again!

If there's anything in this code that you don't understand, just ask. I mainly wrote it to test the prototype, so I wasn't overly concerned with writing good code. Most of these code samples are from bigger programs that I chopped down whenever I found a bug.

And now to the meat.

1. Definite assignment rules make it difficult to write recursive closures.

(1)
{int=>int} factorial={int x=>x<2 ? x : x*factorial.invoke(x-1)};

The above only works if factorial is a field, not a local variable, because factorial is not 'definitely assigned' on the right hand side of the =. I've been playing with Scala a bit recently (which has given me some good fuel for testing this prototype with), and it doesn't have this restriction. After a conversation with David MacIver, a Scala programmer rather than a dabbler like myself, we came to the conclusion that it's possible to cause a NullPointerException in similar code to the above in Scala.

Lo and behold, Neal replied with much the same argument - if the definite assignment rules were relaxed, the following code could fail, or even do undefined things:

(2) {int=>int} factorial=f({int x=>x<2 ? x : x*factorial.invoke(x-1)});

If f() invoked the closure, then factorial would be accessed before it had been assigned a value. In Scala that is guaranteed to give a NullPointerException, because all variables are assigned to null/0/false before being assigned their real value. In Java, the verifier would not load the code, because it would possibly use values that had been in memory before.

However, for the actual case above (not (2), (1)), it's easy to see that the closure is not invoked before factorial has been assigned a value. I think this idiom is useful enough to make a special case for, but then I'm not involved in writing either the code or the specification so I can't really estimate that task.

The obvious workaround would be:

{int=>int} factorial=null; factorial={int x=>x<2 ? x : x*factorial.invoke(x-1)};

..except that local variable capture is not supported yet. The current workaround is to make factorial a field instead of a local variable.

If I ever get my head around combinators properly I think I'll have a better workaround.

2. Java 5's foreach statement cannot take a closure.

This code fails:

for (final String s2: {=>asList(args).iterator()});

This code works:

Iterable it={=>asList(args).iterator()}; for (final String s2: it);

The only difference is that I explicitly made closure conversion happen. Typecasting would not fix it (ClassCastException). This is apparently as specified, as a foreach is only specified to take an Iterable or an array. It's not a huge issue, and I can't think of any use cases, but it's still a surprise.

3. Autoboxing doesn't apply to function types.

{Integer=>Integer} f={int x=>x*2}; is a compile error.
{int=>int} g={Integer x=>x*2}; is a compile error. This doesn't look insane on its own, but because generic type parameters cannot be primitive types, it has repercussions. I can't write a generic map function and use it like this:

Iterable<Integer> mapped=map(asList(1,5,11,26),{int x=>x*x});

I have to use the more verbose and more indirect-looking {Integer x=>x*x}.

There's no way of writing a single generic method that converts a primitive function to a wrapper function, again because generic type parameters cannot be primitive types. I could write 8 individual methods, e.g.:

public static {Integer=>Integer} lift({int=>int} f)
{
 return {Integer i=>f.invoke(i)};
}

etc.

However, there's again no way of generalising those to make them work with {int,int=>int}, etc. There is no subtyping relationship between different function types, except the covariance and contravariance rules, quickly demonstrated here:

{Number=>Number} f={Object o=>5};

The above compiles because any Number input is obviously also an Object, and because the output, 5, is obviously a Number (autoboxing taken for granted).

The lack of autoboxing also means that sometimes you'll have to explicitly supply the generic types. Here's an example:

class Main
{
        public static void main(String[] args)
        {
                int i=id().invoke(5);
        }

        public static <T> {T=>T} id()
        {
                return {T t=>t};
        }
}

This code won't compile, because the compiler judges the <T> in the call to id() from main to be int, which cannot be a type parameter. Neal says this isn't a bug. This version compiles:

class Main
{
        public static void main(String[] args)
        {
                int i=Main.<Integer>id().invoke(5);
        }

        public static <T> {T=>T} id()
        {
                return {T t=>t};
        }
}

I think this is a bug; it seems that inference should pick Integer instead of int in the first case.

Another implication of the lack of autoboxing is that when the closure invocation syntax arrives, that Neal promised would let you reimplement the foreach loop yourself as a method, you won't be able to. I expect this will work, if you write your own foreach method:

foreach(Integer i: asList(3,4,5))
 System.out.println(i*i);

but this won't:

foreach(int i: asList(3,4,5))
 System.out.println(i*i);

4. Closures cannot implement generic methods.

This one is much easier to describe in code than words. Given:

interface Identity
{
 <T> T id(T t);
}

..it is impossible to implement this with a closure. I think Haskell allows something like this. Scala doesn't. A use for this would be in implementing some kind of Either type, where an Either<X,Y> may hold an X or a Y, and you don't want to actually have to test which it is, just call a method in Either, supplying a function to run if it holds an X, and a function to run if it holds a Y. The return type of that method should be generic.

Stay tuned for some examples that actually do work with the prototype, at least one of which is actually interesting!

Monday, October 22, 2007

Life in the old Java yet - Closures Prototype

I get home from eating some overpriced underportioned seafood, and drinking a few stouts, phone my girlfriend and check my email. Nothing out of the ordinary, nothing interesting.. but wait! An email from Neal Gafter with a closures prototype! Wow!

So I download it, follow the instructions and it fails. Hmm. I'll try on Linux. javac -version works. I write my first working Java program with closures. I write my second! I mention it in an IRC channel or two, excitedly! I remember that I'm still on the phone and have no idea what she said for the past 5 minutes, so I lower my laptop lid. I get off the phone as quickly as possible, which is to say we talk for another half hour, which is to say she talks for another half hour.

An hour or two after that I send Neal a list of about 5 or 6 problems with the prototype, but go to bed happy, especially having drunk a Spitfire. A beer, not a WWII aeroplane.

First thing in the morning, he's fixed the problems, so I download again. The same problems are there. Hmm. Oh, right, the filename included the date. What's today's date (it was 2am today that I sent the emails)? 2007-10-22. But Neal's in America, and they're backward, so 2007-10-21. Ok, got the right file now. Still fails in Windows, and at least one of my reported failures still fails but for a different reason. I start playing again, and some more code fails for the same reason as the first, so I give up for now.

There's much more fun in breaking a prototype compiler than a production one!

Ok, new day, and I'm at my girlfriend's house for the evening. Neal's just sent another prototype, and it still doesn't work in Cygwin. I can't get my laptop online to connect to Eugene Ciurana's Linux box that I was testing on, so I start looking at the bin/javac script. I see and fix the not-on-Windows problem. It's passing a full path to lib/javac.jar to javac, and that includes /cygdrive/c, Cygwin paths, which Java doesn't know about. I find a fresh batch of bugs and surprises. After finding the same bug twice from unrelated pieces of code (and getting pestered to watch a film) I stop trying. Neal confirms the bugs, but nothing new arrives the next day.

Over the next few days I find more and more bugs, and Neal fixes them all. There are a few bugs that turn out to be features; they're actually in the spec that way.

Now that I have a fresh set of bugs waiting to be emailed to Neal, he's just published the closures prototype. You can find it over at his blog.

I'll shortly publish some surprises from playing with the prototype, then some examples of what you can (and can't) do with it. In a less commentative style..

Tuesday, October 16, 2007

IDEA 7 - A Release Too Early?

I downloaded the new IDEA 7 release today, and pondered buying the upgrade licence, but went with the 30 day trial first. Here's a little video I made of an editing bug that I noticed immediately.

(If anybody can figure out how to get a .swf into a blogger page, please tell me how. Blogger just mangled any HTML I used to do it.)

Another niggle is that, including the milestone releases, the inspections that use @Nullable and @NotNull seem not to be present, and with no mention of why. Hopefully they're now just in some plugin that I don't happen to have installed. That's a bit of a show-stopper for me, I changed my coding style to use those annotations. I'd have to convert to FindBugs' equivalent to continue using that coding style.

Wednesday, October 10, 2007

Why Java Needs Closures (It Already Has Them)

Some bloggers and people I've talked with think of closures in Java as something unnecessary - e.g., "why must we copy C#?". As it happens, most languages have closures, including Java.

A closure is an executable block of code that can refer to free variables from the enclosing scope. Here's an example in Java:


public void example()
{
        final double use=Math.random()*10000;

        SwingUtilities.invokeLater(new Runnable()
        {
                public void run()
                {
                        System.out.println(use);
                }
        });
}

It's not a great example of how closures are useful. The instance of that anonymous class can reference the variable 'use'. In fact, there's a leak in the implementation that gets into the language - 'use' has to be final. That doesn't stop the anonymous class from being a closure. Let's consider how life would be if Java didn't have closures:

public void example()
{
        double use=Math.random()*1000;

        class MyRunnable implements Runnable
        {
                private double use;

                public MyRunnable(double use)
                {
                        this.use=use;
                }

                public void run()
                {
                        System.out.println(use);
                }
        }

        SwingUtilities.invokeLater(new MyRunnable(use));
}

In fact, that's not far off what the first code sample gets compiled to. In Java 1.0, inner/nested/local/anonymous classes didn't exist, so you'd have to take the above MyRunnable, and put it in a separate file, but the code would be the same otherwise. If you were going to complain about Java getting closures, Java 1.1 was the time to do it!

The above two code samples work in today's Java, and some of you might even favour the second, because it's more explicit - it matches the bytecode more closely.

The way I think is that you should, at least as a thought experiment, take anything like "favour explicit code over abstract code" to an extreme, to test it out. The most explicit you can be in programming is to write assembly code, and none of us wants to do that. Let's flip it around. The most abstract you can be in programming is to write in Lisp, and none of us wants to do that. Both of those statements have holes in them, of course, there are some people who love writing assembly, and there are some who love writing in Lisp. I'm one of the latter.

As it turns out, abstraction is so prevalent in programming that most of us program in languages that are closer to Lisp than they are to assembly. C is the lowest-level programming language that I know, and even it has huge abstractions over the underlying assembly.

Java programmers are already programming on an abstraction, the Java Virtual Machine, before they even worry about syntax. It seems that we already favour abstract code over explicit code. Abstract doesn't mean imprecise, or vague, it's another way of saying "general". We can use abstractions to stop repetition.

Consider the above code samples as templates of some kind:

        final $TYPE $VAR=$VAL;

        SwingUtilities.invokeLater(new Runnable()
        {
                public void run()
                {
                        $ACTION($VAR);
                }
        });

and

        $TYPE $VAR=$VAL;

        class $BLAH implements Runnable
        {
                private $TYPE $VAR;

                public MyRunnable($TYPE $VAR)
                {
                        this.$VAR=$VAR;
                }

                public void run()
                {
                        $ACTION($VAR);
                }
        }

        SwingUtilities.invokeLater(new $BLAH($VAR));

The first template has 4 parameters, $TYPE, $VAR, $VAL, $ACTION. It mentions $VAR twice, the others once each.

The second has 5 parameters, $TYPE, $VAR, $VAL, $ACTION, $BLAH. It mentions $TYPE 3 times, $VAR 6 times, $VAL once and $ACTION once. We're always taught not to repeat ourselves in programming, and we can easily see that the second template is more repetitive than the first. A less well-known rule is to avoid unnecessary names. MyRunnable is a name that's defined once and used once - a bad sign.

Extra repetition means extra scope for errors. You might change 'use' between calling invokeLater and invokeLater actually happening - now you've got a sync problem. This is because you had to copy your variables yourself. Admittedly, thanks to the 'final' restriction, Java doesn't help much there, but at least it stops broken code from compiling.

Let's briefly return to the restriction that the local variables that anonymous classes capture must be final. Does that stop anonymous classes from being closures? Changing the values of variables is kind of an optional feature in programming languages. Mathematics seems to have managed without x++ for many years. Java's anonymous classes place some restrictions on what kind of variables are considered free, but that doesn't stop them from being closures. Haskell definitely has closures, and definitely doesn't have mutable variables.

Ok, with all that out of the way, we now know that Java has closures, and they aren't disappearing anytime soon. Let's take a moment to laugh at the blog posts (actually most of them are anonymous comments on blogs) saying that Java doesn't need closures.

There. Now let's briefly examine why Java needs better closures, by returning to our template. This time we're going to read it with boilerplate glasses:

        boilerplate double use=Math.random()*10000;
        SwingUtilities.invokeLater(boilerplate System.out.println(use));

What we'd like to do now is to make that boilerplate disappear. Scala has an interesting way of doing that:

        val use=Math.random*10000
        invokeLater(System.out.println(use))

Scala methods can have lazy parameters, that is, parameters that are not evaluated at call time, but are evaluated when the method wants to. You can use that to write your own if method, e.g.:

        myIf(Math.random<5,System.out.println("Still here"),System.exit(0))

Of course, myIf is just an interesting result, and not actually useful, but in the case of invokeLater it is useful; it gets rid of a lot of our boilerplate.

Java 7 closures let us do pretty much the same thing:

        double use=Math.random()*10000;

        SwingUtilities.invokeLater({=> System.out.println(use));

There is another syntax for the last line:

        SwingUtilities.invokeLater()
        {
                System.out.println(use);
        }

We still have a little boilerplate with both these syntaxes, but not enough to need the boilerplate glasses. Sadly, despite it being technically possible for IDEs to become boilerplate glasses, they haven't done so. Folding an anonymous class in IDEA looks like:

        new Runnable(){...}

They've folded the wrong thing. Duh. It should look more like:

        new ...{System.out.println(use);}

Some people think that Java doesn't need better closures because Java as a language is already broken in many ways. Often Scala is quoted as the next Java. I've tried Scala out - it's very impressive, and similar enough to Java to not have too many surprises. So in one sense I agree, but I also think that Java should have better closures to move the status quo of programmers up a notch. Programmers should consciously favour abstraction over boilerplate, instead of the current situation, where many use abstractions all the time but are reluctant to invent their own.

Being free of boilerplate lets you think in different ways. My post on point-free programming shows one route you could take - there are many others.

Ricky's technical blog