General musings on programming languages, and Java.

Wednesday, October 10, 2007

Why Java Needs Closures (It Already Has Them)

Some bloggers and people I've talked with think of closures in Java as something unnecessary - e.g., "why must we copy C#?". As it happens, most languages have closures, including Java.

A closure is an executable block of code that can refer to free variables from the enclosing scope. Here's an example in Java:


public void example()
{
        final double use=Math.random()*10000;

        SwingUtilities.invokeLater(new Runnable()
        {
                public void run()
                {
                        System.out.println(use);
                }
        });
}
It's not a great example of how closures are useful. The instance of that anonymous class can reference the variable 'use'. In fact, there's a leak in the implementation that gets into the language - 'use' has to be final. That doesn't stop the anonymous class from being a closure. Let's consider how life would be if Java didn't have closures:
public void example()
{
        double use=Math.random()*1000;

        class MyRunnable implements Runnable
        {
                private double use;

                public MyRunnable(double use)
                {
                        this.use=use;
                }

                public void run()
                {
                        System.out.println(use);
                }
        }

        SwingUtilities.invokeLater(new MyRunnable(use));
}
In fact, that's not far off what the first code sample gets compiled to. In Java 1.0, inner/nested/local/anonymous classes didn't exist, so you'd have to take the above MyRunnable, and put it in a separate file, but the code would be the same otherwise. If you were going to complain about Java getting closures, Java 1.1 was the time to do it!

The above two code samples work in today's Java, and some of you might even favour the second, because it's more explicit - it matches the bytecode more closely.

The way I think is that you should, at least as a thought experiment, take anything like "favour explicit code over abstract code" to an extreme, to test it out. The most explicit you can be in programming is to write assembly code, and none of us wants to do that. Let's flip it around. The most abstract you can be in programming is to write in Lisp, and none of us wants to do that. Both of those statements have holes in them, of course, there are some people who love writing assembly, and there are some who love writing in Lisp. I'm one of the latter.

As it turns out, abstraction is so prevalent in programming that most of us program in languages that are closer to Lisp than they are to assembly. C is the lowest-level programming language that I know, and even it has huge abstractions over the underlying assembly.

Java programmers are already programming on an abstraction, the Java Virtual Machine, before they even worry about syntax. It seems that we already favour abstract code over explicit code. Abstract doesn't mean imprecise, or vague, it's another way of saying "general". We can use abstractions to stop repetition.

Consider the above code samples as templates of some kind:

        final $TYPE $VAR=$VAL;

        SwingUtilities.invokeLater(new Runnable()
        {
                public void run()
                {
                        $ACTION($VAR);
                }
        });
and
        $TYPE $VAR=$VAL;

        class $BLAH implements Runnable
        {
                private $TYPE $VAR;

                public MyRunnable($TYPE $VAR)
                {
                        this.$VAR=$VAR;
                }

                public void run()
                {
                        $ACTION($VAR);
                }
        }

        SwingUtilities.invokeLater(new $BLAH($VAR));
The first template has 4 parameters, $TYPE, $VAR, $VAL, $ACTION. It mentions $VAR twice, the others once each.

The second has 5 parameters, $TYPE, $VAR, $VAL, $ACTION, $BLAH. It mentions $TYPE 3 times, $VAR 6 times, $VAL once and $ACTION once. We're always taught not to repeat ourselves in programming, and we can easily see that the second template is more repetitive than the first. A less well-known rule is to avoid unnecessary names. MyRunnable is a name that's defined once and used once - a bad sign.

Extra repetition means extra scope for errors. You might change 'use' between calling invokeLater and invokeLater actually happening - now you've got a sync problem. This is because you had to copy your variables yourself. Admittedly, thanks to the 'final' restriction, Java doesn't help much there, but at least it stops broken code from compiling.

Let's briefly return to the restriction that the local variables that anonymous classes capture must be final. Does that stop anonymous classes from being closures? Changing the values of variables is kind of an optional feature in programming languages. Mathematics seems to have managed without x++ for many years. Java's anonymous classes place some restrictions on what kind of variables are considered free, but that doesn't stop them from being closures. Haskell definitely has closures, and definitely doesn't have mutable variables.

Ok, with all that out of the way, we now know that Java has closures, and they aren't disappearing anytime soon. Let's take a moment to laugh at the blog posts (actually most of them are anonymous comments on blogs) saying that Java doesn't need closures.

There. Now let's briefly examine why Java needs better closures, by returning to our template. This time we're going to read it with boilerplate glasses:

        boilerplate double use=Math.random()*10000;
        SwingUtilities.invokeLater(boilerplate System.out.println(use));
What we'd like to do now is to make that boilerplate disappear. Scala has an interesting way of doing that:
        val use=Math.random*10000
        invokeLater(System.out.println(use))
Scala methods can have lazy parameters, that is, parameters that are not evaluated at call time, but are evaluated when the method wants to. You can use that to write your own if method, e.g.:
        myIf(Math.random<5,System.out.println("Still here"),System.exit(0))
Of course, myIf is just an interesting result, and not actually useful, but in the case of invokeLater it is useful; it gets rid of a lot of our boilerplate.

Java 7 closures let us do pretty much the same thing:

        double use=Math.random()*10000;

        SwingUtilities.invokeLater({=> System.out.println(use));
There is another syntax for the last line:
        SwingUtilities.invokeLater()
        {
                System.out.println(use);
        }
We still have a little boilerplate with both these syntaxes, but not enough to need the boilerplate glasses. Sadly, despite it being technically possible for IDEs to become boilerplate glasses, they haven't done so. Folding an anonymous class in IDEA looks like:
        new Runnable(){...}
They've folded the wrong thing. Duh. It should look more like:
        new ...{System.out.println(use);}
Some people think that Java doesn't need better closures because Java as a language is already broken in many ways. Often Scala is quoted as the next Java. I've tried Scala out - it's very impressive, and similar enough to Java to not have too many surprises. So in one sense I agree, but I also think that Java should have better closures to move the status quo of programmers up a notch. Programmers should consciously favour abstraction over boilerplate, instead of the current situation, where many use abstractions all the time but are reluctant to invent their own.

Being free of boilerplate lets you think in different ways. My post on point-free programming shows one route you could take - there are many others.

10 comments:

Stephan said...

My opinion for some time is the same as yours: IDEs in Java should filter away line noise like annonymous inner classes. And when we're at it, the should filter types and annotations away and just show them, when I need them :-)

Peace
-stephan

--
Stephan Schmidt :: stephan@reposita.org
Reposita Open Source - Monitor your software development
http://www.reposita.org
Blog at http://stephan.reposita.org - No signal. No noise.

Casper Bang said...

Ehhh... so according to you even anonymous inner classes should be written as how the bytecode is emulating it? That will make for some rather long and complicated parameter lists.

Anyway, when people refer to closures they usually have more powerful thinking in mind (continuations, lambda expressions etc.) than the rather loose definition you're adhering to.

Not really sure what message you are trying to get across, I suspect it might be "why must we copy C#?" as you mention yourself on the very first line.

Howard L said...

I agree with your sentiment that Java already has inner classes (closures) and they are really useful. I don't see why we need an extra construct, why not build upon the existing construct:

1. You have to declare the free variable final - remove that restriction

2. The syntax is long - change the syntax

There are many advantages of inner classes that you haven't touched on:

1. You can inherit implementation via extending other classes

2. You can have fields

3. You can have more than one method

4. You can call other inherited method or other methods defined within itself

5. You can have static data

In summary it is a class just like everything else - rather than something that is sort of like a class but has a number of annoying restrictions.

A BGGA style closure is a strict sub-set of an inner class - why take away useful features?

Bruce Chapman said...

"A BGGA style closure is a strict sub-set of an inner class"


Inner classes don't close over the meaning of break, continue, return etc like the BGGA proposal does.

Howard L said...

The BGGA proposal in places closes over break, continue, and return and in other plaaces doesn't! For example there are two types of return that are distinguished by one ending in a ; and the other not having the ;. In my C3S proposal:

http://www.artima.com/weblogs/viewpost.jsp?thread=182412

I suggest the following to clearly distinguish use cases:

boolean find( IntList list, int required ) {
__list.each( method( value ) { if ( value == required ) find.return true } );
__return false
}

Note how return is named to indicate what it is returning from. Similarly, break and continue.

But the more important point is that if you want the facility to close over break, etc. then add it to inner classes. Don't invent another construct that misses some features out and adds in others. Particularly if the utility of these new features, like closing over break, is of dubiuous value.

Ricky Clarkson said...

casper bang said:

"so according to you even anonymous inner classes should be written as how the bytecode is emulating it?"

No, absolutely not. That would merely be the situation if we didn't have closures already in Java.

"Anyway, when people refer to closures they usually have more powerful thinking in mind"

Agreed, there are multiple meanings for the word 'closure'. Lambda expression is the main other meaning.

The message I'm getting across is that we already have closures, but that they could use some improvement. I think you could only think my opinion is "why must we copy C#?" if you didn't read the words.

howard l said:

"I don't see why we need an extra construct, why not build upon the existing construct:"

BGGA closures do build upon the existing construct, at least in implementation terms, as far as we can speak of an unimplemented spec.

"There are many advantages of inner classes that you haven't touched on"

I don't doubt that inner classes will be useful alongside BGGA closures, though I personally think aiming toward lambda expressions in general is a good thing. I've written some Scala code recently that uses both inner classes and BGGA-ish closures side-by-side.

"A BGGA style closure is a strict sub-set of an inner class - why take away useful features?"

Nothing is being removed. A new syntax is being proposed. It does some things better, but will not replace all inner class usage.

"The BGGA proposal in places closes over break, continue, and return and in other plaaces doesn't! For example there are two types of return that are distinguished by one ending in a ; and the other not having the ;"

It seems more like there are two types - explicit returns with the return keyword, and last-line semi-colon return. I don't think 5; will be a valid return statement anywhere.

Howard, your find.return does not nest. If you have a closure within a closure, and the inner wants to return from the outer, there is no way to disambiguate that, as the outer has no name. That's what Neal was talking about wrt Tennent's Correspondence Principle.

"But the more important point is that if you want the facility to close over break, etc. then add it to inner classes."

Inner classes consist mainly of methods, they're not intended to be blocks of code. They already have semantics for break and continue - I think the principle of least astonishment suggests not adding new semantics to break in existing code (albeit code that currently would not compile). return is even worse to overload. When adding new syntax, such as BGGA closures, you are more free to define new semantics (in this case the semantics of the enclosing code are being retained, a good thing).

I may have been too harsh in this comment, no time to review it before getting on a bus!

Howard L said...

@Ricky,

> Nothing is being removed. A new
> syntax is being proposed. It does
> some things better, but will not
> replace all inner class usage.

The main thing it does better is to shorten the syntax. So why not shorten the syntax of inner classes and not loose really useful stuff that inner classes do and closures don't. That way you don't have to debate whether a closure or an inner class should be used.

> Howard, your find.return does not
> nest. If you have a closure
> within a closure, and the inner
> wants to return from the outer,
> there is no way to disambiguate
> that, as the outer has no name.

No everything has a name, since it is an inner class and all the methods have a name. You may not have had to type the name, i.e. the name may be inferred, but it does have a name.

With BGGA things are confusing, given:

static int apply({int=>int}, int x) { return f.invoke(x); }

what do these do:

{int=>int} f1 = {int x=>x * 42};

{int=>int} f2 = {int x=>x * 42;};

{int=>int} f3 = {int x=>return x * 42};

{int=>int} f4 = {int x=>return x * 42;};

when used like this:

int test() {return apply(fx, 1);} // fx means for all the fs

and like this:

{int=>int} fTemp = {int x=>apply(fx, x)}; // fx means one for each f
int test() {return apply(fTemp, 1);}

It isn't clear is it? Two of them are syntax errors and they look a lot like the ones that aren't! And the two that aren't syntax errors behave differently!!

Ricky Clarkson said...

"The main thing it does better is to shorten the syntax"

In the case where only one method is being implemented, yes. Otherwise, an inner class is not actually bad, syntactically.

"No everything has a name, since it is an inner class and all the methods have a name"

So in your proposal, the interface to implement is inferred, but the method name is still present in source. That could be inferred too - I don't think it makes sense to force the name to be present when most use cases won't need the name.

Also, it still doesn't nest, in the case that the inner closure and the outer closure are implementing the same name of method.

Wrt your examples of BGGA syntax, the ones that are syntax errors jumped out at me quite quickly.. and I don't think it's too bad that invalid code looks similar to valid code. The arguably difficult points are where two valid pieces of code with differing semantics look identical. For example, taken from virtually any beginner Java programmer:

if (x)
{
...stuff
}

and

if (x);
{
...stuff
}

Anonymous said...

@Ricky,

> So in your proposal, the
> interface to implement is
> inferred, but the method name is
> still present in source. That
> could be inferred too - I don't
> think it makes sense to force the
> name to be present when most use
> cases won't need the name.

No both are inferred for the simple case of a single abstract method

> Also, it still doesn't nest, in
> the case that the inner closure
> and the outer closure are
> implementing the same name of
> method.

You can nest by multiple qualification, just like you can say super.super.name(). EG:

interface Method1<R, A> { R call( A a ); }

interface MyList<E> {
__MyList<E> filter( Method1<Boolean, E> predicate );
__...
}

...
MyList<MyList<Integer>> list = …;
list.filter method (l) {
__l.filter method (e) { call.call.return true }
};


I can’t see how you could do the example above in BGGA, which is not to say you can’t (I find the proposal obscure in places). Can you show the equivalent BGGA syntax?

RE the obscurity of the return syntax I am not alone in disliking it, EG:

http://crazybob.org/2007/02/will-closures-carry-as-much-complexity.html

and

http://weblogs.java.net/blog/cayhorstmann/archive/2007/04/whats_so_taxing.html

sbwoodside said...

You can also get return values right now in java closures.

Closures with return values in Java

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.