General musings on programming languages, and Java.

Wednesday, April 04, 2007

Is Josh Bloch the biggest problem for closures?

CORRECTION: Neal has since informed me that he, since the presentation that this post references, has learned that there is no such JCP rule. I'll still keep this post, because it makes some useful points (and so do the comments).

Josh Bloch, with "crazy" Bob Lee and Doug Lea, proposed their Concise Instance Creation Expression solution for closures for Java, last year.

They didn't address returning from the enclosing scope from within the closure. Theirs is a trivial syntax change, probably fairly straightforward to implement but it wouldn't address simple cases such as finding the first line in a file that matches an expression. If you're already familiar with why CICE is insufficient, feel free to skip past the code here.

Here's the psuedocode:


forEachLine ln in "/etc/passwd" do
   if ln contains "root"
      return ln
Here's the usual Java 5:

BufferedReader reader=null;
try
{
    reader=...;
    String line;
    while ((line=reader.readLine())!=null)
    {
        if (line.matches("root"))
            return line;
    }
}
finally
{
    if (reader!=null)
        try
        {
            reader.close();
        }
        catch (IOException exception)
        {
            log(exception); //but don't rethrow it
        }
}
So, of course, as the good programmers we are, we write an abstraction for that:

forEachLine("/etc/passwd",new LineProcessor()
{
    public void processLine(String line)
    {
        if (line.matches("root"))
            return line; //won't compile
    }
});
There's a problem there, the 'return line;' is returning from the run method, not the enclosing method. Ok, we can solve that, let's make LineProcessor return a String, which is normally null unless we want to end the loop. Then forEachLine can also return a String, which will be the return value of the method.

return forEachLine("/etc/passwd",new LineProcessor()
{
    public String processLine(String line)
    {
        return line.matches("root") ? line : null;
    }
});
This works, albeit with some strange semantics, which could be made more obvious by adding a class to hold the result, but that's not important for this post.

So forEachLine now has to know that the LineProcessor it runs might want to return early. When callers of forEachLine don't want to return early, they still have to return null all the time. It's inconvenient; to not punish callers that don't want to return early, we now need two versions of forEachLine, one that supports early return, and one that doesn't. That's confusing. The bad smell wafts up, making you either overload, or use names like "forEachLineWithAPossibleEarlyReturn".

What does CICE do for this?


return forEachLine("/etc/passwd",LineProcessor()
{
    return line.matches("root") ? line : null;
});
That's right. It removes a bit of punctuation and the 'new' keyword. Without underestimating that contribution, it doesn't actually help to make the end result resemble the psuedocode. It would be hard to get back from this end result into the psuedocode, there's a loss in translation. It isn't expressive.

The BGGA proposal allows this:


forEachLine(Line line: "/etc/passwd")
    if (line.matches("root"))
        return line;
Now just to remind you of the psuedocode again:

forEachLine ln in "/etc/passwd" do
   if ln contains "root"
      return ln
It doesn't need a special version of forEachLine for each use case. It's generally better, more expressive, with one wart that I'll mention at the end.

So why does this matter? Neal Gafter, the proposal lead, used to work for Sun, he's well respected, so he can easily make this into a JSR. Or not.

Neal Gafter now works for Google. All the work he's put into BGGA has been on his own time, it's not allowed to be his "20% project" (Google employees spend 20% of their paid time on ideas of their own). He is not the JCP's contact for Google. Surprise, surprise, Josh Bloch is, and Josh has his own more limited, less expressive, proposal, as I've demonstrated just now.

The JCP has a rule that disallows individuals from being on the JCP if their employer is on it. I felt sorry for Neal at the end of his presentation on closures at Google. Here's a quote from the last few minutes (emphasis from original):

From the audience: "Is someone planning on opening up a JSR on this?"

Neal: "There's a long answer and, I'm afraid, a longer answer. I was planning on writing a JSR for this. Sun Microsystems actually asked me to write a JSR proposal for this. But I can't do that, because I am not Google's JCP representative. And I cannot do it as an individual contributor to the Java Community Process, because as a Google employee I cannot be an individual contributor to the Java Community Process.

Google's JCP representative is Joshua Bloch, and he has other ideas about what should be done in the Java language in this space. As far as I know he is not currently planning on submitting a JSR on this. My hope is that creating a prototype, which by the way, I'm doing on my own time, will be something that Sun can use as a justification for Sun creating a JSR to do this into the language, because I think that's the only way this will happen in Java at this point."

Here's a little of Josh Bloch on closures:

"I like closures. I think they work very well in languages that are designed around them, like Smalltalk and Scheme and so forth. I think closures in java are a good thing. We've basically had them since 1.1 in the form of anonymous class instances and those are a bit clunky so I definitely see a place for improving support for closures; on the other hand some of the proposals that I've seen floating around are overly complex; they involve massive changes to the type system, things like function types and I'm severely worried about pushing the complexity of the language beyond the point where Joe Java can't use the language anymore.

If we add more support for closures I think it has to be in the spirit of the current support, which means that closures should take the form of instances, of types that have a single abstract method, whether they are interface types such as Runnable , or class types such as TimerTask. I think it would be nice if a better syntax in the current anonymous class syntax were adopted and if requirement with regards to Final variables were somehow made more tangible, which doesn't mean necessarily relaxed; I think it's actually good that you can not close over non Final variables, but it's a pain to actually mark them final, so if they automatically marked themselves final which would be nice."

His points are all good, except that function types aren't really a big overhaul of the type system - they will be resolved into interfaces. They're no bigger a conceptual problem than array types are.

Also, I think that if I could write a forEachLine in the BGGA style, I'd be more than happy for novices to use it.

The wart I mentioned is that Tennents' Correspondence Principle is violated by the BGGA when a closure throws a checked exception, and the interface that the closure conversion converts it to doesn't include any checked exceptions.

Neal proposes an extension to the generic type system to allow writing interfaces whose single method throws 0, 1 or many checked exceptions, but for those cases where the interface is already fixed, such as SwingUtilities.invokeAndWait(Runnable), the exceptions cannot be passed back to the caller. The code fails to compile. This is an inconsistency with the rest of the proposal. You can return from within a Runnable closure, but you can't throw a checked exception from within it.

I think that it would be possible to achieve exception transparency without needing anything special on the interface, as checked exceptions are purely a compile-time concept.

12 comments:

fcmmok said...

forEachLine(Line line: "/etc/passwd")
{
if (line.matches("root"))
return line;
}

it should be

forEachLine("/etc/passwd", Line line: )
{
if (line.matches("root"))
line
}

Btw, as it is titled BGGA, What the evil are the other three people?

They can raise a JSR, right?

Ricky Clarkson said...

No, the syntax I showed is the same as in the specification: http://www.javac.info/closures-v05.html

It's the same syntax as the enhanced for loop, roughly.

The other three are Gilad Bracha, who left Sun and who has generally moved onto other things, James Gosling, who seems to have an overseeing role, and Peter von der Ahé, who, as far as I know, is the javac tech lead.

However, it would make most sense to have the leader of the BGGA project, Neal, be the one in control of the JSR.

Mike A said...

I too am very disappointed to hear that Neal might be prevented from submitting hi JSR due to internal Google politics. But I have to agree with the other poster, if the alternative to not having Neal on the JSR is no JSR at all, I'd say let's go with one of the other guys as the next best solution. After all they do need to earn having their initial on the proposal. Neal could still be playing the supporting, or behind the scenes role, can't he?

Bob said...

You can rewrite your pure Java solution more concisely and with less awkward semantics like so:

  return findFirstLine("/etc/passwd", new Predicate<String>() {
    public boolean matches(String line) {
      return line.matches("root");
    }
  });

I think you're being unfair to Josh. Why do we need to rush such a risky and controversial feature into the language? Once it's in, we're stuck with it. Just look at C++.

Ricky Clarkson said...

Bob, your findFirstLine is awkward primarily because it is a special case. It is possible, with the BGGA proposal, to write a method that iterates over each line, without caring what it does with each line.

Further, while I didn't address this in the blog, in the case that the relevant line isn't found in /etc/passwd, we don't usually want null to be returned.

The approach you show requires that both eachLine and findFirstLine are implemented, but they are so close in implementation that there is no benefit in keeping them separate.

To be more accurate, the JCP rules are the biggest problem for closures, rather than Josh Bloch in particular. It makes no sense to prevent Neal Gafter from proposing a JSR, regardless of his employer. However, I can see that it makes some sense to disallow him from having a vote independent of his employer.

Anonymous said...

This example sucks. Without code for forEachLine it is meaningless. You pretend that it can be more concise than regular Java by leaving out half the implementation. And you give out about special cases; isn't forEachLine a special case?

A more sensible Java option:

IterableFile f = null;
try {
f = new IterableFile("/etc/passwd");
return selectFirst(f, new Predicate<String>(){
public boolean check(String line){return line.matches(root);}
});
finally { closeNotNull(f);}

Select is a much more sensible and non special case operation than for each line.

Ricky Clarkson said...

Given that there will be one implementation of forEachLine and an unlimited number of uses of it, the implementation of forEachLine isn't really relevant to conciseness.

forEachLine is so special a case that there's an entire special case program devoted to it. It's called grep.

Your 'more sensible Java option' still has repetition. If you use it twice, you repeat a lot of it.

You could implement forEachLine in terms of selectFirst, of course. That's a low-hanging fruit generalisation, and doesn't affect the analysis of CICE.

Anonymous said...

I still think the example is not the best, selectFirst is a better way of expressing the problem and is similar to the standard functional map, select etc., but I am coming around to the need for this behaviour in closures. It would be really useful for using ReentrantLocks.

Howard said...

I have put a comparison of different closure proposals at:

http://www.artima.com/weblogs/viewpost.jsp?thread=202004

Tim Vernum said...


forEachLine is so special a case that there's an entire special case program devoted to it. It's called grep.


That's not true.
Grep isn't a "forEachLine" (although sed is), it's a "findAllMatching", which you think is a special case that shouldn't be needed.

You seem to be arguing that we should take the more complex closure proposal to avoid having to write "forEach" "findFirst" and "findAll".

A whole lot of languages (perl being the first one that jumps to mind) split their list processing along similar lines.

I personally think that using findFirst( ... ) is a better code style than forEach if you want to find the first matching line. Regardless of which closure proposal ends up being implemented, it will still be better for the code to read in accordance with the intent which is to "find the first matching line", not to do something for each line.

I happen to agree that CICE is an insufficient implementation, but I think you've picked a really bad example to argue it.

Held Wetti said...

Maybe I'm blind, but I don't see, where you define in your proposals:
- that "/etc/passwd" is a file's path,
- how to read them (byte-wise, line-wise, buffered/unbuffered, what encoding)
- what should happen when an exception is thrown

Please show a complete example to compare it with the existing code. I guess, when using existing code, it will be more easier to read.

Ricky Clarkson said...

I define those things in the documentation for forEachLine. For your convenience, I'll make up some:

"/etc/passwd" is a file path, they're being read line-wise, that's why you see a String and not a byte/int, and the platform default encoding is used.

When an exception is thrown, it will be passed back to the caller of forEachLine. Both BGGA and JCA support this without any special code.

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.