General musings on programming languages, and Java.

Monday, December 18, 2006

Why Closures in Dolphin is a Good Idea

Why Closures in Dolphin is a Good Idea


On Javalobby, Mikael Grev argues that, while he personally likes closures, and would use them, he would not want them to exist in Java.

He makes quite some deal out of keypresses - namely, the excessive amount of keypresses required for an anonymous class in Java. However, this misses the point somewhat - if something takes many keypresses to type, it will take many 'brain cycles' to parse. In fact, in even the simplest code, an anonymous class is so distracting from the intent of the code that it almost prohibits some excellent coding styles. That is demonstrated in this article.

Apologies for the code formatting - posting from Google Docs seems a bit dodgy. Here's the original. If you're not familiar with use cases for closures, I strongly suggest you take a look at Neal Gafter's blog , and the links he has on there, which, even if you disagree with the authors' points, the articles are at least very entertaining.

Closures Make Code Easier to Understand


Consider the following Haskell code:

map (+2) [1..10]

This is the usual Java code that's roughly equivalent:

int[] list=seq(1,10);
for (int a=0;a<list.length;a++)
    list[a]+=2;


Fairly readable, but it's not as expressive. It doesn't say 'add 2 to each element of a list from 1 to 10'. Instead, it says 'make a list. for each element of that list, add 2'.

That's two sentences, and the second has a sub-clause.

If I try to write the more expressive form (the Haskell version) in Java, I get something like:

result=map(new Function<Integer,Integer>()
{
    public Integer run(Integer input)
    {
        return input+2;
    }
},seq(1,10));


As you can see, the original less expressive form actually maps better onto Java than this better style. And not just in number of keypresses, but in readability. After all, I wrote it, you're just reading it, and you're probably cringing.

The (+2) syntax from Haskell is a way of specifying the + operator, but with one of its values pre-filled. There is a more verbose, probably more familiar, syntax, that resembles Java's impending closures more:

map (x -> x+2) [1..10]

This is an anonymous function that takes a value, x, and returns x+2. The word anonymous is important. The moment we 'simplify' things by giving names to such code snippets, e.g., a named class, or a field that holds the function, or even a local variable that holds the function, we're not actually simplifying, we're increasing the number of sentences needed to express ourselves.

Function<Integer,Integer> addTwo=new Function()
{
    public Integer run(Integer input)
    {
        return input+2;
    }
};

result=map(addTwo,seq(1,10));

Yes, this starts to look more attractive, but it's not really. It just means that we need to understand what addTwo is, as well as what map is and what seq is. We're adding to the number of things we either need to commit to the subconscious, or hold in primary mental space.

For this extremely trivial example, you might wonder what the big deal is. If this was all the benefit one could get from closures, I'd agree with Mikael. However, by making trivial code exceedingly trivial, you can make less trivial code trivial, and complex code, well, readable. Being able to understand more code at once means that you can spot mistakes in it better.

Closures help to keep your code DRY, and encourage excellence.


DRY is Don't Repeat Yourself. By making the above more expressive code also more attractive, you open yourself up to all sorts of optimisations (removal of repetition - I'm not talking about performance, though that does come into it somewhat).

Most operations on lists or Strings can be expressed in terms of mapping or folding (also called reducing). For example, joining a list of Strings to add colons in between is a fold:

result=fold(new String[]{"root","0","/bin/bash"},new Function<Pair<String,String>,String>()
{
    public String run(Pair<String,String> pair)
    {
        return pair.first()+":"+pair.second();
    }
});

Now, at first glance, that code is garbage. Let's add closures:

result=fold(new String[]{"root","0","/bin/bash"},{first,second => first+":"+second});

Now, if you understand that to 'fold' is to run a function on the first and second elements of a list, then run the function on the result of that and the third element, etc., then you'll probably quickly understand the code above - but the excess notation in the anonymous inner class version makes it harder to grasp. This makes using fold unattractive. fold and map are some of the best techniques available for working with lists of data. They are immensely flexible and scalable. Google's famous MapReduce algorithm is entirely based on them.

So, without closures, we are not likely to come up with algorithms like MapReduce - that is, we are actively discouraged from writing the best code. Of course, we are able to think outside of the programming language that we use, but it tends to be slightly harder to do. I doubt that many Java programmers think in terms of folds and then convert that into a suitable Java version. Instead, we think in terms of the Java version, and maybe realise later that it was another hand-coded fold implementation.

Further, by keeping code DRY, you keep maintenance costs low, e.g., if you have a bug in your withLock() implementation, your using() implementation, your withResource() implementation, etc., you can fix it in one place. If you didn't use those, but hand-coded (or IDE-generated) it every time, then you're fixing it in many many places.

I once looked through some of the JDK source, and found that most of the resource allocations don't follow the suggested best practice - the try..finally{try{close}catch{log}} idiom. I wager that this would not have been the case had closures existed from the start. Reusable solutions would have been more attractive - more convenient.

And Now to Refute Some Points in General

These are from Mikael:

"the benefits must be proven to be measurably greater than the costs". It's impossible to prove that, as the benefits and the costs both have humans as part of their variables.

"I would guess that the more advanced coders, the ones that is usually on the closure side, does this". In that statement, 'this' meant auto-generating code using an IDE for anonymous classes. That is probably true, but auto-generating code is a workaround for a missing language feature (not necessarily a feature that should be there though - it's only with this clause that I can make the generalisation). More advanced coders probably get a slight pang of 'this sucks' whenever they auto-generate an anonymous class, or getters and setters.

"That is unless you have to use one of the proposed syntaxes for handling exceptions thrown in the closure or have some funky return structure". Clearly, anonymous classes aren't going to be removed from the language, so if you find the syntax hard to understand, you can always revert to anonymous classes. I expect IDEs will provide automated routes to and from closures and anonymous classes.

"The solution to this aesthetics problem isn't closures though, it can be solved without adding complexity by just allowing a little syntactic sugar for the AICs." Even the syntactic sugar for AICs (anonymous [inner] classes) detracts from the expressiveness, and still discourages DRY and excellent code in the same way that AICs do now. Consider:

result=fold(new String[]{"root","0","/bin/bash"},new Function<Pair<String,String>,String>()
{
        return pair.first()+":"+pair.second();
});


It's not a lot better, it's still got a lot of verbosity that could be inferred (the type parameters to Function, the word Function itself). It's still distracting.

"Closures can do many things that AICs can't. Change the variables outside their scope for instance". Like with autoboxing, you could conceivably configure your IDE to prevent yourself from doing this. For most cases it won't matter. Neal Gafter explained the reasoning behind making code that's inside a closure behave the same as code that's outside it. It doesn't break the WYSIWYG nature of Java, because it's damn obvious that, say invokeAndWait{frame=new JFrame();} will assign to the nearest variable called frame.

"I still think that the AIC should only be working on a copy of the value." This could only promote out-of-sync bugs.

"The primary cost here is that Java developers need to learn new constructs. Constructs that are not very Java-ish and therefore will take some time to getting used to." That's not a cost, it's a benefit. Learning how to use generics benefits those who have. Generics didn't look very Java-ish, but they worked well. It actually helps programmers to learn new concepts.

"Remember that not all are as bright as you and you gain nothing from alienating the Java-Joes however good that feels for your ego." Actually, I do teach some new Java programmers, and I'd be much more comfortable introducing them to:

invokeLater{frame.setVisible(true);}

than:

invokeLater(new Runnable()
{
    public void run()
    {
        frame.setVisible(true); //and, er, you'll have to make frame final.
    }
});


Closures are simpler, for all levels of programmers.

"Take the much loved Collection framework. If it'd been closure-enabled from day one it would've been even better. Now you need to squeeze in closures". Or make sure that closures are implemented in such a way that they are useful with the framework. For example, we can implement a Comparator as a closure, and don't even have to say the word 'Comparator'. It's inferred. Type inference is good.

Collections.sort(list,(x,y => y.intValue()-x.intValue());

If the JDK had to include another version of sort that was closure compatible, which it doesn't, then I would agree with you.

"You could argue the same way [against] for anything that gives more power to the developer. #DEFINE is such a thing.". The use cases for #DEFINE in C and C++, such as inclusion of header files, portability (hoho), definitions of function-like macros, are largely eliminated by simpler features in Java. What features combined are simpler than closures, for all (or most) use cases that closures have?

"With closures you can code Java that doesn't look like Java and that isn't something I'd like for the Java community.". You could replace 'closures' with 'generics' in that statement, rewind a few years, rinse and repeat.

And just a humourous note: This is from Mikael's top ten tips on how to become a Rock Star Programmer: "Write smart cool compressed code constructs". He's joking, but surely that's, well, closures. From the same place: "Less code, in a smart way, means less to maintain.". Agreed. Smart doesn't mean 'hard to understand'.

"Frankly (sort of) defining new keywords on a developer level scares the bejesus out of me." I wonder whether there was a time that allowing developers to write their own functions (rather than having them hard-wired into the machine) was scary.

Shai Almog said:

"Generally I tend to be wary from features that are designed for "experts"". Closures in Java appear not to be designed for experts, but for programmers. It looks to me like every effort is being spent to make programming in Java better. The usage syntax is very compelling.

"VM changes are that much worse even worse than half baked implementations (e.g. generics)." I've found that generics were cooked for just long enough. They could use some extra features, some garnish, but they taste nice. The worst thing about them is that they're awkward to talk about in comments on other peoples' blogs, with the old < etc.

This one's hard to quote, but, Shai conjectured that a closure-accepting method would be hard to maintain, because average programmers wouldn't understand the method.

1. Don't let code into your codebase that is above ALL your staff.
2. IDEs could probably refactor it into an equivalent interface-accepting method anyway with no change to the use site.
3. It's already possible to write code that is above the level of other programmers, e.g., with generics, enums, finally (yes, there are programmers who don't get finally), etc.

Friday, December 15, 2006

Preventing NullPointerExceptions, Maybe

Null has always bothered me. I can write code without causing NullPointerExceptions, fairly easily, but without the techniques documented here, some still slip through. Of course, my automated tests are entirely comprehensive (joke), so there's no problem, right?

Wrong. Writing tests doesn't solve the problem that null exists in the first place. If we place a bollard in the middle of a street, and test all the cars to make sure that they can get around it without hitting the houses, that doesn't make the bollard acceptable.

One rule absolutely solves this. Assign a value to each field as soon as it's declared. A non-null value. To be picky, you'd have to also ban the new Object[x] form of array creation, and never give a local variable a null value. Let's not be picky.

The instinctive reaction to this is to say that you don't always have a value to put in the field, and therefore that null is the best value.

Partly true. However, null is not the best value. The likely first thought is the NullObject pattern. For example, if we have a java.sql.Connection field, we might set it up with java.lang.reflect.Proxy, so that we can call methods on the Connection, though they do nothing. This only hides the problem, in obscure runtime behaviour. Usually, we'd rather see clear runtime behaviour (a NullPointerException) than obscure runtime behaviour ("I thought I'd saved to the DB, but it was the NullConnection"). NullObject isn't going to help.

It's better to have a real distinction between a useful value and a useless value - one that forces you to 'check', or even checks for you. There are a couple of ways of doing this. The @NotNull and @Nullable annotations introduced by IntelliJ IDEA is one - though I haven't used those myself. Another way is possible, using only the Java language. Though it has to be said, the Java 7 language will make this more comfortable.

And Now To The Meat

The following concept was shamelessly stolen from Haskell.

Given a field that may have a Connection, or null, I'll change it to 'maybe a Connection', or Maybe<Connection>. There are two implementations of Maybe - one of them does have a Connection (well, T), and one of them has Nothing.

Then, rather than testing it to see whether it really has a Connection, I tell it what I want it to do if it has a Connection, and what I want it to do if it doesn't have a Connection. Oh, and for maximum flexibility, return me the result.

Let's go with a less flexible version for a moment, as an explanation.


interface Maybe<T>
{
    void apply(SideEffect<T> runThisIfTheresAnObject,Runnable runThisIfThereIsnt);
}

interface SideEffect<T>
{
    void run(T input);
}
So, when I call maybeConnection.apply(saveStuffToDB,initialiseConnectionAndSaveStuff), if maybeConnection is 'just' a Connection, it will call saveStuffToDB.run(connection), and if it is Nothing, it will call initialiseConnectionAndSaveStuff.run().

However, there are two problems with this approach. One is that I tend towards functional programming, and this stuff relies on side effects, so it irritates me. The other is that programming side effects with anonymous classes can really be a pain in Java, thanks to the 'final' requirement on enclosing local variables.

So what I really want to do is to change apply so that it returns something. I could make it return Object, but then I'm just reintroducing the old ClassCastException possibility. I could make Maybe take two type parameters, T and R, R being the return type of apply. However, that would mean that each Maybe would only be able to run 'functions' that return one type - impractical.

Generics allows you to declare type parameters on methods, not just whole classes/interfaces, so let's do that. If you don't like the look of this, skip to the bottom and eye up the alternative implementation (visitor).


interface Maybe<T>
{
    <R> R apply(Function<T,R> ifT,R ifNothing);
}
Let's just walk through that syntax. <R> just declares a type parameter. If you don't like that, simply ignore it. apply takes in a Function, which has one method, R run(T), and it takes an R. If there is a 'real' object, a T, the Function's run method will be invoked, and the R that it returns will be returned from apply. If there isn't a real object, then ifNothing is returned.

It's rather like encapsulating an if statement. By taking the responsibility for checking null away from the user of Maybe, we're taking the possible bug away too. Note that we're only taking it as far as Maybe - of course, if the two implementations of Maybe are broken, then the bug will be everywhere.

And now for example usage:


Maybe<Connection> maybeConnection=MaybeUtility.nothing();
... some code, might set maybeConnection to something else, might not.
String outputToUser=maybeConnection.apply(new Function<Connection,String>()
{
    public String run(Connection connection)
    {
        some code that uses a PreparedStatement etc. and returns a String.
    }
},"Er, some fool forgot to connect to the database.  Fire Fred");
What we're doing here is implementing dynamic dispatch. It's another way of implementing the visitor pattern. In fact, Maybe can be implemented easily via the standard idiom for the visitor pattern - the only reason I don't is that I like single-method interfaces. I find that they fit my thinking better. They also fit the closure proposal better, which is probably worth bearing in mind now.

Here's Maybe implemented with a more obvious visitor approach:


interface Maybe<T>
{
    <R> R accept(MaybeVisitor<T,R> visitor);
}

interface MaybeVisitor<T,R>
{
    R ifJust(T t); //in Haskell, the opposite of Nothing is Just, in terms of the Maybe type.
    R ifNothing();
}
Maybe and friends can all be found in Functional Peas, which is currently a placeholder for some useful bits and pieces of functional (or nearly-functional) code.

Yeah, but..

If you think that this is wasteful in terms of programmer time, I might agree with you - until we have good syntax for closures, using Maybe isn't syntactically that attractive. This can be dealt with to an extent - such as by reducing the need for null from the original code, or choosing the visitor approach. I will blog about techniques for doing that, probably under the heading 'Reducing Mutability'. Another way is to prefer function composition over always writing 'closures', which personally I'm not very good at that yet.

If you think this is useless, because people don't make mistakes if they test enough, I refer you back to the bollard analogy at the beginning.

If you think this is useless, because I am not on a large team, I'm young, I work in a University, and therefore don't know what I'm talking about, then please don't bother commenting, and have a nice life.

If you think that this is useful, but that your colleagues won't understand or agree, just discuss it with them. They might have a better idea.

Saturday, December 02, 2006

Making equals(Object) type-safe

Update - Jean-Francoise Briere came up with a better solution - I've included it at the bottom. It's easy to end up comparing two obviously incomparable objects using the equals(Object) method. This is because equals is not generic. By incomparable, I mean that it is possible to tell from the source that they are incomparable, such as new StringBuffer().equals(""). The contract of equals does not permit throwing an exception when incomparable objects are compared, because comparing two incomparable objects is not a problem; comparing two incomparable references is. That is, it makes sense if you have an List<Object> and add an Integer to it, then look in it for a String, for the equals method to be called, so it should behave normally (of course, it's quite likely that the hashCode method will be called instead). So we can't really ask the objects themselves to help us out, unless we create lots of overloaded equals methods. E.g., an Integer would need equalTo(Integer), equalTo(Number) and equalTo(Object). Clearly a pain in the proverbial. We can instead ask the static type system to help us out.

First Bad Solution; Type-Specific Statics

Wrap equals calls up, say, in static methods.

public class Equaliser
{
   public static boolean equals(String first,String second)
   {
       return first.equals(second);
   }
}
There are two problems with this: 1. Repetition - you'd have to do this for every type. 2. It doesn't work if you put a supertype in, e.g., equals(Object,Object) would match any calls and you wouldn't notice.

Second Bad Solution - Naive Application of Generics

public static <T> boolean equals(T first,T second)
{
   return first.equals(second);
}
This will actually allow a comparison between an Integer and a String, because T is resolved to Object, unless you use the clunky qualifying syntax - ClassName.<Integer>equals(1,"blah"), which is so bad it's worth avoiding in most cases. Your favourite IDE will confirm that the unqualified version resolves T to Object when you hover over the call.

First Not-So-Bad Solution - Less Naive Application of Generics


interface Equalator<T>
{
    boolean isEqualTo(T other);
}

public static <T> Equalator<T> equalator(final T first)
{
   return new Equalator<T>()
   {
       public boolean isEqualTo(T second)
       {
           return first.equals(second);
       }
   };
}
This is called as: equalator(someReference).isEqualTo(someOtherReference), and will catch more bad comparisons. Of course, if your references (not your objects) are actually of type Object, then this won't be useful at all.

And Finally, The Same Thing But More Reusable

In my own code, I implement this as a (badly named) method, equalT:

    public static <T> Function<T,Boolean> equalT(final T first)
    {
        return new Function<T,Boolean>()
        {
            public Boolean run(final T second)
            {
                return first==second || first.equals(second);
            }
        };
    }
Now it's called as: equalT(one).run(two). I should probably get rid of the first==second part of that. Function is a type I defined in the publically-available functionalpeas package - it represents a function with one argument and one return value. Now I can pass equalT(someString) to a method that expects a Function<String,Boolean>, which can be handy, and explains the 'reusable' part of this section's subtitle. One really cool thing about it is that if you replace all your calls to equals AND all your code to == with this, including primitive==primitive, then you won't get any of those pesky problems caused by comparing references instead of objects. Have fun. Next time I'll look at eliminating NullPointerExceptions. No, really. Update Jean-Francoise Briere's solution is based on my 'naive implementation using generics', but is less naive:

public static <T,U extends T> boolean equalT(T t,U u)
{
    return t.equals(u);
}
This is better than my final solution, because it doesn't rely on creating a new object for each comparison, and there are less keypresses. Thanks, Jean-Francoise!

Blog archive

About Me

A salsa dancing, DJing programmer from Manchester, England.