General musings on programming languages, and Java.

Monday, January 21, 2008

A Trivial Display of How Scala's Type System Beats Java's

This article shows that Scala lets you add methods to objects, depending on their generic types. In Scheme, and other lisps and languages influenced by lisp, there is a construct called a cons, which is a pair of values. Linked lists can be built up by 'consing' pairs together. The first part of a cons can be called the 'car' (if used as a list, this is the head), and the second the 'cdr' (if used as a list, this is the tail). So if you have a cons with a cons as its second element, you can think of that as a compound data structure with three elements. E.g.:

(cons 1 (cons 2 3))

The first value of the second part of that, 2, the head of the tail, the car of the cdr, is known as the cadr. Let's write a Cons class in Java and Scala and then look at how to add the cadr.

Consider this Java code:

class Cons<A,D>
{
    public final A car;
    public final D cdr;

    public Cons(A car,D cdr)
    {
        this.car=car;
        this.cdr=cdr;
    }
}
equivalent to this Scala:
case class Cons[A,D](car: A,cdr: D)
(yes, that's all)

Now let's think about cadr. If a Cons has a car of type A and a cdr of type D, what type is the cadr? It's the type of the car of the cdr, which we haven't got a type parameter for. We could try this:

class Cons<A,D extends Cons<E,F>> but there's a recursion problem here -- F must extend Cons<E,F>>, not going to work. Plus, even if that worked, we could no longer use Cons for simple pairs.

We can make a static method elsewhere:

public static <A,D,E> D cadr(Cons<A,Cons<D,E>> cons)
{
    return cons.cdr.car;
}
That works fine, but it makes cadr somewhat a second-class citizen. It's a shame we can't do pair.cadr. In steps Scala.
implicit def addCadr[A,D,E](cons: Cons[A,Cons[D,E]])=new { def cadr=cons.cdr.car }

Cons(1,Cons(2,3)).cadr gives 2
addCadr, when in scope (it can be imported if it's not written where we need it) defines an implicit conversion from Cons[A,Cons[D,E]] to an anonymous class that implements cadr. So any time we ask for cadr the compiler (not the runtime) looks at our Cons type, doesn't see cadr and looks for any implicit conversions that result in a type that implements cadr. In this case it finds one.

The above is probably not a very practical use, let's think of some. You can make a List[Char] have the same methods as a String, for convenience. You can make a List[Double] have a total method.

(The worst thing about Java generics is how annoying they are to type in blogger!)

12 comments:

Anonymous said...

One of the reasons people find lisp programmers elitist bastards is this insane insistence on naming simple pairs 'cons', and instead of 'head' and 'tail', or other more descriptive words, using car and cdr. Please stop doing that. LISP afficionados will like you when you do so, language afficionados don't care; they understand cons/car/cdr just as well as list or pair/head/tail, and everyone else just looks at you, completely befuddled, and thinks you're a daft idiot for suggesting that what THEY use creates unreadable messes whereas lisp (or, in this case, scala) creates beautiful code, when you can't even call a pair a 'pair', and instead decide to make up a non-sequitur term for it.

The only composition of caaadddaaar you used here is cadr, a.k.a. the second element. So what's wrong with pairList.second?

We don't call assignment a 'MOV' anymore either for obscure assembler-related reasons. We moved on a loong long time ago.

Normally I wouldn't whine about this, but this blog post seems like it wants to target java programmers and explain to them that the world is a large place. That's a good thing, but this post will not convince anyone except those already convinced. In fact, it does the opposite; it confirms that scala is for that stereotypical language buff that's still stuck in the LISP golden age.

A more pressing complaint, perhaps, is that this doesn't scale; you can manually add a cadr/second method quite easily in scala, but it seems unlikely you'd define caar, cddr, caadr, caddr, caaar, cdddr, caaaar, caaadr, etcetera. Scala in fact does not really have the facilities to do such things. Dynamic languages can do it by programatically tossing a whole slew of methods into the prototype and/or defining missing_method or some similar construct and parsing the call.


Put differently: Is this -really- the best example you could come up with for the power of implicit def?

Ricky Clarkson said...

Reinier,

Lisp programmers don't insist on car and cdr. Common Lisp requires first, second, third, etc. to be defined. It also requires car, cdr, cadr, etc., (up to cddddr) but any modern text on Common Lisp discourages the use of these. If I had asked lispers to proofread this blog post (instead of Java and Scala programmers), they'd likely have told me to use first, second, etc., and they'd have missed the point too.

I don't want my blog to be used to spread the lie that lisp programmers are elitist bastards. Spend a bit of time with lisp and its community and you'll agree that they aren't.

SICP does not use first and second, however. In the video lectures, Sussman points out that in the early days of lisp/Scheme, programmers would talk on the telephone, and they would find caddr and cadar easier to say than "first part of the tail of the tail". car and cdr are like mathematical operators. If "+" was to be written "arithmetically added to", we'd find arithmetic annoying to write. By making list extraction really short, through mnemonics that, arguably, mean little, we have a little vocabulary that isn't domain-specific, and encourages us to use lists until we have a better abstraction.

Further, if you are not representing a linked list, but instead a pair of pairs, e.g.:

(cons (cons 3 4) (cons 5 6))

then first, second, etc., are misleading and indirect: to access 5 you would need (first (second the-cons)), whereas you would think of 5 as being the 3rd element in this data structure. Having a meaningful English name doesn't always help.

You said that this doesn't scale - it's true that there is no way to automatically generate those methods whenever you call them, but that really doesn't depend on dynamic typing. Common Lisp and Scheme have car, cdr, built in and do not generate them in a method_missing kind of way. Not all dynamic languages are Rubyish.

In fact, a conceivable static language could quite easily generate the methods - I could knock up a source processor to do it. This really is unrelated to typing.

I'm sorry to have offended you with such a useless example. I actually said it was useless in the post, and offered some useful ones. It's sad to have to point at one's own words.

Anonymous said...

I never said that *I* found lisp programmers elitist bastards. I said that it's a common sentiment, and I then suggested that this cons/car/cdr sort of thing might not be helping the cause very much.

I also don't see you addressing any of my points; if you are trying to convince java-only programmers that there are other ways of doing things (a noble cause), this is exactly the kind of post that sends them running for the hills, and I really cant fault them for doing it.

If the phone thing is true, I will now personally insult lispers and call them somewhat dense. How is 'cdddadr' easier to pronounce again? Even the batman intro would have some trouble inserting that amongst the onomatopoeias. But, thanks for proving my point. How is 'phone discussion' in any way or form a relevant factor in the 21st century?

Common Lisp does have car, cdr, and the whole composition thing built right in. It's not defined as a macro AFAIK, which means they copped out of the typing system altogether and just wrote it right into the language spec. Okay. Sure. Any language could do that.

Yes, some sort of macro or AST or other kind of (pre)processor could easily do the right thing, and such things are certainly not restricted to dynamic languages. However, with a dynamic language-like 'missing_method' construction, it's certainly a lot easier. Which again highlights how this example actually might convince one that scala's type system is worse. I know it isn't, but this post isn't helping.

Thus, the example isn't just useless, it also fails to make your point. Given that the title of your blog post is 'A trivial display of how scala's type system beats java's", I presume you meant 'useless' in that its practical function is not all that useful, and not 'useless' in the sense that this entire blog post fails to make its point. You hit 'post' for some reason or other. I feel it well within common sense to presume you didn't mean that the example is useless in supporting your premise.

But, hey, it's your show. If you want to torch me for pointing out some flaws, you go right ahead. I'm effectively at barrel's end here.

Ricky Clarkson said...

About the lisp programmers comment; you're right, I misread.

I'm not trying to convince Java-only programmers, at least in this post. I'm just pointing out something that I find useful, and something I found irritating about Java.

The phone thing is true. cdddadr is pronounced cudududadder, and yes, cdddadr is beyond the usual length. You should probably be made aware that the video lecture to which I referred was recorded in 1986.

"Common Lisp does have car, cdr, and the whole composition thing built right in. It's not defined as a macro AFAIK, which means they copped out of the typing system altogether and just wrote it right into the language spec. Okay. Sure. Any language could do that."

Well, there is no typing system as such. Java could not do that, at least if cadr etc were to be defined as methods. (not strictly true, with a load of casting it could).

I'll try to write the blog post in one line, in a precise way, so that you can't blame cons/car/cdr/lisp/me for you not understanding it (I can't tell whether you understand it or not). Here goes:

Given a type X[T], where T is a type parameter, not a concrete type, it is possible to add methods to it only for X[A], where A is a concrete type.

This is not written as generally as it could be, I expect ScalaReference covers it better.

I gave two useful cases at the end of the original text.

Anonymous said...

I know what implicit def can do. I just thought it was a spectacularly bad way of showing it off, and felt the need to comment. Actually going through the motions of adding, say, a total() method that has the right type for all Number classes produces a useful result and would be just as illuminative as to the ability of implicit def to work on generic types where you insert specific types for some or all generics parameters.

Ricky Clarkson said...

Well, post something better and if I like it I'll make the start of this post point at it :)

Anonymous said...

In a dynamic language, you don't even need to toss a whole slew of methods into the class - you can just define the attributes dynamically:


class Cons:
  def __init__(self, car, cdr):
    self.car = car
    self.cdr = cdr

  def __getattr__(self, attrname):
    if re.match("c[ad]+r", attrname):
      target = self.car
      if attrname[-2] == 'd':
        target = self.cdr

      if hasattr(target, attrname[:-2]+'r'):
        return getattr(target, attrname[:-2]+'r')
      else:
        raise AttributeError, attrname

Ricky Clarkson said...

Anonymous,

As I've already pointed out to Reinier, one could do that in a static language; it is unrelated to whether the language is static or dynamic.

ngu said...

Ricky,
I am probably missing the point, which wouldn't be a great surprise, as I'm not very experienced in this area...

But what is actually the point of adding all these methods to the classes in Java? Is it just a kind of syntax sugar, an ability to avoid using either your:

static <A,D,E> D cadr(Cons<A,Cons<D,E>> cons)

, or:

class MyLongList extends List<Double> implements Summable<Double>{ Double total(){...} } ?

Thanks!

saintx said...

@Ricky: I am a Java programmer. And a LISP programmer. At the University of Minnesota, Sussman's SICP book is still used to teach the very first programming course that all undergrad CSCI majors are required to take. And this post is helping me become a Scala programmer. I am especially grateful for the fact that it specifically used "cons", "cdr" and "car", because I am specifically looking for simple lisp-like structures whose absence (among other things) are causing me to leave Java. My Google search actually specifically included those terms, and this post was precisely the sort of example that I found useful.

Thank you for the excellent post.

saintx said...

@Reinier: When in Rome, do as the Romans.

Anonymous said...

Thank you, anonymous, for working out the very nice "class Cons" example.
Its correctness would be improved, however, by adding a dollar sign.
Compare re.match("c[ad]+r$", "cdark") with the regex in __getattr__().

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.