General musings on programming languages, and Java.

Wednesday, July 18, 2007

Remaining Nameless

In college, I came across some budding Pascal programmers with hundreds of lines something like this:

    gotoxy(12,13);
    write('|');
    gotoxy(12,14);
    write('|');
That's right, they were hand-coding menu systems in ASCII. I actually didn't spot this as being such a great problem at the time, because I couldn't see past how ridiculous it was not to make a procedure called, say, printxy, to do the above.
   printxy(12,13,'|');
   printxy(12,14,'|');
I got a couple of these students to do that, but they didn't really know why. After all, their editor had copy and paste, and everyone else was doing the same as them. Probably their teacher was, too, but I never looked into that.

What happened when they realised that all their '|'s were off by one? They edited each line of code.

Some of those students are now successful (in that they make money) IT professionals (in that they make money), so I can only assume that they learned something in the intervening years.

The rest of education, in the meantime, has changed a little. Students now come along with the word 'reuse' on their tongues. It's a magical thing, and they joke that it means copying and pasting, but they know it doesn't. They can't do it though - the novices anyway. They still copy and paste, like those of my time, sometimes guiltily. Mainly they don't actually understand the alternatives. Even more experienced programmers will struggle to avoid copying and pasting in some situations, and introduce false abstractions to do that. Languages are partially judged on how much repetition they force you into.

Going back to the Pascal example before - printxy wasn't a particularly bad name, but imagine that you're a bit of a weak programmer, and you have a small friend group who usually help each other, the infinite monkeys' approach to assignments. Your friends have never heard of printxy before, and look at your code like it was written by a Martian. You don't want that, so you'll keep your code as it is, despite what that weird Ricky suggested at lunchtime.

At this point, the text editor isn't really very useful. You cannot keep your friend group and me happy at the same time. You could probably write an editor macro that let you type printxy(x,y,'['); and converted it into the appropriate code, but it would be a one-way transformation.

What you'd like, because you want to get help from me and from your peer group (infinite monkeys plus a programmer might equal higher marks!), is to be able to maintain two views of the same code.

That is, you'd like to type printxy(x,y,'[') and have it converted into a gotoxy and write call, but have the editor still know that it's a printxy call, so that you can flick a switch and Ricky can see 'his' version.

This never happened, because the editor didn't support it. The only students who I spoke to who ended up using printxy were smart enough that they didn't depend on a peer group. Except Dave, who tried to depend on me instead.

Suppose that printxy was a particularly bad name, or the implementation of it was a bit more intricate (this is perfectly possible - it would make sense to reset the cursor to 0,0 after writing each bit of text, because on DOS screens the cursor is usually visible - having it flick around the screen is particularly distracting).

It'd be good if, whenever people wrote similar code (either through finger-typing or copy and paste), their environment picked up on it and made it into an abstraction. E.g.:

   gotoxy(12,13);
   write('[');
   cursor(0,0);
   gotoxy(12,14);
   wr..
At that point, the environment could start autocompleting the write and cursor calls, and if the user started to type gotoxy again after the next write and cursor calls, the environment could complete the whole three-line form, placing the cursor in the right places to edit the parts that were different between the first and second times.

Further, the environment could fold the blocks of code, just showing the first one and then the differences:

   gotoxy(12,13);
   write('[');
   cursor(0,0);
   ..gotoxy(12,14)..
   ..gotoxy(12,15)..
Then if the user starts editing one of the forms, the others should be edited in parallel, unless the user makes a specific gesture not to, or manually 'ungroups' a form. Groups could be shown via highlights in the margin, just as they are in some environments for an individual method, etc.

This would allow new programmers to start seeing abstractions appear before they even know what the word abstraction means, and more importantly, without even having to give a name to it. printxy would be something they applied later.

The missing half-dimension of code

In most programming environments, you're restricted to what I'll call one and a half dimensions. The vertical scrollbar is fine, most people will happily scroll a couple of screens down, but never sideways. In most languages, 'arrow-like' code is discouraged as being hard to read. It's a bit like humans wandering around in 2 dimensions, not really being able to wander freely in the air. Code is the same - take it away from the top-level unit (function, class, method), etc., and you're just waiting for it to collapse back down.

The more nested the code is, the harder it is to understand, because you need to take the more polluted namespace into account. If you declare x outside a form, declare x1 inside it, then sometimes forget to use x1, you'll be using the wrong value. It's only because there is no independent namespace.

Programming environments could help a little with this by warning you when you use something from the surrounding namespace, but sometimes you mean to. They could use a different colour instead of warning you. I think it's worth looking at why arrow-like code happens, to find a solution.

I think it's because there's nowhere to store unnamed snippets of code. Imagine a programming environment, where, when you typed a nested form, such as a lambda expression (or anonymous class, or anonymous function), the environment made it appear dislocated from the code that uses it, possibly connected via dotted lines, etc. Any variable that is used from the surrounding namespace can be shown specially, or as some kind of parameter. You could drag the snippet to another block of code, from where it could be used again. Any variables that the snippet needs that the first scope provided, can be either defaulted or new links can be made. This wouldn't even need the snippet to have its own names for variables.

Past the user interface layer, Lisp macros with generated (and hidden) names could easily be used.

You may be wondering why ordinary methods/functions as in most languages can't be used - they can, but not in all the same ways. For example:

for (int a=0;a<100;a++)
{
   if (x[a]<50)
      return;
   print("here");
   if (y[a]<25)
      return;
}
Here the if..return is the repeating part - there's no way of encapsulating that in a method/function, without changing the code a lot.

I'm not saying that naming abstractions is bad, but that being forced to name them is bad. I plan to come up with an environment for editing code in this way. It probably will be based on an existing language, using generated names that the developer doesn't see.

Slava Pestov pointed me at subtext as an existing implementation of something like what I want. It looks interesting, but I don't think there's a huge gap between it and what Excel would be if you made cells part of a tree instead of a grid.

8 comments:

Neal Gafter said...

> Here the if..return is the
> repeating part - there's no
> way of encapsulating that
> in a method/function

Except in language like Smalltalk and Scala (and the proposed closures extension for Java) where you just wrap the code in a closure, return statements and all, and it just works.

Ricky Clarkson said...

It doesn't work, at least in the proposed closures extension for Java, if the two if..return blocks are in different methods.

That is, I can't pass {if(cond)return;} into a method, and have it mean to return from that method.

Anonymous said...

I appreciate you going through the thought process of a young programmer. It was eerily familiar to the time when I first started programming at age 11.

Hamlet D'Arcy said...

Sorry in advance if I turn this into a Java closures debate, but...

Couldn't you make {if(cond)return;} a closure and then make cond a closure itself? Something like {if({condition})return;}? A closure within a closure?

Ricky Clarkson said...

This would be a compile error:

public static final {{=>boolean}=>} returnIt={{=>boolean} cond =>
. . if (cond.invoke())
. . . . return;
};

Because return, whether in closures or not, returns from the nearest method, lexically, not dynamically.

That is, the enclosing method (here there is none, hence the error) is the one that return returns from.

Return doesn't return from whichever method you pass the closure to, but whichever method it's syntactically within.

Anonymous said...

I see our last chat has sent you off into interesting forays. Automatically changing the 'view' of a given bit of code from expanded to abstracted depending on the preferences of the viewer is what I had in mind. Automatically suggesting abstractions lies a bit beyond how far I took it, but it's a logical next step.

Aside from self contained efforts like subtext, I wonder where the real IDE innovation is coming from these days. As nice as eclipse / netbeans / IDEA are, none of them are even glancing sideways at two-way macros (macros which 'know' they are macros and can be viewed in folded macro view for you and anyone else that has configured their template/macro engine to do so).

Java in particular is rife with excellent places to do this (@Getter and @Setter or some such silently expanding to the usual getter/setter, or even more fancy stuff with change listeners and the like, as well as automated equals, hashCode, and constructor-from-fields generation - just those would chop 90% of the lines in a standard immutable POJO 'struct'-like class).

Even without addressing hairy implementation issues, there's a lot of low hanging fruit in plenty of programming languages.

The last time I took a stab at working with the java-specific bits of Eclipse, the code was a mess (in sharp contrast to just about anything else in eclipse which is quite nicely modular and fairly easy to work with). Diving into netbeans' java support is still high on my TODO list; I've worked a little bit with javac before and apparently netbeans mostly uses that to built its ASTs and the like.

Ricky Clarkson said...

Heh, Reinier, this is an old post. I just fixed up the comments, and didn't realise that the RSS would be republised.

It predates any of our discussions. You made some good points. I'm not really thinking along the same lines now, and I wish this post had been more focussed around the desire for languages not to force you to give names to things.

Ricky Clarkson said...

I mean, I just fixed up the formatting.

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.