General musings on programming languages, and Java.

Wednesday, July 18, 2007

Remaining Nameless

In college, I came across some budding Pascal programmers with hundreds of lines something like this:

    gotoxy(12,13);
    write('|');
    gotoxy(12,14);
    write('|');
That's right, they were hand-coding menu systems in ASCII. I actually didn't spot this as being such a great problem at the time, because I couldn't see past how ridiculous it was not to make a procedure called, say, printxy, to do the above.
   printxy(12,13,'|');
   printxy(12,14,'|');
I got a couple of these students to do that, but they didn't really know why. After all, their editor had copy and paste, and everyone else was doing the same as them. Probably their teacher was, too, but I never looked into that.

What happened when they realised that all their '|'s were off by one? They edited each line of code.

Some of those students are now successful (in that they make money) IT professionals (in that they make money), so I can only assume that they learned something in the intervening years.

The rest of education, in the meantime, has changed a little. Students now come along with the word 'reuse' on their tongues. It's a magical thing, and they joke that it means copying and pasting, but they know it doesn't. They can't do it though - the novices anyway. They still copy and paste, like those of my time, sometimes guiltily. Mainly they don't actually understand the alternatives. Even more experienced programmers will struggle to avoid copying and pasting in some situations, and introduce false abstractions to do that. Languages are partially judged on how much repetition they force you into.

Going back to the Pascal example before - printxy wasn't a particularly bad name, but imagine that you're a bit of a weak programmer, and you have a small friend group who usually help each other, the infinite monkeys' approach to assignments. Your friends have never heard of printxy before, and look at your code like it was written by a Martian. You don't want that, so you'll keep your code as it is, despite what that weird Ricky suggested at lunchtime.

At this point, the text editor isn't really very useful. You cannot keep your friend group and me happy at the same time. You could probably write an editor macro that let you type printxy(x,y,'['); and converted it into the appropriate code, but it would be a one-way transformation.

What you'd like, because you want to get help from me and from your peer group (infinite monkeys plus a programmer might equal higher marks!), is to be able to maintain two views of the same code.

That is, you'd like to type printxy(x,y,'[') and have it converted into a gotoxy and write call, but have the editor still know that it's a printxy call, so that you can flick a switch and Ricky can see 'his' version.

This never happened, because the editor didn't support it. The only students who I spoke to who ended up using printxy were smart enough that they didn't depend on a peer group. Except Dave, who tried to depend on me instead.

Suppose that printxy was a particularly bad name, or the implementation of it was a bit more intricate (this is perfectly possible - it would make sense to reset the cursor to 0,0 after writing each bit of text, because on DOS screens the cursor is usually visible - having it flick around the screen is particularly distracting).

It'd be good if, whenever people wrote similar code (either through finger-typing or copy and paste), their environment picked up on it and made it into an abstraction. E.g.:

   gotoxy(12,13);
   write('[');
   cursor(0,0);
   gotoxy(12,14);
   wr..
At that point, the environment could start autocompleting the write and cursor calls, and if the user started to type gotoxy again after the next write and cursor calls, the environment could complete the whole three-line form, placing the cursor in the right places to edit the parts that were different between the first and second times.

Further, the environment could fold the blocks of code, just showing the first one and then the differences:

   gotoxy(12,13);
   write('[');
   cursor(0,0);
   ..gotoxy(12,14)..
   ..gotoxy(12,15)..
Then if the user starts editing one of the forms, the others should be edited in parallel, unless the user makes a specific gesture not to, or manually 'ungroups' a form. Groups could be shown via highlights in the margin, just as they are in some environments for an individual method, etc.

This would allow new programmers to start seeing abstractions appear before they even know what the word abstraction means, and more importantly, without even having to give a name to it. printxy would be something they applied later.

The missing half-dimension of code

In most programming environments, you're restricted to what I'll call one and a half dimensions. The vertical scrollbar is fine, most people will happily scroll a couple of screens down, but never sideways. In most languages, 'arrow-like' code is discouraged as being hard to read. It's a bit like humans wandering around in 2 dimensions, not really being able to wander freely in the air. Code is the same - take it away from the top-level unit (function, class, method), etc., and you're just waiting for it to collapse back down.

The more nested the code is, the harder it is to understand, because you need to take the more polluted namespace into account. If you declare x outside a form, declare x1 inside it, then sometimes forget to use x1, you'll be using the wrong value. It's only because there is no independent namespace.

Programming environments could help a little with this by warning you when you use something from the surrounding namespace, but sometimes you mean to. They could use a different colour instead of warning you. I think it's worth looking at why arrow-like code happens, to find a solution.

I think it's because there's nowhere to store unnamed snippets of code. Imagine a programming environment, where, when you typed a nested form, such as a lambda expression (or anonymous class, or anonymous function), the environment made it appear dislocated from the code that uses it, possibly connected via dotted lines, etc. Any variable that is used from the surrounding namespace can be shown specially, or as some kind of parameter. You could drag the snippet to another block of code, from where it could be used again. Any variables that the snippet needs that the first scope provided, can be either defaulted or new links can be made. This wouldn't even need the snippet to have its own names for variables.

Past the user interface layer, Lisp macros with generated (and hidden) names could easily be used.

You may be wondering why ordinary methods/functions as in most languages can't be used - they can, but not in all the same ways. For example:

for (int a=0;a<100;a++)
{
   if (x[a]<50)
      return;
   print("here");
   if (y[a]<25)
      return;
}
Here the if..return is the repeating part - there's no way of encapsulating that in a method/function, without changing the code a lot.

I'm not saying that naming abstractions is bad, but that being forced to name them is bad. I plan to come up with an environment for editing code in this way. It probably will be based on an existing language, using generated names that the developer doesn't see.

Slava Pestov pointed me at subtext as an existing implementation of something like what I want. It looks interesting, but I don't think there's a huge gap between it and what Excel would be if you made cells part of a tree instead of a grid.

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.