General musings on programming languages, and Java.

Friday, February 23, 2007

Google without the Search - Google Reader

I use quite a few Google services, but I noticed for the first time that Google Reader has no search facility - I sometimes don't remember to bookmark something I've replied to, and then have to search for it again if I want to check for updates. It seems fairly odd for a search company to omit search. For a few hours on Google Reader yesterday, I noticed that in the List View the titles of the articles weren't shown unless I clicked on them. I just switched to the Expanded View for that period, so I'm glad I only had a couple of hundred posts to scroll through (and a mousewheel!). I also find Google Docs a bit weird, it's started telling me occasionally that another collaborator has made some edits, and the edits from the last few seconds will be removed. Not a huge problem, but I'm the only person with the rights to edit the document in question. Marking things as bold, or changing the fonts, etc., sometimes happens to other text besides the bit I've selected, too. Perhaps Google's code quality is suffering as the organisation grows, or perhaps there's too little (none?) testing before features are pushed out to beta projects. After all, Google is using beta users as testers, but we're also using them. The reason everyone was happy to use Gmail in beta was because it was so damn good. It's a shame that other Google services aren't treated as killer apps like Gmail was.

Tuesday, February 20, 2007

Sanitising some code

A simple but valuable refactor - converting an interface plus two similar implementations into a final class. This is taken from real code; I wrote this document while refactoring. Here I have an interface that represents an entry in an ARP table.


public interface ArpEntry extends Stringable
{
 MacAddress getMacAddress() throws CheckedIllegalStateException;
 void age();
 boolean dead();
}
Stringable has one method, asString() - I call that instead of toString(), so that I never get java.lang.SomeType@89345aef3 in logs, etc. getMacAddress() may fail, because there are two kinds of ARP entry - some have a MAC address, and some do not (in real implementations, the second kind have a zero MAC address). The ones that have no MAC address are entries to say that a request has been sent. That's why it throws an exception. age() will age the entry by a unit - approximately one second, but as we use this simulation for teaching purposes, we may allow slower students to slow this down. dead() tests to see whether the entry is dead (aged beyond a certain limit). Currently, I have two implementations, both anonymous classes:

public class ArpEntryUtility
{
 public static final int START_TTL=20;

 public static ArpEntry arpEntry(final MacAddress macAddress)
 {
  return new ArpEntry()
  {
   int timeToLive=START_TTL;

   public MacAddress getMacAddress()
   {
    return macAddress;
   }

   public void age()
   {
    timeToLive--;
   }

   public boolean dead()
   {
    return timeToLive<=0;
   }

   public String asString()
   {
    return macAddress.getRawValue()+"; expires in "+timeToLive+" seconds";
   }
  };
 }

 public static ArpEntry incompleteArpEntry()
 {
  return new ArpEntry()
  {
   private int timeToLive=START_TTL;

   public void age()
   {
    timeToLive--;
   }

   public boolean dead()
   {
    return timeToLive<=0;
   }

   public MacAddress getMacAddress() throws CheckedIllegalStateException
   {
    throw new CheckedIllegalStateException();
   }

   public String asString()
   {
    return "incomplete ARP entry - expires in "+timeToLive+" seconds";
   }
  };
 }
}
It's trivial to see duplication here. One approach would be to make the interface a superclass, or to add a superclass that implements the interface, as a base for common code. However, that isn't the only choice. Let's look at a more classic form of code reuse - calling a method. I'll add two methods, getTimeToLive, setTimeToLive, to the interface. These will do no validation, just pass things through. Don't panic, these methods won't live long.

public interface ArpEntry extends Stringable
{
 MacAddress getMacAddress() throws CheckedIllegalStateException;

 void age();
 boolean dead();

 int getTimeToLive();
 void setTimeToLive(int ttl);
}
Now we can implement age() and dead() as static methods in ArpEntryUtility, and call them from the two implementations. Of course, those method calls are still duplication - we can call the static methods directly and remove the methods from the interface. IDEA has a refactor for this whole paragraph - Make Static. It will sort out the callers for you too, changing entry.age() to ArpEntryUtility.age(entry). Now almost the only case-specific code is getMacAddress(). If I change getMacAddress() so that it returns Maybe<MacAddress>, then there's no need for the exception, and hence no need for getMacAddress() to be implemented differently between each implementation.

public interface ArpEntry extends Stringable
{
 Maybe<MacAddress> getMacAddress();
 int getTimeToLive();
 void setTimeToLive(int ttl);
}
Looking at all the use sites, I see that getMacAddress() is only used once, and in that case the exception is caught and converted to a Maybe anyway, so I've just made the use site simpler too, by chance. It looks almost like a struct now, the only case-specific code left is the asString() implementations. I can make that a static method that does different things based on the Maybe<MacAddress>. The two implementations now only differ in how they are constructed. One is passed a MacAddress, the other isn't. Easy to solve, pass a Maybe and now we only have one implementation. One interface, with one implementation. Needless indirection. Let's change the interface to be a final class, and the implementation to just be a constructor call. Finally, we can get rid of the getters and setter, making macAddress a public final field, and timeToLive a public field. That gets rid of some extra needless indirection.

public final class ArpEntry
{
 public final Maybe<MacAddress> macAddress;
 public int timeToLive=20;

 public ArpEntry(Maybe<MacAddress> macAddress)
 {
  this.macAddress=macAddress;
 }
}

public class ArpEntryUtility
{
 public static void age(final ArpEntry arpEntry)
 {
  arpEntry.timeToLive--;
 }

 public static String asString(ArpEntry arpEntry)
 {
  return arpEntry.macAddress.apply(new Lazy<String>()
  {
   public String invoke()
   {
    return "incomplete ARP entry - expires in "+arpEntry.timeToLive+" seconds";
   }
  },new Function<MacAddress,String>()
  {
   public String run(MacAddress macAddress)
   {
    return macAddress.getRawValue()+"; expires in "+arpEntry.timeToLive+" seconds";
   }
  });
 }

 public static boolean dead(ArpEntry entry)
 {
  return entry.timeToLive<=0;
 }
}
You might now decide to restrict the users of ArpEntry so that they have to access everything via the provided static methods. That's fairly simple, just merge the classes and make the fields private. However, I like to keep my 'bags of functions' separate from my instantiable classes. While I was refactoring, one of my automated tests started to fail. It actually took me about two hours to fix the problem. Inadvertently, I had made some of ArpEntry's client code more logical, the code that decides what to do with an outgoing ARP packet from a computer. However, some parent code to that, the code that decides whether to send an outgoing ARP packet) had a logic error, which I'd never noticed before. This refactoring wasn't as straightforward as it could have been, mainly because IDEA doesn't know much about the Maybe type. But overall, I'm pleased I spotted the logic error now, rather than when I'm closer to a release!

Sunday, February 18, 2007

Ricky's Properties for Java

While I think there should be some language change to make properties really useful, it's worth looking at how close we can get to properties without changing the language, to work out what needs changing. Suppose you were to write a method called, say, setField, or setf for short, that takes a property and a value, and sets it.  This is fairly reasonable as something you might like to do with arbitrary properties, for, say, a GUI.

  <T> T setf(Property<T> property,T value)
  {
      property.value=value;
  }
This approach relies on having one object per property, so it's easy to see it as a memory leak.  It's not actually a memory leak, it's a memory overhead.  Suppose that you create 1000 objects, each of which take 100 bytes normally.  Now you change them to expose Property objects as public final fields, and each object now takes 1000 bytes.  It's only a (potential) leak if you need to create objects every time you use a Property.  What we do have now though, is n more objects, where n is the number of properties.  This isn't a hopeless situation; there is a possible VM-based solution, holding Property objects directly, i.e., without a pointer, as part of the object they're in.  This requires the size of a Property to be known by the verifier, so the actual Property implementation would need to be known at load time. While this might seem like a hopelessly early optimisation, it's worth thinking about now, because if properties do get implemented in the language, and they are completely flexible (so that you can replace a Property object at runtime), then we'll no longer be left with the possibility of this optimisation.  A halfway house would be to make it possible to prevent a Property object from being replaced, or for the VM to be smart enough to tell which Properties aren't going to be replaced.  Obtaining a Property object is tricky
If we want setf to work with properties defined by existing code, we should be able to recognise the getX/setX convention, and make those into Properties.  Let's look at how we can create a legacy Property using current (Java 5/6) code:
  Property<String> nameProperty=new LegacyProperty<String>(new GetterAndSetter<String>()
  {
    public String get()
    {
      return object.getName();
    }
  
    public void set(String s)
    {
      object.setName(s);
    }
  });
This gets a bit shorter with the BGGA closures syntax, or even method references, but it doesn't get any sweeter.  It's still pointless duplication. Another implementation would use reflection.  Property<String> nameProperty=new ReflectiveProperty<String>("name",object);  Obviously there are the usual problems with this, such as type safety not being guaranteed at compile time, performance, that it requires tool support if the programmer is to be certain that it is refactor-proof.  There is an extra problem caused by erasure of generic types; there's no way of knowing that nameProperty really is a Property<String>.  setName could take an instance of some 'Name' class.  This is not simply a case of choosing dynamic typing over static typing, because erasure doesn't give us a choice.  The reflective solution is not typesafe at all unless we either implement reification or give ReflectiveProperty a 'type token', in this case String.class.
  Property<String> nameProperty=new
  ReflectiveProperty<String>(object,String.class,"name");
It doesn't work with legacy bean-manipulating code Suppose I write a new class and don't write getters or setters, but instead expose my fields via Properties.
  class Person
  {
    public final Property<String> name=new DirectProperty<String>("unknown");
  }
Now any code that reflects on Person looking for getX/setX methods won't find any.  It's arguable that the code should use Introspector to introspect, rather than direct manipulation, and hence that I could provide a BeanInfo class for Person, but not all the code that manipulates beans uses Introspector. Erasure could make a List of Properties useless. Suppose you asked a bean for all its properties, either directly or via some introspector.  You'd get a List<Property<What?>>.  It cannot be Object.  It can be ?, though this would prevent set from being called.  It can be a raw type.  In any case, erasure will stop us from seeing the actual type of the property, unless we add a type token, as mentioned earlier. What needs to change? Now let's take the above and make it convenient to use by changing the language a little. 1.  All getX/setX pairs are properties.  This includes isX, read-only properties and write-only properties.  This allows new code to work with existing beans. 2.  All explicit property declarations generate getX/setX or isX/setX pairs at compile time.  This allows new beans to work with existing code. 3.  The generated code simply calls the Property's get/set methods.  There is no generated field in the declaring class, other than for the Property itself. 4.  A syntax is provided for getting at a named Property given the name of a bean.  This is statically checked for correctness. 5.  A syntax is provided for getting the value or setting the value of a property.  The '.' operator will suffice.  A field and a property cannot exist with the same name, which avoids compatibility issues with existing code. The easiest argument against this is also the easiest to refute, namely that it calls non-obvious code.  The same argument could be used to reject polymorphism.  Plus, there are already precedents in Java.  arrayElement[index]=value is an assignment that does more than it appears to - it checks bounds.  String concatenation calls .toString() on objects.  + promotes low primitive types to int.  These are all good things; there's nothing fundamentally wrong with calling non-obvious code, as long as it is possible to discover what code is actually called. 

The strongest argument in favour is readability - there should be no readability price for using properties.  Currently there is a price.

Blog archive

About Me

A salsa dancing, DJing programmer from Manchester, England.