General musings on programming languages, and Java.

Saturday, December 02, 2006

Making equals(Object) type-safe

Update - Jean-Francoise Briere came up with a better solution - I've included it at the bottom. It's easy to end up comparing two obviously incomparable objects using the equals(Object) method. This is because equals is not generic. By incomparable, I mean that it is possible to tell from the source that they are incomparable, such as new StringBuffer().equals(""). The contract of equals does not permit throwing an exception when incomparable objects are compared, because comparing two incomparable objects is not a problem; comparing two incomparable references is. That is, it makes sense if you have an List<Object> and add an Integer to it, then look in it for a String, for the equals method to be called, so it should behave normally (of course, it's quite likely that the hashCode method will be called instead). So we can't really ask the objects themselves to help us out, unless we create lots of overloaded equals methods. E.g., an Integer would need equalTo(Integer), equalTo(Number) and equalTo(Object). Clearly a pain in the proverbial. We can instead ask the static type system to help us out.

First Bad Solution; Type-Specific Statics

Wrap equals calls up, say, in static methods.

public class Equaliser
{
   public static boolean equals(String first,String second)
   {
       return first.equals(second);
   }
}
There are two problems with this: 1. Repetition - you'd have to do this for every type. 2. It doesn't work if you put a supertype in, e.g., equals(Object,Object) would match any calls and you wouldn't notice.

Second Bad Solution - Naive Application of Generics

public static <T> boolean equals(T first,T second)
{
   return first.equals(second);
}
This will actually allow a comparison between an Integer and a String, because T is resolved to Object, unless you use the clunky qualifying syntax - ClassName.<Integer>equals(1,"blah"), which is so bad it's worth avoiding in most cases. Your favourite IDE will confirm that the unqualified version resolves T to Object when you hover over the call.

First Not-So-Bad Solution - Less Naive Application of Generics


interface Equalator<T>
{
    boolean isEqualTo(T other);
}

public static <T> Equalator<T> equalator(final T first)
{
   return new Equalator<T>()
   {
       public boolean isEqualTo(T second)
       {
           return first.equals(second);
       }
   };
}
This is called as: equalator(someReference).isEqualTo(someOtherReference), and will catch more bad comparisons. Of course, if your references (not your objects) are actually of type Object, then this won't be useful at all.

And Finally, The Same Thing But More Reusable

In my own code, I implement this as a (badly named) method, equalT:

    public static <T> Function<T,Boolean> equalT(final T first)
    {
        return new Function<T,Boolean>()
        {
            public Boolean run(final T second)
            {
                return first==second || first.equals(second);
            }
        };
    }
Now it's called as: equalT(one).run(two). I should probably get rid of the first==second part of that. Function is a type I defined in the publically-available functionalpeas package - it represents a function with one argument and one return value. Now I can pass equalT(someString) to a method that expects a Function<String,Boolean>, which can be handy, and explains the 'reusable' part of this section's subtitle. One really cool thing about it is that if you replace all your calls to equals AND all your code to == with this, including primitive==primitive, then you won't get any of those pesky problems caused by comparing references instead of objects. Have fun. Next time I'll look at eliminating NullPointerExceptions. No, really. Update Jean-Francoise Briere's solution is based on my 'naive implementation using generics', but is less naive:

public static <T,U extends T> boolean equalT(T t,U u)
{
    return t.equals(u);
}
This is better than my final solution, because it doesn't rely on creating a new object for each comparison, and there are less keypresses. Thanks, Jean-Francoise!

35 comments:

Curt said...

Here's my proposed solution: blind casting.

@Override public boolean equals(Object o) {
Foo foo = (Foo) o;
...
}

Rationale: the object wasn't designed to be compared against any other type of object ever. Doing so is unsupported and represents a bug. This is often appropriate for classes that are confined to a single containing class or package.

The extent to which this constitutes a violation of the equals contract is also debatable. Blindly casting like this has helped me catch a few very elusive bugs.

Ricky Clarkson said...

The problem with the blind casting approach is that it breaks the contract defined by equals(Object), which doesn't define any exceptions. So you couldn't use your objects in a general Set<Object>, or rather, you could, but sometimes hashCode collisions may occur, and your equals method might actually be invoked.

That is, it's not easy to know at compile time, or even in automated testing, whether your object's equals method will ever be called.

Making hashCode also throw an exception might help in that, but you should be wary of overriding the contracts that you inherit.

I'd rather have the compiler help me out here.

Slava Pestov said...

Jumping through hoops like this is considered acceptable in the Java community?

Ricky Clarkson said...

How does your favourite compiler or runtime help you in this case (accidentally comparing two objects that will never be equal), Slava?

Note that I only propose an idea - unless the Java community consists of just myself, then I don't see that it's been accepted by the Java community.

Jumping through hoops, maybe, in setting up equalT, etc., but the actual invocation isn't so bad - equalT(one).run(two).

Curt said...

The problem you cite is not a problem. Most classes should be private to a class or package (minimize visibility). There is seldom a need for them to work in a heterogeneous collection. If they are put in one, it is likely to be a bug.

If the class is public, then this all changes.

Ricky Clarkson said...

Curt, most of my instantiable classes are anonymous anyway - ideally all, but I have actual features to implement and bugs to fix. ;)

This is not related to collections.

The contract of equals is a constant, whether you use collections or not.

By a heterogenous collection, I assume you mean that you would grab an object, cast it to the type you think it is, then work on it; I can see cases for this (although I prefer a visitor-based approach instead of actual casts, emulating dynamic dispatch).

Slava Pestov said...

Factor is dynamically typed, so it doesn't help there, ricky. But I'd rather fix a trivial runtime type error once in a while than write contorted code to appease some corporate programmer's sick idea of a static type system. Haskell wouldn't be so bad, though. Type-classes can achieve what you did with equality testing quite easily. Which reminds me, you really want Java to be Haskell, why not just bite the bullet and learn it? I bet you could rewrite Netsim in Haskell, with no loss of functionality, a 4x decrease in amount of code and less total bugs.

Ricky Clarkson said...

"But I'd rather fix a trivial runtime type error once in a while than write contorted code to appease some corporate programmer's sick idea of a static type system."

I'd rather learn from my mistakes and prevent them next time. If my solution was prohibitively expensive in some way then I'd just think of another way, at some point.

"Haskell wouldn't be so bad, though. Type-classes can achieve what you did with equality testing quite easily."

Yes, I've been looking at the Eq class recently, but haven't quite got my head around it yet. Well, I thought I had, but the compiler disagreed with me.

"Which reminds me, you really want Java to be Haskell, why not just bite the bullet and learn it?"

I'm more interested in learning from Haskell than learning Haskell. It's certainly an interesting language, but I'm not ready to start a project in it just yet.

"I bet you could rewrite Netsim in Haskell, with no loss of functionality, a 4x decrease in amount of code and less total bugs."

It got renamed a couple of years ago to IPSim. ;)

As it stands, I only know how to do calculations and list operations properly in Haskell, nothing with side effects, like making a window appear on a screen, etc., so IPSim in Haskell would be hard.

I think implementing the IP stack in Haskell would be fun though.

Anonymous said...

I still have no idea what problem this is solving? The #equals contract seems to work fine. An integer will not equal a string, what's wrong with that?

Ricky Clarkson said...

Consider this code:

String s="h";
Integer t=5;
boolean b=s==t; //this line fails to compile
boolean c=s.equals(t); //this doesn't, it should.
boolean d=equalT(s).run(t); //this line fails to compile, problem solved.

That is, when we use equals, we don't get as strong static type checking as when we use ==. This blog post addresses this.

Anonymous said...

Of course this compile time checking for == only works until those objects have been cast to Object. Can you tell me the use case you are trying to solve with compile time checking for equals operations?

Ricky Clarkson said...

Use case? Sure.

In an implementation of an IP stack, I occasionally got the order wrong in constructing an IP packet - new IPPacket(sourceIP,destIP,etc.).

So I made two wrapper types - SourceIPAddress and DestIPAddress, which both compose an IPAddress. I made IPPacket need those, went through and fixed all the compile errors, but things didn't work properly, because I had some stray IPAddress.equals(SourceIPAddress) calls, etc., which will always return false.

Even though the problem doesn't happen very often, and tests will typically catch it, I'm interested in catching my mistakes as early as possible, so I had a look at what generics can do for me.

If I did that refactor now, I'd use the builder pattern, emulating named parameters - packetBuilder.withSource(IPAddress).withDest(IPAddress).etc.

Anonymous said...

I suppose I would have just changed my constructor signature to take a Source and Destination wrapper; why wouldn't that provide the compile-time checking you were looking for?

If you couldn't for some reason, why not do this?

assert src instanceof SourcePacket;
assert dest instanceof DestinationPacket;

Then you need not worry if arguments have been downcasted.

The thing you have to ask yourself is how badly you need compile time versus runtime checking in this case. Normally runtime checking will be sufficient and much more flexible (i.e. require less jumping through hoops).

Ricky Clarkson said...

"I suppose I would have just changed my constructor signature to take a Source and Destination wrapper"

That's what I did. I also stored them as that, and had IPPacket return SourceIPAddress for its getSourceIPAddress(), etc., so the change didn't stay in that one place.

"The thing you have to ask yourself is how badly you need compile time versus runtime checking in this case."

There is no possible runtime checking that can tell you whether you are making a bad comparison between references, regarding the equals method. So the question is more like "how badly do you need this checking?".

That's much more subjective. I've probably wasted two hours of my life debugging code that would have failed to compile if I'd used equalT. It took me less than 5 minutes to come up with equalT. I think that's a nice trade-off, to avoid having to waste another two similar hours later.

oxbow said...

Ricky: I don't mean to be rude, but if you wrote code like that in an interview, you would be a NO HIRE. This represents your own personal idea of what Java should be doing. It's unreadable and difficult to maintain. This is not a problem in the REAL WORLD. I've never ever come across this "problem" in nearly 10 years of programming and it's certainly not worth ugly code (with unecessary boxing).

I note from your profile that you are a relatively inexperienced developer. In a few years' time, you will cringe at the mere thought of this blog entry.

oxbow said...

It doesn't work anyway!

LinkedList<Integer> l1 = new LinkedList<Integer>();
ArrayList<Integer> l2 = new ArrayList<Integer>();
equalT(l1).run(l2);//Oh dear, this doesn't compile

I need:

StupidEquals.<<List<Integer>>equalT(l1).run(l2);

This is great!

Ricky Clarkson said...

"I don't mean to be rude, but if you wrote code like that in an interview, you would be a NO HIRE"

If I had an interview at the kind of place that shunned *having an idea* then I wouldn't want the job. Moron.

Change your reference types to be List<Integer> and it compiles. In fact, you only need to change the first one. There's no way for equalT to know that it makes sense to compare a LinkedList reference to an ArrayList reference. The static type system simply isn't that powerful.

oxbow said...

I'm not shunning people that have ideas. If you can demonstrate you have good ideas then that's great and is exactly what we are looking for. This is not an "idea"; it is bad practice.

As for the linked- and array-list examples. Go and read the equals contract for the List interface. Two lists are considered equal if they have the same elements in the same order, regardless of implementation. Using your mechanism, I have to generate unreadable code in order to compile a simple equals check.

I'm not the first person here to tell you that this is not a problem developing in the real world. Maybe you should take the hint.

Ricky Clarkson said...

How is it bad practice?

It's documented bad practice to use the implementation type when you can use the interface type. Hence you can avoid the ugly syntax by avoiding the bad practice of having LinkedList references.

List<Integer> one=new LinkedList<Integer>();

List<Integer> two=new ArrayList<Integer>();

equalT(one).run(two) works fine now. Nothing unreadable about that. You only get unreadable code from this if you're trying to.

I said the static type system wasn't powerful enough to detect such comparable reference types - the compiler does not read the contract for LinkedList's equals method.

I'm not interested in your 'real world' garbage - equals(Object) is not as type-safe as it could be, this post only looks at how it can be made type-safe. I'm not advocating it as a general practice. Food for thought. Take the hint. Think.

wlievens said...

I can't understand what this is addressing. In my opinion it's perfectly possible for two objects to be conceptually equal, if they are of a different class. They must be of the same conceptual type however, but since that is a design issue, you can't formalize it (in java?).

So you don't want/need static type checking on equals().

oxbow said...

OK. I need to type l1 as a LinkedList because I want to call addFirst() on it, a method which doesn't exist on the List interface. Now if l2 is typed as List:

equalT(l2).run(l1)

compiles, but

equalT(l1).run(l2)

does not. You've just broken the symmetry of the equals operator.

Consider yourself. You're a young research assistant who has never actually had to work on a wide variety of projects, written and maintained by a number of various programmers. Just imagine what that would be like if everyone's pet "look ma, I can fly" idea has been crammed in to those projects. Don't get me wrong, I've done this kind of daft thing myself. I have learned the hard way that it is wrong.

Maybe, just maybe, the fact that I've 10 years of experience working in Java programming gives me some kind of right to point out to you that you are wrong. As I said, if you took that idea into most interviews you'd be laughed out of the room. And I want to meet the company who's impressed by it - because I never want to work there.

Ricky Clarkson said...

"In my opinion it's perfectly possible for two objects to be conceptually equal, if they are of a different class."

It is possible, but not usual. A Car and a Bike will never be equal. Things like ArrayList and LinkedList are the exception, rather than the rule.

"You're a young research assistant who has never actually had to work on a wide variety of projects, written and maintained by a number of various programmers."

1. I'm mainly a programmer. Research Assistant is only the job title. Most people in the same role as me are studying for a PhD. I'm not. I'm interested in programming, not in writing papers.

2. I've worked on a wide variety of projects. Not in a huge capacity, that's yet to come in my career. I'm using the time I have now to work out how I want to program when I get back into industry.

"the fact that I've 10 years of experience working in Java programming"

This is laughable. Really. I've got 20 years experience of programming (most of it as a hobbyist). Therefore I'm better than you. Oh, and my brother is bigger than yours.

Use logic instead of ad hominem arguments, please.

I'm more interested in how well you learned than how long you've been doing it for. You can't use "I've been doing it longer" to automatically win arguments. Sorry.

oxbow said...

Fair enough. I also don't agree with the "I've been doing it for N years" argument in general as you've no idea of the competence of the person saying it. I'm a technical architect at a technology-driven hedge fund in London (to give you some background) so I'd hope you think I might at least be semi-competent.

The 20 years programming isn't relevant. What is relevant is having worked on systems which were there when you arrived, are developed by a number of people, and will be there when you're gone.

When I evaluate a candidate, I want to know that they can distinguish between good ideas and bad ones. Coding your own replacement for the equals method is not a good idea. It's not standard, no-one will be able to follow the resultant code and the problem is an imaginary one.
(Your solution also has the drawback of creating a new Object on every invocation.)

If you want to know when an object of one class might be equal to an Object of another; work with libraries that take advantage of code-generation (eg. Hibernate).

Oh, your brother probably is bigger than mine; he's only 5ft 8in. But then so am I.

Ricky Clarkson said...

"I'd hope you think I might at least be semi-competent."

I'm not forming an opinion about anyone. I'm talking about syntax.

"The 20 years programming isn't relevant."

Sure it is. I know the knock-on effects of my solution.

"What is relevant is having worked on systems which were there when you arrived, are developed by a number of people, and will be there when you're gone."

Been there, done that. Admittedly not enough yet, that's yet to come in my career.

"When I evaluate a candidate, I want to know that they can distinguish between good ideas and bad ones. Coding your own replacement for the equals method is not a good idea. It's not standard, no-one will be able to follow the resultant code and the problem is an imaginary one.
(Your solution also has the drawback of creating a new Object on every invocation.)"

Not being standard isn't a problem in my project. I am the only developer. Anyone competent will be able to follow the resultant code.

The problem is not imaginary. I encountered the problem. The solution might not be tasteful, but that's the nature of taste. It varies.

The allocation of an extra object might not be a problem, as it's normally a very short lived object. Indeed I haven't yet noticed a problem with it. Part of the reason I've implemented this in my project is to find out whether it's practical. Because there are no other developers, I experiment. If I make a mess, I just work extra hours to sort it out. Doesn't happen often. It's nice to have this freedom; and it doesn't mean I'd do the same in a large project. I also don't do it near a deadline.

"If you want to know when an object of one class might be equal to an Object of another; work with libraries that take advantage of code-generation (eg. Hibernate)."

No. I think this flags up an issue with Java. Even if equals was rewritten now to take advantage of generics, the problem would not be soluble.

E.g., Integer can't implement Eq<Integer> and Eq<Number>, generics doesn't allow this. Er, this point isn't too well thought through, but I'm about to head out..

"Oh, your brother probably is bigger than mine; he's only 5ft 8in. But then so am I."

We're both 6 feet tall. Grr..

oxbow said...

"Anyone competent will be able to follow the resultant code"

if (MyEquals.<Iterator<? extends Map.Entry<? extends K, ? extends V>>>equalT(i1).run(i2)) {
System.out.println("Clear as day");
}

We can leave it there. I know this is irritating to hear but, believe me, in a few years time you will simply not be doing this and think back on it as a mistake. Really.

The reason I've commented is that you've blogged about it. For your own sake, if you're going for a job, don't work this one into the conversation.

Bradley said...

Is it not a good enough solution to check the type of the parameter to equals()?, as in

class Foo {
String field1;
boolean equals(Object object) {
return (object instanceof Foo) && ((Foo)object).getField().equals(this.field) ;
}

Aside perhaps from the details, is there a problem with such an approach? Yes it's possible that a String variable "5" could logically equal the same thing as an Integer with value 5, but is it often necessary to make such a comparison?

hlovatt said...

You can have a type safe equals in a class by just having a different name for equals. In particular you could use Comparable which already gives you a different name, but if ordering wasn't possible then mimic the way Comparable works, e.g.:

interface Equatable< T extends Equatable< T > > { boolean equate( T other ); }
...
class Int implements Equatable< Int > {
public final boolean equate( Int other ) { ... }
public final boolean equals( Object other ) {
return other instanceof Int ? equate( (Int)other ) : false;
}
public final int hashCode() { ... }
...
}

Ricky Clarkson said...

> if (MyEquals.<Iterator<? extends Map.Entry<? extends K, ? extends V>>>equalT(i1).run(i2)) {
> System.out.println("Clear as day");
> }

The blog post mentions that this qualified syntax is ugly, this is nothing new. Here are two ways of removing it for the case presented:

1. Implicit upcast (introduce an explaining variable).

Iterator<? extends Map.Entry<? extends K, ? extends V>> iter=i1; //implicit upcast

if (equalT(iter).run(i2))
whatever.

2. Explicit cast.

if (equalT((Iterator<? extends Map.Entry<? extends K, ? extends V>>)i1).run(i2))

I'd go with the first one, or just use equals as usual.

"We can leave it there. I know this is irritating to hear but, believe me, in a few years time you will simply not be doing this and think back on it as a mistake. Really."

Spotting a problem and looking at potential solutions is not a mistake. Elsewhere (javalobby) a better approach using generics has been pointed out (though it still is awkward under the case of LinkedList.equals(ArrayList)), and it's also been pointed out that IDEA has a code inspection that can detect these bad calls.

"The reason I've commented is that you've blogged about it. For your own sake, if you're going for a job, don't work this one into the conversation."

Don't worry about me and job interviews. Really. Not only do I know how the above works, but I know its drawbacks too. Even if the interviewer had read this blog, my only criticism of myself would be that I responded to the trolls.

Ricky Clarkson said...

"They must be of the same conceptual type however, but since that is a design issue, you can't formalize it (in java?)."

Programming is not just about communicating with the machine, but about communicating with others (and even oneself).

If Java can't represent a design, then there's too much space between Java and the design.

Bradley, I think you misread the post and the comments. I'm not discussing the implementation of equals. Your implementation is fine with me.

Howard said:

"You can have a type safe equals in a class by just having a different name for equals."

This is true. I've experimented in the past with that. With Java's current implementation of generics, one cannot make an Integer Equalable<Integer> and Equalable<Double>, for example - the type system doesn't allow implementing the same type twice with different type parameters. Perhaps if/when reification arrives this will be made legal.

However, one could simply make separate IntegerEqualable, DoubleEqualable, etc. I don't like this solution, because it's repetitive, and expensive in comparison to the expense of the problem in the first place, in my opinion.

hlovatt said...

@Rick,

As you correctly point out you cannot implement the same interface twice in Java with different generic types. This is because Java does single dispatch not multiple dispatch. I have written a Pattern Enforcing Compiler, PEC, that adds support for the multiple dispatch pattern amongst other patterns. Using this compiler you can elegantly solve the problem:

1. Make a numeric equate that has multiple dispatch semantics (i.e. it extends MultipleDispatch):

public interface NumEquate extends MultipleDispatch { boolean equate( NumEquate other ); }


2. Implement a convenience base class, i.e. you don't have to use this class but it is convenient to do so, that provides a default implementation and a hook to allow a single dispatch call to become multiple dispatch:

public abstract class Num implements NumEquate {
public final boolean equate( NumEquate other ) { throw new AssertionError(); } // Body of method replaced by compiler with a multiple dispatch call - hook for normal method call!
public final static boolean equate( NumEquate n1, NumEquate n2 ) { return false; } // Base case
public final boolean equals( Object other ) { return this == other || other instanceof Num ? equate( (Num)other ) : false; } // equals for old code
public final int hashCode() { ... } // Consistent with equals
}

3. Implement the classes you want and provide equate methods as static methods that define both argument types (receiver and declared argument):

public class Int extends Num {
public static final boolean equate( Int i1, Int i2 ) { ... }
...
}
public class Doub extends Num {
public static final boolean equate( Doub i1, Doub i2 ) { ... }
public static final boolean equate( Int i1, Doub i2 ) { ... }
public static final boolean equate( Doub i1, Int i2 ) { ... }
...
}

NB The static methods that implement equate can be in any class, literally any loaded class. Provided that they don't need access to private data; if they need private data then they need to be in the class containing that data.

Ricky Clarkson said...

"As you correctly point out you cannot implement the same interface twice in Java with different generic types. This is because Java does single dispatch not multiple dispatch."

As far as I know, it's simply because of erasure.

PEC looks interesting, odd how it doesn't use annotations and apt, but instead chooses marker interfaces.

What did you use to give you an AST?

hlovatt said...

@Ricky,

"PEC looks interesting"

Thanks

"odd how it doesn't use annotations and apt, but instead chooses marker interfaces."

It predates annotations and the apt. I am working on a future version that uses Jackpot to manipulate Java source. However interfaces have a number of advantages over annotations and will therefore be retained in some places. In particular interfaces:

1. Are types (vital for some patterns, e.g. Immutable)

2. Are always inherited (an annotated interface does not annotate a class that implements the interface)

3. Can be extended (this allows you to build one pattern by inheritance from another, e.g. Singleton inherits from StaticFactory)

"What did you use to give you an AST?"

I currently modify class files, not source, and use Javassist to modify them. Javassist gives a reflection style access to the program, like Class etc., but with added methods that allow modification.

Slava Pestov said...

"chris", I also disagree with the usefulness of such an implementation of equality in Java, but it doesn't mean there is anything wrong with ricky trying out new ideas. I understand in the corporate world, new ideas in programming languages are considered dangerous (look how long it took GC and OOP to catch on, and how much resistance there is to functional programming now) however not all of us aspire to mediocrity. I understand you don't have ambitions or goals beyond maintaining some pile of corporate IT spaghetti web app, however in the "real world" good programmers don't do this type of sweatshop work and instead try to solve interesting problems.

Ricky Clarkson said...

"however in the "real world" good programmers don't do this type of sweatshop work and instead try to solve interesting problems."

And sometimes mediocre programmers like me do it too.

Anonymous said...

Hello.

A quick link regarding your blog entry.

Safe Type Objects at coder's log

We also have articles on other topics pertaining to building large scale java applications as well as common configurations.

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.