#376 Contract for equals and compare

brian Tue 21 Oct 2008

I haven't really formalized the contract for the equals and compare methods yet because I thought that eventually I might do something special with them. But now that I'm reworking comparison for value-types, I think we need to nail the specifications for these methods down.

At issue is how equals and compare should handle a null argument and argument of different type.

For example a common Java-styled way to implement equals is like this:

class Point
{
  Bool equals(Obj that)
  {
    if (pt isnot Point) return false
    pt := (Point)that
    return pt.x == this.x && pt.x == this.y
  }
}

This pattern correctly handles (and allows) both null and objects of different types. However it handles subclasses of Point incorrectly:

class Point3D : Point
{
  Bool equals(Obj that)
  {
    pt := that as Point3D
    if (pt == null) return false
    return pt.x == this.x && pt.x == this.y && pt.z == this.z
  }
}

In this case, passing an Point3D to Point.equals might return true, but not vise versa. That seems incorrect to me. In fact the Java specification for Object.equals requires equals to be symmetric and transitive, but looking thru many classes they all seem to incorrectly handle subclasses (since they just check instanceof).

So I think the specification for Fan should be that equals returns true only if the types are exactly the same. In which case the pattern should be:

Bool equals(Obj that)
{
  if (type != that?.type) return false
  pt := (Point)that
  return pt.x == this.x && pt.x == this.y
}

My proposed rules:

  1. the equal and compare methods must accept null
  2. if you use equals and compare on a null object you will get a NullErr (normal method call semantics apply unless we define stronger nullable rules)
  3. you can pass any object type to equals - but false must be returned if the types are not exactly the same
  4. you can pass any object type to compare, but if types are not an exact match you must return Obj.super.compare
  5. if you use one of the comparison operators == != < etc then the lhs can be null (plus can be optimized for non-nullable and value types)
  6. if you use one the comparison operators the compiler will do type checking (if it knows the two types are not comparable)

JohnDG Tue 21 Oct 2008

Bool equals(Obj that)

This is more or less the pattern in Java, but leads to large amounts of boilerplate. I would prefer the compiler insert the boilerplate, and require the user supply a signature of the following form:

Bool equals(MyClass that)

Then you can eliminate the need for a "contract" that will surely be violated.

Better still would be requiring a compare signature:

Bool compare(MyClass that)

Then you automatically implement equals as compare(that) == 0. No more contract violations between equals & compare (compareTo in Java).

brian Wed 22 Oct 2008

Then you automatically implement equals as compare(that) == 0.

I don't think you can really do that because many things have the notion of equality, but not comparison. I guess you should just always return -1 as not equal - but seems a bit weird.

prefer the compiler insert the boilerplate

I agree - it is two lines of boiler plate, and my original thought was do something like that. But because Fan doesn't allow overloading a method by parameter types, this would have to be some special case. Is this worth creating a special case in the type system?

alexlamsl Wed 22 Oct 2008

I guess you should just always return -1 as not equal - but seems a bit weird.

Hmmm.... sort()?

Bool compare(MyClass that)

How about Bool compare(This that)? Given that we have unique slot names and automatic casting, this one should be pretty straight forward...

jodastephen Wed 22 Oct 2008

The signature for equals should somehow guarantee that only comparisons to the correct type will compile. This eliminates a whole category of bugs (as a new language, we should target anything that old languages have in their bug inspection tools like FindBugs).

Given the boilerplate nature of much equals code, it makes sense to target it if possible. There has been a lot of literature written about this IIRC.

One concept is to have a default implementation that is auto-generated. How about causing that to be generated from a keyword attached to the fields?

class ColouredPoint {
  equals Int x = 0;
  equals Int y = 0;
  Colour? col;
}

where we don't care about the colour for equals() purposes. (ie. this is a declarative approach to equals)

The opposite way to write this would be to declare the fields that you don't want in the equals, but that interferes with the default for equals() which should be identity comparison.

Users should still be able to write their own equals() method, but if they do then they get no auto-generation assistance.

Comparisons could be done similarly, but you'd have to specify an order:

class ColouredPoint {
  compare(1) Int x = 0;
  compare(2) Int y = 0;
  Colour? col;
}

This is a lot more messy, and thus it might be more appropriate to force the user to write compare methods. This fits with comparison being optional.

katox Wed 22 Oct 2008

I agree that something like Stephen's suggestion covers 95% of equals/compare methods. I like this declarative approach since it is very refactoring friendly (unlike Java contracts) and it keeps equals and compare consistent (this is very hard to achieve in a boilerplate code that noone actually reads).

I'm a bit unusure about the equals and subclasses. What if we had a list of some objects and we wanted to check if some other object is already in there (by equals) - if the object was actually a subclassed instance the response would be always false. This sounds weird.

Having == asymetrical is also counter-intiutive. Point x is equal to Point3D y because the 2D part is the same - but that holds only for x.equals(y). To achieve a symetric relation we would have to use equals from the most specialized common parent class. That is certainly doable for == but again quite counter-intuitive for y.equals(x) becase ((Point)y).equals(x) would be called.

helium Wed 22 Oct 2008

You'd need multimethods to get equals right:

class Point2D { ... }
class Point3D : Point2D { ... }

generic Bool equals(Point2D a, Point2D b);

Bool equals(Point2D a, Point2D b)
{
   return a.x == b.x && a.y == b.y
}

Bool equals(Point3D a, Point2D b)
{
   return a.z == 0 && a.x == b.x && a.y == b.y
}

Bool equals(Point2D a, Point3D b)
{
   return equals(b, a)
}

Damn, there should be a shortcut syntax for functions that return a single value. Something like

Return name(Arg arg) = expr;

meaning

Return name(Arg arg) 
{
   return expr
}

So the example becomes

generic Bool equals(Point2D a, Point2D b);

Bool equals(Point2D a, Point2D b) = a.x == b.x && a.y == b.y;

Bool equals(Point3D a, Point2D b) = a.z == 0 && a.x == b.x && a.y == b.y;

Bool equals(Point2D a, Point3D b) = equals(b, a)

But I don't think that would make it into Fan, right?

JohnDG Wed 22 Oct 2008

I don't think you can really do that because many things have the notion of equality, but not comparison. I guess you should just always return -1 as not equal - but seems a bit weird.

It's weird but it's also DRY. It has the side-effect of forcing an ordering on objects that don't really have a good notion of order.

I'm not sure if you can return -1 for inequality because then two different objects are less than each other -- which might trip up some sorting algorithms (though you really shouldn't be sorting unsortable objects to begin with).

In Java code, I return hashCode() - that.hashCode().

I agree - it is two lines of boiler plate, and my original thought was do something like that. But because Fan doesn't allow overloading a method by parameter types, this would have to be some special case. Is this worth creating a special case in the type system?

I don't think so. Not just for equals. However, you can at least automatically insert the checks for type and null, so that inside the equals method, the user may cast to the target type. This makes equals a special case, unlike the other operator methods, but if you don't treat it specially in some way, then you're going to have contract violations and lots of boilerplate.

As for the other discussion on this thread, I don't really like the idea of adding new keywords for equality, since the majority of classes don't need or care about equality. I do, however, like the idea of a compiler-supplied default for equals, which calls the equals methods of all non-null fields (the non-null fields are generally the ones that are important for equality testing). This nicely handles all cases where data is stored in normal form.

The symmetry requirement is hard to fulfill because if you fulfill it in the trivial way (classes must be exactly the same), it's not very useful in open type systems (very many frameworks, such as Harmony, are incompatible with the symmetry requirement). If you don't fulfill it, then code becomes fragile and bugs creep in.

Consider the following case:

A -> B -> C
A -> D -> F

In my view, C is "equal" to F iff the following predicate holds: C.compare(F) == 0 || F.compare(C) == 0. This choice of predicate guarantees symmetry, even in open type systems where subclasses have no knowledge of the classes that derive them.

More generally, I think I would be in favor of making most of (all?) the operators magical. If you use an operator, you get a symmetrical version; e.g.: a < b is translated to a.compare(b) < 0 || b.compare(a) > 0. Then we get the benefit of guaranteed symmetry even in the presence of open types.

This lets you say, for example, that 5 < 5.1, point2d == point3d (if Point3D decides, for example, that z = 0 corresponds to a Point2D point), and lots of other nice relations that are currently too dangerous to express in Java/C#.

brian Wed 22 Oct 2008

You'd need multimethods to get equals right

Good idea Helium, but I don't think multimethods are in Fan's future :)

Damn, there should be a shortcut syntax for functions that return a single value. Something like

Technically to be wholly consistent with blocks, we should allow methods to be declared with a single statement without curly braces. Although I suspect that might cause problems with IDEs and AST repair algorithms. I definitely love that style in functional languages, but not sure it is for Fan.

It's weird but it's also DRY. It has the side-effect of forcing an ordering on objects that don't really have a good notion of order.

Well I do like the notion of just having a compare instead of an equals method. It would indeed be DRY. The key is deciding how to enforce ordering when it doesn't make sense. Using hash codes actually seems like a pretty interesting idea.

What do other people think of making equals route to compare? Maybe even make equals final?

And the fundamental issue - yes we like the idea of having the compiler generate boiler plate code for the type check. The idea of having the special keywords for fields seems neat, but seems little too much of a special case. However, having the compiler ensure the correct type is something I'd like to figure out, just not sure how.

In general compare and equals are the prototypical contra-variant case where you'd like to use This. However a real contra-variant solution would be to insert a cast and have the method throw CastErr if called with the wrong type. Maybe that is what compare should do, but that isn't what equals should do - it should gracefully handle that and return false. But I suspect the right solution probably involves using the This type and some sort of special contra-variance.

helium Wed 22 Oct 2008

What about some very limited form of multi methods:

class Base {}
class Derived : Base {}

class Foo {
   void method(Base x is Derived)
   {
      echo("Called with Derived")
   }
   void method(Base x)
   {
      echo("Fallback case")
   }
}

The compiler simply merges all these multi methods into one method with simple type tests.

class Foo {
   void method(Base x)
   {
      if (x is Derived)
         echo("Called with Derived")
      else
         echo("Fallback case")
   }
}

This still works with subclassing

class FurtherDerived : Derived {}

class Bar : Foo {
   void method(Base x is FurtherDerived)
   {
      echo("Called with FurtherDerived")
   }
}

The compiler would translate that to

class Bar : Foo {
   void method(Base x)
   {
      if (x is FurtherDerived)
         echo("Called with FurtherDerived")
      else
         base.method(x)
   }
}

Using this you can write equals like this:

Bool equals(Obj that is This)
{
   return this.whatever = that.whatever
}

If the other object has not the same type, base.equals will be called.

But that still wouldn't handle the Point2D == Point3D problem right. Here the pseudo code that shows why it wouldn't work:

class Point2D { 
   Bool equals(Obj that is Point2D)
   { return x == that.x && y == that.y }
}
class Point3D : Point2D { 
   Bool equals(Obj that is Point3D)
   { return x == that.x && y == that.y && z == that.z }
   Bool equals(Obj that is Point2D)
   { return x == that.x && y == that.y && z == 0 }   
}

p2D := Point2D()
p3D := Point3D()

p3D.equals(p2d)  // works
p2D.equals(p3d)  // doesn't work

You'd need to extend my Multimethods with something like isExactly, so Point2D's equals doesn't deliver flasely true.

Than you could use JohnDG's

p2D.equals(p3d) || p3D.equals(p2d)

jodastephen Wed 22 Oct 2008

Maybe that is what compare should do, but that isn't what equals should do - it should gracefully handle that and return false

I disagree. Under what circumstances is it useful for an equals or compare check to an incompatible type to even compile? (Note, I agree that equals should not throw a runtime error if an incompatible type is passed in - I just don't think that it should ever get as far as compiling.)

The points about equals() delegating to compare() may be worth considering. Unless we do this, there will be a discrepancy in the operators where == and != will clash with <, >, <= and >=. This would be pretty hard to explain.

However, delegating to compare doesn't make sense for the vast majority of classes, as they simply don't have an order.

How about a Mixin for comparability. This would specify the equals() and compare() methods, where equals() is final and delegates to the abstract compare(). Thus, you can implement equals() alone (< and > operators don't work) or compare() alone (and == delegates to compare). You can't implement both equals() and compare().

jodastephen Wed 22 Oct 2008

For subclasses, I think that equals and compare should compare at the level of the lowest common class. This way, your subclasses cannot mess up your equals/compare implementation. This probably requires the generation of code by the compiler to check the input type and call the superclass:

// written by the user
class Point2D {
  Bool equals(This other) {
    return x == other.x && y == other.y
  }
}
// generated by the compiler:
Bool equals(Obj that) {
  if (that isnot Point2D) {
    super.equals(that)
  }
  other := that as Point2D
  return x == other.x && y == other.y
}

Subclasses then naturally follow:

// written by the user
class Point3D {
  Bool equals(This other) {
    return x == other.x && y == other.y && z == other.z
  }
}
// generated by the compiler:
Bool equals(Obj that) {
  if (that isnot Point3D) {
    super.equals(that)
  }
  other := that as Point3D
  return x == other.x && y == other.y && z == other.z
}

Thus, comparing a Point2D to a Point3D either way around compares using the logic of Point2D.

I believe that this pattern with one parameter of This could be a general one, not specific to this use case.

tompalmer Wed 22 Oct 2008

Thus, comparing a Point2D to a Point3D either way around compares using the logic of Point2D.

But that seems so wrong. I like either the rule (enforced or not) that they should be of exactly the same type, or just live with asymmetry.

helium Thu 23 Oct 2008

Thus, comparing a Point2D to a Point3D either way around compares using the logic of Point2D.

But that seems so wrong.

It does because I made a HUGE mistake! Why nobody told me that I talk total bullshit. Point3D <: Point2D makes no sense. A 2D point is a special case of a 3D point not the other way around (a 2D point is a 3D point with z = 0). You can't substitude a 3D point everywhere a 2D point is expected.

If you do this correctly everything works just fine.

jodastephen Thu 23 Oct 2008

Just so the debate isn't confused, I'm proposing that given two objects A and B, the comparison is performed using the logic written in their lowest common class.

This really makes a lot of sense once you start thinking about how real subclass relationships should work (after all inheritance is one of the weaker OO ideas).

Consider how you might want to sort a list of Foo, where one entry is a SubFoo. You want the comparison to be fully consistent no matter what order the comparisons are done. This really implies that you are comparing based on the Foo-ness of an object, and that the SubFoo-ness doesn't matter.

helium Thu 23 Oct 2008

The Liskov substitution principle.

To make that clear: I now agree with jodastephen. It was just because of that wrong 2D and 3D point example. Now my limited multimethods work perfektly for equals even without that isExactly rubbish I suggested.

Wait ... I just notice this 2D/3D point example didn't came from me, so it's not completly my fault :)

brian Thu 23 Oct 2008

I disagree. Under what circumstances is it useful for an equals or compare check to an incompatible type to even compile?

It might not make sense in an expression like x == y - and in fact the compiler will type check that already. But there are lots of cases where equals is used more dynamically. For example maybe the most important case is dealing with lists of Objects. Calling something like list.findIndex is going to use equals and expect equals to handle mismatched types. Since equals defaults to === this actually works out pretty well.

There are many good ideas, so let me try to summarize (let me know if I missing any):

  • Heliums idea for limited multi-methods is really interesting; ideally we want to turn whatever we do for equals/compare into a general purpose feature; in the end if we do any contra-variance the compiler will indeed be generating something like that
  • Stephan's idea to walk up the class hierarchy until we find the common class actually makes a lot of sense to me. Tom disagrees with those semantics. So key question: given two disparate types are they automatically unequal or equal based on common type's rules?
  • I think all of the discussion comes back to allowing the This type to be used some sort of contra-variant/multi-method parameter.

So we really have two discussions going on:

  1. What really should be the logical semantics for equals and compare when given disparate types. Options are:
    1. Throw exception
    2. Automatically assume they are unequal (for compare this is harder)
    3. Route to common base class implementation
  2. What is the right syntax and language features to avoid generating boiler plate type comparisons

JohnDG Thu 23 Oct 2008

Stephan's idea to walk up the class hierarchy until we find the common class actually makes a lot of sense to me.

This is completely bogus. It's bogus because of the "open types" problem. A rectangle and a polygon are both a Shape (and nothing else), but Shape doesn't have enough information to determine if two shapes are the same. It's up to Polygon or Rectangle or some combination of them to determine if a rectangular polygon is equivalent (for some purpose) to a rectangle (for example, in managing a dirty rectangle list).

Similarly, ComputerAddress might be the common ancestor of URL and JXTAChannel, but ComputerAddress would never be able to determine if two fundamentally different types of communication represent the same physical computer.

In general, superclasses do not have enough information to determine a good notion of equality for subclasses. So going with equals for the common superclass will result in a useless definition of equality.

Now here's my proposal for equality and comparison. First, compare has a default definition:

Int compare(Obj that) {
    return this.hashCode - that.hashCode
}

(More ambitious would be defining equality in terms of non-null fields equality and using the hash code only for sorting in non-equal cases, but the above is more compatible with the existing notion of equality.)

Second, equals is final and is defined in terms of compare:

Bool equals(Obj that) {
    if (that == null) return false
    return compare(that) == 0
}

This ensures no divergence is possible, that no information is repeated and no contracts are violated. Nice and dry.

When a user wants to define their own notion of equality, they do so as follows:

override Int compare(Obj obj) {
    This that = obj;

    if (this.x == that.x && this.y == that.y) return true;

    return super.compare(obj)
}

Nice and simple, and most importantly, no boilerplate aside from the casting (which is necessary because of no contra-variance) and the return of the superclass (which handles sorting if the user doesn't care about order).

Finally, one more change should be made. For all relational operators -- that is, for all operators defined in terms of the compare method (<, >, <=, >=, !=, and ==), a symmetric definition should be used:

<    =>    a.compare(b) < 0  || b.compare(a) > 0
>    =>    a.compare(b) > 0  || b.compare(a) < 0
<=   =>    a.compare(b) <= 0 || b.compare(a) >= 0
>=   =>    a.compare(b) >= 0 || b.compare(a) <= 0
!=   =>    a.compare(b) != 0 && b.compare(a) != 0
==   =>    a.compare(b) == 0 || b.compare(a) == 0

These definitions are consistent, symmetrical, and solve the open type problem.

helium Thu 23 Oct 2008

I think you didn't mean return true but rather return 0.

brian Thu 23 Oct 2008

John,

You make a good point about using the common base class (although in some sense this is how much Java code works today). So that leads back the semantic rule that equals returns false if lhs.type != rhs.type.

I like the basic strategy of what you are proposing for collapsing everything down to compare. However we do need to handle the case where there is a hash code collision - that shouldn't result in zero. Maybe the default algorithm is something like this:

Int compare(Obj? that)
{
  if (this === that) return 0
  hash := this.hash - that.hash
  return hash === 0 ? -1 : hash
}

I'm not sure I fully understand or like two method calls on comparison. I think that can be handled by equals and compare overrides returning false/super if the types are different.

helium Thu 23 Oct 2008

He thinks about this:

Assume you have two classes one is a subclass of the other:

class Base {}
class Derived : Base {}

Base doesn't know about Derived so it's compare method is possibly wrong when you compare Base objects with Derived objects. But as Derived does know about Base, the author can implement a correct compare method.

Let's assume two instances

base := Base()
derived := Derived()

base.equals(derived) might return false because it lacks the knowledge of how to correctly compare a Base with a Derived (not because the result actually should be false). In that case base.equals(derived)||derived.equals(base) would work.


But then imagine the the other case: base.equals(derived) returns true where it realy should return false because of lacking knowledge. In that case base.equals(derived)||derived.equals(base) doesn't help.


This is all very abstract and we'd need some real-world examples to show whether this is realy a problem or if this all just hypothetical.

jodastephen Fri 24 Oct 2008

It shouldn't surprise us that this debate is complex as it keeps academics amused. Fan is a pragmatic language, and needs a pragmatic solution. One point we should bear in mind is that inheritance is the weakest OO feature - big inheritance hierarchies are a bad sign.

Firstly, I don't buy the compare forwards and backwards approach of JohnDG. This feels counter-intuitive and potentially very slow for the vast majority of cases.

Secondly, I dislike the hash code based comparison approach intensely. Hash codes are an internal artifact, effectively random numbers. We should not be exposing a whole new behaviour based on such a flaky foundation - there is a whole class of bugs waiting there. In addition, ordered comparison is rarely needed on a class, and should be an optional feature.

A rectangle and a polygon are both a Shape (and nothing else), but Shape doesn't have enough information to determine if two shapes are the same. It's up to Polygon or Rectangle or some combination of them to determine if a rectangular polygon is equivalent (for some purpose) to a rectangle (for example, in managing a dirty rectangle list).

If you want to model this using inheritance then why not make Rectangle a subclass of Polygon - it is a specialisation. My opinion is that most of the simple textbook examples of inheritance won't help us here.

Similarly, ComputerAddress might be the common ancestor of URL and JXTAChannel, but ComputerAddress would never be able to determine if two fundamentally different types of communication represent the same physical computer.

Neither should the equals() method. This is way too hard a problem for equals/compare methods to be tackling. An equals/compare method should be simple. Really simple. And just derived from the state of an object. In this case, I'd suggest a dedicated closure or class to be used to compare these two types.

And there is a simple remedy for a superclass to define the values to check of a subclass. The superclass can define abstract methods that obtain the value required for the comparison from each subclass.

Thus, I posit that in the vast majority of cases, there is no need to define equals/compare at more than one point in the hierarchy (in fact Java blocks compare at two points these days using generics).

Finally, as you might tell, I'm no longer a big fan of inheritance. This is one reason why I hope Fan will end up including something to make composition easier.

jodastephen Fri 24 Oct 2008

BTW, Joshua Bloch says of Java - "There is simply no way to extend an instantiable class and add an aspect while preserving the equals contract". His solution - use composition.

brian Tue 28 Oct 2008

I think we all agree that identity is fundamentally a really hard problem. I've given this some thought, and I don't think I'm really to tackle any substantial changes to how equals and compare work right now. I'd love to spend some time really trying to solve this, but I think some of these other things like interop and with-blocks/ctors are higher priority.

So all I'm going to do right now is fix the code such that equals is guaranteed to handle and return false if this.type != that.type. That seems safer than what I'm doing right now using the is operator.

For compare I'm thinking the safest thing do is throw CastErr if this.type != that.type. Anything else that might result in the use of multiple comparison algorithms would end up with weird sorting. The result of that change would be that if you want to sort a list of heterogeneous types you would be required to pass in a sort closure.

Login or Signup to reply.