#392 Quick composites

jodastephen Thu 13 Nov 2008

I blogged recently about composite classes. This is related to previous discussions around equals/compare/wrapper classes. I found the blog comments re other languages very interesting.

The basic blog is about creating a type safe version of a string identifier. Java has lots of code:

public final class PoolId {
  private final String poolName;

  public PoolId(String poolName) {
    if (poolName == null) {
      throw new IllegalArgumentException();
    }
    this.poolName = poolName;
  }
  public String getPoolName() {
    return poolName;
  }
  public boolean equals(Object obj) {
    if (obj == this) {
      return true;
    }
    if (obj instanceof PoolId == false) {
      return false;
    }
    PoolId other = (PoolId) obj;
    return poolName.equals(other.poolName);
  }
  public int hashCode() {
    return poolName.hashCode();
  }
  public String toString() {
    return poolName;
  }
}

Ruby has this (no non-null checks):

PoolId=Struct.new(:poolName) 

Scala has this (no non-null checks):

case class PoolId(poolName:String, configName:String)

Groovy has this (no non-null checks):

@Immutable
final class PoolId { String poolName, configName } 

Currently, Fan has this:

class PoolId {
  Str poolName
  Str configName
  PoolId(Str poolName, Str configName) {
    this.poolName = poolName;
    this.configName = configName;
  }
  Bool equals(Obj that) {
    if (type != that?.type) return false
    other:= (PoolId ) that
    return other.poolName == this.poolName && other.configName == this.configName
  }
  Int hashCode() {
    return poolName.hashCode() + configName.hashCode()
  }
}

Put simply, I don't think Fan is currently competitive for producing simple value objects. Given Fan's focus on immutability, we need to be able to produce meaningful little classes quickly and easily. My suggested syntax was:

class PoolId {
  state Str poolName
  state Str configName
}  

which generates the constructor/equals/hashcode and maybe the compare/toString.

helium Thu 13 Nov 2008

Why does the Java and Ruby version only have poolName, while all other versions have an additional configName?


Anyway, in Haskell you could write (including non-null-checks completly at compile-time)

data PoolId = PoolId { poolName :: String, configName :: String }
   deriving (Eq, Show)

deriving tells the compiler to automatically create instances for the listet type classes (you don't have to know what type classes are to get my point). You can compare it (Eq) and convert it to a string like with the toString-Method (Show). If you want ordering you could add Ord. ...

So you can specify what should be automatically implemented. I don't want the compiler to magically add compare as many things don't have a natural order.


The name state is IMO irritating when used by immutable objects and you explicitly mentioned "Fan's focus on immutability"? (BTW, I think Fan isn't focused on immutability at all. A language that is focused on immutability would default to immutability and not to mutability.)

andy Thu 13 Nov 2008

That seems more like a Type concept than a field concept, so I would prefer something along these lines:

@someFacet class PoolId {
  Str poolName
  Str configName
  @transient Str notUsed   // not used for equals/hash
}

Might be nice to use a Mixin there:

class PoolId : Simple { ... }

But that doesn't work unless the compiler injected the router calls for equals/hash.

jodastephen Thu 13 Nov 2008

@helium, the Java version only has poolName because I copied it from my blog, and couldn't be bothered to type all the additional verbose rubbish required to add configName.

I agree that compare shouldn't be autogenerated as not everything has an order.

The Haskell code looks interesting, as do Andy's ideas to inherit behaviour that can reference the fields of the subclass.

The idea that this is type based rather than field based brings me back to another old question - what is so special about enums that they deserve such special treatment? Are they perhaps a special case of a bigger concept?

brian Thu 13 Nov 2008

I've been giving it some thought since you wrote your blog article. There are actually a couple different things in there:

  1. auto-generation for constructor field setters
  2. auto-generation of equals/hash
  3. how to easily create these little helper classes

Although they are indeed inter-related, I do think they can all be examined orthogonally too.

Figuring out a pattern/syntax for constructor/setters is something I can think we can do independently and would be nice. It isn't as pressing since often you use with-blocks, but still seems to occur often enough to think about a nicer solution.

The equals/hash seems nice for larger classes. Although in the kind of little class you blogged out, I rarely care about equals or hash. Typically they are more of an internal helper class.

But getting back the real meat of the problem you describe - I think what you really want is not composite classes, but first class tuples. Often times in Java/C#/Fan those little classes are really just a work around for lack of first class tuples. This is very similar to the reason why first class function types remove much of the need for where small interface classes are used in Java. So I'd say the ideal solution is just something like:

(Str, Str)              // unnamed
(Str pool, Str config)  // named

We had a discussion about tuples before. That discussion touched on anonymous classes and structural typing.

alexlamsl Thu 13 Nov 2008

Tuples feel more natural to me than those auto-generated classes - less magical, if you see what I mean.

In fact, I think Perl has some pretty good examples of how tuples can be pretty powerful - for instance, multiple return values.

jodastephen Fri 14 Nov 2008

I'd say that tuples were definitely not what I was thinking of when writing the blog. The key aspect that is required is that it is a real class, with a real name, that can be extended with real behaviour later without any big refactoring.

In this sense, its similar to an enum in Java, where methods can be added later. Can't remember if Fan enums allow that.

This should also link into a more in depth look at classes formed by composition (eg. JSR-310 where a class like OffsetDate is a composite of LocalDate and ZoneOffset).

BTW, for my example (PoolID) the equals/hash were very important, as the PoolID is used as the key in a map.

I agree with your analysis of the three aspects to the problem though.

Finally, I'm not opposed to tuples, but have never actually found the lack of them in Java to be a major issue for me.

helium Fri 14 Nov 2008

Tuples don't have named fields, those things are called records.

Generating usefull versions of methods for a class is not the same thing as having records and/or tuples in the language.

@jodastephen:

Finally, I'm not opposed to tuples, but have never actually found the lack of them in Java to be a major issue for me.

Were you ever used to use them?

what is so special about enums that they deserve such special treatment? Are they perhaps a special case of a bigger concept?

They are a simple way to define some related constants, but they have some similarity to (pretty castrated) algebraic datatypes.

katox Fri 14 Nov 2008

I agree that simplification of simple composition classes would be very nice. I always find a major annoyance in Java to write huge boilerplate just to temporarily put some values together (internally).

I'd be pretty opposed to some kind of magic autogeneration of code. The heliums idea sounds much more reasonable and also more flexible.

It is also not that uncommon that you want to make a simple wrapper class just to alter the behaviour of some algorithm (mostly TreeMap or TreeSet) - this should be easy (but it is really oververbose and error prone in Java).

Finally, I'm not opposed to tuples, but have never actually found the lack of them in Java to be a major issue for me.

But it is quite common. See bugs.sun.com

I think allowing tuples is very logical requirement in a language that uses f(a, b, c) syntax. You don't have to put a, b and c into a common class in order to pass it to a function so why should the compiler force you to create a single return object? The arguments that having a tuple (or MRV, as a crippled tuple) as a return type is not true OO is bullshit.

Having MRV is just a half step (though it is the thing that is mostly needed). But restricting algebra just because users might misuse the feature mostly leads to a terrible and more complicated design later.

The other thing is whether to allow named records instead of tuples. That might work (and could be useful to avoid Fan's current usage of f([A:a, B:b, C:c]) untyped expressions) but it sort-of collides with the class approach...

@jodastephen: Java enums are just a plain disaster. Most of our code had to drop enums and use public static final again for various reasons. What is the point of having functions around constants when you can't use the constants in constant expression at all? Uhg...

jodastephen Fri 14 Nov 2008

Perhaps we are at a point where the diverging forces pull Fan in one direction or another. I don't believe Fan is a functional language. I see it as an OO language that takes the best ideas and what we've learnt from Java and C#.

Multiple return values have always come up as a solution looking for a problem in my world. Values can be returned as a map, array, or dedicated little class. But allowing tuples opens a door that many, many developers will abuse like nobodies business. Like all language design its about the tradeoff, and I think tuples are very rarely needed, and very easily abused. Hence we should leave them out.

This thread shouldn't be interpreted as just being about simple one or two field classes. We should consider easy creation of larger domain objects as a goal. The fields of Fan greatly help this over Java, but we need some way to get equals/hashCode/constructor behaviours too. The examples I started with demonstrate this - for simple classes I should not need to write an equals method.

brian Sat 15 Nov 2008

tuples are very rarely needed, and very easily abused. Hence we should leave them out.

I'm not sure I agree that they are rarely needed. I use them a lot in Python. Then again I don't miss them hugely per se in Fan - that might because most of my work is in public APIs which tend to more formal interfaces. My gut is that tuples aren't really right for Fan.

However, I do really want something to create "structures" on the fly easier (this will probably turn out to be a big topic for me Dec). And I don't see that as a static typing problem, but rather a dynamic one.

This thread shouldn't be interpreted as just being about simple one or two field classes.

I think one of the problem with this thread is that we don't really have a very clear problem statement. I come back to my three issues:

  1. easier constructors - no real proposals for that yet
  2. equals/hash support - this seems like a nice to do, but not something I run across very often; seems like a case the existing OO mechanisms could handle with a simple mixin:
    mixin Struct
    {
      override Int hash()
      {
        h := 33
        type.fields.each |Field f| { v := f.get(this); if (v != null) h ^= v.hash }
        return h
      }  
    
      override Bool equals(Obj? that)
      { 
        if (this.type != that.type) return false
        return type.fields.all |Field f->Bool| { return f.get(this) == f.get(that) } 
      }
    }   
  3. smaller classes - other than tuples, isn't this really just solving 1 and 2?

jodastephen Mon 17 Nov 2008

Whilst your mixin would work, it would also be slow. Essentially you are using reflection for equals and hash which isn't pretty.

This is a common case IMO. Most of the time that you want to write an equals method it will be derived from some or all of the fields of the class. Its pretty rare that it is anything else. So, the common case should be made easy.

Finally, how easy would it be to allow a mixin (or similar) to generate code (fcode or source-code) in the subclass? This would solve the equals/hash and constructor case I suspect.

brian Mon 17 Nov 2008

Whilst your mixin would work, it would also be slow. Essentially you are using reflection for equals and hash which isn't pretty.

Reflection is a pretty common technique in Fan (especially with dynamic calls). Once we move reflection to method handles, then theoretically performance should be pretty close to static calls. So I agree we should consider reflection performance, but I don't think we should discount elegant solutions which are reflection/dynamic based.

Finally, how easy would it be to allow a mixin (or similar) to generate code (fcode or source-code) in the subclass?

Not sure what you are asking - are you asking about some type of code generation macros?

jodastephen Mon 17 Nov 2008

We don't have method handles yet, so reflection is slower. It also generates a lot of classes in the background. I'm arguing for a neater solution.

(I think what we are actually disputing is how common this use case is. I contend that over 90% of equals/hash are derived directly from the state of an object. Is there an easy way to examine the Fan source code to test that theory?)

> Finally, how easy would it be to allow a mixin (or similar) to generate code (fcode or source-code) in the subclass?

Not sure what you are asking - are you asking about some type of code generation macros?

Thinking of some kind of plugin to the compiler that can generate the code in the subclass. It might be a more generic language change - for example, a generator/plugin/mixin could add log entry/exit to subclasses - like AOP, but integrated into the source code rather than declared outside using a separate language.

brian Tue 18 Nov 2008

Is there an easy way to examine the Fan source code to test that theory?)

I wrote a quick script to count how many classes override equals, then looked thru the code to see if equals was a straight test of each field:

  • total classes: 1499
  • total equals overrides: 33 (2% of total)
  • mapped to all fields: 10 (0.7% of total, 30% of equals)
  • mapped to some fields: probably make just about all work that way

I excluded tests from the total and primitives from equals count. So empirical evidence would suggest that mapping equals/hash to fields probably would work most of the time. But then again it only comes up in 2% of classes. Based on looking at the code, it didn't seem there was a lot to gain by auto-generating equals/hash. The strongest cases were in fwt for things like Pen, Font, etc which are most like structures.

Thinking of some kind of plugin to the compiler that can generate the code in the subclass.

It definitely makes me think of macros. I think something like that could be powerful and used for good. But also seems like the kind of thing that could easily be abused.

helium Tue 18 Nov 2008

How would it work with method handles? Who would create your list of methods handles that you could call in equals/hash?

For macros you might want to look at Nemerle.

brian Tue 18 Nov 2008

How would it work with method handles? Who would create your list of methods handles that you could call in equals/hash?

I haven't quite figured out how we'll use method handles. But the issue is that most of reflection overhead is in packaging of the arguments up into Object[]. Method handles let us call the method without that Object[] boxing. In the case of trap we can push that all the way back to the original call site. For reflection we'd want to do something similar (at the call site). It is probably something to figure out sooner rather than later because it probably effects the specification of how trap works.

helium Tue 18 Nov 2008

I withdraw my question. I was thinking in the wrong direction.

helium Tue 18 Nov 2008

Can't you use LINQ expression trees somehow at least on the .Net side? LINQ Expressions as Fast Reflection Invoke

jodastephen Wed 19 Nov 2008

I wrote a quick script to count how many classes override equals, then looked thru the code to see if equals was a straight test of each field: * total classes: 1499 * total equals overrides: 33 (2% of total) * mapped to all fields: 10 (0.7% of total, 30% of equals) * mapped to some fields: probably make just about all work that way

These figures look roughly what might be expected for system level code. ie. System level code will naturally have a low rate of equals/hash. Domain objects tend to have more hand coded equals/hash. The interesting figure would be the possible 100% hit rate for equals derived from fields.

I'll try and do something similar for my work sourcebase, although in Java its not that easy to work it out I suspect.

helium Wed 19 Nov 2008

Just some numbers for .Net:

Quick and dirty in C#:

var assemblys = AppDomain.CurrentDomain.GetAssemblies();

var count = 0;
var totalCount = 0;

foreach (var assembly in assemblys)
{
    foreach (var type in assembly.GetTypes().Where(t => !t.IsInterface))
    {
        ++totalCount;
        var equals = type.GetMethods(BindingFlags.Instance | BindingFlags.Public)
            .Where(m => m.Name == "Equals" && m.DeclaringType != typeof(object));

        if (equals.Count() > 0)
            ++count;
    }
}

Console.WriteLine(count + " of " + totalCount + " classes have their own Equals");

The result in an otherwise empty project is: 1932 of 4356 classes have their own Equals. That's about 44%.

When I start it with the debug host a lot more is loaded so I get 3347 of 8515 classes have their own Equals which still is about 40%.

An empty WinFormsapplication (i.e. what VS2008 automatically creates): 3076 of 7722 classes have their own Equals. Again about 40%.

An empty WPF application: 5188 of 10255 classes have their own Equals. That's about 50%.

jodastephen Thu 20 Nov 2008

Figures for JSR-310 (date and time) library:

  • Total classes: 119
  • Total defining non reference equals: 34 (28%)
  • Total of equals derivable from fields: 30 (88%)
  • Total of equals that could be derivable from fields: 34 (100%)

Figures for Joda-Time (date and time) library:

  • Total classes: 222
  • Total defining non reference equals: 26 (11%)
  • Total defining or inheriting non reference equals : 44 (19%)
  • Total of equals derivable from fields: 40 (90%)
  • Total of equals that could be derivable from fields: 40 (90%)

A separate count of inherited equals was needed for Joda-Time - JSR-310 doesn't use inheritance really.

Login or Signup to reply.