All Topics

#2255 equality => identity for const objects

tomcl Sun 6 Apr 2014

I'm trying to work out how best to get high performance using as keys to a map const objects which model simple structs of other const objects. A simple example (my actual use case has const class fields instead of Ints):

const class Pair { const Int x; const Int y }

What I'd like to do is to modify the constructor for these classes to add caching and return a previous equal value instance if one has ever been created so that:

if a,b are instances of Pair

(a.x == b.x && a.y == b.y) => a === b

I am a bit sad that this is not the case anyway. I can see little reason for not ensuring referential uniqueness over all value equal const objects.

My problem is that I can't see any neat way to do this at user level with the mutable cache state (a Map) referenced by a make method in class Pair, because static fields must be const.

It can obviously be done with a helper class instantiated once containing the cache data but this is pretty ugly.

The alternative, discussed in a few threads here, is to implement a value-related hash function and equality based on value. That is a not much code, but less efficient.

brian Sun 6 Apr 2014

The alternative, discussed in a few threads here, is to implement a value-related hash function and equality based on value. That is a not much code, but less efficient.

Actually the way HotSpot optimizes, my experience is that its pretty much always faster to create new instances rather than check and populate an internalized cache. So I would recommend only caching when you are really trying to save on memory for long-lived objects or your constructor is really expensive.

In Fantom you would build the cache using an Actor (for mutating), potentially using an AtomicRef for concurrent read access. But like I said, I find the added complexity doesn't buy you any performance improvements unless object construction is expensive.

Here is a snippet of code that uses an actor to build a cache for parsed strings:

const class Foo
{
  ** Parse a foo.  On failure raise ParseErr
  ** or return null based on check flag.
  static new fromStr(Str s, Bool checked := true)
  {
    cache := (Str:Obj)cacheRef.val
    val := cache[s]
    if (val == null) val = cacheActor.send(s).get(1min)
    if (val is Foo) return val
    if (checked) throw val
    return null
  }

  private static const AtomicRef cacheRef := AtomicRef(Str:Obj[:].toImmutable)
  private static const Actor cacheActor := Actor(ActorPool()) |msg| { onCache(msg) }
  private static Obj? onCache(Str s)
  {
    Obj? val := null
    try
      val = FooParser(s).parse
    catch (ParseErr e)
      val = e
    catch (Err e)
      val = ParseErr(s, e)
    cache := (Str:Obj)cacheRef.val
    cacheRef.val = cache.rw.set(s, val).toImmutable
    return val
  }
}

tomcl Mon 7 Apr 2014

Thanks, that is interesting. I can see that it makes sense theoretically to use an Actor here, since static mutable state needs some contract so that it cannot harm concurrency, and in Fantom that contract is made for Actors.

On the other hand it seems a little bit contrived as a way to implement the needed static mutable state for single-threaded computation?

You have the Actor framework which is useful to implement message-passing concurrency, and you restrict static mutability so that it is more difficult to break that: I'm still not clear how beneficial that restriction is.

Two other issues:

(1) Implementing a value-based hash and equality function - I take your point that this is maybe a better solution - the hash function cannot be a field or a once method. That seems a shame? I guess though it makes a fairly small difference to overall performance

(2) Does Fantom run faster on a 64 bit JVM? I guess your 64 bit Floats and Ints might be quite a bit quicker, even though address stuff will be slower.

SlimerDude Mon 7 Apr 2014

(Actors) seems a little bit contrived as a way to implement the needed static mutable state for single-threaded computation?

You could just put the map in an atomic Ref - though you may calculate the same val twice in a race condition. See From One Thread to Another....

I'm still not clear how beneficial that restriction is.

I see it as keeping everything clean. If you really need to pass data between threads, then Fantom goes the extra distance to ensure you do it in a safe manner. Concurrency causes so many problems in Java.

the hash function cannot be a field or a once method.

Why not? These classes compile ok:

const class Hash1 {
  override const Int hash := 42
}

class Hash2 {
  override once Int hash() {
    42
  }
}

Does Fantom run faster on a 64 bit JVM?

That's a bit subjective, see Does Java 64bit perform better than the 32bit version? on Stackoverflow and Should I use a 32- or a 64-bit JVM? on D-Zone.

tomcl Mon 7 Apr 2014

Ah I see - I had not understood the power of Atomic Ref to subvert things. OK that is fine, and I agree that for anything concurrent the Actor solution is needed, and fully justified.

I thought the built-in Map uses hash and equals as defined: I now realise looking at it that equals and hash can be over-ridden and I guess that a field can override a non-void method!

I can see why an Actor framework has merits, and agree that arbitrary concurrency is a nightmare.

tomcl Thu 10 Apr 2014

So here is my best attempt to solve the "use reference for equality" Pairs problem in Fantom. I got into a lot of trouble originally wanting to subclass normal pairs as reference-equality ones!

It is surprisingly neat, but only because of careful use of the Map shortcut methods.

It has to construct new SpPair objects even when these are not needed because they have been cached: the problem is that single objects can be sent as messages but not two separate objects, as would be best here. Serialising would not help efficiency!

I'd welcome comments about better ways to do it...

const class SpPair

{
  new make(Student s, Project p)
  {
    student = s
    project = p
  }
  
  static new makeUnique(Student s, Project p)
  {
    return mapState.send(make(s,p)).get()
  }
  
  const Student student
  
  const Project project
  
  static const Actor mapState := Actor(ActorPool()) |SpPair spp -> SpPair| {
    cache := ([Student:[Project:SpPair]]) Actor.locals.getOrAdd("Sp1Map", 
                      |->[Student:[Project:SpPair]]| {[:]} )
    cacheS := cache.getOrAdd(spp.student, |->[Project:SpPair]|{[:]})
    return cacheS.getOrAdd(spp.project, |->SpPair| {spp})     
  }  
}

SlimerDude Thu 10 Apr 2014

Yeah, it's pretty good - I'd only have a couple of minor points...

I would use it-blocks to shorten the creator funcs, so

Actor.locals.getOrAdd("Sp1Map", |->[Student:[Project:SpPair]]| {[:]} )

becomes

Actor.locals.getOrAdd("Sp1Map") { [:] }

You could also use a getOrAdd for the last cache too.

And lastly, you could send an immutable list if you didn't want to create an SpPair instance. My take on the code would then look like:

using concurrent

const class SpPair {
	
  const Student student  
  const Project project

  private static const Actor mapState := Actor(ActorPool()) |Obj[] list -> SpPair| {
    student := (Student) list[0]
    project := (Project) list[1]
    cache   := ([Student:[Project:SpPair]]) Actor.locals.getOrAdd("Sp1Map") { [:] }
    cacheS  := cache.getOrAdd(student) { [:] }
    return cacheS.getOrAdd(project) { SpPair.make(student, project) }
  }  
	
  private new make(Student s, Project p) {
    student = s
    project = p
  }
  
  static new makeUnique(Student s, Project p) {
    return mapState.send([s, p].toImmutable).get()
  }
}

const class Student { }
const class Project { }

I would also consider creating a unique key out of the Student and Project objects, and send that over. It would also mean the cache would be a plain Map instead of a Map of Maps. But given SpPair looks like what I'd imagine the key to look like - it's a little academic!

tomcl Thu 10 Apr 2014

Many thanks.

Yes, I realised about getOrAdd() for the last cache in parallel with your post. Now shown.

I also tried the unique key thing but as you say it is cart before horse.

It and With, With-add blocks in Fantom are incredibly useful for getting rid of noise! I think I have not yet got the hang of "It" usages as here. They are one of the features of the language not much highlighted but very important. One reason Hertz's book would be a good idea is that there are a number of idiomatic Fantom things that are very powerful but need more long-winded tutorials than is provided in the standard documentation.

I was also expecting that Map() would be the same as [:] but for reasons I don't understand that is not the case.

Good idea using a list and I think more efficient. Also, as important, it is easy to read.

On that topic do the little closures used here all over the place make things less efficient?

SlimerDude Thu 10 Apr 2014

do little closures used all over the place make things less efficient?

Probably a bit, as they involve a few extra method calls, so adding those calls to the program stack may eat up a few nano-secs. But if that's really a concern, you should probably be programming in native C or assembler!

As for Map() not being the same as [:], um, it should be! Map() won't get type inferred by the compiler though, it'll always be Obj?:Obj?.

Good idea using a list

I was going to use a wrapper object - but then, that's what SpPair is! Back to the horse and cart again!

It and With, With-add blocks in Fantom are incredibly useful ...

I think I've only used a with block once, and felt dirty afterwards - it's not explicit as to what's going on. And I'm not sure what a with-add block is? Do you mean the add operator for declaring lists and maps?

Hertz's book would be a good idea is that there are a number of idiomatic Fantom things that are very powerful but need more long-winded tutorials than is provided in the standard documentation.

Yup, I agree. Personally I'm more excited about seeing Fantom specific topics in the Book than general programming techniques. (But that's getting off-topic!)

tomcl Thu 10 Apr 2014

OK, I'm still getting the hang of the thing.

By with-add I mean the declarative thing, very useful in GUIs and similarly but less spectacularly in Lists and Maps.

By with I mean like With-Add but without the comma, and used to set fields.

I'm probably using the wrong names!

tomcl Sat 12 Apr 2014

Time comparison identity equality vs component compare equality

Some rough measurements. On a Core 2 Duo without hyperthreading under 64 bit W7 with 32 bit 1.7 JVM

As you'd expect these are very dependent on RAM allowed for the JVM. I have this set much higher than the maximum used heap, and I average the measurement over 10 runs to minimise errors due to other CPU activity. It took a while to get consistent results.

Cached pairs with identity for equality

construction (with actor cache): 33700 ns/pair

comparison: 31ns/pair

Simple pairs using component comparison for equality

construction: 244ns/pair

comparison: 55ns/pair

The very high time for the cached pair construction can't be the Maps, so it is probably the communication overhead with the Actor?