I'm trying to work out how best to get high performance using as keys to a map const objects which model simple structs of other const objects. A simple example (my actual use case has const class fields instead of Ints):
const class Pair { const Int x; const Int y }
What I'd like to do is to modify the constructor for these classes to add caching and return a previous equal value instance if one has ever been created so that:
if a,b are instances of Pair
(a.x == b.x && a.y == b.y) => a === b
I am a bit sad that this is not the case anyway. I can see little reason for not ensuring referential uniqueness over all value equal const objects.
My problem is that I can't see any neat way to do this at user level with the mutable cache state (a Map) referenced by a make method in class Pair, because static fields must be const.
It can obviously be done with a helper class instantiated once containing the cache data but this is pretty ugly.
The alternative, discussed in a few threads here, is to implement a value-related hash function and equality based on value. That is a not much code, but less efficient.
brianSun 6 Apr 2014
The alternative, discussed in a few threads here, is to implement a value-related hash function and equality based on value. That is a not much code, but less efficient.
Actually the way HotSpot optimizes, my experience is that its pretty much always faster to create new instances rather than check and populate an internalized cache. So I would recommend only caching when you are really trying to save on memory for long-lived objects or your constructor is really expensive.
In Fantom you would build the cache using an Actor (for mutating), potentially using an AtomicRef for concurrent read access. But like I said, I find the added complexity doesn't buy you any performance improvements unless object construction is expensive.
Here is a snippet of code that uses an actor to build a cache for parsed strings:
const class Foo
{
** Parse a foo. On failure raise ParseErr
** or return null based on check flag.
static new fromStr(Str s, Bool checked := true)
{
cache := (Str:Obj)cacheRef.val
val := cache[s]
if (val == null) val = cacheActor.send(s).get(1min)
if (val is Foo) return val
if (checked) throw val
return null
}
private static const AtomicRef cacheRef := AtomicRef(Str:Obj[:].toImmutable)
private static const Actor cacheActor := Actor(ActorPool()) |msg| { onCache(msg) }
private static Obj? onCache(Str s)
{
Obj? val := null
try
val = FooParser(s).parse
catch (ParseErr e)
val = e
catch (Err e)
val = ParseErr(s, e)
cache := (Str:Obj)cacheRef.val
cacheRef.val = cache.rw.set(s, val).toImmutable
return val
}
}
tomclMon 7 Apr 2014
Thanks, that is interesting. I can see that it makes sense theoretically to use an Actor here, since static mutable state needs some contract so that it cannot harm concurrency, and in Fantom that contract is made for Actors.
On the other hand it seems a little bit contrived as a way to implement the needed static mutable state for single-threaded computation?
You have the Actor framework which is useful to implement message-passing concurrency, and you restrict static mutability so that it is more difficult to break that: I'm still not clear how beneficial that restriction is.
Two other issues:
(1) Implementing a value-based hash and equality function - I take your point that this is maybe a better solution - the hash function cannot be a field or a once method. That seems a shame? I guess though it makes a fairly small difference to overall performance
(2) Does Fantom run faster on a 64 bit JVM? I guess your 64 bit Floats and Ints might be quite a bit quicker, even though address stuff will be slower.
SlimerDudeMon 7 Apr 2014
(Actors) seems a little bit contrived as a way to implement the needed static mutable state for single-threaded computation?
You could just put the map in an atomic Ref - though you may calculate the same val twice in a race condition. See From One Thread to Another....
I'm still not clear how beneficial that restriction is.
I see it as keeping everything clean. If you really need to pass data between threads, then Fantom goes the extra distance to ensure you do it in a safe manner. Concurrency causes so many problems in Java.
the hash function cannot be a field or a once method.
Why not? These classes compile ok:
const class Hash1 {
override const Int hash := 42
}
class Hash2 {
override once Int hash() {
42
}
}
Ah I see - I had not understood the power of Atomic Ref to subvert things. OK that is fine, and I agree that for anything concurrent the Actor solution is needed, and fully justified.
I thought the built-in Map uses hash and equals as defined: I now realise looking at it that equals and hash can be over-ridden and I guess that a field can override a non-void method!
I can see why an Actor framework has merits, and agree that arbitrary concurrency is a nightmare.
tomclThu 10 Apr 2014
So here is my best attempt to solve the "use reference for equality" Pairs problem in Fantom. I got into a lot of trouble originally wanting to subclass normal pairs as reference-equality ones!
It is surprisingly neat, but only because of careful use of the Map shortcut methods.
It has to construct new SpPair objects even when these are not needed because they have been cached: the problem is that single objects can be sent as messages but not two separate objects, as would be best here. Serialising would not help efficiency!
I'd welcome comments about better ways to do it...
You could also use a getOrAdd for the last cache too.
And lastly, you could send an immutable list if you didn't want to create an SpPair instance. My take on the code would then look like:
using concurrent
const class SpPair {
const Student student
const Project project
private static const Actor mapState := Actor(ActorPool()) |Obj[] list -> SpPair| {
student := (Student) list[0]
project := (Project) list[1]
cache := ([Student:[Project:SpPair]]) Actor.locals.getOrAdd("Sp1Map") { [:] }
cacheS := cache.getOrAdd(student) { [:] }
return cacheS.getOrAdd(project) { SpPair.make(student, project) }
}
private new make(Student s, Project p) {
student = s
project = p
}
static new makeUnique(Student s, Project p) {
return mapState.send([s, p].toImmutable).get()
}
}
const class Student { }
const class Project { }
I would also consider creating a unique key out of the Student and Project objects, and send that over. It would also mean the cache would be a plain Map instead of a Map of Maps. But given SpPair looks like what I'd imagine the key to look like - it's a little academic!
tomclThu 10 Apr 2014
Many thanks.
Yes, I realised about getOrAdd() for the last cache in parallel with your post. Now shown.
I also tried the unique key thing but as you say it is cart before horse.
It and With, With-add blocks in Fantom are incredibly useful for getting rid of noise! I think I have not yet got the hang of "It" usages as here. They are one of the features of the language not much highlighted but very important. One reason Hertz's book would be a good idea is that there are a number of idiomatic Fantom things that are very powerful but need more long-winded tutorials than is provided in the standard documentation.
I was also expecting that Map() would be the same as [:] but for reasons I don't understand that is not the case.
Good idea using a list and I think more efficient. Also, as important, it is easy to read.
On that topic do the little closures used here all over the place make things less efficient?
SlimerDudeThu 10 Apr 2014
do little closures used all over the place make things less efficient?
Probably a bit, as they involve a few extra method calls, so adding those calls to the program stack may eat up a few nano-secs. But if that's really a concern, you should probably be programming in native C or assembler!
As for Map() not being the same as [:], um, it should be! Map() won't get type inferred by the compiler though, it'll always be Obj?:Obj?.
Good idea using a list
I was going to use a wrapper object - but then, that's what SpPair is! Back to the horse and cart again!
It and With, With-add blocks in Fantom are incredibly useful ...
I think I've only used a with block once, and felt dirty afterwards - it's not explicit as to what's going on. And I'm not sure what a with-add block is? Do you mean the add operator for declaring lists and maps?
Hertz's book would be a good idea is that there are a number of idiomatic Fantom things that are very powerful but need more long-winded tutorials than is provided in the standard documentation.
Yup, I agree. Personally I'm more excited about seeing Fantom specific topics in the Book than general programming techniques. (But that's getting off-topic!)
tomclThu 10 Apr 2014
OK, I'm still getting the hang of the thing.
By with-add I mean the declarative thing, very useful in GUIs and similarly but less spectacularly in Lists and Maps.
By with I mean like With-Add but without the comma, and used to set fields.
I'm probably using the wrong names!
tomclSat 12 Apr 2014
Time comparison identity equality vs component compare equality
Some rough measurements. On a Core 2 Duo without hyperthreading under 64 bit W7 with 32 bit 1.7 JVM
As you'd expect these are very dependent on RAM allowed for the JVM. I have this set much higher than the maximum used heap, and I average the measurement over 10 runs to minimise errors due to other CPU activity. It took a while to get consistent results.
Cached pairs with identity for equality
construction (with actor cache): 33700 ns/pair
comparison: 31ns/pair
Simple pairs using component comparison for equality
construction: 244ns/pair
comparison: 55ns/pair
The very high time for the cached pair construction can't be the Maps, so it is probably the communication overhead with the Actor?
tomcl Sun 6 Apr 2014
I'm trying to work out how best to get high performance using as keys to a map const objects which model simple structs of other const objects. A simple example (my actual use case has const class fields instead of Ints):
const class Pair { const Int x; const Int y }
What I'd like to do is to modify the constructor for these classes to add caching and return a previous equal value instance if one has ever been created so that:
if a,b are instances of Pair
(a.x == b.x && a.y == b.y) => a === b
I am a bit sad that this is not the case anyway. I can see little reason for not ensuring referential uniqueness over all value equal const objects.
My problem is that I can't see any neat way to do this at user level with the mutable cache state (a Map) referenced by a make method in class Pair, because static fields must be const.
It can obviously be done with a helper class instantiated once containing the cache data but this is pretty ugly.
The alternative, discussed in a few threads here, is to implement a value-related hash function and equality based on value. That is a not much code, but less efficient.
brian Sun 6 Apr 2014
Actually the way HotSpot optimizes, my experience is that its pretty much always faster to create new instances rather than check and populate an internalized cache. So I would recommend only caching when you are really trying to save on memory for long-lived objects or your constructor is really expensive.
In Fantom you would build the cache using an Actor (for mutating), potentially using an AtomicRef for concurrent read access. But like I said, I find the added complexity doesn't buy you any performance improvements unless object construction is expensive.
Here is a snippet of code that uses an actor to build a cache for parsed strings:
tomcl Mon 7 Apr 2014
Thanks, that is interesting. I can see that it makes sense theoretically to use an Actor here, since static mutable state needs some contract so that it cannot harm concurrency, and in Fantom that contract is made for Actors.
On the other hand it seems a little bit contrived as a way to implement the needed static mutable state for single-threaded computation?
You have the Actor framework which is useful to implement message-passing concurrency, and you restrict static mutability so that it is more difficult to break that: I'm still not clear how beneficial that restriction is.
Two other issues:
(1) Implementing a value-based hash and equality function - I take your point that this is maybe a better solution - the hash function cannot be a field or a once method. That seems a shame? I guess though it makes a fairly small difference to overall performance
(2) Does Fantom run faster on a 64 bit JVM? I guess your 64 bit Floats and Ints might be quite a bit quicker, even though address stuff will be slower.
SlimerDude Mon 7 Apr 2014
You could just put the map in an atomic Ref - though you may calculate the same val twice in a race condition. See From One Thread to Another....
I see it as keeping everything clean. If you really need to pass data between threads, then Fantom goes the extra distance to ensure you do it in a safe manner. Concurrency causes so many problems in Java.
Why not? These classes compile ok:
That's a bit subjective, see Does Java 64bit perform better than the 32bit version? on Stackoverflow and Should I use a 32- or a 64-bit JVM? on D-Zone.
tomcl Mon 7 Apr 2014
Ah I see - I had not understood the power of Atomic Ref to subvert things. OK that is fine, and I agree that for anything concurrent the Actor solution is needed, and fully justified.
I thought the built-in Map uses hash and equals as defined: I now realise looking at it that equals and hash can be over-ridden and I guess that a field can override a non-void method!
I can see why an Actor framework has merits, and agree that arbitrary concurrency is a nightmare.
tomcl Thu 10 Apr 2014
So here is my best attempt to solve the "use reference for equality" Pairs problem in Fantom. I got into a lot of trouble originally wanting to subclass normal pairs as reference-equality ones!
It is surprisingly neat, but only because of careful use of the Map shortcut methods.
It has to construct new SpPair objects even when these are not needed because they have been cached: the problem is that single objects can be sent as messages but not two separate objects, as would be best here. Serialising would not help efficiency!
I'd welcome comments about better ways to do it...
SlimerDude Thu 10 Apr 2014
Yeah, it's pretty good - I'd only have a couple of minor points...
I would use it-blocks to shorten the creator funcs, so
becomes
You could also use a
getOrAdd
for the last cache too.And lastly, you could send an immutable list if you didn't want to create an
SpPair
instance. My take on the code would then look like:I would also consider creating a unique key out of the
Student
andProject
objects, and send that over. It would also mean the cache would be a plainMap
instead of aMap
ofMap
s. But givenSpPair
looks like what I'd imagine the key to look like - it's a little academic!tomcl Thu 10 Apr 2014
Many thanks.
Yes, I realised about getOrAdd() for the last cache in parallel with your post. Now shown.
I also tried the unique key thing but as you say it is cart before horse.
It and With, With-add blocks in Fantom are incredibly useful for getting rid of noise! I think I have not yet got the hang of "It" usages as here. They are one of the features of the language not much highlighted but very important. One reason Hertz's book would be a good idea is that there are a number of idiomatic Fantom things that are very powerful but need more long-winded tutorials than is provided in the standard documentation.
I was also expecting that Map() would be the same as [:] but for reasons I don't understand that is not the case.
Good idea using a list and I think more efficient. Also, as important, it is easy to read.
On that topic do the little closures used here all over the place make things less efficient?
SlimerDude Thu 10 Apr 2014
Probably a bit, as they involve a few extra method calls, so adding those calls to the program stack may eat up a few nano-secs. But if that's really a concern, you should probably be programming in native C or assembler!
As for
Map()
not being the same as[:]
, um, it should be!Map()
won't get type inferred by the compiler though, it'll always beObj?:Obj?
.I was going to use a wrapper object - but then, that's what
SpPair
is! Back to the horse and cart again!I think I've only used a
with
block once, and felt dirty afterwards - it's not explicit as to what's going on. And I'm not sure what awith-add
block is? Do you mean the add operator for declaring lists and maps?Yup, I agree. Personally I'm more excited about seeing Fantom specific topics in the Book than general programming techniques. (But that's getting off-topic!)
tomcl Thu 10 Apr 2014
OK, I'm still getting the hang of the thing.
By with-add I mean the declarative thing, very useful in GUIs and similarly but less spectacularly in Lists and Maps.
By with I mean like With-Add but without the comma, and used to set fields.
I'm probably using the wrong names!
tomcl Sat 12 Apr 2014
Time comparison identity equality vs component compare equality
Some rough measurements. On a Core 2 Duo without hyperthreading under 64 bit W7 with 32 bit 1.7 JVM
As you'd expect these are very dependent on RAM allowed for the JVM. I have this set much higher than the maximum used heap, and I average the measurement over 10 runs to minimise errors due to other CPU activity. It took a while to get consistent results.
Cached pairs with identity for equality
construction (with actor cache): 33700 ns/pair
comparison: 31ns/pair
Simple pairs using component comparison for equality
construction: 244ns/pair
comparison: 55ns/pair
The very high time for the cached pair construction can't be the Maps, so it is probably the communication overhead with the Actor?