All Topics

#1837 Ordered Map litteral

tcolar Wed 14 Mar 2012

Is there a way to define a map litteral (not empty) that is ordered.

say:

Str:Str map := ["a":"1", "b":"2", "c":"3", ...... ]

Since "ordered=true" can only be applied to an empty map and I can't think of a way to apply that "ordered=true" before hand (it block would do it after) ... then I have no choice but to do something like:

Str:Str map := [,] {ordered=true}
// then add all the items one at a time
....

Or am I missing something ? I have a feeling I am :)

brian Wed 14 Mar 2012

You are correct, you haven't use a literal for creating an ordered map. Simplest code to achieve that might be something like:

[:] { ordered = true }.addAll([...])

tcolar Wed 14 Mar 2012

Well I actually had tried that but that doesn't help either because the Litteral you pass to addAll is itself not sorted either

KevinKelley Wed 14 Mar 2012

Hey, I remember reading it that way too -- that "ordered" might mean "sorted" which would have to mean that the map was based on a tree-map, rather than hash-based...

But the docs says explicitly that it's a hash-map, so I guess that "insertion order" is about all that ordered could mean.

I think you're probably wishing for something like a red-black tree, something that maintains order and still gives fast insert/delete. Some time back Ivan put out a Fantom collections package, which I'm pretty sure has TreeMap in it.

@Ivan, by the way, thank you for that.

tcolar Thu 15 Mar 2012

Thanks, but in this case i was not looking for it to be sorted, just to keep the order of insertion.

ivan Thu 15 Mar 2012

I wanted something like this long time ago. The compromise I could find was like this:

fansh> class Foo {
> static Obj:Obj? orderedMap(Obj?[][] list) 
> {
>   result := [:] { ordered = true }
>   list.each { result[it.first] = it.last }
>   return result
> }
> }
sh17::Foo

fansh> Foo.orderedMap(
> [
>   ["a", "b"],
>   ["c", "d"]
> ])
[a:b, c:d]

fansh>

ivan Thu 15 Mar 2012

Thanks, Kevin!

Now this package in fact is moved under xored account here – https://bitbucket.org/xored/collections

Mostly I use it now for json objects:

fansh>  constJson := Json.makeWith {
> it->foo = [1,2,3,5]
> it->bar = ["a": 43]
> }
{
  "foo": [1,2,3,5],
  "bar": {
    "a": 43
  }
}

fansh> persistentCopy := constJson.mutate {
> it->bar->a = 42
> }
{
  "foo": [1,2,3,5],
  "bar": {
    "a": 42
  }
}
fansh>

KevinKelley Thu 15 Mar 2012

I need to use that stuff more -- const everywhere, and the persistent types where modified instances share memory.

Actors are great, and make for solid code. But they've got that "pervasiveness" thing going -- where everything they touch ends up needing to deal with constness. So if you weren't thinking about that up front, then you have issues like lots of naive copying. Since I often bang out a quick test implementation of something, just to get it clear in my head what I'm trying to do, I often find myself coming back and reworking everything to be const-safe.

It's a lot nicer to have good cheap persistent types in the first place.

Ivan, I was just looking again at your multiple dispatch example -- https://bitbucket.org/ivan_inozemtsev/mdispatch. (There's some neat stuff I'm playing around with, PEG parsers like in OMeta and in vpri's Ian Piumarta's Maru, that wants multiple-dispatch). Anyway I don't know enough about how MD ought to work... the mdispatch DSL's call implementation seems to just take the first-matching-definition. Is that a useful scheme, or is it just that the example is intended to be about DSLs, not MD?

I'm trying to get my head around the "best-fit" idea: if you've got defined a draw(shape, canvas), draw(circle, printer), draw(ellipse, screen)... what should the algorithm look like, to find and call the most-specific method.

object
- shape
  - circle
    - ellipse
  - square...
- canvas
  - printer
  - screen

I guess you'd probably want to do the "exclude methods that don't fit actual-args" as in the DSL example, and then do a left-to-right selection of the most-specific matching method...

Anyway, not really a question, just thinking out loud about this interesting code!

brian Thu 15 Mar 2012

Well if anyone has a way we could improve the situation, suggestions welcome. I've often wanted to use ordered/case insensitive with literals myself.

tcolar Thu 15 Mar 2012

Maybe a facet might help (for literals like list/map):

@Ordered Str:Str map := ["a":"1"]

Then the compiler could make it "ordered" right at construction time before adding the values

Maybe the facet could be more generic

@LiteralOption(ordered=true) // to set that option on the object at construction before setting the values
Str:Str map := ["a":"1"]

Or something among those lines ... can't really think of much else.

jstaudemeyer Thu 15 Mar 2012

The simplest solution is to let the map literal always generate something like the Java LinkedHashMap. Groovy went that way, it was a nonbreaking change that didn't harm anybody but proved to be very useful.

StephenViles Thu 15 Mar 2012

+1 for jstaudemeyer's idea: maps instantiated by a literal are ordered.

DanielFath Thu 15 Mar 2012

I think what people would like most (at least people that I've talked to on other blogs) is that Map could easily be replaceable with a different implementation. In other words ideally you could choose or plug in different Map types depending on the need. It's probably too broad but I think Map litteral syntax should be just a syntax sugar over a Map class or a mixin, which you could extend and changed as you wish. It's probably way too broad for this particular solution, but it should be one solution we strive for.

dobesv Fri 16 Mar 2012

jstaudemeyer and StephenViles seem to be on to something - the map is separated by commas, it might be intuitive that it would be ordered by default. People are already accustomed to this in javascript and apparently Groovy. For the few performance hotspots where using an ordered map is a problem, we could have a way to make an unordered map, for example by using the less friendly syntax Map().add().add().add()

Another approach that comes to mind is to have an it-block like approach where you can put a map literal after something and it adds the keys and values into the map in a standard way. Perhaps

MyMap() ["foo":"bar", "baz":"quux"]
// =>
MyMap().add("foo","bar").add("baz", "quux")
// or
Map() { ordered = true } ["foo":"bar", "baz":"quux"]
// =>
Map() { ordered = true }.add("foo","bar").add("baz", "quux")

This would work for lists as well:

LinkedList() ["foo", "bar", "buzz"]
// =>
LinkedList().add("foo").add("bar").add("buzz") }

This could also support a syntax where you choose a different method to call besides add, for example:

MyQueue.enqueue ["foo", "bar", "baz"]
// =>
MyQueue.enqueue(foo).enqueue(bar).enqueue(baz)

Thoughts?

Akcelisto Fri 16 Mar 2012

I think we may extend current conventions. Let's that custom classes use ":"-syntax as class Map like custom classes use ","-syntax as class List.

OrderedMap{"a":"1", "b":"2", "c":"3", ......}

We already have

Class definition:

@Serializable { collection = true }
class Person
{
  Void add(Person kid) { kids.add(kid) }
  Void each(|Person kid| f) { kids.each(f) }
  Str name
  @transient private Person[] kids := Person[,]
}

Creating and serialization:

Person
{
  name = "Homer Simson"
  Person { name = "Bart" },
  Person { name = "Lisa" },
  Person { name = "Maggie" },
}

Compiles to:

Person
{
  name = "Homer Simson"
  this.add(Person { name = "Bart" })
  this.add(Person { name = "Lisa" })
  this.add(Person { name = "Maggie" })
}

I suggest

Class definition:

@Serializable { map = true }
class Person
{
  Void set(Str key, Person kid) { kids[key]=kid }
  Void each(|Str key, Person kid| f) { kids.each(f) }
  Str name
  @transient private Str:Person kids := Str:Person[:]
}

Creating and serialization:

Person
{
  name = "Homer Simson"
  "key1":Person { name = "Bart" },
  "key2":Person { name = "Lisa" },
  "key3":Person { name = "Maggie" }
}

Compiles to:

Person
{
  name = "Homer Simson"
  this["key1"] = Person { name = "Bart" })
  this["key1"] = Person { name = "Lisa" })
  this["key1"] = Person { name = "Maggie" })
}

brian Fri 16 Mar 2012

I've actually been thinking along lines of @Akcelisto myself. One thing I did our product's scripting language was use just a simple colon instead of ":=" to declare new variables and I've found a prefer that, so been thinking of myself if/how just colon gets used in Fantom, since we don't have labels like Java its actually a free operator.

The other benefit of maybe using colon is that in that case we could say that identifier keys are treated as strings:

Str:Int[:] { one: 1, two: 2, three: 3 }

Akcelisto Fri 16 Mar 2012

Str:Int[:] { one: 1, two: 2, three: 3 }

But this may clash with named params of methods. Or named params is not important?

KevinKelley Fri 16 Mar 2012

Add named params workalike with

sys::Func.callMap({one: 1, two: 2, three: 3})

Get rid of <map> literal from grammar; add

<def>  ::= <id> ":" <expr>

with <def> appearing somewhere in <expr> tree,

then use it-block syntax to construct a map...

andy Fri 16 Mar 2012

{ one: 1, two: 2, three: 3 }

I think about this more as the anonymous object syntax. Maybe just using Maps is our poor mans implementation for anonymous objects. But then we lose out on the consumption side (which is really what we want it for):

obj := { msg:"Foo", result:12 }

echo("$obj.msg: $obj.result")             // this is alot nicer...
echo(obj["foo"] + ": " + obj["result"])   // ...than this

This probably remains my #1 feature missing from Fantom.

dobesv Sun 18 Mar 2012

Ah I didn't know about that existing syntax for calling add() implicitly. Extending that syntax to support a map version seems like a good idea.

dobesv Sun 18 Mar 2012

andy: I wonder if Map could override trap() so that you can do:

obj := [ "msg":"Foo", "result":12 ]

echo("$obj->msg: $obj->result")             // this is alot nicer...
echo(obj["foo"] + ": " + obj["result"])   // ...than this

Not as big a saving in characters as your version, but a smaller delta from what's already there.

andy Mon 19 Mar 2012

I wonder if Map could override trap()

We do that for a similar construct in our commercial software. Main reason we haven't added it to Map, is because it conflicts with reflection: map.size vs map["size"]

SlimerDude Sun 20 Apr 2014

Hmm... this would still be nice to have!

As Maps already have a special literal syntax, would it not be easiest to postfix an it-block at the end of the declaration. Sorta like:

map := ["a":"1", "b":"2", "c":"3", ...] { ordered = true }

ivan Tue 22 Apr 2014

Hmm, that might be a nice idea for collection literals -- execute it-blocks after object creation, but before filling it with items. I took a look to a compiler and it seems that this would require not so many changes:

Add a field ClosureExpr? itBlock to MapLiteralExpr
Add an optional setting of it at the end of Parser#mapLiteralExpr method:
```
if (peekt === Token.lbrace) 
{
  consume
  map.itBlock = tryItBlock
}
```

If itBlock is set, add an invocation of with in CodeAsm#mapLiteral:

if (map.itBlock != null) 
{
  expr(map.itBlock);
  op(FOp.CallVirtual, fpod.addMethodRef(ns.objWith));
}

I made a very primitive search through existing sources of Fantom libraries and few other projects to see whether there is a code which can potentially broken by this change, and couldn't find anything dangerous -- mostly it-blocks are used after empty collection literals, except ObixUtil, but still it-block is used just to set a default value, so it's safe to execute it-block before adding elements.

Just for a reference, to search through sources I used this line in bash:

grep -r "] {" . --include="*.fan" | grep -vE "\*\*|,]|:]"

brian Tue 22 Apr 2014

@ivan's suggestion is not that bad, just say that an it-block after a list/map literal is applied before the adds. Although it feels a little icky to me

SlimerDude Sun 18 May 2014

Although it feels a little icky to me

Maybe the implementation does, but from a syntax point of view it is a sound progression. Where you once declared an empty map, you now declare the map values:

map := [:] { ordered = true }
map := [1:2, 3:4] { ordered = true }

The map values are still contained within [ and ], only with an empty map it reduces to a single :

Compare to the current alternative:

map := [:] { ordered = true }.add(1, 2).add(3, 4)

Now that is icky!

Or maybe when you declare a map with multiple values, it could default to being ordered? For that's usually what you would expect?

brian Wed 21 May 2014

I talked to Andy and he likes that change. However, I am not on board yet...

I don't like making map literals ordered by default, because that internal data structure isn't as optimal as simple hash map. Maybe some performance testing is in order though since that would be simplest solution ...

Although it-block syntax is ok, the fundamental problem is that to implement it means that you run the it-block before adding the elements. You are just focused one use case, an it-block where ordered is set to true. But the compiler and language must be unambiguous in all cases. For example what would this code do:

map := [1:2, 3:4] { remove(1) }

Today that works and does what you would expect.

fansh> map := [1:2, 3:4] { remove(1) }
[3:4]

So when I say its icky, I mean its an extremely ugly semantic change to how it-blocks work with map literals. And we would we say the same thing for lists? Consistency would dictate that we do that too. So we would effectively be saying the it-block which comes after the declaration is run before.

I would like to solve the problem, but honestly this solution seems much much worse than the pain cased by the problem.

SlimerDude Wed 21 May 2014

You are focused on one use case, an it-block where ordered is set to true.

Yep, I am! Or rather was. Thanks for the update and clarification, I now see your dilemma... Hmm... -ponders over Map it-block usage-

map literals ordered by default

It was just an afterthought, but as you mentioned it I looked up some Java Map performance stats. A good comparison is given by Bruce Eckel:

http://www.artima.com/weblogs/viewpost.jsp?thread=122295

---------- HashMap ----------
 size     put     get iterate
   10     281      76      93
  100     179      70      73
 1000     267     102      72
10000    1305     265      97

------- LinkedHashMap -------
 size     put     get iterate
   10     354     100      72
  100     273      89      50
 1000     385     222      56
10000    2787     341      56

LinkedHashMap tends to be slower than HashMap for insertions because it maintains
the linked list (to preserve insertion order) in addition to the hashed data 
structure. Because of this list, iteration is faster.

Stackoverflow says the same, LinkedHashMaps have faster iteration, but slower insertion.

HashMap vs LinkedHashMap performance in iteration over values()

brian Thu 22 May 2014

The more I think about it, having map literals ordered by default would be ideal. But those performance differences in get are pretty significant. We use map literals in our code extensively since we are always working with dynamic associative arrays. So I'd be disinclined to say we should make a compiler change to always use ordered maps.

I still really hate the idea of running a map literal's it-block before the adds. So I think we need alternate ideas... Maybe something related to how the comma operator implicitly calls add:

map { ordered = true; a: b, c: d }

I think Scala made that work using some special magic with a colon operator

SlimerDude Fri 12 Sep 2014

map { ordered = true; a: b, c: d }

Hmm... that doesn't look too pretty to me. I'd like to still use square brackets [...] to define the Map data - otherwise it gets confusing.

(As an aside, I'm still not clear on where the current comma / add operator works and where it doesn't!)

I'd be disinclined to say we should make a compiler change to always use ordered maps.

I don't really know how Maps are defined internally, but...

How about then, Maps are ordered by default and in the Map's ctor, after running the it-block, if ordered is still false then the LinkedHashMap contents are copied into a new HashMap object.

Plus, you'd only need to create the LinkedHashMap if an it-block is supplied, thus normal Map constructs don't incur any extra overhead.

I'll use pseudo code for the map ctor to explain better:

new make(|This f|? f := null) {
  map = (f == null) ? new HashMap() : new LinkedHashMap()

  f?.call(this)

  if (ordered == false && map is LinkedHashMap)
    map = new HashMap().addAll(map)
}

which would work like:

map = [:]                            // no it-block so HashMap created by default
map = [1:2, 3:4]                     // no it-block so HashMap created by default
map = [1:2, 3:4] { ordered = true }  // LinkedHashMap created and kept
map = [:] { caseInsensitve = true }  // LinkedHashMap created and replaced with HashMap

That way, a slight overhead is only ever incurred by those times the internal map is replaced. And when declaring maps literally in code, they're never that big so the overhead should be negligible.

andrey Sun 14 Sep 2014

Different implementations of hash maps is what we shall keep in mind in regard with this topic. We have ordinary hash maps, linked hash maps, need to have immutable hash maps, some would like to have tree maps (ordered, by keys), etc. And there is only one literal now to handle all this stuff.

Also, ideally, I would like to provide own hash map implementation, and still benefit from hash map literal.

Clojure experience with persistent (immutable) structures, lead to mutable structures to be used for immutable structure initialization, e.g. you can prepare mutable list/hashmap and pass to immutable constructor, which in some cases is much faster.

So what about to leave current hash map literal as is to construct ordinary hash maps, and use them as a constructor parameter for other structures.

peristent := ImmutableMap([“a”:”b”])
linked := LinkedHashMap([“a”:”b”])

And so on… No need to change anything. Stuff like { ordered = true } is confusing: it is not clear if keys ordered by value (TreeMap) or by the order in which they are added (LinkedHashMap). Basically any options supplied with literal will affect on implementation class, so I prefer to define implementation class explicitly.

SlimerDude Wed 17 Sep 2014

linked := LinkedHashMap([1:"a", 2:"b"])

Ah, I also went down that line of thinking but it doesn't work because it requires the compiler to know of the different Map types / implementations and interpret the syntax differently for each.

Different implementations of hash maps is what we shall keep in mind

Why? From a user and API point of view an ordered (by insertion) map is all you could ever want.

@Brain: map literals ordered by default would be ideal

Yup, the only reason why it's not (and why you might want other implementations) is performance. Sigh...

some would like to have tree maps (ordered, by keys)

I would argue that Map should have:

Map.sortByKey()
Map.sortByVal()

But that's a different topic! (I've actually wanted these methods in the past!)

I would like to provide own hash map implementation

You pretty much can... you have @Get and @Set operators, then to define and use your own Map:

class MyMap {

  new make(Map map) {
    ...
  }

  ...
}

MyMap([1:"a", 2:"b"])

But of course, MyMap doesn't extend Map - but is that actually a problem?

As for multiple implementations...Objective C went for the one generic, multi-purpose Array and Map with NSArray and NSDictionary which many find superior to Java's tangled web they call the Collections Framework.

Me, I'll just be happy if I can shoe-horn ordered map literals into the Fantom syntax somehow!

rasa Thu 18 Sep 2014

@SlimerDude

some would like to have tree maps (ordered, by keys)

I would argue that Map should have:

Map.sortByKey()

Map.sortByVal()

Tree map should be sorted implicitly, not explicitly.

SlimerDude Sun 25 Jan 2015

After recently playing with ElasticSearch and JSON I ran into this ordered map literal issue again. Ivan's JSON library is quite cool, but getting the following Fantom syntax to work would be much cooler:

map := [1:2, 3:4] { ordered = true }

Combining my last idea with Ivan's code insight, I think the following is a viable solution:

The Idea

Let's look at the Map syntax and how often each is / will be used.

map = [:]                               || most used
map = [1:2, 3:4]                        ||
map = [1:2, 3:4] { it.ordered = true }  ||
map = [1:2, 3:4] {   ... other ...   }  \/ least used

Map literals with it-blocks aren't used that much, but when they are it is mostly to set map flags. So lets use this knowledge to tweak how Maps are created...

Before the Code Assembler creates a Map, it checks if an it-block is supplied or not. If it is supplied then, guessing the map will eventually become ordered, it creates a Map with a LinkedHashMap. If an it-block is not supplied then it creates a standard Map with a HashMap. (Note the it-block has not been called yet.)

The Code Assembler may then set the literal values as usual.

Next. If an it-block was supplied (and only if) then the Code Assembler calls a special method on the Map, passing in the it-block. This method runs the it-block and checks the status of the ordered flag.

If ordered is now true then happy days, because the map is already backed by a LinkedHashMap.

If ordered is still false then we guessed wrong and we revert the map back to a HashMap, resetting the literal values.

Analysis

Looking back to our map usage, the only time we ever guess wrong is for the bottom, least used example. That is, when an it-block it used and the ordered flag is not set.

Is this a problem? I believe not because:

Such constructs are rarely used,
The extra work is only creating a HashMap and re-setting 5 or 10 map literals.

All things considered, it's hardly an overhead.

On the plus side, all current functionality stays exactly the same. Even the it-block is called after the literals are set.

The Implementation

The implementation should be pretty simple (I've been looking at the compiler code). It starts by following Ivan's points of:

Adding a ClosureExpr? itBlock field to MapLiteralExpr
Have the Map literal Parser look for an it-block
Altering the Code Assembler to call an extra map method.

The Map implementations would also need that extra method to run the it-block and check the ordered flag. (Should only be 3 or 4 lines.) Granted this extra method is a bit messy, but it can easily be hidden from public view.

Testing ordered map literals is then as simple as:

map := Int:Int[5:5, 4:4, 3:3, 2:2, 1:1] { ordered = true }
verifyEq(map.keys, [5, 4, 3, 2, 1])

Ta daa! How about it?

brian Thu 29 Jan 2015

I think the fundamental problem is that we should just make map literals ordered by default and that is the end of the story. From a programming perspective that is most elegant with no real rules. The only reason not to do that is performance of LinkedHashMap is a little slower than HashMap.

We use an insane number of map literals in our code so I'm very sensitive to performance. However, 90% of the time we end up never adding additional items to them and/or we turn them into immutable maps. So I actually think there is opportunity to optimize this and make it much faster and more memory efficient. For example the optimal way to store a map of two key/value pairs is like this:

class Map2
{
  Obj? get(Obj key) 
  {
    if (key == key0) return val0
    if (key == key1) return val1
    return null
  }

  Obj key0
  Obj? val0
  Obj key1
  Obj? val1
}

I do some of this optimization already in our code at a higher level, and that is always much faster with far less RAM/GC usage up to about 8-12 pairs. But I've been thinking if I bring it down into Java that I can super optimize it for sys::Map directly and make toImmutable much faster. With some compiler analysis we might even be able to make choices on optimal backing store at compile time.

So that is where I really want to go, but its going to take a little while to get there.