#2652 List.groupBy()

SlimerDude Tue 10 Oct 2017

Following on from this topic I think the following (parametrised) utility method would be a really useful addition to List:

** Groups items according to the return value of the given function.
** Items that return the same object are grouped together.
** All groups are placed in a Map, keyed off the object they were grouped by.
**
** Example:
**   animals := ["cat", "cow", "donkey", "dog", "snake"]
**   animals.groupBy |Str item -> Str| { item.chars.first.toChar }
**     // --> [c:[cat, cow], d:[donkey, dog], s:[snake]]
**
** Note the returned map is ordered, allowing you to split a List into
** arbitrary sections by using a function that operates on the index.
**
** Example:
**   [1,2,3,4,5].groupBy |v, i| { i / 3}.vals  // --> [[1,2,3], [4,5]]
**   [1,2,3,4,5].groupBy |v, i| { i % 2}.vals  // --> [[1,3,5], [2,4]]
**
[Obj:V[]] groupBy(|V item, Int index -> Obj| c)

I've found the exact same method here:

And methods that replicate taking the vals (in the 2nd split example) are found here:

I find a lot of use for the function, often to categorise data (as per the 1st animal example) and to split data into display columns (as per the 2nd split example).

As for Fantom, it has potential to be used in:

  • sys::Uri.query()
  • web::WebUtil.parseHeaders()
  • web::WebUtil.parseQVals()

And probably many other places once you know it exists.

Because groupBy() returns a parametrised Map, the compiler pod first needs to be updated to handle these types:

diff -r a60c8b8decfb src/compiler/fan/namespace/GenericTypes.fan
--- a/src/compiler/fan/namespace/GenericTypes.fan	Fri Oct 06 09:21:02 2017 -0400
+++ b/src/compiler/fan/namespace/GenericTypes.fan	Tue Oct 10 17:27:55 2017 +0100
@@ -86,10 +86,17 @@
     {
       t = parameterizeFuncType(nn)
     }
+    else if (nn is MapType)
+    {
+      t = parameterizeMapType(nn)
+    }
+    else if (nn.name.size == 1)
+    {
+      t = doParameterize(nn.name[0])
+    }
     else
     {
-      if (nn.name.size != 1) throw Err("Cannot parameterize $t.qname")
-      t = doParameterize(nn.name[0])
+      throw Err("Cannot parameterize $t.qname - $nn.typeof $nn")
     }
     return nullable ? t.toNullable : t
   }
@@ -106,6 +113,13 @@
     return FuncType(params, t.names, ret)
   }
 
+  private CType parameterizeMapType(MapType t)
+  {
+    k := parameterize(t.k)
+    v := parameterize(t.v)
+    return MapType(k, v)
+  }
+
   abstract CType doParameterize(Int ch)
 
 //////////////////////////////////////////////////////////////////////////

And here's a patch for List.groupBy() complete with Java and Javascript implementations, and a couple of unit tests.

diff -r a60c8b8decfb src/sys/fan/List.fan
--- a/src/sys/fan/List.fan	Fri Oct 06 09:21:02 2017 -0400
+++ b/src/sys/fan/List.fan	Tue Oct 10 17:34:51 2017 +0100
@@ -544,6 +544,25 @@
   **
   L intersection(L that)
 
+  **
+  ** Groups items according to the return value of the given function.
+  ** Items that return the same object are grouped together.
+  ** All groups are placed in a Map, keyed off the object they were grouped by.
+  **
+  ** Example:
+  **   animals := ["cat", "cow", "donkey", "dog", "snake"]
+  **   animals.groupBy |item -> Str| { item.chars.first.toChar }
+  **     // --> [c:[cat, cow], d:[donkey, dog], s:[snake]]
+  **
+  ** Note the returned map is ordered, allowing you to split a List into
+  ** arbitrary sections by using a function that operates on the index.
+  **
+  ** Example:
+  **   [1,2,3,4,5].groupBy |v, i| { i / 3}.vals  // --> [[1,2,3], [4,5]]
+  **   [1,2,3,4,5].groupBy |v, i| { i % 2}.vals  // --> [[1,3,5], [2,4]]
+  **
+  [Obj:V[]] groupBy(|V item, Int index -> Obj| c)
+
 //////////////////////////////////////////////////////////////////////////
 // Utils
 //////////////////////////////////////////////////////////////////////////
diff -r a60c8b8decfb src/sys/java/fan/sys/List.java
--- a/src/sys/java/fan/sys/List.java	Fri Oct 06 09:21:02 2017 -0400
+++ b/src/sys/java/fan/sys/List.java	Tue Oct 10 17:34:51 2017 +0100
@@ -14,6 +14,7 @@
 import java.util.Collection;
 import java.util.Comparator;
 import java.util.HashMap;
+import java.util.LinkedHashMap;
 import fanx.serial.*;
 import fanx.util.*;
 
@@ -917,6 +918,35 @@
     return acc;
   }
 
+  public final Map groupBy(Func f)
+  {
+    Type r = f.returns();
+    if (r == Sys.VoidType) throw ArgErr.make("func must return a value");
+
+    HashMap groups = new LinkedHashMap(size);
+    Object  key;
+    Object  val;
+    List    list;
+    for (int i=0; i<size; ++i)
+    {
+      val = values[i];
+
+      if (f.arity() == 1)
+        key = f.call(val);
+      else
+        key = f.call(val, Long.valueOf(i));
+
+      if (!groups.containsKey(key))
+        groups.put(key, new List(of, 16));
+
+      list = (List) groups.get(key);
+      list.add(val);
+    }
+
+    MapType mapType = new MapType(r, typeof());
+    return new Map(mapType, groups);
+  }
+
 //////////////////////////////////////////////////////////////////////////
 // Utils
 //////////////////////////////////////////////////////////////////////////
diff -r a60c8b8decfb src/sys/js/fan/List.js
--- a/src/sys/js/fan/List.js	Fri Oct 06 09:21:02 2017 -0400
+++ b/src/sys/js/fan/List.js	Tue Oct 10 17:34:51 2017 +0100
@@ -756,6 +756,32 @@
   return acc;
 }
 
+fan.sys.List.prototype.groupBy = function(f)
+{
+  var r = f.returns();
+  if (r == fan.sys.Void.$type) throw fan.sys.ArgErr.make("func must return a value");
+
+  var groups = new fan.sys.Map.make(r, this.$typeof());
+  groups.ordered$(true);
+  var  key, val, list;
+  for (var i=0; i<this.m_size; ++i)
+  {
+    val = this.m_values[i];
+
+    if (f.m_params.size() == 1)
+      key = f.call(val);
+    else
+      key = f.call(val, i);
+
+    if (!groups.containsKey(key))
+      groups.set(key, fan.sys.List.make(this.m_of));
+
+    list = groups.get(key);
+    list.add(val);
+  }
+  return groups;
+}
+
 //////////////////////////////////////////////////////////////////////////
 // Utils
 //////////////////////////////////////////////////////////////////////////
diff -r a60c8b8decfb src/testSys/fan/ListTest.fan
--- a/src/testSys/fan/ListTest.fan	Fri Oct 06 09:21:02 2017 -0400
+++ b/src/testSys/fan/ListTest.fan	Tue Oct 10 17:34:51 2017 +0100
@@ -1142,6 +1142,35 @@
   }
 
 //////////////////////////////////////////////////////////////////////////
+// GroupBy
+//////////////////////////////////////////////////////////////////////////
+
+  Void testGroupBy()
+  {
+    verifyErrMsg(ArgErr#, "func must return a value")
+    {
+      [,]->groupBy |Obj o| { }
+    }
+
+    animals := ["cat", "cow", "donkey", "dog", "snake"]
+    groups  := animals.groupBy |Str item -> Int| { item.chars.first }
+
+    verifyEq(groups.typeof, Int:Str[]#)
+    verifyEq(groups.ordered, true)
+    verifyEq(groups['a'], null)
+    verifyEq(groups['c'], ["cat", "cow"])
+    verifyEq(groups['d'], ["donkey", "dog"])
+    verifyEq(groups['s'], ["snake"])
+    verifyEq(groups.size, 3)
+
+    split := [1,2,3,4,5].groupBy(|Int v, Int i->Int| { i / 3}).vals
+    verifyEq(split, [[1,2,3], [4,5]])
+
+    split = [1,2,3,4,5].groupBy(|Int v, Int i->Int| { i % 2}).vals
+    verifyEq(split, [[1,3,5], [2,4]])
+  }
+
+//////////////////////////////////////////////////////////////////////////
 // Sort
 //////////////////////////////////////////////////////////////////////////

Jur Tue 4 Jun 2019

+1 - I find this very useful! Thanks SlimerDude!

brian Wed 1 Jul 2020

Ticket promoted to #2652 and assigned to brian

still not 100% sure about that one, but we should track

SlimerDude Thu 16 Jul 2020

Thanks.

It's essentially the beginnings of a Bucket sort algorithm but without the final sort.

To put it in Axon terms, it's the beginnings of a hisRollup where the returned Map vals are all ready to be folded.

With the recent addition of:

  • List.flatMap()
  • List.containsSame()
  • List.addIfNotNull()

And the possible addition of:

  • List.getSafeRange()
  • List.getClippedRange()

I believe this too could greatly enhance the core Fantom API.

brian Fri 7 May 2021

I have checked in the implementation of this new method. Slightly different from your changeset Steve, but basically the same (I had forgotten you actually provided a whole changeset)

Looking thru some of my own use cases though, it was also important to support the use case where the map was already created. Sometimes this was because the map was created sorted/case-insensitive. Other times its because we are grouping several lists. To support this case without creating intermediate maps, I also added a groupByInto method which takes an existing map to group into. That also frees up the groupBy from always having to create an ordered map so you don't need the performance hit unless you need it (Steve's proposed behavior).

With the compiler bug Steve noted, we will need to wait a cycle to keep 3.0.76 bootstrap buildable by 3.0.75. So this change will be in 3.0.77

brian Tue 11 May 2021

Ticket resolved in 1.0.77

Now that build 1.0.76 is posted, I've uncommented this and it will officially be in 1.0.77

SlimerDude Wed 12 May 2021

That's great news, thanks!

Login or Signup to reply.