#580 List cdr/eachRange method

brian Thu 7 May 2009

I keep finding myself wanting a method on List which iterates from index 1 to the end (skipping the first item) - kind of like cdr in Lisp.

Dummy example (if we didn't have List.join):

s = list.first.toStr
list.something |x| { s += ",$x" }

What should we call such a method?

Another option would be eachRange:

s = list.first.toStr
list.eachRange(1..-1) |x| { s += ",$x" }

andy Thu 7 May 2009

Yeah I've wanted to iterate a subrange a few times myself. I think eachRange is what I'd like to have.

qualidafial Thu 7 May 2009

Functionally this would be equivalent to list.slice(range).each, but without the copy overhead, correct?

brian Thu 7 May 2009

Functionally this would be equivalent to list.slice(range).each, but without the copy overhead, correct?

Correct. Right now you have to use a for loop which seems a little un-Fan.

qualidafial Thu 7 May 2009

I was just thinking that slice on a read-only list should share the same backing array as the original read-only list, the same way java.lang.String.substring shares its char[] array with the original string.

JohnDG Fri 8 May 2009

What you really want, qualidafial, is probably something akin to the JFC's subList, which is merely a view on the original collection. This works for any collection.

s = list.first.toStr
list.sub(1..-1).each |x| { s += ",$x" }

Of course, slice on an immutable list (not just read-only -- that's insufficient) should indeed share the same backing.

KevinKelley Fri 8 May 2009

@Brian - I've ran into that thing too, where you want to hit all but the first (or last) of a list. Usually it's in a situation where I need both an element and its preceding (or following) element. Which is a bit awkward; each acts as an iterator on the list, but doesn't give you its neighbors. So, I've done

list.each |item, index| { if (index>0) doSomething(item, list[index-1]) }

more or less.

So I'm ambivalent on the Lisp cdr thing; On the one hand, when I found Lisp and its minimalist variants I really got stoked about how everything could be expressed so elegantly. But on the other hand, when I tried to put it to practical use, I kept finding that it was hard to turn practical problems into elegant answers. At least in reasonable time.

eachRange could be useful; but it's not complete. It helps sometimes but doesn't fix the problem.

I guess eachRange is just an inversion of something like:

(1..<list.size).do |index| { elt := list[index]; prev := list[index-1] ... }

...so maybe I'm suggesting, subsume this into a request for a Range.do method. or something. Range.each gives you:

(1..<list.size).each |i| { echo(list[i]) }

which to me anyway, seems to generalize better.

brian Fri 8 May 2009

Regarding slice - today it does not share the same backing array (since I don't maintain an offset). Although it is a certainly an optimization we can add in the future, it doesn't effect the API semantics. Read-only lists do support the same backing store, and then copy on write to the original mutable list. That tends to be the really common case, which I optimized heavily for.

But I think that is really orthogonal to this discussion, I don't think you should be required to do slice[...].each. There should be a dedicated method for that. So I'm kind of thinking eachRange is the right way to go (which seems more OO than new methods on Range).

JohnDG Fri 8 May 2009

What do you think about List.sub or List.range, retrieving a view of the list, ala java.util.List.subList? This would give you much more than just eachRange.

Helper methods are great, but even greater still are generic methods that can be used to do a lot of different things (and my experience with the JCF suggests that sublists are a very powerful abstraction).

// Loop over everything after first
list.range(1..-1).each |x| { ... }

// Clear everything after first
list.range(1..-1).clear

brian Fri 8 May 2009

What do you think about List.sub or List.range, retrieving a view of the list, ala java.util.List.subList? This would give you much more than just eachRange.

How is that different than slice? Are you suggesting some expected semantics that it tracks modifications to the backing store? That seems kind of confusing to me.

JohnDG Fri 8 May 2009

A sublist is a view of a list backed by the original list.

If you clear a sublist, you clear the portion of the original list that it represents. If you add items to a sublist, they are added to place in the original list that corresponds to the end of the sublist. Etc.

From the outside, a sublist appears to be an ordinary list, and all methods have normal semantics, it's just that the modifications actually occur to a region of the original list.

It's a very powerful construct that allows you to write much cleaner code (instead of keeping track of start/end indices, you just operate on lists, which are sublists of other lists).

JohnDG Fri 8 May 2009

A slice is a copy of a region. A sublist is a view of a region. Both are lists, and were you to discard the original, a sublist would be semantically equivalent to a slice. But the whole point of sublists is that whatever you do to them affects the original list.

JohnDG Fri 8 May 2009

Oh and one more thing: a sublist has virtually no overhead to construct (i.e. no copying of data).

brian Fri 8 May 2009

A sublist is a view of a list backed by the original list.

Are you proposing a mutable sub-view over a mutable list? That seems really confusing to me, and I can't ever imagine a situation where I'd want that.

However, having an efficient way to get an read-only sub-view on a read-only list seems really useful - and doesn't cause any weird side-effects. In which case that is just an optimization on slice for read-only/immutable lists:

subList := list.ro[0..10]

But regardless, I still think we need a eachRange - that seems common enough that it shouldn't warrant multiple methods.

andy Fri 8 May 2009

I think I side with Brian's comments on the sub-view concept. But regardless, eachRange is definitely something common enough to include in the API.

KevinKelley Fri 8 May 2009

The book Beautiful Code has a nice chapter on doing multi-dimensional array iterators for NumPy -- an N-dimensional array package for Python.

The author of that came up with a pretty well-generalized solution, apparently, with the concept of slicing to get access to a subset of the original array, without copying. One example use case that pops to mind is doing image processing -- iterate over an image array, taking 3x3 slices for blur/sharpen/whatever.

So, I think it's a powerful and useful idea. But it's not like it's a quick API change, it's deeper than just sugar.

I would suggest that, if somebody thinks it's useful, start with NumPy as an example and make a pod.

JohnDG Fri 8 May 2009

Are you proposing a mutable sub-view over a mutable list? That seems really confusing to me, and I can't ever imagine a situation where I'd want that.

There's a reason it's in the JCF.

But regardless, I still think we need a eachRange

Fine with me, as long as it stops somewhere. Wouldn't want each sys class to have 50 special purpose methods when 5 generic ones used combinatorially would suffice.

brian Fri 8 May 2009

I added an eachRange method - I think no matter what else we do that is useful.

qualidafial Fri 8 May 2009

There's a reason it's in the JCF.

There's also some problems with the way it's done in JCF. Removals on the sublist are supported but additions are not. Structural changes on the superlist at or before the upper end of the sublist range do not shift the range accordingly on the sublist like you'd hope--it's just a shifted view with no correlation to events that happen in the original. Put in Fan terms:

Int[] a = [1,2,3,4,5]
Int[] b = a.sub(1..3)
echo(b) => [2,3,4]
a.removeAt(0)
echo(b) => [3,4,5]

Even if you could make the list track the original in a light-weight manner, there's still a question of whether additions on the superlist should be included in the sublist at either edge of the range:

Int[] a = [1,2,3,4,5]
Int[] b = a.sub(1..3)
echo(b) => [2,3,4]
a.insert(2, 0)
echo(b) => [2,0,3,4]
a.insert(1, -1)
echo(b) => [-1,2,0,3,4] or [2,0,3,4] ??

Fine with me, as long as it stops somewhere. Wouldn't want each sys class to have 50 special purpose methods when 5 generic ones used combinatorially would suffice.

I totally agree here--it feels like API bloat is starting to creep in. There's a more elegant solution to this somewhere, we just haven't found it yet.

brian Fri 8 May 2009

I totally agree here--it feels like API bloat is starting to creep in. There's a more elegant solution to this somewhere, we just haven't found it yet.

I should note I have a philosophy of a few classes, but as many convenience methods on a given class as makes sense. So I have no problem with 50 methods on List if they are useful - I think that is way better than having methods which operator on Lists spread out across multiple classes (ie I hate the way Java's collection APIs do this). I subscribe to the "HumaneInferace" school of thought. What belongs and what doesn't is really a matter of taste.

But obviously combinatorial methods is better when it makes sense - which is why I tend to have convenience methods on all the key built-in types to translate into other types.

alexlamsl Fri 8 May 2009

Just make sure that it doesn't end up like Perl whereby no two programmer can understand each other's writings ;-)

Login or Signup to reply.