#86 Substring boundry handling

andy Thu 15 Jun 2006

Here is how a few other languages handle substrings on the boundary of the source string:

Ruby:

  irb(main):001:0> s = "four"
  => "four"
  irb(main):002:0> v = s[4..-1]
  => ""
  irb(main):003:0> v = s[5..-1]
  => nil

Java: 

  String s1 = "four";
  String s2 = s1.substring(4);
  System.out.println("@@@ s2.length: " + s2.length()); 

  @@@ s2.length: 0

  String s3 = s1.substring(5);
  System.out.println("@@@ s3.length: " + s3.length()); 

  java.lang.StringIndexOutOfBoundsException: String index out of range: -1
          at java.lang.String.substring(String.java:1768)
          at java.lang.String.substring(String.java:1735)

C#:

  string s = "four";
  string v = s.Substring(4);
  Console.WriteLine("@@@ v.length: " + v.Length);

  @@@ v.length: 0

  v = s.Substring(5);
  Console.WriteLine("$$$ v.length: " + v.Length);  

  System.ArgumentOutOfRangeException: startIndex cannot be larger than 
    length of string.
  Parameter name: startIndex
     at System.String.InternalSubStringWithChecks(Int32 startIndex, 
       Int32 length, Boolean fAlwaysCopy)
     at HelloNameSpace.HelloWorld.Main(String[] args)

I think we should be consistent with these languages on the first case:

s := "four"
v := s[4..-1]  // returns zero-length Str ""

The other case probably is safer to throw an exception.

andy Tue 20 Jun 2006

So I think we should handle Range for Str.slice as follows:

if (startIndex == Str.size)
  return ""

else if (startIndex < 0 || startIndex > Str.size)
  throw Err

else if (endIndex < 0 || endIndex > Str.size)
  throw Err

else
  normal slice behavoir

brian Wed 21 Jun 2006

Your comment doesn't match your code - if startIndex == size, you skip or do not skip the check for endIndex?

andy Wed 21 Jun 2006

That is correct - it trumps the endIndex. I'll clean up the code.

andy Thu 22 Jun 2006

Actually it worked out that a single change in Range (to allow startIndex == size to be in range and valid) caused List, Buf, and Str to work without change - as long as endIndex is in range.

So the new behavoir:

"abc"[3..-1]   -> ""
[1,2,3][3..-1] -> [,]

buf := Buf.make
buf.setCharset(Charset.utf16BE)
buf.write(0xaa).write(0xbb).write(0xcc)
buf[3..-1].toStr -> "0x[]"

But if the endIndex is out of bounds, an IndexErr will be still be thrown.

Login or Signup to reply.