All Topics

#1528 Proposal: value-type improvements

MoOm Fri 13 May 2011

I'd like to propose 3 improvements for value-types that I think Fantom would really benefit from.

Char

Currently, characters in Fantom are represented via an Int (i.e. a 64bit int). I think it as 2 major drawbacks:

It makes the sys API not as clear as it could be. For example, some of the methods of sys::Int only makes sense for actual integers (e.g. mult, mod, pow, shiftl, negate...) and some other methods only makes sense for characters. Actually, there are only a small number of the sys::Int methods that makes sense for both a character and an integer. To me, those two types are very different, and I think that having only one unified type leads to confusion.

An example of confusion that it causes is the toDigit / fromDigit methods. I always mix up those two methods (i.e. which one is the digit, is it the integer or the char?). If we add a real Char type, the API could become:

class Int
{
  Char? toDigit(Int radix := 10)
  static Int? fromDigit(Char ch, Int radix := 10)
}

which I personnaly found a lot clearer.

using 64-bit for a character that is actually 16-bit wide is a waste of memory. It doesn't matter much as we rarely store a lot of raw characters (as we usually use a Str) but it could lead to performance issue if we want to write an application that does heavy character-operations (imagine a word-processor for example).

Int32 / Float32

Int and Float are 64bit wide in Fantom (at least, for the Java/.Net implementation). I think this choice was really wise since, as stated in the intro doc, it makes Fantom "future-proof". But sometimes, I'd like to do heavy computations that don't require more than 32bits, and having to use 64bit numbers would really kill the performances. For example, if I want to write a 3D game for smartphones in Fantom, I cannot use 64bit float for object positions, matrices or vertices. Even if the performance overhead is just 30%, it's hardly acceptable. I can of course use native methods here for heavy computations, but I'll lose portability.

What I propose would be to keep the sys API using Int/Float everywhere, but give the possibility to the user to use a Float32 or an Int32 whenever he feels it is necessary. Implicit conversion between Int and Int32 should occur whenever an Int is stored into an Int32, or the other way around. The same between Float and Float32.

List of value-types should be represented by a value-type array

This last point is maybe more related to the current implementation rather than to the language itself.

The Java implementation of Fantom represents a Fantom Float[] by a Java Object[]. For obvious performance reasons, it'll be really great if it was implemented by a Java double[]. We will get rid of boxing in List which would be really great.

I'm not sure whether some features of the language actually prevent this from happening. To me, List covariance seems to make it really hard to implement. How could we implement Obj[] list := Int[1, 2, 3] if Fantom Int[] were implemented by Java double[]? I don't know if some other features makes this impossible, but I'd definitely prefer to lose covariance on value-type lists if I get no boxing when I add/get an Int into a list.

Real value-type arrays and 32bit numbers would allow Fantom programs to get the same performances as a Java-written program.

What do you think about all of this?

vkuzkokov Sat 14 May 2011

Char

Introduction of Char will likely reduce the need for methods like StrBuf.addChar and create one canonical representation of character instead of Int and one-character Str. What I think should be considered is the representation of characters beyond U+FFFF (which doesn't work properly now).

Int32 / Float32

Luckily, I didn't have to do optimization in Fantom other than algorithmic. This proposal seems to work best with the next one.

List of value-types should be represented by a value-type array

The only thing I know that will stop work is

Obj?[] list := Int[1, 2, 3]
list.add(null)

which shouldn't have worked in the first place. Implementation-wise it's done similar to java.util.EnumSet. Fantom even has an advantage here: there's less difference between constructor and factory. Also, we will lose List(Int#, 10) syntax which isn't widely used anyway. All in all implementing this proposal shouldn't break much.

That brings me to idea, maybe we should be allow

Constructor syntax for factories

Sth(x) will call Sth.make if it's static method and (possibly) its return type fits Sth. This way we'll be able to keep source-level backward compatibility if we choose to implement last proposal and help with other cases of "Introduce factory" refactoring.

brian Mon 16 May 2011

Regarding Char vs Int, I personally think that was a great simplification. A character as a Unicode code point is just an integer. Sure you could argue that deserves some sort of special "Int subclass", but it is just one of many such cases (I think Stephen has argued for some mechanism like that). But at this point the main issue is that a massive breaking change like that is out of the question.

Regarding alternate int/float sizes, I still think we made the right design decision to keep them out of the core. At the time (circa 2005), I thought most servers would be switched over to 64-bit machines by now, just like about every 16-bit machine switched to 32-bit. But I think due to the focus on lower power, that 32-bit machines will continue to play an important role for many years to come.

The the difference b/w 32 and 64 primitives comes into play when you have lists of such. And in that case since we box both Int/Float, the boxing is a much bigger hit than the 32-bit vs 64-bit. We have previously discusses ways List could implicitly store Int/Floats without boxing, but there are some technical challenges with how that works without still having to box/unbox thru the generic methods.

I think eventually a dedicated math pod that defines optimized arrays of various bit widths would probably be the best solution. Although for performance reasons it wouldn't be able to leverage something like generic sys::List.

MoOm Mon 16 May 2011

Thanks for your answers.

Regarding Char vs Int, I personally think that was a great simplification. A character as a Unicode code point is just an integer. Sure you could argue that deserves some sort of special "Int subclass", but it is just one of many such cases (I think Stephen has argued for some mechanism like that). But at this point the main issue is that a massive breaking change like that is out of the question.

A char is indeed just an integer, but to me, what defines a type is the set of methods that can be invoked on it rather than its representation. And in this regard, a char is definitely different from an integer. 2030.toLower just makes no sense. And neither does `a` % `b`. What I had in mind was not "Int subclassing", but making Char an independent type with no relationship with Int at all (except that it could be explicitely converted from/to an Int). The current Int API could be easily split into two almost-distinct sets of methods. But I agree that it's definitely a massive breaking change, so I won't insist for it. :) It's just that it is easier to do it right now than later (if ever it should happen).

Regarding alternate int/float sizes, I still think we made the right design decision to keep them out of the core. At the time (circa 2005), I thought most servers would be switched over to 64-bit machines by now, just like about every 16-bit machine switched to 32-bit. But I think due to the focus on lower power, that 32-bit machines will continue to play an important role for many years to come.

I wouldn't mind that alternate-sized ints/floats were not part of sys if it had no consequences on performances. But adding them in an extra native-pod makes it impossible to get them as value-types and we will lose the ability to get optimized code generation from the compiler/vm. It'll just be like boxing a Java int or float.

Of course, we could have native higher-level objects such as vectors and matrices, but what if I want to write my own numerical algorithm? What if I want to write a physics engine, a collision detection algorithm or a graph algorithm. I know that Fantom won't get me the same perfs as C but it's a pity that I can't get similar performances as with Java.

But in any case, I agree with you that as long as sys::List does boxing, there is no point at trying to get better performances by using 32bits numbers.

jodastephen Mon 16 May 2011

I agree that Char and Int are two separate types. When evaluating types I always look at the valid operations, and the operations between those two concepts are very different. And preventing stupid errors definitely should be part of the APIs job.

While I appreciate there is a Fantom style of fewer classes, I can't say I agree with it in some cases - this being one of them.

(BTW MoOm, a Char would need to be 32 bit due to Unicode restrictions, but doing so in Fantom would make the "premature optimiser" group of developers choose it as an int to avoid the "overhead of 64 bit Int". My preference would be constraints on ints, such as Int<1..31>, which allows the compiler/runtime to pick an appropriate underlying size.)

brian Tue 17 May 2011

A char is indeed just an integer, but to me, what defines a type is the set of methods that can be invoked on it rather than its representation. And in this regard, a char is definitely different from an integer.

I think you can definitely make a case (either way). But I think at this point the main issue is the time is past for such a big breaking change.

DanielFath Tue 17 May 2011

If not now then when? The later the change occurs the more difficult will it be to make it.

brian Tue 17 May 2011

If not now then when? The later the change occurs the more difficult will it be to make it.

I think the answer is never unless we decided to make some breaking 2.0 release.