All Topics

#369 Primitives and Nullable Types

brian Mon 6 Oct 2008

Case for Primitives

This is a continuing discussion from previous topics:

One of the big decisions I made early on in Fan was to stick with a pure OO type system. This means that currently all objects are boxed object references including numerics. But since the JVM summit, I've been thinking deeply about elegant, efficient interop with Java and .NET libraries. Since then I've done a 180 on support for primitive types. I think Fan needs primitive support for two reasons:

Fan needs to be as efficient as Java and C#, even it means a less pure type system
Fan needs primitives for clean, efficient interop with Java/.NET libraries

We can't support arbitrary value types like .NET, since the JVM doesn't support value types (at least in the foreseeable future). So we can start with the premise that only 3 value types will be supported: Bool, Int, and Float.

Primitive Types

So the question is: how does Fan introduce primitives into its type system?

In the JVM there is a fundamental difference between primitives and their boxed object - for example boolean and java.lang.Boolean. The CLR is a bit more sophisticated in that value types can be boxed/unboxed without creating new fundamental types. C# 2.0 introduced the notion of nullable value types which are really a struct containing a value type and a null flag.

C# makes a distinction between boxing/unboxing and nullability. This makes sense when you have true value types. But this distinction doesn't buy us anything on the JVM. So to me the notion of boxing and nullability is the same problem. By boxing to an object, I get nullable support.

So original question is how to we introduce primitives into the type system. But if I broaden the question to how do I introduce nullable into the type system, then we can kill two birds with one stone.

Nullable Types

So now the question is: how does Fan introduce nullable types into its type system?

Previously I've been lukewarm to the idea of true nullable types. The current proposal is based on Stephen's prototype back in Aug. Now with primitives on the table, I've flipped positions and think we should embrace true nullable types, where not-null is the default. If we stick with nullable as our default, then we don't reap the full benefits of primitives. And as it has been pointed out, not-null is most often the developer intent.

So my proposal is to allow any type to be annotated with the "?" character. The "?" char is used by C# already, and seems to be the defacto standard for nullable types:

Int   =>  Int which is never null
Int?  =>  Int which may be null

With this change we can map to the Java runtime as follows:

sys::Bool    =>  boolean
sys::Bool?   =>  java.lang.Boolean
sys::Int     =>  long
sys::Int?    =>  java.lang.Long
sys::Float   =>  double
sys::Float?  =>  java.lang.Double

Every other type would be always be a reference type, so nullable would be purely an artifact of the Fan type system, not the Java type system. We need to do a little research to see the best way to map this to the CLR type system since we are expecting to combine the notion of boxed and nullable.

Implementation

One of the changes we made to the Fan type system since the original nullable discussion was that Fan implicitly casts anything that would require you to insert a manual cast. Following this convention thru with nullable types:

Str x; Str? y
y = x   // safe assignment
x = y   // implicitly means x = (Str)y

If we are working with Bool, Int, or Float then the cast is really a boxing/unboxing operation:

Int x; Int? y
y = x   // really means y = Int.box(x)
x = y   // really means x = Int.unbox(y)

In the case of primitives we really need to trap going from both nullable and non-nullable to do our boxing/unboxing. We could make a distinction at the Fan level between primitive and reference types, but that doesn't seem right. Fan should just care about nullable and non-nullable, then leave it to the Java or CLR runtime to decide how best to optimize. This leaves the door open for the CLR to map Fan types to value-types, and same for the JVM if it ever gets value-types.

So my proposed implementation is to define two new fcodes ToNullable and FromNullable. The compiler will insert these opcodes whenever converting between a nullable and non-nullable type. The JVM runtime will implement these opcodes as follows:

ToNullable: if Bool, Int, or Float then box the primitive; otherwise no-op
FromNullable: if Bool, Int, or Float then unbox; otherwise throw NullErr if reference is null

Both of these opcodes will provide a type argument which allows us to emit the right bytecode for efficiency. We will need to decide a similar strategy for .NET once we do a little investigation. This lets Fan treat everything uniformly using its nullable type system, and the JVM runtime still has the information it needs to optimize into primitives. Plus conversions from Obj? to Obj will fail fast where they happen instead of propagating into other sections of code.

Conversion Rules

So as stated above, no explicit casting is required to convert between Obj and Obj?. However a special opcode is generated for each conversion to permit the runtime to do boxing/unboxing and null checks. So the nullable type is more about annotations for humans and runtime optimizations then static type checking.

However, I do think there are some things which the compiler should enforce. The obvious rule is that you can't use the null literal with a non-nullable type. For example this would result in a compile time error:

Int x := null

We should also figure out how to automate invariant checking for fields which are declared as non-nullable types. For fields explicitly assigned, we will throw an exception if an attempt to assign them to null. However we also need to figure how this all works using with-blocks. That is still a very open issue despite extensive discussion. But however we solve that problem should also allow us to ensure that once an object is fully constructed all its non-nullable fields are set correctly so that we don't end up with NPE in some other part of the code. I think for starters, a safe rule is that the compiler reports an error if any non-nullable field is not set by the constructor(s).

So that is my proposal - I plan on starting this work very soon. So if you have any feedback, the sooner the better!

alexlamsl Mon 6 Oct 2008

Compiler errors I can think of so far:

Obj a = null (including field assignment and with-block)
|->Obj| {return null}
(|Obj->| a)(null)

Your field not assigned case feels like final fields in Java, which I use a lot personally.

I thought about Obj? a = null; Obj b = a; - but that's too much dark magic to me. The compiler error could end up being as cryptic as those C/C++ ones.

Interesting to think about the "hierarchical" structure of Num?, Num, Int? and Int:

Num?   Num   Int?  Num?
 |      |     |     |
Int?   Int   Int   Num

          Num?
       ____|____
      |         |
     Num       Int?
      |_________|
           |
          Int

jodastephen Tue 7 Oct 2008

Excellent news! As you might expect I'm fully behind this proposal, and the direction which Fan is now taking.

I don't think I've much to add at this point, and I agree that we will need to revisit the constructor/with issues once this is done.

BTW, I also think that this will allow for tools to easily add another layer of checking to the Fan system - full null-safe proving. Using the fcode FromNullable, and an evaluation of the code, it will be possible for a FindBugs type tool to identify probable NullErrs. This would of course be separate from the Fan compiler itself - well unless the Fan compiler allows plugin extensions at some point in the future (think of a structured way to write tools to extend the compiler...).

alexlamsl Tue 7 Oct 2008

Continued from that hierarchical train of thought - now can we just do implicit ?. and ?-> (Safe Invokes) based on whether a nullable instance is on the left?

In which case, ?: (Elvis operator) on a non-nullable instance should be reported as a compiler error...

tompalmer Tue 7 Oct 2008

Wow. Great news. I thought you had some concern about making assignment and such like possibly non-atomic on 32-bit systems? But still, I'd rather take this route. Seems likely to make cleaner code and provide easier, fast math.

As another note on this, you'll need to make === compare value rather than pointer equality for Bool, Int, and Float (and for their nullable varieties). This will occasionally cause confusion when dealing with Java and .NET libraries but will make Fan more consistent when on its own (vs. Java's evil == autoboxing inconsistencies).

brian Tue 7 Oct 2008

Interesting to think about the "hierarchical" structure of Num?, Num, Int? and Int:

I will post shortly - but I don't consider nullability in the "type hierarchy". It is really just another dimension.

Continued from that hierarchical train of thought - now can we just do implicit ?. and ?-> (Safe Invokes) based on whether a nullable instance is on the left?

I don't think that is quite right. There is a semantic difference between using "." and getting a NPE versus using "?." silently skipping the method call.

In which case, ?: (Elvis operator) on a non-nullable instance should be reported as a compiler error

agreed - probably a lot of low hanging fruit like that

I thought you had some concern about making assignment and such like possibly non-atomic on 32-bit systems?

Not sure I recall that. My previous position was that primitives where a trade-off in complexity versus performance. I initially chose simplicity. My change of heart was deciding the extra complexity is necessary in order to achieve performance.

As another note on this, you'll need to make === compare value rather than pointer equality

I don't understand this one. I think === always compares reference equality - that is what is does by definition. However 99% of the time you should be using == which will work as you'd expect (unlike Java).

brian Tue 7 Oct 2008

I've checked in the first phase of nullable types. The new type? syntax is supported by the compiler and available end-to-end via the reflection APIs:

t := Int?#
t.base       =>  sys::Num
t.qname      =>  "sys::Int"
t.signature  =>  "sys::Int?"
t.isNullable =>  true

Right now the Type.fits call ignores nullability (not sure if this is right yet):

Int#.fits(Int?#) => true
Int?#.fits(Int#) => true

Couple things to note:

Int? extends from Num, not Int - we don't consider nullability part of the type hierarchy.
an expression such as x is Int ? y : z requires a space b/w type name and question mark
Int:Str? means [Int:Str?], otherwise you need to use [Int:Str]?' if the whole map is nullable

At this point nullability is purely an annotation of the type system - we don't do anything with it yet. Next up I'll add the checks to the compiler and begin the process of slogging through the APIs to mark things nullable.

Once we've got all our APIs sanitized with nullable types, I'll begin the work to use primitives in the Java runtime.

katox Tue 7 Oct 2008

Great news indeed - thumbs up!

It doesn't seem to me that this proposal would affect Fan's type system purity. I'd be cautious with more implicit rules. Judging from XQuery/XPath example they can harm optimizations and general understandability in many (quite unpredictable) ways.

I agree that unitialized fields (or assigned with null) should be reported as a compiler error during object construction (including with blocks) on non-null types.

I'd be opposed to implicit casting . to ?. on nullable types. If . and ?. are the same on non-null types and if . are automatically casted to ?. on non null types we don't need both of them and we could stick with just single . operator. But I don't like it much as it could hide some subtle errors (not calling a method, missing a side effect or something). I think NPE is better in such cases.

Using operator ?. or ?-> with non-null type should be reported as a warning. I'd also rather see Elvis operator ?: to issue a warning instead of an error when used with non-null types. Obj o; if (o != null) doSomething(o) is allowed so why Elvis operator shouldn't be? Actually, a warning could be reported in both cases.

I'm not quite sure about === same operator. I can see the Tom's point for Bool, Int and Float but what about Num? Would it be a different beast?

alexlamsl Tue 7 Oct 2008

I thought the whole idea is to get rid of hidden NPEs by declaring whether you are expected null at a certain site in your code. So as much as it is pointless but not strictly illegal to use ?. for non-null instances, it is arguably harmful to allow -> for nullable instances.

If you are really concerned about propagation of null down the call chain, I would go as far as only allowing . & -> for non-nullables and ?. & ?-> for nullables.

katox Tue 7 Oct 2008

Allowing ?. for non-null instances could be useful to allow fast changes from possibly null to non-null in APIs. You wouldn't have to care about the implementation (replacing all ?. for . in the code immediately). The ? part of the operator would be clearly redundant but the semantics would remain to be the same. You can clean up the code later (by hunting warnings in implementation code). The same can be said about ?-> and ?: for non-null types - no harm, no change of semantics - they just cover the case that can't happen (ok, issue a warning).

Dynamic invoke operator -> on null types is different. It can be quite easily replaced by Obj? o; (o as Obj)->someMethod or Obj? o; o?->someMethod depending on your intention.

But implicit casting from -> to ?-> is also harmful. You wouldn't know if o->someMethod is called or not depending on the type of o -- you would always have to look up the declaration. But there doesn't have to be any. o could be declared using type inference o := getMyO; o->someMethod -- is someMethod always called or could it be silently ignored? You have to go to look at getMyO declaration. Ugh. Better than fast-failing NPE? I don't think so.

tompalmer Tue 7 Oct 2008

The issue with === is like so:

Int a = someRandomNumber
Int? b = a
Int? c = a

Now, imagine more complicated cases going into and out of functions. One would expect that b === c, but you'll need some super-clever flyweight retrieval to actually guarantee they are the same object.

brian Tue 7 Oct 2008

I think the important part is to get the nullable type infrastructure in place and to start annotating all the APIs correctly. We can play with the rules over time as we get experience with them.

Using ?. ?-> and ?: on non-nullable types doesn't hurt and as katox said might make prototyping easier. I don't have a good feeling about error, warning, or ignore yet.

Although many APIs in Fan will be non-nullable, the most likely scenario is that all the of the Java APIs will be typed as nullable (since we don't have any meta-data to tell us otherwise). Plus all of the existing code assumes nullable types - so I don't think it makes sense to change of the existing semantics for operators like ?., etc.

Regarding the === issue - that really is a rare operator to use, and you only use to compare references. Maybe it doesn't make sense to do that for something like Int?, but I can see scenarios when it might. I definitely don't think it should have special rules. Previously I guaranteed that Ints were interned between -128 to 1024. With the move to Long I intern via Long.valueOf which interns -128 to 128. Although once we start calling against Java libraries there is a good chance someone might create via the Long constructor. So we can't make any interning guarantees anymore. But remember the goal here is that == and != are the operators you should be using and they work as expected.

jodastephen Wed 8 Oct 2008

I agree with Brian's step by step approach here. I also agree that we should leave === alone, because everyone will be using == and != successfully. I also think that using ?. on a non-null type seems OK if pointless.

On the other issues, these are really a question of how tough do you want the compiler to be. Different users will have different expectations depending on whether they are comfortable/like a more dynamic world or are prototyping, vs those that want to eliminate NullErr.

To handle this, I'd like to consider the possibility of pluggable elements in the compiler, which would allow additional (tougher) rules to be added at that very low level. Thus, if you work somewhere where you want to eliminate NullErr, you turnon/plugin the harsh null checker. (This is all for the future, but does provide a way for decisions like this to flesh out over time)

tompalmer Wed 8 Oct 2008

I'd like to consider the possibility of pluggable elements in the compiler

Strict static code analysis tools exist for most semi-popular languages. This sounds like it would be similar.

I think a more general approach would be to make sure that the compiler API exposes plenty of nice details. This allows for sweet IDE tools like we've come to expect for Java (vs. simple and often wrong syntax highlighting or hacked autocomplete), and it would also make it easier to create 3rd party static analysis tools. Note that I haven't reviewed the currently exposed compiler or syntax tree APIs while making this comment.

brian Wed 8 Oct 2008

I'm 100% on board with a pluggable compiler. I plan to use the Fan compiler itself for background compiling in IDEs (so that we have an AST available for auto-completion, etc) - so that will probably require some rework. The Java and .NET libraries will be handled as plugins in their own pods outside the core compiler. And I'm quite willing to work with anyone interested in adding additional checking into the compiler. In fact, I've already been bitten by auto-casting during some refactoring. So I'm wondering myself about having various levels of strictness for static type checking. The compiler code is pretty clean (not exactly a unbiased option though :) - so I don't foresee adding plug-points as that big a deal.

JohnDG Wed 8 Oct 2008

This looks like a fantastic proposal, with which I'm in 100% agreement.

I would suggest making it a compiler error to use . on a nullable type. Instead, -> or .? should be used for this purpose.

As for a pluggable compiler, I think the reason why compiler parsers are generally not used for syntax analysis is the conflicting needs of compilers versus real-time syntax analysis tools. IDEs require an incremental parser that can deal with and gracefully repair any number of errors, while still yielding a valid (and useful) AST. Compilers, on the other hand, generally try for the fastest parser they possible can (which is seldom an incremental parser), and are generally not considered too much with repairing errors or generating a "best guess" AST.

Not to say there aren't good, compelling reasons for a single parser...

alexlamsl Thu 9 Oct 2008

Whilst I can understand the desire to be more flexible when changing code, one would argue that, with the introduction of a facility which specifies "nullability", the compiler should be more strict to bring out the strength of such a feature.

If you want to change a type from nullable to non-nullable (and vice versa), you should review the places where this is being used in order to make such a breaking change.

Although encourage transition from nullable types to non-nullable types would probably not be a bad idea, i.e. allowing non-nullable instances to use ?., but not nullable to use ->.

brian Thu 9 Oct 2008

I would suggest making it a compiler error to use . on a nullable type. Instead, -> or .? should be used for this purpose.

I'm definitely not understanding that request. That is like saying every use of "." in normal Java code should be "?.". In a perfect type system we wouldn't actually need "?." - but I still thinks it makes sense to keep both and allow nullable types to use both. They are two very different intents.

On another note, which is right?

Bool equals(Obj that)
Bool equals(Obj? that)

These will map to normal Java Object.equals, so I'm inclined to say that we take a nullable parameter since I believe that is the Java contract.

katox Thu 9 Oct 2008

I think Obj? is correct. You can compare to null because ? is an orthogonal concept to types in Fan (it is not a type hierarchy). The first signature would require special handling in code which is not very desirable.

JohnDG Thu 9 Oct 2008

I'm definitely not understanding that request. That is like saying every use of "." in normal Java code should be "?.".

That's indeed true, but in Fan code, the normal usage would be ., and ?. would be very rare indeed (and so would NPEs).

The whole point of nullable types (aside maybe from performance on non-nullable primitives) is to make code safe from NPEs. The only way you can do that is to prevent people from invoking a method on an object that might be null -- unless, perhaps, they do some check beforehand:

Obj? obj = getObj

if (obj != null) {
   obj.foo // Legal since compiler knows that obj cannot be null

   Obj obj2 = obj // Legal for same reason
}

If you introduce nullable types but then let people use all instances in the manner they have become accustomed to, then you get all of the drawbacks and few of the benefits (in particular, no reduction in NPEs since although you'll be annotating the types, you won't be using that information in the very place where NPEs are generated).

JohnDG Thu 9 Oct 2008

By the way, I disagree with autocasting from Object? to Object. Of course, casting the other way around is perfectly safe and should be done automatically.

In my opinion, the only way to get from an Object? to an Object should be via a conditional. For example:

Obj obj = (otherObject != null) ? otherObject : defaultObject;

By autocasting from Object? to Object, you push the check for null into the runtime system, which will just result in more NPEs. By forcing people to use conditionals, you push the check into the compiler, which results in fewer NPEs.

brian Thu 9 Oct 2008

By the way, I disagree with autocasting from Object? to Object. Of course, casting the other way around is perfectly safe and should be done automatically.

I'm having trouble seeing how auto-casting for types should be different than auto-casting for nullable. John your comment on that topic was:

If you don't autocast, then the developer's just going to do it manually. You won't be introducing more errors, just making the developer's life easier.

So is there really a difference between these two? It seems inconsistent to make a distinction.

Furthermore consider this code:

str[str.index(":")..-1]

If I happened to know that the string contains an a colon (or already have a catch block), do you really want to make me write:

str[(Int)str.index(":")..-1]

So I guess I would frame the debate more generally: how do we support auto-casting consistently?

JohnDG Thu 9 Oct 2008

I'm having trouble seeing how auto-casting for types should be different than auto- casting for nullable. John your comment on that topic was:

You're right -- of course, I meant that casting should be impossible, not limited to auto-casting. The only way to cast from Object? to Object should be inside a conditional where it is statically guaranteed to be safe.

If I happened to know that the string contains an a colon (or already have a catch block), do you really want to make me write:

No, I want that cast to be illegal.

Str.index could return Int, in which case no casting would be necessary (with, for example, the convention that -1 means the string was not found, as per java.lang.String.indexOf), or if it returns Int?, then the proper way to write the code is as follows:

index := str.index(":")

if (index != null) {
   str[index..-1]
}
else {
   // Do something here
}

The above code simply can't throw an NPE. In fact, I think that generally, not allowing casting (manual or auto) from non-nullable to nullable and requiring ?. for nullable types eliminates NPEs.

Now if this were to not happen for some reason, and explicit casts were allowed from Object? to Object, then I do think auto-casting should be supported. But I don't think any kind of cast should be allowed, and misspoke above when I limited it to auto.

katox Thu 9 Oct 2008

I think this sort of "magical values" by convention is something to be avoided. It leads to a mess (specially in poorly documented code).

Allowing (nullable to non-nullable or other) conversion only if the compiler can prove that it is legal is an interesting idea. I've seen tons of code where a developer "just knew" that some feature is true - but others did not. Or they changed it later - a nice example of a code rot where NPE jumped out of nowhere in "stable" code.

If the code analyzis was smart enough this proposal could lead to more robust code without too much hassle. But dynamic parts could be painful...

JohnDG Thu 9 Oct 2008

I think this sort of "magical values" by convention is something to be avoided. It leads to a mess (specially in poorly documented code).

That's true, but null is itself a magical value.

The general solution is to refactor the API. For example, in the case of index, provide a method that captures everything from the specified string to the end of the string (fromStrToEnd), and which simply returns an empty string if the specified string was not found (several variations would be needed on this method).

Allowing (nullable to non-nullable or other) conversion only if the compiler can prove that it is legal is an interesting idea.

It's not just interesting, it's actually used in some languages, even a variant of Java I can't recall at the moment (Nice???).

I've seen tons of code where a developer "just knew" that some feature is true - but others did not. Or they changed it later - a nice example of a code rot where NPE jumped out of nowhere in "stable" code.

Exactly. There's no such thing as knowing, only assuming.

Aside from elimination of NPEs, the #1 thing this is going to do to code is dramatically simplify it, because no longer will programmers attempt to preemptively check every single value for null (often failing to do the correct thing in case the value is null, which leads to its own sorts of problems that are seldom reproduced in-house).

jodastephen Thu 9 Oct 2008

Quick thought - NPE elimination is a worthy goal, but often too heavy for many people. Fan as a language encourages a little flexibility, really just the right amount.

The point about plugin compilers, or compiler switches, is that harsher NPE checking can be added in if, and only if, a development team needs that harsh level of checking.

I know I want to be able to write this and live with any NullErr.

str[str.index(":")..-1]

helium Thu 9 Oct 2008

Tehre is the ?: operator so even without casts it's pretty convenient

obj? foo = ...
obj bar = foo ?: default

Can I use throw as expression in Fan? Than you could still do

obj bar = foo ?: throw NPE

OK, in the case shown above it might not be perfect

... = str[str.index(":")?:0 .. -1]

buut you can still introduce names to make it more readable

start := str.index(":") ?: 0
... = str[start .. -1]

Calling methods on nullable objects could be done like this:

nullable.method()   // message-eating, was ?.

nullable!.method()  // "I know what I do", was .

JohnDG Thu 9 Oct 2008

Quick thought - NPE elimination is a worthy goal, but often too heavy for many people.

There's little benefit to a nullable type system unless you're going to reduce NPEs. According to the current proposals, where casting is supported from Object? to Object and . is allowed for nullables, there will be no reduction in NPEs. So why all the extra complexity if there will be no gain (aside from primitive performance)?

The point about plugin compilers, or compiler switches, is that harsher NPE checking can be added in if, and only if, a development team needs that harsh level of checking.

That's nice in theory but it's already possible, even with Java (annotations and such can be used with static code analyzers to verify that code is NPE safe), but the reality is that anything which is not baked into the core and on by default will not be used in code in the wild (i.e. 99.9% of code).

I know I want to be able to write this and live with any NullErr.

And so you can if index is declared to return Int and not Int?. This is an API design and is quite orthogonal to this discussion.

JohnDG Thu 9 Oct 2008

Calling methods on nullable objects could be done like this:

I don't think any special syntax is necessary. . for non-nullable and ?. for nullable will ensure the vast majority of APIs are NPE safe, because developers simply won't want to type an extra character that makes the code look ugly.

Any case where . is used on a nullable instance or where a nullable instance is "casted" to a non-nullable instance is a case where the code cannot be proven to be correct. In other words, it's an NPE waiting to happen, with the right instance passed in, or with some change to the code at a higher level. If the code is provably correct, then the instance need not be nullable, and therefore should not be declared so.

After some trivial modifications to the Fan APIs, I don't see any reason why existing client code should not look almost exactly like it does today. And much client code would be dramatically simpler and provably more robust.

helium Thu 9 Oct 2008

I don't think any special syntax is necessary. . for non-nullable and ?. for nullable will ensure the vast majority of APIs are NPE safe, because developers simply won't want to type an extra character that makes the code look ugly.

I don't get it.

nullable?.method()

There is an extra character (a questionmark) and this is special syntax to call a method (the usul way is just to use a dot). I suggested to remove the questionmark.

The !. is optional if you still want some way in the language to perform unsafe possibly NPE throwing calls.

I suggest that this

nullable.method()

is a message-eating call that can't throw a NPE but returns null in case of nullable being null.

But I just notice a problem. Lets assume this class:

class Foo {
   Int method() { return 42 }
}

Now if I call method on a nullable Foo a message-eating call will return null but the method is guaranteed not to return null.

Foo? foo := null
bar := foo.method()   // with my syntax or foo?.method() with the current syntax

bar has type Int? not Int.

JohnDG Thu 9 Oct 2008

For ANY method returning a value, the following would hold:

Obj? obj := nullable?.method // Legal

Obj  obj := nullable?.method // Illegal, since if 'nullable' 
                             // is null, obj would be null

Obj? obj := nullable.method  // Illegal, since nullable might be null

Obj  obj := nullable.method  // Illegal, since nullable might be null

For any method returning a NULLABLE, the following would hold:

Obj? obj := nonNullable?.method // Legal, but pointless

Obj  obj := nonNullable?.method // Illegal, return value might be null

Obj? obj := nonNullable.method  // Legal

Obj  obj := nonNullable.method  // Illegal, return value might be null

For any method returning a NON-NULLABLE, the following would hold:

Obj? obj := nonNullable?.method // Legal, but pointless

Obj  obj := nonNullable?.method // Legal, but pointless

Obj? obj := nonNullable.method  // Legal, but pointless

Obj  obj := nonNullable.method  // Legal, best way

helium Thu 9 Oct 2008

The table for my suggestion would be

obj := optionalObject.method()

The type of obj is the nullable version of the return type (just the return type if already nullable).

obj := object.method()

The type of obj is the same as the return type.

And if you'd add the unsafe !. you'd additionally get

obj := optionalObject!.method()

The type of obj is the same as the return type.

alexlamsl Thu 9 Oct 2008

Str.index could return Int, in which case no casting would be necessary (with, for example, the convention that -1 means the string was not found, as per java.lang.String.indexOf), or if it returns Int?, then the proper way to write the code is as follows:

That looks really, really verbose :-)

I know of 2 things that needs to be allowed without any casting on the Type System, for certain:

Obj a;
Obj? b;

// Case 1
Obj c;
a = b?:c;

// Case 2
interface T {
  Obj m();
}

T? d;
b = d?.m;

So I think ?: would naturally serve as the "casting" operator from nullables to non-nullables; whereas ?. and ?-> do it in the opposite sense.

So brian's code would read:

Int i = str[(str.index(":")?:0)..-1]

Which looks much clearer and error-free to me. It's arguably better than the Java contract (of returning -1 if not found), since it is written there in front of your eyes.

katox Thu 9 Oct 2008

helium:

nullable?.method()

There is an extra character (a questionmark) and this is special syntax to call a method (the usul way is just to use a dot). I suggested to remove the questionmark.

That'd be inconsitent and confusing (specially if you played with nullable and non-nullable objects and switched their types back and forth). In Java (and whole C based family) . always used to mean run a method (or die trying). Now just by switching to a nullable you change that to best effort. That means something different and thus it deserves a different syntax.

So far the best solution seems to be to disallow . on nullables completely if the usage can't be proven to be correct. This would certainly reduce NPE caused by false expectations.

Str f() { return "hello" }
Str? g() { return "hello" }
Str? h() { return null }

Str x := f()
x.hash  // ok
x?.hash // ok, the same

Str? xN := f()
xN.hash  // ok
xN?.hash // ok, the same

Str? y := g()
y.hash  // illegal, error: y might be null
if (y != null) y.hash // ok, can be proven
y?.hash // ok, best effort

Str? z := h()
z.hash // illegal, error: z might be null
if (z != null) z.hash // ok, could warn about inaccessible code
z?.hash // ok, best effort -> always null in this case

It would not be that harsh. As you already noted you could use fallback syntax

... = str[str.index(":")?:0 .. -1]

which covers the intent quite clearly.

The same principle could be used for autocasting

if (x is Str) Str xs := x; // ok
if (y is Str || y is Num) Str ys := y // illegal

alexlamsl Thu 9 Oct 2008

Oh, another thing:

Obj a;
Int? b = a as Int;

In addition, the is and isnot operators can now be evaluated compile-time for non-nullables.

JohnDG Thu 9 Oct 2008

I forgot about the ?: operator. It's really a beautiful way to handle this case because it forces the programmer to make explicit their intention in a scenario which would otherwise lead to an NPE (or worse).

helium Thu 9 Oct 2008

That'd be inconsitent and confusing

How so? Where is the inconsistency?

brian Thu 9 Oct 2008

I think it is safe say that opinions are all over the map for what kind of rules nullable types should impose. I don't think we can please everything, so I do come back to what Stephen said about a pluggable compiler. Although I'm thinking more along the lines of a "strict" flag. Production pods could be compiled with the strict flag, and scripts would default to non-strict. I'm kind of thinking I want that for myself anyhow.

Str.index could return Int, in which case no casting would be necessary (with, for example, the convention that -1 means the string was not found

I do want to point out that the reason I think index should continue to return null is that negative numbers mean something with most APIs (for example str[-1] means last char). It seems better to really use null and fail fast or require static checking, versus getting some wrong answer. So this isn't a contrived example or something easily "fixed" with API changes.

Quick thought - NPE elimination is a worthy goal, but often too heavy for many people. Fan as a language encourages a little flexibility, really just the right amount.

Honestly this is where my head is at right now. BTW I actually place a huge amount of value on primitive performance and human nullable annotations. Although I do want to figure out how to have the compiler make use of that info also. Just to recount how we got here:

I implemented the "->" operator for dynmaic calls
"->" was awkward to use because it returned Obj, so I changed to allow auto-casting from Obj
we decided last summer to enhance auto-casting to include any downcast
it seem logical to follow thru same rules for nullable

I've never used a language with full-on nullable types. I know Nice has it, and Cobra over in the .NET world has it. But I get very worried about it becoming a checked-exception-like-nightmare. But I'm willing to try out various rules and see how painful they are actually use. Or even better maybe some of you guys can help hacking the compiler trying out various rules.

katox Thu 9 Oct 2008

Although I do want to figure out how to have the compiler make use of that info also.

This is the only way how get rid of NPEs and make developers' life easier instead of the opposite. Let the compiler do the dirty work. It is the same as with function signatures or variable names. If there is no help there shouldn't be restrictions. It could be thought of as refactoring on steroids.

JohnDG Thu 9 Oct 2008

I think it is safe say that opinions are all over the map for what kind of rules nullable types should impose.

Actually, I think katox, alexlamsl, and myself are all on the same page, and I expect Stephen will come around with his next post. :-)

So this isn't a contrived example or something easily "fixed" with API changes.

Fair enough, but others have suggested ?:, which I like a lot.

BTW I actually place a huge amount of value on primitive performance and human nullable annotations.

Performance reasons aside, nullable annotations are worthless unless they're used by the compiler to reduce NPEs. And nullable annotations are not required to achieve high performance (there are other approaches). So my feeling on this subject is that unless we actually achieve the promise of nullability, the overhead does not justify the cost.

But I get very worried about it becoming a checked-exception-like-nightmare.

Code has to throw exceptions, and is generally cleaner when it does so. Code doesn't have to use null values (by and large), and it's generally more fragile and harder to understand when it does. Thus the distinction.

But I'm willing to try out various rules and see how painful they are actually use.

Does this mean if someone else gets true nullable types into the compiler, you'll consider adopting it?

brian Fri 10 Oct 2008

And nullable annotations are not required to achieve high performance (there are other approaches)

I'm curious what your thoughts are there. Since Java doesn't support value types I embarked upon this project to make a distinction b/w Int and Int? so that we can compile down to primitives.

Does this mean if someone else gets true nullable types into the compiler, you'll consider adopting it?

My priority right now is to get the basics working so the nullable infrastructure is in place. That will include the really simple rules (like don't use null literal). Then immediately after that I'm going to work on moving to primitives. Then after that I'm going to start the interop work (which is really driving all of this).

So I'm not planning on prototyping various nullable checks. I really think it is more of an experiment to see what works (unless we can draw on some real-world language experience). But I'll definitely considering adopting additional rules if they make the language safer at an acceptable expense. I just don't want nullable rules which require writing a lot of boiler plate code just to make the compiler happy. So prototyping some rules, then seeing what they do to the current codebase seems like an interesting experiment.

helium Fri 10 Oct 2008

I've never used a language with full-on nullable types.

Haskell, O'Caml, SML, ... they are all there. Just download and play with them.

Actually, I think katox, alexlamsl, and myself are all on the same page

Well, acually I am, too. Just as you I'm not happy with the current meaning of . on nullables. So I suggested changing the meaning of . in the case of null (rendering ?. useless) and you suggested prohibiting . on nullables (and thus only allowing ?.). So the only little difference between the two suggestions is a very little bit of syntax (=> "do I have to type that extra ?"). And I'd be absolutly happy either way.

Actually you have to use ?. on optional values in my toy language, but that's because optional values are just a union type. An optional Int would be Int|Unit where Unit is the type of null (I unified Void and the type of null into just one type Unit to see how that works). And I have Int? as syntactic suggar for Int|Unit.

About your rules: You already defined that you disallow the cast from T? to T for all T, so a lot of your illigal cases just come from that rule and have nothing to do with . and ?..

katox Fri 10 Oct 2008

"do I have to type that extra ?"

You want to, actually.

class X {
  Int x = 0;

  Void sideEffect(Int newX) {
      x = newX
  }

  static X getX() {
     ...
  }

}
...
class Y {
    Void runMethod() {
        x := X.getX
        ...
        x.sideEffect
    }
}

Imagine that you want to change getX to return X? instead of X. With one syntax (just .) you would have to search for all usages of getX and look around how the output is used. If you missed the bottom line in runMethod in Y the code would compile just fine but the side effect won't happen if getX returned null.

With different syntax you had to only ensure that the method is called x?.sideEffect or if (x != null) x.sideEffect; else doSomethingElse; (or similar) but the compiler would force you to notice that - by compile error ;).

JohnDG Fri 10 Oct 2008

I'm curious what your thoughts are there. Since Java doesn't support value types I embarked upon this project to make a distinction b/w Int and Int? so that we can compile down to primitives.

Lower the max value of Int (and raise the min value) by exactly one integer. Then use max value + 1 to represent null. Similar changes for other primitives. All code uses primitives everywhere, except they are boxed when necessary (for example, when used as a key in a map).
Do whole-program static analysis. In all cases where it can be proven that an Int is not used for storing null, replace it with a primitive. Similarly for other primitives.
Same as (2), but only inside methods, and with run-time conversions to primitives where appropriate.

Perhaps others can suggest additional options.

But I'll definitely considering adopting additional rules if they make the language safer at an acceptable expense. I just don't want nullable rules which require writing a lot of boiler plate code just to make the compiler happy.

There are only one set of rules that are safe and consistent with existing operators. Far from arbitrary, they're pretty much the only sensible way to define nullable rules in the case that the goal is a reduction of NPEs. Which is why, I think, there are now 4 people in favor of them of the 6 people who have posted on this issue.

To summarize the rules again:

For ANY method returning a value, the following would hold:

Obj? obj := nullable?.method // Legal

Obj  obj := nullable?.method // Illegal, since if 'nullable' 
                             // is null, obj would be null

Obj? obj := nullable.method  // Illegal, since nullable might be null

Obj  obj := nullable.method  // Illegal, since nullable might be null

For any method returning a NULLABLE, the following would hold:

Obj? obj := nonNullable?.method // Legal, but pointless

Obj  obj := nonNullable?.method // Illegal, return value might be null

Obj? obj := nonNullable.method  // Legal

Obj  obj := nonNullable.method  // Illegal, return value might be null

For any method returning a NON-NULLABLE, the following would hold:

Obj? obj := nonNullable?.method // Legal, but pointless

Obj  obj := nonNullable?.method // Legal, but pointless

Obj? obj := nonNullable.method  // Legal, but pointless

Obj  obj := nonNullable.method  // Legal, best way

Conditional provability:

Obj  a
Obj? b
Obj  c

a = b?:c

// Equivalent to:
if (b != null) a = b
else a = c

if (b != null) {
   c = b.foo // Assuming foo returns non-nullable
}

Conditional casting:

Obj a
Int? b = a as Int

// Equivalent to:
if (a is Int) b = a
else b = null

Assignment to Null:

Obj a = null // Illegal

Obj? b = null // Legal

tompalmer Fri 10 Oct 2008

I greatly prefer NullErr to extra compiler checks. Well, I don't mind the compiler complaining for "always wrong" cases. But for "maybe wrong", I'd rather just be allowed to code it how I want.

JohnDG Fri 10 Oct 2008

But for "maybe wrong", I'd rather just be allowed to code it how I want.

Under either scenario, you can still code how you want. The difference is that with nullability rules, you use an extra ? when you want to write code that may or may not work. And the advantage is that you can code a whole method without worrying whether something is null or not, doing your checks only at the end (if necessary, and in some cases it wouldn't be necessary), because one null value can automatically ripple through lots of assignments and invocations.

brian Sat 11 Oct 2008

With different syntax you had to only ensure that the method is called x?.sideEffect or if (x != null) x.sideEffect; else doSomethingElse; (or similar) but the compiler would force you to notice that - by compile error ;).

Katox has a great example here of why the dot operator should have different semantics with nullable types. To me "." and "?." are two very different things. Remember that virtually all local variables use type inference.

My gut tells me that full-on static provability will be too much to swallow - but like I said, I just really want to see how various restrictions work against the current codebase. I would not be surprised either way - too painful or actually works out ok.

However I do use dynamic invoke, which by definition returns Obj? - so I know I'd find it annoying to wrap dynamic invokes in an condition.

I'm also worried about interop. Do we agree that Java and .NET APIs should be brought in the Fan type system as nullable? In which case would you really want to have a provable non-null condition for every single method which returned a reference? We can't ignore that problem.

jodastephen Sat 11 Oct 2008

Actually, I think katox, alexlamsl, and myself are all on the same page, and I expect Stephen will come around with his next post. :-)

Not really. I want the ability for the compiler to determine null safety to be present. But I do not want that to be the default or forced. Nor do I think that such forcing is in the style of Fan.

Most of the time, its enough for developers to just write code and rely on their own knowledge to work through issues like null checking. Fan isn't a terribly strict language, with casting, reflection and so on - thus light null checking fits too.

Anything that a compiler does that causes developers to have to write "boilerplate" code (code just to satisfy the compiler) is demotivating and frustrating to many if not most developers. Null-checks, of any kind, are often in this category, whether it be the if statements avoided by ?. or the weird ?: statements discussed above to convert nulls to non-nulls.

However, there are clearly some environments where it is useful to have a higher level of proof/safety in the code.

Bear in mind that part of the reason I requested this was to move knowledge out of comments and into the code - in my work we use javadoc to describe whether each parameter and return type is nullable or not. That is messy and I hate all comment based systems. The solution of marking types with ? performs the same role but does allow much more detailed analysis if desired.

One possibility rather than a command line switch or pluggable compiler is an annotation. This could annotate a class or method to say whether nulls should be fully checked or not: @NullChecks / @NoNullChecks.

JohnDG Sat 11 Oct 2008

Anything that a compiler does that causes developers to have to write "boilerplate" code (code just to satisfy the compiler) is demotivating and frustrating to many if not most developers.

It's not boilerplate, though, because in any case where you would have to use a ? operator, the value may in fact be null. That's the whole point of strong nullable rules. While a programmer may assert that she "knows it won't be null", if that were in fact the case, it would be possible to refactor the code in such a way that the ? operator were not required.

Lots of languages have such rules, or further still, no concept of null (Haskell). These languages have a reputation for being easier to program in, with less boilerplate than other languages, and the code tends to have much higher reliability.

Null-checks, of any kind, are often in this category, whether it be the if statements avoided by ?. or the weird ?: statements discussed above to convert nulls to non-nulls.

Have you seen code written by the average Java programmer? It's littered with an abundance of if statements testing for null. Why? Because every time an NPE bug is found, the programmer adds one -- and often not one, but lots of extras for "good measure". As a result, null checks are everywhere, even where they shouldn't be, which leads to even more null checks -- because looking at the code, it seems that some variable can sometimes be null (because there's a null check), so you have to propogate the checks wherever that variable is passed to other code.

It's a nightmare. Tons of confusing and contradicting boilerplate code that does not need to exist at all.

Strong nullability rules mean you do not need to check for null in the vast majority of cases. They simply eliminate the need. Moreover, as proposed, the rules also let you code without caring if something is null. In many methods, if a parameter is null, you skip the code that interacts with that parameter. But these rules allow you to code as if the parameter were not null, because the conditional invocation operator ?. will not invoke a method on a null object.

You don't want boilerplate. Well, take a look at source code written by the average Joe. You will not see more boilerplate than for null checking. The situation is atrocious. Even managers know what null pointer exceptions are -- they are that common. Fan can promise the power to scrap that boilerplate, streamline code bases, and eliminate NPEs once and for all. Or it can promise more of the same.

Honestly, the current proposal serves only to enable high-performance primitives. If I declare a "non-nullable" instance, I have no guarantees it will be non-null. More strongly, there is absolutely no relation to an instance's declared type and whether or not it can be null. So why bother declaring stuff nullable or non-nullable? It's a waste of my time and doesn't buy me anything that can't be bought with other tactics (e.g. "magic primitive" chosen beyond the declared min/max to represent null).

JohnDG Sat 11 Oct 2008

Katox has a great example here of why the dot operator should have different semantics with nullable types. To me "." and "?." are two very different things. Remember that virtually all local variables use type inference

Actually, his example shows that in the presence of strong nullable rules, you cannot use . with nullable types without nasty side-effects. There are no issues with ?., and indeed, the conditional invocation operator nicely propagates null and therefore reduces the amount of special case code needed to deal with null.

brian Sat 11 Oct 2008

Let me summarize the debate as I see it. Everybody is in agreement that nullable types are a good thing. No one seems to be debating that point, which is good because the work is almost done.

The real debate is how far the static type checking should take nullable types. The ideal outcome is fully provable nullable type checking, which would 100% eliminate null pointer exceptions. We know we can't achieve 100% perfection and still keep the dynamic features of Fan like reflection/trap. But there are varying levels of static checking which can be done.

I'm not ready to commit to full nullable type checking/provability unless I know that it doesn't require boiler plate. That isn't a theoretic debate we can have. As Helium pointed out many esoteric languages have it, but those are mostly functional languages very unlike Fan. So to me, the only way to see the effect is to implement various rules and see how the codebase reacts. If full provability works and really doesn't require a lot of boiler plate code, how could we argue against that? I suspect the reality will be somewhere in the middle and it will come down to a trade-off b/w elegant code and NPE elimination.

I'm also interested in the Java interop angle. No one has really answered my question - do Java libraries come in as nullable types? If so and people are using Fan as a better Java, I don't see how full nullable type checking could possibly work.

JohnDG Sat 11 Oct 2008

Everybody is in agreement that nullable types are a good thing.

I personally see no benefit if non-nullable types can be null with impunity.

I'm also interested in the Java interop angle. No one has really answered my question - do Java libraries come in as nullable types?

No. You do bytecode analysis to determine nullability. This is not academic but has been shown to be both possible and straightforward.

Fan has so much meta information that I think you're going to have to do some analysis and computation and cache the results in order to make Java usable from Fan.

jodastephen Sat 11 Oct 2008

Honestly, the current proposal serves only to enable high-performance primitives. If I declare a "non-nullable" instance, I have no guarantees it will be non-null.

Is that really so? My understanding is that a non-null variable can never hold null, as a NullErr will be thrown if you try and do so:

Str? a := null
Str b := a   // NullErr

do Java libraries come in as nullable types?

Yes. But only until we know better.

By that, I mean we should pursue a strategy of bytecode analysis (JohnDG) or manual inspection (just read the OpenJDK) to build up a meta-data map of Java method to nullability.

JohnDG Sat 11 Oct 2008

Is that really so? My understanding is that a non-null variable can never hold null, as a NullErr will be thrown if you try and do so.

That's no benefit at all. Wait a few lines of code, and a NullErr will be thrown anyway when someone tries to use b in a way that assumes it is non-null. In both cases, NullErr is thrown. And in both cases, the check is runtime and defects will be detected long after injection.

helium Sat 11 Oct 2008

That's no benefit at all. Wait a few lines of code, and a NullErr will be thrown anyway when someone tries to use b in a way that assumes it is non-null. In both cases, NullErr is thrown. And in both cases, the check is runtime and defects will be detected long after injection.

See my other post: b cannot be null. It's decalred non-nullable. The assignet itself will throw not the later usage.

JohnDG Sat 11 Oct 2008

See my other post: b cannot be null. It's decalred non-nullable. The assignet itself will throw not the later usage.

No, I mean that without nullable annotations, if you wait a few lines of code, then NullErr will be thrown anyway. In other words, unenforced nullable annotations do not add value by themselves, at least with respect to reducing NPEs.

helium Sat 11 Oct 2008

Ok, I seem to have missed something. I thought everybody agrees on adding the distinction between nullable and non-nullable variables and we only discuss the semantics (assured by the compiler or by additional compiler-generated runtime checks).

brian Sat 11 Oct 2008

By that, I mean we should pursue a strategy of bytecode analysis (JohnDG) or manual inspection (just read the OpenJDK) to build up a meta-data map of Java method to nullability.

This would be a pretty interesting project. If we could analyze Java and .NET code for nullability (or someone created manual maps for popular APIs), then we could bring those APIs into Fan with the correct nullability. I would think this might be useful to a lot of the alternate JVM languages.

alexlamsl Sat 11 Oct 2008

The former seems to be a much more attractive option - especially if it is bullet-proof and the result is cached. May be like Java stubs from pods we can get Fan stubs from JARs ;-)

I have to admit that without those rudimentary checks, non-nullable types would feel more like const in C/C++ - not that we don't want to use it, but it is so confusing that I would just put ? everywhere to keep the compiler happy...

(Seems like it's not far away before I'm tempted to write Fan scripts calling C libraries through JNA...)

Fantom

#369 Primitives and Nullable Types

brian Mon 6 Oct 2008

Case for Primitives

Primitive Types

Nullable Types

Implementation

Conversion Rules

alexlamsl Mon 6 Oct 2008

jodastephen Tue 7 Oct 2008

alexlamsl Tue 7 Oct 2008

tompalmer Tue 7 Oct 2008

brian Tue 7 Oct 2008

brian Tue 7 Oct 2008

katox Tue 7 Oct 2008

alexlamsl Tue 7 Oct 2008

katox Tue 7 Oct 2008

tompalmer Tue 7 Oct 2008

brian Tue 7 Oct 2008

jodastephen Wed 8 Oct 2008

tompalmer Wed 8 Oct 2008

brian Wed 8 Oct 2008

JohnDG Wed 8 Oct 2008

alexlamsl Thu 9 Oct 2008

brian Thu 9 Oct 2008

katox Thu 9 Oct 2008

JohnDG Thu 9 Oct 2008

JohnDG Thu 9 Oct 2008

brian Thu 9 Oct 2008

JohnDG Thu 9 Oct 2008

katox Thu 9 Oct 2008

JohnDG Thu 9 Oct 2008

jodastephen Thu 9 Oct 2008

helium Thu 9 Oct 2008

JohnDG Thu 9 Oct 2008

JohnDG Thu 9 Oct 2008

helium Thu 9 Oct 2008

JohnDG Thu 9 Oct 2008

helium Thu 9 Oct 2008

alexlamsl Thu 9 Oct 2008

katox Thu 9 Oct 2008

alexlamsl Thu 9 Oct 2008

JohnDG Thu 9 Oct 2008

helium Thu 9 Oct 2008

brian Thu 9 Oct 2008

katox Thu 9 Oct 2008

JohnDG Thu 9 Oct 2008

brian Fri 10 Oct 2008

helium Fri 10 Oct 2008

katox Fri 10 Oct 2008

JohnDG Fri 10 Oct 2008

tompalmer Fri 10 Oct 2008

JohnDG Fri 10 Oct 2008

brian Sat 11 Oct 2008

jodastephen Sat 11 Oct 2008

JohnDG Sat 11 Oct 2008

JohnDG Sat 11 Oct 2008

brian Sat 11 Oct 2008

JohnDG Sat 11 Oct 2008

jodastephen Sat 11 Oct 2008

JohnDG Sat 11 Oct 2008

helium Sat 11 Oct 2008

JohnDG Sat 11 Oct 2008

helium Sat 11 Oct 2008

brian Sat 11 Oct 2008

alexlamsl Sat 11 Oct 2008