#353 Not null - compiler change

jodastephen Sat 30 Aug 2008

I recently mentioned that I was working on a compiler change. That change was to try out a form of not null in Fan. What I managed in a couple of sessions (one at the airport ;-), was that I successfully changed the language and the code worked almost first time. This is what I can now support:

Void anyMethod(Str param1, Int! param2) {
  // param1 can be null
  // param2 cannot be null
}
// syntax sugar for
Void anyMethod(Str param1, Int! param2) {
  if (param2 === null) {
    throw NullErr("Parameter 'param2' must not be null")
  }
}

As can be seen, this is an incredibly simple change (and a final language change would be more complex). Basically, if you add the ! symbol, then the parameter is checked to be non-null. This ensures that code within the method does not have to worry about whether the parameter is null or not. Note that this does not change the calling code, so you can still pass null to the method. It also doesn't prevent the method from re-assigning the variable to be null (which may or may not be a good thing).

The real question is how far would it make sense to extend this feature.

Clearly, all variables (fields, locals, parameters) make sense as places to declare non-null. Fields could add code to the setter to block nulls. Locals and parameters require alternate solutions. Return types could also be documentable as non null, but I see less advantage to doing this at the moment.

The full alternative to my approach is to make the non-null formally part of the type system. I've tried to avoid that, as it complicates the type system, and forces casting (or auto-casting) from nullable to non-null types.

So, my general approach has been to try and find a way to protect against nulls, without causing the cascading effects of full null protection (where one change can affect the whole application, a la checked exceptions).

It should be noted that the benefits go beyond avoiding NullErr. Done properly, this could allow primitive long and double to be used, enhancing performance. And it definitely raises something which should at least be documented to be a proper language feature.

I would post a patch, but its a little tricky without a public VCS ;-)

brian Sun 31 Aug 2008

This is pretty interesting. At first glance Int! looks like we are trying to add the notion of null into the type system - that is something that scares me.

But what you are doing here is much less ambitious - in essence it is a bit of design-by-contract or maybe better described as compiler sugar to insert the checking you would do manually. That sort of feature I could really dig. And I would say adding it to return would be very handy, because the compiler could trap all returns with a check to fail fast versus waiting for the error to propagate into other sections of code.

My initial thoughts immediately leap towards using facets instead of !. And if using facets, why not make it more general purpose and pluggable? In a way I almost think about this as parameter based decorators. Annotating a parameter with a function facet runs that function on the parameter on method entry, field on set, etc.

Void anyMethod(Str param1, @nonnull Int param2) { ... }
@nonnull Str field
// syntax sugar for
Void anyMethod(Str param1, @nonnull Int param2) { nonnull(param2); ... }
Str field { set { nonnull(val); @field = val } }

I have no idea how the @nonnull facet gets mapped to a function (might be a good reason to switch to typed facets), but something like that has been brewing in my mind.

katox Sun 31 Aug 2008

Stephen, I believe that exactly the same idea is covered in detail by this OOPSLA ’03 paper.

Brian: note that the example (from the paper) is still problematic in Fan

class A {
    @nonnull
    Str a := "aStr"

    new make() {
        type.log.info("makeA")
        ctorDisaster
    }

    virtual Void ctorDisaster() { 
        type.log.info("$a")
    }
}
class B : A{
    @nonnull
    Str b := "bStr"

    new make() : super() {
        type.log.info("makeB")
    }

    override Void ctorDisaster() {
        type.log.info("$a and $b"); // yields "a and null" in make()
    }
}
class Main
{
  static Void main() { B() }
}

However, this example is sick not only from null/nonnull point of view. It allows a programmer to run a method on B instance before B constructor is entered. This can lead to nasty hidden bugs.

The authors of the paper solved the construction problem by introducing raw (uninitalized) types but it seems that it only circumvents the ill-designed constructor pattern in Java/C#.

I agree the general idea is quite good in that it is not too ambitious. It is non-intrusive and can help to identify some stupid bugs in code (see section 8.1).

I don't see using facets as a good idea though. null is already a language construct and Fan behaves differently on == and equals methods. It is rather weird that some fundamental static type and code analyses would depend on a facet. I'd make it a proper language construct - it's hammered too deep into Fan design.

On the other hand I can imagine a system of contract facets like @Positive for Int, Float, @NonEmpty for Str etc. But this could be something above the basic language.

jodastephen Sun 31 Aug 2008

The first question is why !. This is simply because null is a language level feature. Everything else (integer range, not empty, allowed string characters etc.) are just features about the state of an object.

In addition, aesthetically, this will occur very often, and an @nonnull very much gets in the way when visually reading the code.

I'm also happy to see the ! on the return type.

The facet mapping to a function is clever, and would be useful, but doesn't subvert the need for not-null being specially handled. Further, I think that the way to handle object state (like not empty or positive) is lightweight object wrappers rather than separated meta-data (UserName decorates Str, Age decorates Int), ie. a proper OO solution.

brian Mon 1 Sep 2008

The facet mapping to a function is clever, and would be useful, but doesn't subvert the need for not-null being specially handled.

I'm not sure I agree that because null is language feature that we need to make a special case for checking it. This proposal is not to extend the type system, but rather to annotate types - and annotation seems like a job for facets. Although I agree that using something like @nonnull would be extremely verbose and annoying. Maybe @nn instead? With a space that is 4 chars versus 1 with !? I like the !, but at the same time I don't like creating a special feature if a general purpose feature would do the job - so I'm on the fence.

jodastephen Mon 1 Sep 2008

I think the second reason for using ! is the potential to use it as part of the type, to enable primitives.

We all know that boxed object maths is slower than primitive maths. Allowing ! on Int and Double allows primitives to be used underneath the covers. This would have a marked effect on certain performance critical pieces of code.

For method signatures, it might make sense to overload the method in the bytecode - one version as objects only in the signature, which checks and delegates to the version including primitives. Within a method, a local non-null variable can of course easily be converted to a primitive.

Now, there is some talk of enhancing the JVM to allow for fast object based numbers. So, maybe this argument is moot. Brian, perhaps you'll find out more if you're still going to the JVM language summit?

BTW, Fan already recognises that null is special. It has the ?., ?-> and ?: operators. The ! is merely a similar extension.

Of course for consistency, there is an argument for making Int? mean "may be null" and Int mean "non-null"...

brian Tue 2 Sep 2008

Allowing ! on Int and Double allows primitives to be used underneath the covers.

Not sure I understand this - number operators today will throw NullErr if used with nulls (although it has never actually happened to me while writing code). So I don't think we need a special syntax to tag those APIs to do optimization. We could do escape analysis and inlining today. Yes I'm still going to the JVM Summit, so we'll see that the HotSpot guys are thinking (behind what has been posted today).

Of course for consistency, there is an argument for making Int? mean "may be null" and Int mean "non-null"

From a purity stand point, I agree with you. Although from a practical stand point, I think we need to make null the default, especially if Int! means auto-generate some non-standard code.

katox Tue 2 Sep 2008

Although from a practical stand point, I think we need to make null the default

Why? Is the distribution of null/non-null in Fan sources very different from a Java program?

What semantic is used more is certainly the topic of many Internet flame wars but fortunately there has been at least one reasonable attempt to provide some real numbers.

Unfortunately, it has been our experience in specifying moderately large code bases that the use of non-null annotations is more labor intensive than it should be. Motivated by this experience, we conducted an empirical study of 5 open source projects (including the Eclipse 3.3 JDT Core) totaling over 700 KLOC.

...

Hence the study results clearly support the hypothesis that in Java code, over 2/3 of declarations that are of reference types are meant to be non-null—in fact, it is closer to 3/4.

jodastephen Thu 4 Sep 2008

So I don't think we need a special syntax to tag those APIs to do optimization.

The ! allows optimisation between methods, not just within a method.

I know that of all the blogs I wrote about possible language changes in Java, the concept of better handling for null was pretty universally popular.

I'd suggest a thought experiment as the next step. Anyone who codes in Fan should consider whether they mean a variable to allow null or not, and imagine adding the ! symbol. My gut feel is that it is useful in all types of code - system and business level.

(I certainly know our coding standard asks every developer to document the null/nonNull status of every parameter and return value in javadoc in our entire system. That shouldn't really be in javadoc, but annotations are just too heavyweight and ugly.)

alexlamsl Fri 5 Sep 2008

Allow me to be devil's advocate.

I think the enforcement of a non-null contract can actually be harmful. null in object space is like 0 in number space - a common crossing point of all kinds of Types share.

Imagine the pain of driving around every single case of non-null when doing quick prototyping - it's like trying to do some I/O in Java, where the meat is just 10 lines, but all the try-blocks with useless (or even wrong) catch blocks just make the whole thing looks like a final-year computer science project (!) already.

And to discriminate against null over other invalid parameter values is just unfair. Just because methods that does not throw IllegalArgumentException yet will give NullErr by default does not mean null is more wrong than other values - worse yet, you get unspecified behaviour and wrong answers which continues to propagate through the rest of the program.

Don't get me wrong, I don't like "stupid" NullErrs either. So I do like a?.b and other facilities that does not hinder me from writing potentially useful code just because the contract I'm override did not think 0 was meaningful enough to be included, yet would love to see NaN to rip the rest of the system apart.

Just my 2 cents :P

jodastephen Fri 5 Sep 2008

I think you've misunderstood the current proposal. The ! is optional. You can still write code exactly as you do today, and experience NullErr in exactly the same way. No effect on prototyping. No additional pain.

This change benefits the point at which you want to be clear your contracts and document the contract to the caller. Even then, it still doesn't block the caller from erroneously calling your method with a null value.

The difference between null and other IllegalArgErr type issues is the difference between a language level feature (null) and an API level feature (the valid values or state of an object). This also manifests itself in the complexity of the check - the compiler can write a null check, but developers would need to write plugin IllegalArg checks.

alexlamsl Fri 5 Sep 2008

From a point of view of someone writing programs or designing an API, it does not matter whether a wrong input is a language level "feature" or an API level "feature" - they are wrong, and they (may) break things.

Yes, the proposal at the moment inserts and throws NullErr rather than forcing me to bind to a contract, it seems. Yet it isn't - if I tried to call a method which the original implementor did not see null as useful for a particular parameter and decided to just add an extra ! character to it, I will suddenly need to think really hard to try and get round that hole.

If you really mean it, i.e. if this parameter is null then the problem will crop up somewhere else some moment later with completely irrelevant stack trace, then I am sure I will type in the full null-check code willingly.

And from experience, so far I have been living really happily without typing that boilerplate code too many times myself so far.

katox Sat 6 Sep 2008

alexlamsl: Let's suppose we have not null Int by default and Int? as optional Int + null.

null in object space is like 0 in number space - a common crossing point of all kinds of Types share.

No, it isn't. You can do almost any operation with zero (ok, except division and other undefined cases like log etc.). On the other hand Int x := null; Int y := 1; y += x gives you NullErr. If you consider Int an object, not a primitive, then it is clean that 0 is not null because you have both. You would need Null Object to get that. Convenience operators like Groovy's ?. just alleviate the pain of handling an invalid reference, i.e. null, which is more similar to NaN.

Imagine the pain of driving around every single case of non-null when doing quick prototyping

I see no pain. You can use Int? whenever you like if you don't care about null. I don't consider writing a single extra character a pain.

Moreover Int? is a supertype of Int, so what if

  • some API method supported Int? Then you can put my Int or Int? right into it,
  • some API method supported just Int? Then, obviously, you can't just put an Int? into it which is a Good Thing. The API clearly stated that the original author didn't thought of accepting null as an argument of that method. You can either
    • ensure that you pass Int instead (that needs some work but it will work with no changes to the method itself)
    • relax the original API to accept Int? -- but in this case you must also ensure that the method won't fail on null input (that may be non-trivial).

    And to discriminate against null over other invalid parameter values is just unfair. Just because methods that does not throw IllegalArgumentException yet will give NullEr.

An exchange of IllegalArgumentException for NullEr is no win. Stephen's proposal (with or without not null by default) would give you a compile time error.

worse yet, you get unspecified behaviour and wrong answers which continues to propagate through the rest of the program.

Of course. But it is pure ignorance to put some default value to API which stated it really needs something reasonable. You get what you deserve then.

Yes, the proposal at the moment inserts and throws NullErr rather than forcing me to bind to a contract, it seems. Yet it isn't - if I tried to call a method which the original implementor did not see null as useful for a particular parameter and decided to just add an extra ! character to it, I will suddenly need to think really hard to try and get round that hole.

Very true. But you can be sure that null works in such a case instead of taking changes.

On the other hand this gives you also an opportunity to support your laziness if you are the implementor. You just mark in the signature that you didn't bother to think of that more complex case or that you didn't bother to handle a nonsential input.

Not null by default would also prevent you from writing unnecessary defensive code like this

Int getMyInt () {
   return 2;
}

x = getMyInt()
if (x != null) {
   // but x is already never null
   y += x
}

instead

x = getMyInt()
y += x

You have to check the documentation (which could be missing or wrong) or implementation code (which could change later) -- in either case you would probably let the if on its place cluttering your calling code. Or you just don't check -- maybe it's fine when prototyping but what if you wanted to put the snippet to production code? Review everything or take chances again?

If you really mean it, i.e. if this parameter is null then the problem will crop up somewhere else some moment later with completely irrelevant stack trace

That could be tricky and can pop up in most unwanted time. Why bother with that?

And more, if you want to be more confident about your code when you are shifting towards production, you would probably add more and more annotations (@nonnul) which make everything much less readable and you'd type yourself to death because it's likely you would do that for more than a half of your APIs.

And it is probably not the Not null by default case what forces you to put a lot of work into it -- it is not null added later or checked always at runtime.

This proposal is nothing like C++ references which can't be reassigned a you have to cast yourself to death...

brian Sat 6 Sep 2008

I'm pretty staunchly opposed to making nullable a true part of the type system. But I really like how Stephen's proposal attempts to provide a decent solution without extra type system complexity. Specifically his proposal does two important things - it auto-generates the null check code to fail fast and it clearly documents the developer intent in a way that is machine processable.

Honestly even though I like the proposal, I'm still a bit on the fence. The proposed solution seems a little too specific, and not enough general purpose. So I've been trying to think how this feature can dovetail into my desire for decorator functions. The more thought I've given to it, the more I like the approach I wrote up above. Let's assume that @nonnull is decorator facet somehow mapped to the function &NullErr.check(Obj) which throws an err if its argument is null. I can apply this facet to any slot or method parameter:

  • If I annotate a method parameter, then the function is applied to the parameter upon method entry.
  • If I apply the facet to the entire method, then the function wraps the return of the method (just like Python decorators).
  • If I apply the facet to a field, then the function is applied to the value before set

Trapping fields might warrant some more options like setter veto, or post set traps. But as a general purpose mechanism this could solve null error checking, field change handlers, and enable an aspect oriented programming style when appropriate.

I agree that @nonnull is probably so common that is warrants a bit of syntax sugar, so I'd be up for saying that:

Int! foo()   =>  @nonnull Int foo()
Void foo(Int! x) => Void foo(@nonnull Int x)

But under the covers it would just be sugar for the general purpose @nonnull facet. Although I don't like the postfix notation - I'd like to come up with a prefix notation since it is sugar for a prefixed facet. I think it always makes sense to have the compiler aware of that facet to double check that you don't pass the null literal to a @nonnull method - but I'm not favorable to further compile time checking.

So I'm proposing a more general purpose facet decorator feature, of which @nonnull and the some syntax sugar would be included. However, I'm also watering down the original proposal a bit, because this feature wouldn't apply to local variables, only slots and parameters. Although remember that in Fan 95% of your locals use type inference, so you don't actually have a true declaration.

If we like my proposal, then the next step is to figure out how @nonull is declared to be a decorator that calls NullErr.check - this might be the reason that convinces me to switch facets to true type declarations.

jodastephen Sun 7 Sep 2008

I think that with the syntax sugar concept that this could be a winner. It provides a simple to understand way to add true "facets" to code, while the syntax sugar for non-null provides for the essential common case.

(For the record, I believe non-null by default is the better approach in general for large systems, however I would say that optional non-null probably suits the simple type system of Fan better)

BTW, I don't see what you've described as a decorator, because the code implementing the facet doesn't call the original code, it is called on demand before or after a method or slot set. A true decorator of a method would call an invoke() or similar to invoke the wrapped code:

aDecortaorFacet {
  // do somthing before
  result := invoke()
  // do something after
}

A true decorator would also face the non-local return problem, because the invoked code might throw and exception or contain a return. Thus, if you want true decorators, I suspect this debate links to the non-local return one. However, simple "invoke at specific point" functions are much simpler to implement, so maybe we should start with those.

On the non-null side, I agree that non-null should block passing in null, and agree with avoiding any further compile-time checking. Also, slots that are defined as non-null should be checked at the end of the construction process (of course we don't have a way to do this right now...). There may be a market in the future for additional tools (like FindBugs) that could use the non-null info to try and avoid any NPEs by deducing unchecked nulls passed in, but that doesn't need to be part of the Fan compiler. (Although that might be an argument for plugins to the Fan compiler)

Login or Signup to reply.