#604 Operator overloading with multiple types

brian Fri 22 May 2009

Original discussion in #601.

Today operator overloading is done by creating a shortcut method, for example:

a + b  =>  a.plus(b)

This only allows a type to implement a given operator against one type. This has proved quite awkward. For example:

class DateTime
{
  Duration minus(DateTime time)
  DateTime plus(Duration duration)
}

That requires shenanigans like this to subtract a duration from a date time:

DateTime.now + (-1hr)

The proposed fix is to allow a shortcut operator to bind against multiple potential methods using a naming convention. Given expr a typed as A and expr b typed as B, the compiler will translate a+b according to these rules:

  1. If A.plus exists and fits B, then it is used
  2. If A.plusB exists and fits B, then it is used
  3. If A.plusX exists where X is a base class of B, then it is used
  4. If A.plusM exists where M is a mixin of B, then it is used

In all cases the naming convention uses the unqualified type name of B and its super types.

brian Fri 22 May 2009

Promoted to ticket #604 and assigned to brian

qualidafial Fri 22 May 2009

+1.

Let's have rule 3 also state that subclasses take precedence over superclasses. e.g. if F extends E which in turn extend D, and A declares both plusD and plusE, then a + f resolves to a.plusE(f). I presume you meant this to be the case, but the documents should spell this out fully.

Rule 4 should show a compiler error if A.plusM is satisfied more than once, e.g. A declares plusP and plusQ and B mixes in both P and Q. In this case B must be casted to P or Q to compile, e.g. a + b becomes a + (P) b.

KevinKelley Fri 22 May 2009

This sounds interesting, and like it might work out. If we can have something that works right, I'm way in favor of it.

brian Fri 22 May 2009

@qualidafial: yes, I will basically use the same rules as Java uses to resolve the best fit, and report an ambiguity error if there are multiple fits (same code we already use in Java FFI for resolving to an overloaded method).

JohnDG Fri 22 May 2009

This looks fine to me, but an alternative that just popped into my mind bears mentioning: introducing a form of multiple dispatch.

Suppose that methods truly are unique, that is, there is only one A#plus, but suppose you can declare more than one with the same number of arguments:

mdispatch A plus(A a) {
    ...
}

mdispatch A plus(B b) {
    ...
}

This is translated into the following:

Obj plus(Obj a) {
    if (a is A) {
        return ...
    }
    else if (a is B) {
        return ...
    }
    throw new Err()
}

which is the type reflected through A#plus.

This would be more generally useful, as many times it's convenient for a function to operate on two or more types that do not share a common superclass other than Obj, and in such cases, it's still desirable to obtain type safety.

Meanwhile, it would still preserve the unique name per slot, and keep reflection nice and simple.

brian Fri 22 May 2009

This looks fine to me, but an alternative that just popped into my mind bears mentioning: introducing a form of multiple dispatch.

Mutli-dispatch was in my head as a potential more generic solution. Although in normal methods you don't get trapped because you can just pick another method name. Operators are special in that you only get one name.

I am not going to rush into this feature though, I want to think about it for a while.

Another thing this feature would enable is auto-coercion with math between things like Int and Float. Today you have to convert to the same type first. That used to be a good thing when everything was boxed. But now it is a bit annoying.

qualidafial Fri 22 May 2009

Another thing this feature would enable is auto-coercion with math between things like Int and Float. Today you have to convert to the same type first. That used to be a good thing when everything was boxed. But now it is a bit annoying.

Why not just have separate methods for each type of primitive?

class Int
{
  Int plus(Int b)
  Float plusFloat(Float b) { toFloat + b }
  Decimal plusDecimal(Decimal b) { toDecimal + b }
}

class Float
{
  Float plus(Float b)
  Float plusInt(Int b) { plus(b.toFloat) }
  Decimal plusDecimal(Decimal b) { toDecimal + b }
}

class Decimal
{
  Decimal plus(Decimal b)
  Decimal plusNum(Num b) { plus(b.toDecimal) }
}

Sure it makes the API a bit noisy but it makes arithmetic expressions less noisy everywhere else.

brian Fri 22 May 2009

Why not just have separate methods for each type of primitive?

That was what I was trying to say, with this feature we can do that.

qualidafial Mon 8 Jun 2009

Ping.

brian Mon 8 Jun 2009

I'm still thinking about this. I think it may be a while before I tackle it (especially until after we get the breaking changes done).

qualidafial Wed 8 Jul 2009

I've been thinking about the order of precedence and I think in practice it could be confusing. Suppose you have:

Num plus(Num a)
Int plusInt(Int a)
Float plusFloat(Float a)

I would expect a+b to resolve to a.plusInt(b) if b is an Int, since Int is the most specific known type. However it resolves to a.plus(b) instead because there is a plus method.

A couple alternatives:

  • Resolve to the method with the most specific type.
  • Move rule 1 down to #4 so that the most specifically typed method wins first. State in the policy that the untyped method should accept the most generic type.
  • Make untyped operator overloading mutually exclusive to typed operator overloading. That is, if you have both plus(Num) and plusInt(Int) then the compiler will not convert + expressions to .plus for you.

brian Wed 8 Jul 2009

The issue of method resolution (which is effectively very similar to method overloading in Java) is a sticky point for me which is why I haven't done this.

I'm actually thinking of doing a more explicit syntax such as:

Num +(Num a) 
Int +(Int b) 

Which would generate the plusNum and plusInt explicit (so it was always named with its type argument). That would be a form of overloading for just operators.

Or we could just require it to be "<name><type>"

But I don't really like either one of those.

qualidafial Wed 8 Jul 2009

Do we have to require that methods be named "<operator><type>"? Why not just "<operator><UniqueIdentifier>"?

Num plusA(Num a)
Int plusB(Int a)

Another option is to mark operator methods with facets/symbols, allowing you to name the methods whatever you want:

@plus Num foo(Num a)
@plus Int bar(Int a)
@plus Float baz(Float a)

The compiler would pick the best fit, similar to how Java resolves overloaded methods.

qualidafial Wed 8 Jul 2009

Personally my vote is to drop the order of precedence and just allow any method starting with plus to be used for +, then let the compiler decide which method to invoke based on which is the most specific to the argument.

tompalmer Wed 8 Jul 2009

any method starting with plus to be used for +

You'd at least need to require a non-lowercase letter next, so for example divert isn't taken as having a div prefix. I think this kind of logic probably leads to the rule of needing to match the plain type name. I'm not sure which style I think best.

JohnDG Thu 9 Jul 2009

Multiple dispatch is really the generalized solution to this problem. The logic is the same and it handles this case and others very well (and in a statically type safe manner). I'll do a detailed proposal if there's any interest.

brian Thu 9 Jul 2009

Multiple dispatch is really the generalized solution to this problem.

Not sure I follow that - Java style single dispatch would solve this problem if we allowed method overloading. How would multiple dispatch work without overloading?

qualidafial Thu 9 Jul 2009

With multiple dispatch you lose the guarantee on the return type. e.g. DateTime.minus(Duration) returns a DateTime, but DateTime.minus(DateTime) returns a Duration. If both operations are inside one method then you have Obj minus(Obj a) and you lose all that type information.

qualidafial Wed 12 May 2010

Ping.

Coming back to this conversation, I like the annotations option best--it's just replacing one convention (naming) with another, freeing the programmer to choose whichever method names are convenient.

One possible benefit is that you can make the primary operator method public and the rest of them private. e.g.

class Int
{
  @plus Int plus(Int a)
  @plus private Num plusNum(Num a)
  @plus private Float plusFloat(Float a)
}

This way in certain special cases you can allow use of operators without making the API docs so noisy with all the overloaded alternatives.

Thus when you type (Int a) + (Num b) the compiler could still resolve the expression to (Int a).plusNum((Num b)). It's possible that this breaks some accessibility rule in the virtual machine though, I'm not entirely sure. It would be cool though.

qualidafial Fri 17 Sep 2010

Ping. Any interest in tackling this for 1.0?

brian Fri 17 Sep 2010

Any interest in tackling this for 1.0?

Its one of those on the fence features. I would like to tackle because it might result a breaking change. And I believe it might result in a solution for dealing with Java overrides for methods overridden by parameters. But I haven't really been super happy with anything proposed so far.

katox Fri 17 Sep 2010

Do we need the solution to be compile time safe? It seems that JohnDG's solution isn't (unless I'm missing something). But multiple dispatch would useful for other things like double-dispatch implementation of the visitor pattern. Right now a Fantom implementation would look similar to Java - that is not a good thing.

The main UC seems to be complicated by units too - is it sure that function overloading (or an alternative) would solve it completely?

katox Thu 23 Sep 2010

I stumbled upon this interesting discussion full of insightful links at StackOverflow.

As I have never needed method overloading in Fantom - with the exception of double dispatch which is more general than method overloading and thus not supported anyway - I'm inclined to solutions which handle just operator overloading (by convention, facets or similar) rather than full method overloading.

Complicating the dynamic side of the language just because of this is definitely too expensive trade-off.

qualidafial Thu 23 Sep 2010

Brian, I also proposed using facets to mark operator methods:

class DateTime
{
  @Minus Duration minusDateTime(DateTime)
  @Minus DateTime minusDuration(Duration)
}

This way you just declare as many methods per operator as are required. When the compiler sees:

dateTime - duration

It will scan DateTime for unary methods annotated with @Minus, and pick the method which is the closest fit. If there is ambiguity, the compiler can emit a message to the effect that you need to cast one argument to remove the ambiguity.

Methods like this could even go in their own Operators section of the fandoc.

Brian, could you comment on what in this particular proposal you disliked?

brian Thu 23 Sep 2010

@qualidafial

I think your proposal is probably the way to go - its clean and simple. The main reason I've held back is because this problem is so closely related to handling Java method overloading in subclassing. But maybe we should keep them as orthogonal problems and pick the most elegant solution for this problem.

There is also a part of me that dislikes creating a whole language feature around what is mostly the special case of DataTime/Duration. Other than that annoyance, I haven't found this to really be a problem. Is that justification to complicate the simple system we have now?

Although if we did do it, then we could overload arithmetic operators for Int, Float, and Decimal to handle the difference coercion cases.

qualidafial Thu 23 Sep 2010

I think you just made your own case with the primitive thing.

There's also the case of my fan-measure project which aims to provide a fluent API for converting different measurements:

dist := Dist.km(80)
time := 2hr
speed := dist / time  => 40km/hr
klics := dist / Dist.km(1) => 40 (Int)

dmoebius Fri 24 Sep 2010

Why not simply use:

class DateTime
{
  Duration -(DateTime)
  DateTime -(Duration)
}

dmoebius Fri 24 Sep 2010

Sorry. Just saw that Brian already proposed that. Should've read the whole thread.

brian Fri 24 Sep 2010

BTW, I'm thinking instead of polluting the sys namespace with a bunch of marker interfaces for different operators, of having a single Operator facet:

@Operator { unary = "-" }
Int negate(Int i)

@Operator { binary = "+" }
Int plus(Int i)

@Operator { binary = "-" }
Int minus(Int i)

DanielFath Fri 24 Sep 2010

Will this be allowed:

@Operator { binary = "-"}
Bool myMinus (Bool a)

or will still need to start name with minus/negate/etc.

Overall I like the idea with the operator, still the nice thing about minus/plus is that usually forces you into a specific context, preventing most operator abuse. I think just requiring the method to start with minus would be a nice compromise.

PS. Does this means all shortcut operators (get/set/equals) will be instanced via annotations or just the few problematic ones?

rfeldman Fri 24 Sep 2010

+1 to Brian's proposed syntax of the @Operator facet

Although I think you could do away with the unary and binary distinction - since these methods are non-static, isn't the argument for unary meaningless anyway?

e.g.

@Operator { value = "-" }
Int negate() // zero arguments implies unary

@Operator { value = "-" }
Int minus(Int other) // one argument implies binary

yachris Fri 24 Sep 2010

+1 as well.

rfeldman, wouldn't the annotation help the compiler? If not, you have to put something there; saying "unary" or "binary" is a lot more meaningful than "value", plus magic you have to infer from argument counts. Clearer is better :-)

rfeldman Fri 24 Sep 2010

I see your point, but I'm more concerned with what's easy for the developer than what's easy for the compiler.

Requiring the developer to spell out unary vs. binary seems like it would be prone to copy/paste errors, wherein you change the method name and signature but forget to swap "unary" for "binary" and end up with a failed build for a pretty silly reason.

yachris Fri 24 Sep 2010

Well, I guess we have differing definitions of easy for the developer :-)

I think it'd probably be just as easy to paste in an extra argument (or forget it) and get the wrong kind that way.

brian Fri 24 Sep 2010

Yeah I suppose the unary or binary could be easily implied by whether the method takes a parameter. And it probably makes sense to force naming conventions so that your prefix has to be minus, plus, etc. In that case then Operator can just be a marker facet and the symbol is derived from the prefix:

@Operator minus(DateTime)
@Operator minusDuration(Duration)

That seems pretty simple and forces a standard naming convention.

qualidafial Tue 28 Sep 2010

Actually I think this is worse. API designers should have to do one thing for a method to be considered an operator.

I'm picturing myself staring at a method with an @Operator tag and a subtle naming error, and scratching my head about why the compiler won't let me use this operator:

class Foo
{
  @Operator Foo inc(Foo) // should be named 'increment'
}

This scenario applies whether or not you have an @Operator facet.

We can avoid this in one of several ways:

  1. Have the compiler flag an error if a method is tagged @Operator but the name does not match an operator
  2. Add a field to the facet identifying the type of operator @Operator { op = Op.minus } (necessitating a new Op enum)
  3. Introduce a separate facet per operator.

Personally I still prefer option 3 because while it does pollute the sys namespace, it tidies up the code at every call site:

class Foo
{
  @Increment Foo inc(Foo)
}

brian Tue 28 Sep 2010

I don't think you will be scratching your head as long as the compiler tells you exactly what is wrong your implementation. Whatever the design there are still lots of things that must be checked (arity, non-Void return, naming conventions, etc).

While I don't personally fall into the "operator overloading is evil" camp, I do strongly discourage it for anything beyond the math definitions. So I think naming conventions should be required.

brian Wed 10 Nov 2010

Ticket resolved in 1.0.56

This feature is posted to hg and will be in the next build. This turned out to be a fairly massive feature covering nine changesets.

Under the new design operator methods are annonated with the @Operator facet. Binary operators support overloading by parameter type. Naming conventions are enforced by the compiler and used to determine the operator symbol.

Breaking Changes

To implement this feature properly required some breaking changes:

  1. All slice methods were renamed getRange (Str, List, Buf, Uri); I have left old slice methods, but deprecated them
  2. Renamed Date.minus to minusDate, DateTime.minus to minusDateTime
  3. It is possible that if you were using an operator such as + or [] with auto-casting that your code will now report a error since these operators are now overloaded; it is easy to fix by inserting an explicit cast
  4. If you were creating your own operators, you will need to apply the @Operator facet
  5. If you have code like Obj+Str you will now get a compiler error which can be fixed:
    obj + "xxx"        // won't work anymore
    "${obj}xxx         // switch to interpolation
    "" + obj + "xxx"   // add leading ""
    obj.toStr + "xxx"  // call toStr if non-nullable

Sys API Operator Enhancements

I added several new operator methods which previously required awkward work arounds:

Date.minus(Duration)     // can now just use 'Date.today - 1day'
DateTime.minus(Duration) // can now just use 'DateTime.now - 1hr'

Number operators now support all the different combinations:

Int + Int         =>  Int
Int + Float       =>  Float
Int + Decimal     =>  Decimal
Float + Int       =>  Float
Float + Float     =>  Float
Float + Decimal   =>  Decimal
Decimal + Int     =>  Decimal
Decimal + Float   =>  Decimal
Decimal + Decimal =>  Decimal

Same for all the other operators mult, div, mod, and minus. In most cases they compile down directly into a few Java bytecodes.

Design (from Updated docLang)

Fantom supports operator overloading using operator methods. Operator methods are just normal methods which are annotated with the @Operator marker facet. The following naming conventions are enforced for determining which operator is used by the method:

prefix     symbol    degree
------     ------    ------
negate     -a        unary
increment  ++a       unary
decrement  --a       unary
plus       a + b     binary
minus      a - b     binary
mult       a * b     binary
div        a / b     binary
mod        a % b     binary
get        a[b]      binary
set        a[b] = c  ternary

In the case of the unary and ternary operators the method name must match exactly. For the binary operators, the method must only start with the given name. This allows binary operators to be overloaded by parameter type:

class Foo
{
  @Operator Int plusInt(Int x) { ... }
  @Operator Float plusFloat(Float x) { ... }
}

Foo + Int    =>  calls Foo.plusInt and yields Int
Foo + Float  =>  calls Foo.plusFloat and yields Float

The compiler performs method resolution of operators using a very simple algorithm. If there are multiple potential matches the compiler will report an error indicating the operator resolves ambiguously. The compiler does not take class hierarchy into account to attempt to find the "best" match.

qualidafial Wed 10 Nov 2010

This is great news! Can't wait for the next build...

I just want to say thank you Brian and Andy, for sharing Fantom with us, and for putting in so much time and effort to make it great.

Fantom is well poised to become the next big language; it's exciting to watch it unfold and to take part in it.

jodastephen Wed 10 Nov 2010

Although I proposed a different syntax for indicating it is an operator recently, I believe the annotation works just fine. The naming prefix convention works well too.

Can you give an example of the simple matching rules ignoring class hierarchy? I'd like to understand what the compromise is.

brian Wed 10 Nov 2010

Although I proposed a different syntax for indicating it is an operator recently, I believe the annotation works just fine. The naming prefix convention works well too.

The syntax proposed on IRC doesn't really work because every method still has to have a valid name for dynamic calls. This design makes it explicit in the docs what the name is for every method.

Can you give an example of the simple matching rules ignoring class hierarchy? I'd like to understand what the compromise is.

Consider:

class Foo
{
  @Operator Num plusNum(Num n) { ... }
  @Operator Int plusInt(Int n) { ... }
}

In the code about Foo + Int will not compile because there are two potential matches for the operator. In Java the compiler then drops down and starts checking the class hierarchy. But I am not ready to do that in Fantom, nor do I really think it is a good application of operators either. In this case if you want a catch-all for Num, then just create one method that takes Num.

jodastephen Wed 10 Nov 2010

I think the lack of class hierarchy lookup is fine then, except in the case of evolution.

If you start with a plusInt() method, and later want to change it to plusNum(), keeping the original as deprecated, you can't. Doing so would break existing code.

There are solutions to this - class hierarchy lookup, not matching method marked as deprecated, or (my favourite, but the hardest) encoding changes between versions in a form such that the compiler can make the change automatically.

brian Wed 10 Nov 2010

I think the lack of class hierarchy lookup is fine then, except in the case of evolution.

I agree, we can always revisit it later. Right not we do the strictest thing which lets us loosen up the rule in the future.

I also want to detail one more fairly big breaking change I made as part of this feature. I removed the special case of string concatenation. Consider:

obj + "str"

Technically Obj doesn't have an plus(Str) operator, so this code should not work. Previously I made it a special case. However with this new design, I think we should keep everything strictly according the operator design so that class designers can decide how they handle this + Str.

If you have code like Obj+Str you will now get a compiler error which can be fixed:

obj + "xxx"        // won't work anymore
"${obj}xxx         // switch to interpolation
"" + obj + "xxx"   // add leading ""
obj.toStr + "xxx"  // call toStr if non-nullable

Login or Signup to reply.