Blog of Gavin King

Object-oriented != imperative

Dear FP community: one of the things I really like about you folks is the rigor you've brought to the field of programming language design. Compared to the kind of magical and folklore-based thinking we've grown accustomed to in the field of computing, your approach to problems is a massive breath of fresh air. But there's one area where you guys seem to have fallen into some rather fuzzy and unfounded rhetoric. What I'm taking about is your continued conflation of object orientation with imperative programming.

When we program with classes and objects, we have the choice between expressing ourselves using:

  • mutable objects, or
  • immutable objects.

This is no different to programming using functions, where we have the choice between:

  • impure functions, or
  • pure functions.

The use of mutable objects and impure functions is called imperative programming and is the norm in business and scientific computing. Another way to describe it is "programming with side-effects". Savvy programmers from both the OO and FP traditions have long understood that side-effects make it more difficult to reason about complex code, and that a design based upon immutable objects and/or pure functions is often superior.

Indeed, I've been hearing the advice to prefer immutable objects, and avoid side-effects, for as long as I've been programming using object-oriented languages.

Oh, but wait, you might respond: isn't my point totally specious, when so many of the objects that people actually write in practice are actually mutable? Well, no, I don't think so. The truth is, that almost as many of the functions that people actually write in practice are impure. Programming with functions, and without objects, does not in itself innoculate us against side-effects. Indeed, the disciplined use of immutability that we see in some parts of the FP community is simply not the norm in business or scientific computing, even in languages which don't support object orientation. Trust me, the old Fortran code I used to mess about with when I was doing scientific computing work back in university was certainly no more free of side-effects than typical object-oriented code I've worked with since.

Perhaps a source of the confusion here is that we say that objects "hold state and behavior". When some people who aren't so familiar with OO read this, they imagine that by "state", we mean mutable state. But that's not quite right. What this statement is saying is that an object holds references to other objects, along with functions that make use of those references. Thus, we can distinguish one instance of a class from another instance, by the different references (state) it holds. We don't mean, necessarily that those references are mutable, and, indeed, they're very often immutable, especially in well-designed code.

"So", replies my imaginary FP interlocutor, "how then would such an immutable object be any different to a closure?" Well, at some level it's not any different, and that's the point! There's nothing unfunctional about objects! You can think of a class as a function that returns the closure of its own exported nested declarations. (Indeed, this is precisely how we think of a class in Ceylon.)

Among the modern languages, what really distinguishes a language as object-oriented is whether it supports subtype polymorphism (or structural polymorphism, which I consider just a special kind of subtyping). Thus, some languages that people consider "functional" (ML, Haskell) aren't object-oriented. Others (OCaml, F#) are.

A request: FP folks, please, please improve your use of terminology, because I've seen your misuse of the term "object-oriented" creating a lot of confusion in discussions about programming languages.

The signature of reduce() in Ceylon

The Iterable interface defines a method named fold() with this signature:

Result fold<Result>(Result initial)
        (Result accumulating(Result partial, Element element))

Where Element is the element type of the Iterable. This method accepts an initial value, and an accumulator function which is applied to each element of the iterable object in turn. For example:

Integer sum = (1..10).fold(0)(plus);

Sometimes, we don't need the initial value, since can simply start accumulating from the first element. Following the convention used by Scala and F#, let's call this function reduce(). Then we would like to be able to write:

Integer sum = (1..10).reduce(plus);

But what should the signature of this method be? A first stab might give us:

Element reduce(Element accumulating(Element partial, Element elem))

But this signature is a bit more restrictive than it should be. It's perfectly reasonable for the result type of reduce() to be a supertype of the element type. Scala handles this using a lower bound type constraint. Transliterating this to Ceylon, using an imaginary syntax for lower bounds, it would look like:

Result reduce<Result>(Result accumulating(Result partial, Element elem))
        given Result abstracts Element

Here the lower bound constraint ensures that the first element is assignable to the first parameter of the accumulator function.

But Ceylon doesn't have lower bound type constraints. Why? Well, because it seems that we can in practice almost always use union types to achieve the same effect. So let's try that:

Result|Element reduce<Result>(
        Result accumulating(Result|Element partial, Element element))

Now let's try to implement this signature. One possibility would be:

Result|Element reduce<Result>(
        Result accumulating(Result|Element partial, Element element)) {
    assert (!empty, is Element initial = first);
    variable Result|Element partial = initial;
    for (elem in rest) {
        partial = accumulating(partial, elem);
    }
    return partial;
}

The assertion handles the case of an empty Iterable, resulting in an AssertionException if the iterable object has no first element.

Alternatively, we might prefer to return null in the case of an empty Iterable, which suggests the following implementation:

Result|Element|Null reduce<Result>(
        Result accumulating(Result|Element partial, Element element)) {
    if (!empty, is Element initial = first) {
        variable Result|Element partial = initial;
        for (elem in rest) {
            partial = accumulating(partial, elem);
        }
        return partial;
    }
    else {
        return null;
    }
}

Going back to Scala, we notice that Scala has two versions of reduce(), which are exactly analogous to the two possibilities we've just seen. The first version throws an exception in the empty case, and the second version, reduceOption(), returns an instance of the wrapper class Option.

But in Ceylon, we can do better. In Ceylon, Iterable has a slightly mysterious-looking second type parameter, named Absent, with an upper bound given Absent satisfies Null. An Iterable<T,Null>, which we usually write {T*}, is a possibly-empty iterable. An Iterable<T,Nothing>, which we usually write {T+}, is an iterable we know to be nonempty.

Thus we arrive at the following definition of reduce():

 Result|Element|Absent reduce<Result>(
        Result accumulating(Result|Element partial, Element element)) {
    value initial = first;
    if (!empty, is Element initial) {
        variable Result|Element partial = initial;
        for (elem in rest) {
            partial = accumulating(partial, elem);
        }
        return partial;
    }
    else {
        return initial;
    }
}

Now, for a "spanned" range expression like 1..n, which is nonempty, we get a non-null return type:

Integer sum = (1..n).reduce(plus);

On the other hand, for a "segmented" range expression like 1:n, which is possibly-empty, we get an optional return type:

Integer? sum = (1:n).reduce(plus);

Best of all, it never throws an exception. This is, I humbly submit, Pretty Damn Nice.

Notice just how much work union types are doing for us here. Compared to Scala's reduce()/reduceOption(), they let us eliminate:

  • a lower bound type constraint,
  • a second, effectively overloaded, version of the method, and
  • the wrapper Option class.

I've added this definition of reduce() to Iterable, and it will be available in the next release of Ceylon.

On three-legged elephants

I've often argued that good design—of a language, library, or framework—isn't about packing in as many features as possible into your solution, rather it's about discovering a small set of interlocking features that act to reinforce each other. That is to say, we want to maximize the power/expressiveness of our solution, while simultaneously minimizing the surface area. I am, furthermore, quite often willing to sacrifice some flexibility for elegance. It's my view that elegant solutions are easier to learn, more enjoyable to use, and easier to abstract over.

With this in mind, I would like to consider a problem that language designers have been working on for at least two decades: how to combine subtype polymorphism with parametric polymorphism (generics). This is a central problem faced by any object-oriented language with static typing. Recent languages have come progressively closer to a satisfying solution, but I would like to submit, if it doesn't sound too self-serving, that Ceylon offers the most satisfying solution so far.

Our mascot is Trompon the elephant, because an elephant has four legs and would fall over if one of his legs were missing. Ceylon's type system is exactly like this! (Yes, this is a baxplanation.)

The four legs of the type system are:

  • declaration site variance
  • ad hoc union and intersection types
  • type inference based on principal typing
  • covariant refinement and principal instantiation inheritance

If we were to take away any one of those four characteristics, all of a sudden stuff that Just Works would simply not work anymore, or, even if we could make it work, it would turn out way more complex, and involve reasoning that is difficult for the programmer to reproduce.

Consider this really simple line of code:

value animals = ArrayList { Cat(), Dog(), Person() };

The inferred type of animals is ArrayList<Cat|Dog|Person>.

  • If we were to take away declaration site covariance, then animals would not be assignable to List<Animal>.
  • If we were to take away union and intersection types, then the process for inferring the type argument would be ambiguous and much more complex. (Ceylon's type argument inference algorithm is defined in two pages of pseudocode in the language specification, which sounds like a lot, until you realize how problematic and underspecified this algorithm is in other languages, and that the actual implementation of the algorithm is not much more longer.)
  • If we were to take away type inference, or principal typing, we would need to explicitly write down some uninteresting types in this line of code.

Minus any one of these characteristics, we're left with a three-legged elephant.

Principal instantiation inheritance is a kind-of hidden feature of Ceylon that we don't talk about much, even though we use it extensively throughout the design of our container types. It lets us say, for example, that a List<Element> is a Ranged<List<Element>>, and that a Sequence<Element> is a List<Element>&Ranged<Sequence<Element>>. Principal instantiation inheritance meshes really nicely with declaration-site covariance, and with ad hoc union/intersection types. Consider the following code:

List<String>|Integer[] ranged = ... ;
value span = ranged.span(0,max); 

Here, the inferred type of span is List<String>|Integer[]. Isn't that nice? The typechecker has reasoned that the principal supertype instantiation of Ranged for List<String>|Integer[] is the type Ranged<List<String>|Integer[]>, and thereby determined the perfect return type for span().

If we were to take away any one of declaration site covariance, principal instantiation inheritance, or union types, then this reasoning would no longer be sound. The elephant would fall on his ass.

A paradox about typesafety and expressiveness

It's often argued that dynamic typing is more expressive, and I am, for the most part, willing to go along with that characterization. By nature, a language with dynamic typing places fewer constraints on me, the programmer, and lets me "express myself" more freely. I'm not going to, right now, re-argue the case for static typing, which is anyway quite clear to anyone who has ever written or maintained a decent-sized chunk of code using an IDE. However, I would like to point out a particular sense in which dynamic typing is much less expressive than static typing, and the consequences of that.

(Now, please bear with me for a bit, because I'm going to take a rather scenic route to getting to my real point.)

In a nutshell, dynamic typing is less expressive than static typing in that it doesn't fully express types.

Of course that's a rather silly tautology. But it's still worth saying. Why? Because types matter. Even in a dynamically-typed language like Python, Smalltalk, or Ruby, types still matter. To understand and maintain the code, I still need to know the types of things. Indeed, this is true even in weakly-typed JavaScript!

The consequence of this is that dynamically-typed code is far less self-documenting than statically typed code. Quick, what can I pass to the following function? What do I get back from it?

function split(string, separators)

Translated to Ceylon, this function signature is immediately far more understandable:

{String*} split(String string, {Character+} separators)

What this means, of course, is that dynamic typing places a much higher burden on the programmer to conscientiously comment and document things. And to maintain that documentation.

On the other hand, static typing forces me to maintain the correct type annotations on the split() function, even when its implementation changes, even when I'm in a hurry, and my IDE will even help me automatically refactor them. No IDE on Earth offers the same kind of help maintaining comments!

Now consider what else this extra expressiveness buys me. Suppose that Foo and Bar share no interesting common supertype. In any dynamic language on earth, I could write the following code:

class Super {
    function fun() { return Foo(); }
}

class Sub extends Super {
    function fun() { return Bar(); }
}

But if I did write such a thing, my team-mates would probably want to throttle me! There's simply too much potential here for a client calling fun() to only handle the case of Foo, and not realize that sometimes it returns a Bar instead. Generally, most programmers will avoid code like the above, and make sure that fun() always returns types closely related by inheritance.

As a second example, consider a well-known hole in the typesystem of Java: null. In Java, I could write:

class Super {
    public Foo fun() { return Foo(); }
}

class Sub extends Super {
    public Foo fun() { return null; }
}

Again, this is something Java developers often avoid, especially for public APIs. Since it's not part of the signature of fun() that it may return null, and so the caller might just obliviously not handle that case, resulting in an NPE somewhere further down the track, it's often a good practice to throw an exception instead of returning null from a method belonging to a public API.

Now let's consider Ceylon.

class Super() {
    shared default Foo? fun() => Foo();
}

class Sub() extends Super() {
    shared actual Null fun() => null;
}

Since the fact that fun() can return null is part of its signature, and since any caller is forced by the typesystem to account for the possibility of a null return value, there's absolutely nothing wrong with this code. So in Ceylon it is often a better practice to define functions or methods to just return null instead of throwing a defensive exception. Thus, we arrive at an apparent paradox:

By being more restrictive in how we handle null, we make null much more useful.

Now, since a "nullable type" in Ceylon is just a special case of a union type, we can generalize this observation to other union types. Consider:

class Super() {
    shared default Foo|Bar fun() => Foo();
}

class Sub() extends Super() {
    shared actual Bar fun() => Bar();
}

Again, there's absolutely nothing wrong with this code. Any client calling Super.fun() is forced to handle both possible concrete return types.

What I'm saying is that we achieved a nett gain in expressiveness by adding static types. Things that would have been dangerously error-prone without static typing have become totally safe and completely self-documenting.

Intersections and variance

Warning: the following post is for people who enjoy thinking about types, and want to understand how the Ceylon compiler reasons about them. This post is partly an attempt to demystify §3.7 of the spec. To understand the following, you first need to understand how variance works in Ceylon.

Snap quiz!

Consider:

MutableList<Topic> | MutableList<Queue> list;
  1. What is the type of list.first?
  2. What is the type of the parameter of list.add()?

Background

One of the key goals of Ceylon's type system is to make subtyping and generics play nicely together. Almost every object-oriented language since C++ and Eiffel has tried this, and more recent languages have been getting gradually closer to a really convincing solution, without, in my opinion, completely nailing it. Ceylon's contribution, of course, is to introduce first-class union and intersection types to the mix. In this post we'll see one of the ways in which they help.

Four useful identities

To begin to fully understand how the Ceylon typechecker reasons about generic types, you need to grasp the central importance of the following identities:

  1. Given a covariant type List<Element>, the union List<String>|List<Integer> is a subtype of List<String|Integer>.
  2. Given a contravariant type ListMutator<Element>, the union ListMutator<Queue>|ListMutator<Topic> is a subtype of ListMutator<Queue&Topic>.
  3. Given a covariant type List<Element>, the intersection List<Queue>&List<Topic> is a subtype of List<Queue&Topic>.
  4. Given a contravariant type ListMutator<Element>, the intersection ListMutator<String>&ListMutator<Integer> is a subtype of ListMutator<String|Integer>.

Note that in each identity, we take a union or intersection of two instantiations of the same generic type, and produce an instantiation of the generic type that is a supertype of the union or intersection.

These identities are actually quite intuitive. No? OK then, let's consider them one at a time, and convince ourselves that they make sense:

  1. If we have a list that, when iterated, either produces Strings or produces Integers then clearly every object it produces is either a String or an Integer.
  2. If we have a list in which we can either insert Queues or insert Topics, then we clearly have a list in which we can insert something that is both a Queue and a Topic.
  3. If we have a list that, when iterated produces only Queues, and, when iterated, produces only Topics, then clearly every object it produces must be both a Queue and a Topic.
  4. If we have a list in which we can both insert a String and insert an Integer, then we clearly have a list in which we can insert something that we know is either a String or an Integer.

Satisfied? Good.

Generic supertypes and principal instantiations

So, how are these identities useful? Well, imagine that we have a type T whose supertypes include multiple distinct instantiations of a generic type. For example, imagine that T has the supertypes List<String> and List<Integer>. And suppose we want to determine the type of one of the members inherited by T from the generic type, say List.first.

Then we would first need to form a principal instantiation of the generic type List. In this case the principal instantiation would be List<String|Integer>, according to identity 1. It's a "principal" instantiation because every other instantiation of List that is a supertype of T is also a supertype of List<String|Integer>. That it is even possible to form a principal instantiation like this is one of the things that makes Ceylon's type system special. Now we can determine the type of first by substituting String|Integer for the type parameter of List. For the record, the result is the type String|Integer|Null.

A type like T arises when we explicitly write down an intersection like List<Queue>&List<Topic>, or a union like List<String>|List<Integer>, but it also arises through the use of inheritance.

Principal instantiation inheritance

In Java, a type can only inherit a single instantiation of a supertype. A class can't inherit (either directly or indirectly) both List<Queue> and List<Topic> in Java. We call this inheritance model single instantiation inheritance. Ceylon features a more flexible model called principal instantiation inheritance1, where this restriction does not apply. I'm allowed to write:

interface Queues satisfies List<Queue> { ... }
interface Topics satisfies List<Topic> { ... }

class Endpoints() satisfies Queues & Topics { ... }

In which case, Endpoints is a subtype of List<Queue>&List<Topic>. Then the typechecker uses identity 3 above to automatically infer that Endpoints is a List<Queue&Topic>. Other languages can't do this.

Problem: invariant types

Now notice something about the identities above: they don't say anything about invariant types. Unfortunately, I simply can't form a principal instantiation of MutableList that is a supertype of MutableList<Queue>&MutableList<Topic>. Or at least I can't within Ceylon's type system as it exists today.

Caveat: I'm not a fan of use-site variance. I've been too burned by it in Java. However, if we do ever decide to introduce use-site variance in Ceylon, which does remain a real possibility, what I've just said will no longer apply, since this instantiation:

MutableList<in Queue&Topic out Queue|Topic> 

would be a principal supertype for MutableList<Queue>&MutableList<Topic>. But who the hell wants to have to deal with nasty-looking types like that?

Conclusion

The significance of this is that we should, where reasonable, be careful with how we use invariant types in Ceylon. To be sure, we still need invariant types, especially for concrete implementations like the class Endpoints above, but we should try to declare public APIs on covariant and contravariant types.

Consider the case of MutableList. What I'm saying is that its interesting operations should all be inherited from the covariant List and the contravariant ListMutator. It doesn't matter much whether we declare MutableList as a mixin interface like this:

shared interface MutableList<Element> 
        satisfies List<Element> & 
                  ListMutator<Element> { ... }

Or just as a type alias, like this:

shared alias MutableList<Element> 
        => List<Element> & ListMutator<Element>;

In either case, the following code2 is accepted by the typechecker, assuming the interesting operations first and add() are defined on List and ListMutator respectively:

void doSomeStuff(MutableList<Topic>|MutableList<Queue> list) {
    Topic|Queue|Null topicOrQueue = list.first;
    if (!topicOrQueue exists) {
        Topic&Queue topicAndQueue = .... ;
        list.add(topicAndQueue);
    }
    ...
}

Here we're calling both "producer" and "consumer" operations of the invariant type MutableList on an intersection of distinct instantiations of MutableList. The typechecker determined that the principal instantiation of the covariant supertype is List<Topic|Queue> and that the principal instantiation of the contravariant supertype is ListMutator<Topic&Queue>, and that therefore the code is sound.

1The notion of principal instantiation inheritance is mainly based on Ross Tate's research. As far as I know, the idea of combining principal instantiation inheritance with union and intersection types is original to Ceylon.

2Thanks for Riener Zwitzerloot for this example and for inspiring this post.