Blog of Gavin King

Why build your own type system?

In Ceylon 1.2 we've factored out the type system of Ceylon as an independent module, with minimal dependencies and a clean API. The ceylon-model project incorporates:

  • an extensible object-oriented model of the type system in Ceylon,
  • algorithms for reasoning about types at compile time—or even at runtime in a system with reified generics—and
  • a framework for model loading, that is, for loading a model representing the types in a program or module from binary artifacts, source artifacts, or anything else—this is a necessary feature for incremental compilation and modularity.

Why might this be interesting to someone outside the Ceylon team? Well, if you're thinking of creating a programming language with nominal typing, you could consider reusing ceylon-model in your own language, saving yourself the problem of implementing all the hairy typing algorithms that we've already spent years getting right. That way, you can concentrate on the stuff that's really different about your language, that is, the syntax, the compiler backend, the runtime semantics, and the basic language types.

ceylon-model implements an extremely complete type system, including all the following features:

  • nominal typing with single-inheritance of classes and multiple-inheritance of interfaces,
  • bounded parametric polymorphism (generics with constraints),
  • sum types,
  • union and intersection types, with sophisticated reasoning about principal types and canonicalization,
  • reasoning about subtyping,
  • reasoning about coverage and disjoint types,
  • principal instantiation inheritance (more powerful than the single-instantiation inheritance found in other languages with subtyping and generics),
  • declaration-site and use-site variance,
  • type constructor polymorphism (higher-kinded types),
  • higher-rank / impredicative polymorphism,
  • type aliases including generic type aliases,
  • self types, and
  • some support for overloading.

It doesn't include any functionality for type inference, flow-sensitive typing, or pattern matching, since in Ceylon that stuff is implemented within the typechecker, and anyway your language probably has different rules to ours.

It seems to me that unless your type system is based on structural typing or dependent types, ceylon-model is an awesome starting point. It might already have everything you need, and, even if it doesn't, it provides a framework that would let you incorporate other features into the type system. For example, I imagine that primitively-defined function, array, tuple, or record types would be easy to incorporate.

We're already reusing this model between the typechecker, two compiler backends, two IDE projects, and the runtime, so we know its API is reasonable.

The only stumbling block is that the project is implemented in Java, so if you want to use it in a compiler written in a different language, you'll probably want to translate it to that language first. The code is pretty clean and straightforward, so that's probably not a massive hurdle to overcome.

The license for ceylon-model is Apache 2.0, so you can basically do what you like with it. But if you decide to use it in a project, please drop us a note, because it makes us happy to see people reuse our work.

Constructors in Ceylon

Since the earliest versions of Ceylon, we've supported a streamlined syntax for class initialization where the parameters of a class are listed right after the class name, and initialization logic goes directly in the body of the class.

class Color(shared Integer rgba) {

    assert (0 <= rgba <= #FFFFFFFF);

    function encodedValue(Integer slot)
            => rgba.rightLogicalShift(8*slot).and(#FF);

    shared Integer alpha => encodedValue(3);

    shared Integer red => encodedValue(2);
    shared Integer green => encodedValue(1);
    shared Integer blue => encodedValue(0);

    function hex(Integer int) => formatInteger(int, 16);

    string => "Color { \ 
               alpha=``hex(alpha)``, \ 
               red=``hex(red)``, \ 
               green=``hex(green)``, \ 
               blue=``hex(blue)`` }";

}

We can instantiate a class like this:

Color red = Color(#FFFF0000);

The ability to refer to parameters of the class directly from the members of the class really helps cut down on verbosity, and most of the time this is a really comfortable way to write code.

However, as we've seen over the past few years of writing Ceylon code, there are moments when we would really appreciate the ability to write a class with multiple initialization paths, something like constructors in Java, C#, or C++. To be clear, in the overwhelmingly common case—something more than 90% of classes, I would estimate—constructors are unnecessary and uncomfortable. But we still need a good solution for the remaining trickier cases.

A couple of especially compelling cases we ran into were:

  • the class Array in ceylon.language, which can be allocated with a list of elements, or with a size and a single element value, and
  • cloning copy constructors, used, for example, to implement HashMap.clone() and HashSet.clone().

Unfortunately, I've always found the design of constructors that Java and C# inherited from C++ to be a bit strange and inexpressive. So before I tell you what we've done about constructors in Ceylon 1.2, let me start by explaining what I think is wrong with constructors in Java.

What's wrong with constructors in Java?

As alluded above, the biggest problem with the constructor syntax in languages that borrow from C++ is that in the common case of a class with just one constructor, the parameters of that constructor aren't available in the body of the class, leading to awful code like the following:

class Point {
    public float x;
    public float y;
    public Point(float x, float y) {
        this.x = x;
        this.y = y;
    }
    public String toString() {
        return "(" + x + ", " + y + ")";
    }
}

This hurts. Fortunately, we've already made that pain go away in Ceylon.

class Point(shared Float x, shared Float y) {
    string => "(``x``, ``y``)";
}

So let's look at some additional issues with constructors in Java.

To begin with, the syntax is irregular. In C-like languages, the grammar for a declaration is:

Modifier* (Keyword|Type) Identifier OtherStuff

Constructors, strangely, don't conform to this general schema, having been bolted on later.

Second, the constructors of a class are all forced to have the same name. This seems like a quite bizarre restriction:

  • If they all have the same name, why not declare them with a keyword instead of an identifier? DRY, anyone?
  • It's a restriction that robs me of expressiveness. Instead of new ColorWithRGBAndAlpha(r,g,b,a), giving me a clue as to the semantics of the arguments, I write just new Color(r,g,b,a), and the reader is left guessing.
  • Constructors thus run into Java's totally broken support for overloading. I can't have a constructor that takes a List<Float> and another which takes a List<Integer>, since these two parameter types have the same erasure.
  • Constructor references (Class::new in Java) can be ambiguous, depending on the context.

Third, constructors aren't forced to initialize the instance variables of the class. All Java types have a "default" (zero or null) value, and if you forget to initialize an instance variable in Java, you'll get a NullPointerException, or, worse, an incorrect zero value at runtime. These problems most certainly belong to the class of problems that I expect a static type system to be able to detect, and, indeed, in other contexts Java does detect uninitialized variables.

Further note that this would be an even bigger problem in Ceylon, because most types don't have null as an instance, so there is no obvious "default" value.

As usual, my purpose here isn't to bash Java, but to justify why we've done things differently in Ceylon.

Named constructors and default constructors

By contrast, the newly introduced syntax for constructors in Ceylon is regular, expressive, and doesn't rely on overloading (which Ceylon doesn't support, except when interoperating with native Java code). Here's the basic syntax for a constructor:

new withFooAndBar(Foo foo, Bar bar)
        extends anotherConstructor(foo) {
    //do stuff
}

When the class to which the constructor belongs directly extends Basic, the extends clause is optional.

And here's an example of how it's used:

class Color {

    shared Integer rgba;

    //default constructor
    shared new (Integer rgba) {
        assert (0 <= rgba <= #FFFFFFFF);
        this.rgba = rgba;
    }

    //named constructor
    shared new withRGB(
        Integer red, Integer green, Integer blue, 
        Integer alpha = #FF) {
        assert (0 <= red <= #FF, 
                0 <= green <= #FF, 
                0 <= blue <= #FF);
        rgba = 
                alpha.leftLogicalShift(24) +
                red.leftLogicalShift(16) +
                green.leftLogicalShift(8) +
                blue;
    }

    //another named constructor
    shared new withRGBIntensities(
        Float red, Float green, Float blue, 
        Float alpha = 1.0) {
        assert (0.0 <= red <= 1.0, 
                0.0 <= green <= 1.0, 
                0.0 <= blue <= 1.0);
        function int(Float intensity) 
                => (intensity*#FF).integer;
        rgba = 
                int(alpha).leftLogicalShift(24) +
                int(red).leftLogicalShift(16) +
                int(green).leftLogicalShift(8) +
                int(blue);
    }

    function encodedValue(Integer slot)
            => rgba.rightLogicalShift(8*slot).and(#FF);

    shared Integer alpha => encodedValue(3);

    shared Integer red => encodedValue(2);
    shared Integer green => encodedValue(1);
    shared Integer blue => encodedValue(0);

    function hex(Integer int) => formatInteger(int, 16);

    string => "Color { \ 
               alpha=``hex(alpha)``, \ 
               red=``hex(red)``, \ 
               green=``hex(green)``, \ 
               blue=``hex(blue)`` }";

}

Constructor declarations are indicated with the keyword new, and have a name which begins with a lowercase letter. We call a constructor like this:

Color red = Color.withRGBIntensities(1.0, 0.0, 0.0);

Or, using named arguments, like this:

Color red = 
    Color.withRGBIntensities { 
        red = 1.0; 
        green = 0.0;
        blue = 0.0;
    };

A function reference to a constructor has a natural syntax:

Color(Float,Float,Float) createColor
        = Color.withRGBIntensities;

A class may have a constructor, called the default constructor, with no name. Instantiation via the default constructor works just like instantiation of a class without constructors:

Color red = Color(#FFFF0000);

A class isn't required to have a default constructor, but most classes will have one.

Why do we need the concept of a default constructor? Well, because a class with constructors may not have a parameter list. Wait, let's stop and reemphasize that caveat, because it's an important one:

You can't add constructors to a class with a parameter list! Instead, you must first rewrite the class to use a "default constructor" for its "normal" initialization logic.

However, a class with constructors may still have initialization logic directly in the body of the class. For example, the following is perfectly legal:

class Color {

    shared Integer rgba;

    shared new (Integer rgba) {
        this.rgba = rgba;
    }

    shared new withRGB(
        Integer red, Integer green, Integer blue, 
        Integer alpha = #FF) {
        assert (0 <= red <= #FF, 
                0 <= green <= #FF, 
                0 <= blue <= #FF);
        rgba = 
                alpha.leftLogicalShift(24) +
                red.leftLogicalShift(16) +
                green.leftLogicalShift(8) +
                blue;
    }

    shared new withRGBIntensities(
        Float red, Float green, Float blue, 
        Float alpha = 1.0) {
        assert (0.0 <= red <= 1.0, 
                0.0 <= green <= 1.0, 
                0.0 <= blue <= 1.0);
        function int(Float intensity) 
                => (intensity*#FF).integer;
        rgba = 
                int(alpha).leftLogicalShift(24) +
                int(red).leftLogicalShift(16) +
                int(green).leftLogicalShift(8) +
                int(blue);
    }

    //executed for every constructor
    assert (0 <= rgba <= #FFFFFFFF);

    //other members
    ...
}

The last assert statement is executed every time the class is instantiated.

Value constructors

The constructors we've just seen are termed callable constructors in the language specification, because they declare parameters. We also have value constructors, which don't declare parameters, and which are executed once, the first time the constructor is evaluated in the context to which the class belongs. For a toplevel class, a value constructor is a singleton.

class Color {

    shared Integer rgba;

    //default constructor
    shared new (Integer rgba) {
        this.rgba = rgba;
    }

    //value constructors

    shared new white {
        rgba = #FFFFFFFF;
    }

    shared new red {
        rgba = #FFFF0000;
    }

    shared new green {
        rgba = #FF00FF00;
    }

    shared new blue {
        rgba = #FF0000FF;
    }

    //etc
    ...

}

We can use a value constructor like this:

Color red = Color.red;

Sometimes the constructors of a class share certain initialization logic. If that logic doesn't depend upon the parameters of the class, we can put it directly in the body of the class, as we've already seen. But if it does depend upon the parameters, we often need to take a different tack.

Constructor delegation

To facilitate reuse of initialization logic within a class, it's useful to allow a constructor to delegate to a different constructor of the same class. For this, we use the extends clause:

Integer int(Float intensity) 
        => (intensity*#FF).integer;

class Color {

    shared Integer rgba;

    shared new (Integer rgba) {
        this.rgba = rgba;
    }

    //value constructors delegate to the default constructor

    shared new white 
            extends Color(#FFFFFFFF) {}

    shared new red 
            extends Color(#FFFF0000) {}

    shared new green 
            extends Color(#FF00FF00) {}

    shared new blue 
            extends Color(#FF0000FF) {}

    shared new withRGB(
        Integer red, Integer green, Integer blue, 
        Integer alpha = #FF) {
        assert (0 <= red <= #FF, 
                0 <= green <= #FF, 
                0 <= blue <= #FF);
        rgba = 
                alpha.leftLogicalShift(24) +
                red.leftLogicalShift(16) +
                green.leftLogicalShift(8) +
                blue;
    }

    shared new withRGBIntensities(
        Float red, Float green, Float blue, 
        Float alpha = 1.0) 
            //delegate to other named constructor
            extends withRGB(int(red), 
                            int(green), 
                            int(blue),
                            int(alpha)) {}

    assert (0 <= rgba <= #FFFFFFFF);

    //other members
    ...
}

A constructor may only delegate to a constructor defined earlier in the body of the class.

Note that we've written extends Color(#FFFFFFFF) to delegate to the default constructor of Color.

Definite initialization and partial constructors

An ordinary constructor like Color.withRGB() or Color.withRGBIntensities() has a responsibility to initialize every value reference belonging to the class that is either:

  • shared, or
  • used ("captured") by another member of the class.

The Ceylon compiler enforces this responsibility at compile time and will reject the code unless it can prove that every value reference has been fully initialized, either:

  • by every ordinary constructor, or
  • in the body of the class itself.

This rule would make it difficult to factor out common logic contained in constructors if it weren't for the notion of a partial constructor. For a partial constructor, the requirement that all references are fully initialized is relaxed. But a partial constructor may not be used to directly instantiate the class. It may only be called from the extends clause of another constructor of the same class. A partial constructor is indicated by the abstract annotation:

Here's a contrived example:

class ColoredPoint {
    shared Point point;
    shared Color color;

    //partial constructor
    abstract new withColor(Color color) {
        this.color = color;
    }

    shared new forCartesianCoords(Color color, 
        Float x, Float y) 
            //delegate to partial constructor
            extends withColor(color) {
        point = Point.cartesian(x, y);
    }

    shared new forPolarCoords(Color color, 
        Float r, Float theta) 
            //delegate to partial constructor
            extends withColor(color) {
        point = Point.polar(r, theta);
    }

    ...

}

So far, we've only seen how to delegate to another constructor of the same class. But when a class extends a superclass, every constructor must ultimately delegate—perhaps indirectly—to a constructor of the superclass.

Constructors and inheritance

A class may extend a class with constructors, for example:

class ColoredPoint2(color, Float x, Float y) 
        extends Point.cartesian(x, y) {
    shared Color color;
    ...
}

A more interesting case is when the extending class itself has constructors:

class ColoredPoint extends Point {
    shared Color color;

    shared new forCartesianCoords(Color color, 
        Float x, Float y)
            //delegate to Point.cartesian()
            extends cartesian(x, y) {
        this.color = color;
    }

    shared new forPolarCoords(Color color, 
        Float r, Float theta)
            //delegate to Point.polar()
            extends polar(r, theta) {
        this.color = color;
    }

    ...
}

In this example, the constructors delegate directly to constructors of the superclass.

Ordering of initialization logic

With constructor delegation, together with initialization logic defined directly in the body of the class, you must be imagining that initialization can get pretty convoluted.

Well, no. The general principle of initialization in Ceylon remains unchanged: initialization always flows from top to bottom, allowing the typechecker to verify that every value is initialized before it is used.

Consider this class:

class Class {
    print(1);
    abstract new partial() {
        print(2);
    }
    print(3);
    shared new () extends partial() {
        print(4);
    }
    print(5);
    shared new create() extends partial() {
        print(6);
    }
    print(7);
}

Calling Class() results in the following output:

1
2
3
4
5
7

Calling Class.create() results in this output:

1
2
3
5
6
7

All quite orderly and predictable!

In the comments, David Hagen suggests a way of understanding how constructor delegation and ordering works.

Using value constructors to emulate enums

You might already have noticed that if a class only has value constructors, it's very similar to a Java enum.

shared class Day {
    shared actual String string;
    abstract new named(String name) {
        string = name;
    } 
    shared new sunday extends named("SUNDAY") {}
    shared new monday extends named("MONDAY") {}
    shared new tuesday extends named("TUESDAY") {}
    shared new wednesday extends named("WEDNESDAY") {}
    shared new thursday extends named("THURSDAY") {}
    shared new friday extends named("FRIDAY") {}
    shared new saturday extends named("SATURDAY") {}
}

We therefore let you use value constructors in a switch statement. The ability to switch over value constructors can be viewed as an extension of the pre-existing facility for switching over literal values of types like Integer, Character, and String.

Day day = ... ;
String message;
switch (day)
case (Day.friday) { 
    message = "thank god"; 
}
case (Day.sunday | Day.saturday) { 
    message = "we could be having this conversation with beer"; 
}
else {
    message = "need more coffee";
}
print(message);

But when we noticed the similarity to Java enums, we decided to take the idea a little further than what Java offers here. Java enumerations are open, in the sense that a switch statement which covers all enumerated values of an enum must still include a default case to be considered exhaustive by definite assignment and definite return checking.

But in Ceylon we also have the notion of closed enumerated types, where the "default" case—the else clause of Ceylon's switch statement—may be omitted, but the whole switch will still be considered exhaustive.

Note: API designers should be careful to only "close" an enumeration that won't grow new value constructors in future revisions of the API. Day and Boolean are good examples of closed enumerations. ErrorType is an examples of an open enumeration.

If we add an of clause to Day, it will be considered a closed enumeration.

shared class Day 
        of sunday | monday | tuesday | wednesday | 
           thursday | friday | saturday {
    shared actual String string;
    abstract new named(String name) {
        string = name;
    } 
    shared new sunday extends named("SUNDAY") {}
    shared new monday extends named("MONDAY") {}
    shared new tuesday extends named("TUESDAY") {}
    shared new wednesday extends named("WEDNESDAY") {}
    shared new thursday extends named("THURSDAY") {}
    shared new friday extends named("FRIDAY") {}
    shared new saturday extends named("SATURDAY") {}
}

Now Ceylon will consider a switch statement that covers all the value constructors as an exhaustive switch, and we can write the following code without needing an else clause:

Day day = ... ;
String message;
switch (day)
case (Day.monday | Day.tuesday | 
      Day.wednesday | Day.thursday) { 
    message = "need more coffee"; 
}
case (Day.friday) { 
    message = "thank god"; 
}
case (Day.sunday | Day.saturday) { 
    message = "we could be having this conversation with beer"; 
}
print(message);

This is an alternative to the existing pattern for emulating Java-style enums.

A final word

The design I've presented here is the final result of a thought process that spanned five years. I personally found this to be a surprisingly difficult problem to address in a principled way. For a time, I hoped to not even need to have constructors in the language at all. But ultimately I'm very happy with the end result. It seems to me not only principled and consistent with the rest of the language, but also very expressive and powerful.

A little more about type functions

My previous post about type functions generated some interesting discussion, here, and on reddit.

Therefore, I think it's worth tying up several loose ends from the earlier post. So here's a collection of further observations about type functions.

Warning: this post addresses some very technical details of how we've incorporated type functions into Ceylon's type system. Don't even bother continuing any further until you've read the earlier post.

The "why" of this

The most well-known application of higher-order generics is for representing high-level abstractions of container types: functors, monads, and friends. That's not what motivated me to experiment with higher-order generics in Ceylon, and as a practical matter I still have little interest in these abstractions, though they're fun to play with.

No, what bothered me was not that Ceylon's type system wasn't powerful enough to represent Functor or Monad, but rather that Ceylon's type system wasn't powerful enough to represent Ceylon. I'll show you what I mean by that in just a second. But first I want to argue that type functions can be seen as a regularization of the language.

From a purely syntactic point of view, it's always seemed a little strange that every sort of type declaration in Ceylon can have a list of type parameters, except for a type parameter itself. Furthermore, it's noticeable that I can take a reference, or meta reference to any program element unless it has a list of type parameters. Now, such restrictions might seem reasonable if a parameterized type parameter or a reference to a generic declaration were not meaningful notions at a fundamental level. But they clearly are meaningful, and even at least somewhat useful.

Exactly how useful is a different question—the jury's still out, at least in my mind. And so perhaps we'll ultimately conclude that this stuff isn't worth its weight. The weight being, substantially, the time it takes for programmers new to Ceylon to understand this stuff.

In the original post, I showed how type functions were necessary to represent the type of a reference to a generic function. One place where this problem arises is with one of Ceylon's more unique features: its typesafe metamodel.

A use case for generic function reference types

Usually, I can obtain a metamodel object that represents a class or function and captures its type signature. For example, the expression `String` evaluates to a metamodel object that captures the type and initializer parameters of the class String:

Class<String,[{Character*}]> stringClass = `String`;

For a generic declaration, I can do a similar thing, as long as I'm prepared to nail down the type arguments. For example, I can write `Singleton<String>` to get a metamodel object representing the class Singleton after applying the type argument String:

Class<Singleton<String>,[String]> stringSingletonClass
        = `Singleton<String>`

But in Ceylon as it exists today, I can't obtain a typed metamodel object that represents just `Singleton`, because to represent the type of that metamodel object I would necessarily need a type function.

Now, with the new experimental support for type functions, the type of the expression `Singleton` could be <T> => Class<Singleton<T>,[T]>(), allowing code like this:

value singletonGenericClass = `Singleton`;
...
Class<Singleton<String>,[String]> stringSingletonClass 
        = singletonGenericClass<String>();

That's just one example of how allowing references to generic functions makes Ceylon feel more "complete".

Two use cases for anonymous type functions

I get the impression that the "scariest" bit of what I've presented in the previous post is the notation for anonymous type functions. That is, the following syntax:

<X> => X
<X> => [X,X,X]
<X,Y> => X|Y
<T> given T satisfies Object => Category<T>(T*)

But I'm convinced that this notation is not really that hard to understand. The reason I assert this is because if I give each of these type functions a name, then most of you guys have no problem understanding them:

alias Identity<X> => X;
alias Triple<X> => [X,X,X];
alias Union<X,Y> => X|Y;
alias CategoryCreator<T> given T satisfies Object => Category<T>(T*);

But—one might reasonably enquire—why do we even need them, if the named versions are easier to read?

Well, we need them:

  1. in order to be able to denote the type of a reference to a generic function—remember, we don't have undenotable types in Ceylon—and
  2. to make it easy to partially apply a named type function like Map.

For example, we want to be able to write stuff like <T> => Map<String,T> when working with higher-order generics, thus turning a type function of two type parameters into a type function of one type parameter.

Are type functions "type types"?

One thing I should have made very clear, and forgot, is that type functions don't represent an additional meta level. In Ceylon's type system, type functions are types, in the very same sense that function are values in Ceylon and other modern languages.

There simply is no additional meta-type system for types in Ceylon. The closest thing we have to a "type type" is a generic type constraint, but that's an extremely impoverished sort of type type, since Ceylon provides no facilities at all to abstract over type constraints—I can't even assign an alias to a type constraint and reuse it by name.

Ceylon reasons about type constraints and assignability of types to type variables using hardcoded rules written primitively into the language spec and type checker, not by abstraction over the types of types.

Type functions and subtyping

But if a type function is a type, what are its subtyping relationships with other types and other type functions?

Well, first, recall that some type functions have instances: generic function references. We didn't want to introduce values into the language that aren't Objects, so we've declared that every type function is a subtype of Object. This preserves the useful property that our type system has a single root type Anything.

Next, recall that an ordinary function type is covariant in its return type, and contravariant in its parameter types. For example, the function type:

String(Object, Object)

is a subtype of:

Object(String, String)

Since if a function accepts two Objects and returns a String, then it's clearly also a function that accepts two Strings and returns an Object.

Given two function types with one parameter:

F(P)
G(Q)

Then F(P) is a subtype of G(Q) iff P is a supertype of Q and F is a subtype of G.

Similar rules apply to type functions. Consider two type functions of one type parameter:

alias A<X> given X satisfies U => F<X>
alias B<Y> given Y satisfies V => G<Y>

Then A is a subtype of B iff:

  • the upper bound U on X is a supertype of the upper bound V on Y, and
  • for any type T, F<T> is a subtype of G<T>.

That is to say, if A<X> accepts every type argument T that B<Y> accepts, and for each such T, the applied type A<T> is a subtype of the applied type B<T>, then we can soundly replace B with A in well-typed code.

(Of course, these rules generalize to type functions with multiple type parameters.)

Generic function types and subtyping

Now let's narrow our attention to consider only type functions that represent the types of generic functions. To make it easier, we'll consider generic functions of the following form, with just one type parameter and just one value parameter:

F<X> f<X>(P<X> p) given X satisfies U => ... ;

Here, F<X> is the return type, a type expression involving the type parameter X, and P<X> is the parameter type, which also involves X.

The type of this generic function—as we saw in the previous post— is the type function:

<X> given X satisfies U => F<X>(P<X>)

So let's consider two type functions of the general form we're considering:

alias A<X> given X satisfies U => F<X>(P<X>)
alias B<Y> given Y satisfies V => G<Y>(Q<Y>)

Then we see quickly that A is a subtype of B iff:

  • the upper bound U on X is a supertype of the upper bound V on Y, and
  • for any type T, the return type F<T> of A is a subtype of the return type G<T> of B, and the parameter type P<T> of A is a supertype of the parameter type Q<T> of B.

For example, this generic function type:

<X> => X&Object(X|Object)

is a subtype of this generic function type:

<X> given X satisfies Object => X(X)

Take a minute to convince yourself that this is correct intuitively.

(Again, these rules generalize naturally to functions with multiple type parameters and/or multiple value parameters.)

Type functions and type inference

When we call a first-order generic function in Ceylon, we don't usually need to explicitly specify type arguments. Instead, we can usually infer them from the value arguments of the invocation expression. For example, if we have this generic function:

List<Out> map<In,Out>(Out(In) fun, List<In> list) => ... ;

Then we can always safely infer In and Out, because there's a unique most-precise choice of type arguments:

value list = map(Integer.string, ArrayList { 10, 20, 30 });

In this example, we can safely infer that In is Integer, and Out is String, without any loss of precision.

Unfortunately, once higher-order generics come into play, inference of type functions is a much more ambiguous problem. Consider this second-order generic function, which abstracts the map() function away from the container type, by introducing the type function variable Box to represent the unknown container type:

Box<Out> fmap<Box,In,Out>(Out(In) fun, Box<In> box) 
        given Box<Element> { ... }

And now consider the following invocation of this function:

fmap(Integer.string, ArrayList { 10, 20, 30 })

What type should we infer for Element, and what type function for Box?

  • Integer and List?
  • Integer and Iterable?
  • Integer and ArrayList?
  • Integer and MutableList?
  • Integer and ListMutator?
  • Integer and Collection?
  • Object and Category?

In general, there might be several different reasonable choices, and no really good criteria for choosing between them. So in this case, we require that the type arguments be specified explicitly:

fmap<List,Integer,String>(Integer.string, ArrayList { 10, 20, 30 })

However, there is a pattern we can use to make type function inference possible. In this case, we could define the following interface:

interface Functor<Box,Element> given Box<Value> { ... }

Now let's imagine that our ArrayList class inherits Functor, so that any ArrayList<Element> is a Functor<List,Element>.

And let's redefine fmap() like this:

  Box<Out> fmap<Box,In,Out>(Out(In) fun, Functor<Box,In> box) 
        given Box<Element> { ... }

Then, finally, for the same instantiation expression we had before:

fmap(Integer.string, ArrayList { 10, 20, 30 })

we can now unambiguously infer that Box is List and In is Integer, since those types are encoded as type arguments to Functor in the principal supertype instantiation of Functor for the expression ArrayList { 10, 20, 30 }.

Instances of type functions

In the original post, we noted that a type function that returns a function type is the type of a generic function. For example, the type function:

<X> given X satisfies Object => X(X)

Is the type of this generic function:

X f<X>(X x) given X satisfies Object => x;

But then it's natural to enquire: if some type functions are the types of generic functions, what are the other type functions the types of?

Well, if you reflect for a second on the relationship between types and values, I think you'll see that they must be the types of generic value declarations. That is, this type function:

<X> => List<X>

would be the type of this value, written in pseudo-Ceylon:

List<X> list<X> => ... ;

That is, when presented with a type X, list<X> evaluates to a List<X>.

Of course there are no actual generic values in Ceylon, the closest thing we have is a nullary generic function:

List<X> list<X>() => ... ;

whose type is actually:

<X> => List<X>()

There's no plan to ever introduce generic values into Ceylon, so types like <X> => List<X> have no instances. They're useful only as type arguments to higher-order generic types.

Type functions and principal typing

Finally, let's address a rather technical point.

A very important property of Ceylon's type system is the ability to form a principal instantiation of any union or intersection of different instantiations of a generic type.

For, example, for the covariant type List<T>:

  • List<X> | List<Y> has the principal instantiation List<X|Y>
  • List<X> & List<Y> has the principal instantiation List<X&Y>

For the contravariant type Comparable<T>:

  • Comparable<X> | Comparable<Y> has the principal instantiation Comparable<X&Y>
  • Comparable<X> & Comparable<Y> has the principal instantiation Comparable<X|Y>

Naturally, it's important that we can do the same tricks for intersections and unions of instantiations of higher-order types. As at happens, this works out extremely naturally, using the following identities:

  • <<X> => F<X>> | <<Y> => G<Y>> is a subtype of <T> => F<T> | G<T>, and
  • <<X> => F<X>> & <<Y> => G<Y>> is a subtype of <T> => F<T> & G<T>.

Thus, if we have the following covariant second-order type:

interface Functor<out Element, out Container>
        given Container<E> { ... }

Then we obtain the following principal instantiations:

  • Functor<E,A> | Functor<F,B> has the principal instantiation Functor<E|F,<T> => A<T>|B<T>>, and
  • Functor<E,A> & Functor<F,B> has the principal instantiation Functor<E&F, <T> => A<T>&B<T>>.

You don't need to know these identities when you're writing code in Ceylon, but it's nice to know that type functions don't undermine the basic algebraic properties which are the reason Ceylon's type system is so nice to work with. Everything fits together here, without weird holes and corner cases.

A word about "rank"

In the previous post I described our support for references to generic functions as "arbitrary rank" polymorphism, which prompted a short discussion about how to measure the rank of a generic type. I now think that the term "rank" probably isn't very meaningful in connection to Ceylon, since there's nothing special about our function types: they're just instantiations of the perfectly ordinary generic type Callable. As suggested by Kamatsu on reddit, it seems to me that a better word to use is probably "impredicative".

Programming with type functions in Ceylon

I've recently been working on some experimental new features of Ceylon's already extremely powerful type system. What I'm going to explain in this post is known, technically, as:

  • higher order generic types (or type constructor polymorphism, or higher kinds), and
  • higher rank generic types (or rank-N polymorphism).

Please don't worry about this jargon salad. (And please don't try to google any of those terms, because the explanations you'll find will only make these pretty straightforward notions seem confusing.) Stick with me, and I'll do my best to explain the concepts in intuitive terms, without needing any of the above terminology.

But first, let's start with pair of examples that illustrate a motivating problem.

This function simply returns its argument:

Object pipeObject(Object something) => something;

This function adds Floats:

Float addFloats(Float x, Float y) => x+y;

Modern programming language let us treat either of these functions as a value and pass it around the system. For example, I can write:

Object(Object) pipeObjectFun = pipeObject;

Or:

Float(Float,Float) addFloatsFun = addFloats;

Where Object(Object) and Float(Float,Float) represent the types of the functions, and pipeObject and addFloats are a references to the functions. So far so good.

But sometimes it's useful to have a function that abstracts away from the concrete data type using generics. We introduce a type variable, to represent the "unknown" type of thing we're dealing with:

Any pipe<Any>(Any anything) => anything;

And:

Number add<Number>(Number x, Number y)
            given Number satisfies Summable<Number>
        => x+y;

Sometimes, as in add(), the unknown type is constrained in some way. We express this using a type constraint:

given Number satisfies Summable<Number>

This is Ceylon's way of denoting that Number may only be a type which is a subtype of the upper bound Summable<Number>, i.e. that it is a type to which we can apply the addition operator +.

Now, what if I want to pass around a reference to this function. Well, one thing I can typically do is nail down the unknown type to a concrete value:

Object(Object) pipeObjectFun = pipe<Object>;

Or:

Float(Float,Float) addFloatsFun = add<Float>;

But that's a bit disappointing—we've lost the fact that add() was generic. Now, in object-oriented languages it's possible to define the generic function as a member of a class, and pass an instance of the class around the system. This is called the strategy pattern. But it's inconvenient to have to write a whole class just to encapsulate a function reference.

It would be nicer to be able to write:

TypeOfPipe pipeFun = pipe;

And:

TypeOfAdd addFun = add; 

Where TypeOfPipe and TypeOfAdd are the types of the generic functions. The problem is that there's no way to represent these types within the type system of most languages. Let's see how we can do that in Ceylon 1.2.

Introducing type functions

I promised to avoid jargon, and avoid jargon I will. The only bit of terminology we'll need is the idea of a type function. A type function, as its name implies, is a function that accepts zero or more types, and produces a type. Type functions might seem exotic and abstract at first, but there's one thing that will help you understand them:

You already know almost everything you need to know about type functions, because almost everything you know about ordinary (value) functions is also true of type functions.

If you stay grounded in the analogy to ordinary functions, you'll have no problems with the rest of this post, I promise.

So, we all know what an ordinary function looks like:

function addFloats(Float x, Float y) => x+y;

Let's break that down, we have:

  • the function name, and list of parameters, to the left of a fat arrow, and
  • an expression on the right of the fat arrow.

A type function doesn't look very different. It has a name and (type) parameters on the left of a fat arrow, and a (type) expression on the right. It looks like this:

alias Pair<Value> => [Value,Value];

Aha! This is something we've already seen! So a type function is nothing more than a generic type alias! This particular type function accepts a type, and produces a tuple type, a pair, whose elements are of the given type.

Actually not every type function is a type alias. A generic class or interface is a type function. For example:

interface List<Element> { ... }

This interface declaration accepts a type Element, and produces a type List<Element>, so it's a type function.

I can call a function by providing values as arguments:

pipe("hello")
add(1.0, 2.0)

These expressions produce the values "hello" and 3.0.

I can apply a type function by providing types as arguments:

Pair<Float>

This type expression produces the type [Float,Float], by applying the type function Pair to the type argument Float.

Similarly, I can apply the type function List:

List<String>

This type expression just literally produces the type List<String>, by applying the type function List to the type argument String.

On the other hand, I can take a reference to a value function by just writing its name, without any arguments, for example, pipe, or add. I can do the same with type functions, writing Pair or List.

Back in the quotidian world of ordinary values, I can write down an anonymous function:

(Float x, Float y) => x+y

In the platonic world of types, I can do that too:

<Value> => [Value,Value]

Finally, an ordinary value function can constrain its arguments using a type annotation like Float x. A type function can do the same thing, albeit with a more cumbersome syntax:

interface List<Element> 
        given Element satisfies Object 
{
    //define the type list
    ...
}

Even an anonymous type function may have constraints:

<Value> given Value satisfies Object 
        => [Value,Value]

Jargon watch: most people, including me, use the term type constructor instead of "type function".

Type functions are types

Now we're going to make a key conceptual leap. Recall that in modern languages, functions are treated as values. I can

  1. take a function, and assign it to a variable, and then
  2. call that variable within the body of the function.

For example:

void secondOrder(Float(Float,Float) fun) {
    print(fun(1.0, 2.0));
}

//apply to a function reference
secondOrder(addFloats);

//apply to an anonymous function
secondOrder((Float x, Float y) => x+y);

We call functions which accept functions higher order functions.

Similarly, we're going to declare that type functions are types. That is, I can:

  1. take a type function and assign it to a type variable, and then
  2. apply that type variable in the body of the declaration it parameterizes.

For example:

interface SecondOrder<Box> given Box<Value> {
    shared formal Box<Float> createBox(Float float);
}

//apply to a generic type alias
SecondOrder<Pair> something;

//apply to a generic interface
SecondOrder<List> somethingElse;

//apply to an anonymous type function
SecondOrder<<Value> => [Value,Value]> somethingScaryLookin;

The type constraint given Box<Value> indicates that the type variable Box accepts type functions with one type argument.

Now, there's one thing to take note of here. At this point, the notion that type functions are types is a purely formal statement. An axiom that defines what kinds of types I can write down and expect the typechecker of my programming language to be able to reason about. I have not—yet—said that there are any actual values of these types!

Jargon watch: the ability to treat a type function as a type is called higher order generics.

The type of a generic function is a type function

Let's come back to our motivating examples:

Any pipe<Any>(Any anything) => anything;

Number add<Number>(Number x, Number y)
            given Number satisfies Summable<Number>
        => x+y;

If you squint, you'll see that these are actually functions with two parameter lists. The first parameter lists are:

<Any>

And:

<Number> given Number satisfies Summable<Number>

Which both accept a type. The second parameter lists are:

(Any anything)

And:

(Number x, Number y)

Therefore, we can view each generic function as a function that accepts a type and produces an ordinary value function. The resulting functions are of type Any(Any) and Value(Value,Value) respectively.

Thus, we could write down the type of our first generic function pipe() like this:

<Any> => Any(Any)

And the type of add() is:

<Number> given Number satisfies Summable<Number>
        => Number(Number,Number)

Phew. That looks a bit scary. But mainly because of the type constraint. Because generic function types like this are pretty verbose, we can assign them aliases:

alias AdditionLikeOperation
        => <Number> given Number satisfies Summable<Number>
                => Number(Number,Number);

Or, equivalently, but more simply:

alias AdditionLikeOperation<Number> 
        given Number satisfies Summable<Number>
                => Number(Number,Number);

That was the hard part—we're almost done.

References to generic functions

Now we can use these types as the types of references to generic functions:

<Any> => Any(Any) pipeFun = pipe;

AdditionLikeOperation addFun = add;

And we can apply these function references by providing type arguments:

String(String) pipeString = pipeFun<String>;
Object(Object) pipeObject = pipeFun<Object>;

Float(Float,Float) addFloats = addFun<Float>;
Integer(Integer,Integer) addInts = addFun<Integer>;

Or, alternatively, we can just immediately apply the generic function references to value arguments, and let Ceylon infer the type arguments, just as it usually does when you call a function directly:

String hi = pipeFun("hello");
Integer zero = pipeFun(0);

Float three = addFun(1.0, 2.0);
String helloWorld = addFun("Hello", "World");

And now we've solved the problem posed at the beginning!

Now for the kicker: the types <Any> => Any(Any) and AdditionLikeOperation are both type functions. Indeed, the type of any generic function is a type function of this general form:

<TypeParameters> => ReturnType(ParameterTypes)

Similarly, every type function of this general form is the type of some generic function.

So we've now shown that some type functions are not only types, they're the types of values—the values are references to generic functions like pipe and add.

Jargon watch: the ability to treat a generic function as a value is called higher rank generics.

Abstraction over generic functions

Finally, we can use all this for something useful. Let's consider a scanner library that is abstracted away from all of:

  • the character type,
  • the token type,
  • the kind of container that tokens occur in.

Then we might have a scan() function with this sort of signature:

"Tokenize a stream of characters, producing
 a stream of tokens."
Stream<Token> scan<Char,Token,Stream>
        (grammar, characterStream, newToken, newStream)
            //Note: Stream is a reference to a type function!
            given Stream<Element> satisfies {Element*} {

    //parameters:

    "The token grammar."
    Grammar grammar;

    "The character stream to tokenize."
    Stream<Char> characterStream;

    "Constructor for tokens, accepting a
     substream of characters."
    Token newToken(Stream<Char> chars);

    "Generic function to construct a stream
     of characters or tokens."
     //Note: newStream is a reference to a generic function!
     Stream<Elem> newStream<Elem>({Elem*} elements);

    //implementation:

    Stream<Token> tokenStream;

    //do all the hard work
    ...

    return tokenStream;
}

Here:

  • Char is the unknown character type,
  • Token is the unknown token type,
  • Stream is a type function representing an unknown container type that can contain either characters or tokens,
  • newToken is a function that accepts a substream of characters and creates a Token,
  • characterStream is a stream of characters, and, most significantly,
  • newStream is a generic function that constructs streams of any element type, used internally to create streams of characters as well as tokens.

We could use this function like this:

//Let's use String as the token type, Character as the
//character type, and Iterable as the stream type.

//input a stream of characters
{Character*} input = ... ;

//and some token grammar
Grammar grammar = ... ;

//get back a stream of Strings
{String*} tokens =
        scan<Character, String, Iterable>
            (grammar, input, String,
                    //Note: a generic anonymous function! 
                    <Elem>({Elem*} elems) => elems);

Or like this:

//use LinkedList as the stream type
import ceylon.collection { LinkedList }

//we don't need Unicode support, so let's use
//Ceylon's 8-bit Byte as our character type
alias Char => Byte;

//define our own token type
class BasicToken(LinkedList<Char> charList) {
    string => String { for (b in charList) 
                       b.unsigned.character };
}

//input a linked list of characters
LinkedList<Char> input = ... ;

//and some token grammar
Grammar grammar = ... ;

//get back a linked list of BasicTokens
LinkedList<BasicToken> tokens =
        scan<Char, BasicToken, LinkedList>
            (grammar, input, BasicToken,
                    //Note: a generic function ref! 
                    LinkedList);

As you can see, our parsing algorithm is now almost completely abstracted away from the concrete types we want to use!

Compiling this code

In Ceylon 1.2, programming with type functions is an experimental feature that only works in conjunction with the JavaScript backend. So you can run code that uses type functions on a JavaScript virtual machine, but not on the JVM. This is something we're inviting you to play with to see if the whole community agrees it is useful. But right now it's not covered in the language specification, and it's not supported by the Java backend.

The typechecker itself supports extremely sophisticated reasoning about higher order and higher rank types. Type functions are fully integrated with Ceylon's powerful system of subtype polymorphism, including with union and intersection types, and with type inference and type argument inference. There's even limited support for type function inference! And there's no arbitrary upper limits here; not only rank-2 but arbitrary rank types are supported.

If there's enough interest, I'll cover that material in a future post.

Unique approach to observer/observable pattern in Ceylon

The essence of the famous observer/observable pattern is that you have an observable object that produces events of various kinds, and one or more observer objects that register themselves as interested in notification when these events occur.

Of course, we represent each kind of event as a type, usually a class, though nothing prevents us from using an interface type as an event type.

For example:

class Started() {}
class Stopped() {}

An event type may even be generic:

class Created<out Entity>
        (shared Entity entity) 
        given Entity satisfies Object {
    string => "Created[``entity``]";
}

class Updated<out Entity>
        (shared Entity entity) 
        given Entity satisfies Object {
    string => "Updated[``entity``]";
}

class Deleted<out Entity>
        (shared Entity entity) 
        given Entity satisfies Object {
    string => "Deleted[``entity``]";
}

Of course, we have powerful mechanisms for abstracting over event types, for example:

alias Lifecycle<Entity> 
        given Entity satisfies Object
        => Created<Entity> |
           Updated<Entity> |
           Deleted<Entity>;

An observer, usually, is in essence nothing more than a function that accepts a certain type of event as a parameter.

For example, this anonymous function observes the creation of Users:

(Created<User> userCreated) 
        => print("new user created: " + userCreated.entity.name)

This anonymous function observes lifecycle events of any kind of entity:

(Lifecycle<Object> event) 
        => print("something happened: " + event)

Union and intersection types give us a nice way to express conjunction and disjunction of event types:

void (Created<User>|Deleted<User> userEvent) {
    switch (userEvent)
    case (is Created<User>) {
        print("user created: " + userEvent.entity.name);
    }
    case (is Deleted<User>) {
        print("user deleted: " + userEvent.entity.name);
    }
}

Now here's where we can do something really cute. Typically, in other languages, the observable object provides various observer registration operations, one for each kind of event the object produces. We're going to define a generic class Observable that works for any event type, and uses reified generics to map events to observer functions.

shared class Observable<in Event>() 
        given Event satisfies Object {
    ...
}

The type parameter Event captures the various kinds of events that this object produces, for example, an Observable<Lifecycle<User>> produces events of type Created<User>, Updated<User>, and Deleted<User>.

We need a list to store observers in:

value listeners = ArrayList<Anything(Nothing)>();

Here, Anything(Nothing) is the supertype of any function with one parameter.

The addObserver() method registers an observer function with the Observable:

shared void addObserver<ObservedEvent>
        (void handle(ObservedEvent event))
        given ObservedEvent satisfies Event
        => listeners.add(handle);

This method only accepts observer functions for some subset of the events actually produced by the Observable. This constraint is enforced by the upper bound given ObservedEvent satisfies Event.

The raise() method produces an event:

shared void raise<RaisedEvent>(RaisedEvent event)
        given RaisedEvent satisfies Event
        => listeners.narrow<Anything(RaisedEvent)>()
            .each((handle) => handle(event));

Again, the upper bound enforces that this method only accepts event objects that are of an event type produced by the Observable.

This method uses the new narrow() method of Iterable in Ceylon 1.2 to filter out observer functions that don't accept the raised event type. This method is implemented using reified generics. Here's its definition in Iterable<Element>:

shared default {Element&Type*} narrow<Type>() 
        => { for (elem in this) if (is Type elem) elem };

That is, if we have a stream of Elements, and we call narrow<Type>(), explicitly passing an arbitrary type Type, then we get back a stream of all elements of the original stream which are instances of Type. This is, naturally, a stream of Element&Types.

Now, finally, if we define an instance of Observable:

object userPersistence 
        extends Observable<Lifecycle<User>>() {

    shared void create(User user) {
        ...
        //raise an event
        raise(Created(user));
    }

    ...
}

Then we can register observers for this object like this:

//observe User creation events
userPersistence.addObserver(
        (Created<User> userCreated) 
        => print("new user created: " + userCreated.entity.name));

//observe User creation and deletion events
userPersistence.addObserver(
        void (Created<User>|Deleted<User> userEvent) {
    switch (userEvent)
    case (is Created<User>) {
        print("user created: " + userEvent.entity.name);
    }
    case (is Deleted<User>) {
        print("user deleted: " + userEvent.entity.name);
    }
});

Notice how with union and intersection types, subtyping, and variance, we find ourselves with a powerful expression language for specifying exactly which kinds of events we're interested in, in a typesafe way, right in the parameter list of the observer function.

For the record, here's the complete code of Observable:

shared class Observable<in Event>() 
        given Event satisfies Object {
    value listeners = ArrayList<Anything(Nothing)>();

    shared void addObserver<ObservedEvent>
            (void handle(ObservedEvent event))
            given ObservedEvent satisfies Event
            => listeners.add(handle);

    shared void raise<RaisedEvent>(RaisedEvent event)
            given RaisedEvent satisfies Event
            => listeners.narrow<Anything(RaisedEvent)>()
                .each((handle) => handle(event));

}

Finally, a caveat: the precise code above does not compile in Ceylon 1.1, because the narrow() method is new, and because of a fixed bug in the typechecker. But it will work in the upcoming 1.2 release of Ceylon.