Team blog

Why you might want to choose Ceylon

In a couple of days Ceylon 1.2 will be released. Ceylon 1.2 is out now, after a year of development. That’s exciting for us, but we think it would be interesting to summarize our thoughts about why you should be excited about Ceylon, and why you might consider it over other programming languages designed to run on the Java and JavaScript virtual machines.

Ceylon is a rather ambitious programming language, so sometimes when people ask us to summarize its advantages, it can be a bit difficult to know where to start!

A truly cross-platform language

Ceylon is a language that is equally at home on the JVM and on any JavaScript VM. Furthermore, implementation of a compiler backend for the Dart VM is already quite advanced, and work has begun on a native backend based on LLVM. The architecture of the compiler and tooling make addition of a new runtime environment, if not easy, at least straightforward.

Also important: Ceylon is rigorously defined by a complete language specification. The very act of writing a specification forces the language designer to really think through the semantics and corner cases, and acts as an essential reference as to what the behavior of the compiler should be.

Unlike other languages designed first for the JVM and then, belatedly, ported to JavaScript, Ceylon doesn’t feature JVM-specific numeric types with semantics that aren’t relevant to—or can’t be honored on—other VMs. And its language module isn’t polluted with dependencies to types like java.lang.Exception that don’t exist in other runtime environments. Instead, Ceylon’s language module completely abstracts the underlying runtime, and offers a set of elegant APIs that vastly improve on those available natively.

But this abstraction doesn’t come at the cost of lowest-common-denominator performance characteristics. On the contrary, you can expect Java-like performance when Ceylon executes on the JVM, Dart-like performance when Ceylon executes on the Dart VM, and JavaScript-like performance when Ceylon executes on a JavaScript VM like Node.js.

Nor does this abstraction limit the potential for interoperation with native code. Ceylon features excellent interoperation with Java and Maven, and it’s very easy to use most Java libraries in Ceylon. Similarly, Ceylon’s battery of dynamic constructs make interoperation with native JavaScript straightforward. With the new native functions and classes in Ceylon 1.2, it’s even possible to write a cross-platform module that interoperates with native Java and JavaScript code!

Finally, Ceylon's module system is already compatible with OSGi, with Maven, with Vert.x, with npm, and with requirejs. When Jigsaw finally arrives, we'll support it too.

This all amounts to an impressive engineering achievement, especially when taking into account the sophistication of the language itself.

Truly disciplined static typing

Chances are you have some experience writing code in a language with static typing. In combination with the right tooling, static typing makes code more robust, more understandable, and much more maintainable. But most languages go out of their way to include features and APIs which undermine their own static type system. A handful of languages such as ML and Haskell eschew this, offering a principled, disciplined approach to static typing. And programmers working with these languages report a curious thing: that their programs have a strange tendency to work first or second time they run them. Sure, it takes a little longer to produce a program that the compiler accepts, but once the compiler is satisfied, so many common bugs have already been eliminated, that the program is often already correct or at least almost correct.

Ceylon follows this same philosophy and, even though it’s a very different sort of language to ML, our experience is that our programs have the very same tendency to just work, almost immediately. You surely won’t believe this until you experience it yourself.

Ceylon’s type system itself is state-of-the-art, including generic types with both use-site and declaration-site covariance and contravariance, mixin (multiple) inheritance, principal instantiation inheritance, sum types, disjointness analysis, experimental support for higer-order and higher-rank generics, and, best of all, union and intersection types.

Union and intersection types

Ceylon was the first modern language to introduce a full implementation of union and intersection types, and the first language to realize that union and intersection types make flow-typing and type inference, especially generic type inference, very much simpler, more predictable, and more elegant. As of today, it’s still the language with the most sophisticated support for unions and intersections.

What’s most difficult to explain to folks who’ve never written any significant amount of code in Ceylon is just how powerful union and intersection types really are. Far from being an exotic construct that you encounter occasionally in tricky library code, they form an essential part of the everyday experience of writing code in Ceylon, and there is almost no significant Ceylon program which doesn’t use union types somewhere. It’s no exaggeration to say they will totally change the way you think about types.

Flow-sensitive typing

Ceylon was also the first modern language to embrace the notion of flow-sensitive typing, where the conditional logic in the body of a function affects the types inferred by the compiler. This approach eliminates a significant amount of verbosity and a significant number of runtime type casting errors. In combination with Ceylon’s powerful coverage and disjointness analysis, it also helps detect certain other kinds of logic errors.

The cleanest solution to the problem of null

The null value has perplexed language designers for decades. For a long time the “best” solution to the problem was an ML- or Haskell-style Maybe type, which offered type safety and eliminated the hated “null pointer” error. That solution works quite well for languages without subtyping, but it’s not the best approach for a language with subtyping and union types. Ceylon’s approach to null is just as type safe, but, with flow-sensitive typing, is more user-friendly, and it doesn’t require the overhead of separate Option-style wrapper objects. Unlike some other languages, Null isn’t a special case in Ceylon’s type system. Rather, it’s just one further example of the power of union types.

An elegant and powerful representation of tuple and function types

Like most other modern languages, Ceylon features tuples and function references. Unlike other languages Ceylon doesn’t represent these things as special cases in the type system, nor as an explosion of interface types, one for each tuple/function -arity. Instead, there is just one class Tuple, and one interface Callable which expresses function types in terms of tuple types. This approach allows the language to abstract over tuple or function -arity, and write higher-order functions that simply can’t be expressed in many other modern languages.

Reified generics

Along with Gosu, Ceylon is one of only two languages to offer reified generics on the JVM. And it’s the only language to offer reified generics on JavaScript virtual machines. You’ll sometimes run into attempts to “backsplain” the lack of reified generics in Java or other JVM languages: claims that reified generics perform badly, aren’t useful, or don’t work correctly with variance or with higher-order generics. Ceylon is an existence proof that none of these assertions is true.

The system of reified generics backs Ceylon's unique typesafe metamodel, which enables powerful runtime metaprogramming (reflection). It also makes debugging easier: you can see the runtime type arguments of every generic type or generic function in the Ceylon debugger!

A clean, efficient, highly readable syntax

A program written in idiomatic Ceylon is usually much more compact than the same program written in Java. But it’s also usually more readable. Ceylon’s syntax appears, at first glance, to be rather boring and conservative: it uses familiar (and cleaner) prefix type annotations instead of postfix types; it uses plain English keywords instead of cryptic strings of symbols; its naming convention eschews the use of abbreviations. When you look at a program written in Ceylon, you’ll probably understand more or less what it’s doing, even if you have only a passing familiarity with the language.

But that first impression misses a surprising amount of syntactic flexibility. Type inference, named arguments, comprehensions, tuples and sequences, “fat arrow” function definitions, anonymous functions, let, switch, if, case, and object expressions, flow-sensitive typing, and a very powerful facility for stream processing all work together to lend a level of expressiveness rarely seen outside of dynamic languages.

Operator overloading is a feature that, while useful, is widely abused in languages that offer it in unadulterated form. Instead of untrammelled operator overloading, Ceylon supports operator polymorphism, a more disciplined approach, where its operators have fixed, well-defined semantics, but are defined in terms of interfaces that user-written types may satisfy. Thus, you can have the + operator apply to your own class, but it always means some kind of addition-like operation.

We often don’t need to write down the types of things explicitly (type inference takes care of that), but when we do, Ceylon features a uniquely sophisticated expression syntax for types, along with type aliases, eliminating the verbosity of generic type declarations in other languages.

Awesome modularity and tooling

Ceylon’s suite of command line tools, including an incremental compiler, program runner, test runner, documentation compiler, code formatter, and WAR archive packager, are all accessible via the ceylon command, which features a plugin architecture, and excellent usability.

But Ceylon really comes into its own when authoring code using its powerful Eclipse-based IDE. Ceylon’s IDE has a feature set that easily surpasses other modern languages, and competes with much more mature languages like Java and C#. For those who prefer IntelliJ, an IntelliJ-based IDE is already in the works, with a first release planned in the next few months.

If you’ve ever spent time waiting for mvn to finish, you’ll immediately appreciate Ceylon’s powerful module architecture, which comes fully integrated into the command line tools and IDE. Modularity is a problem that is utterly central to the discipline of software engineering, and Ceylon, at least arguably, does it better than any other language out there. You don’t usually need to interact directly with the module repository architecture, since the command line compiler, IDE, and runtime all know how to fetch dependencies transparently.

Finally, Ceylon Herd is an fantastic tool that makes it ridiculously easy to share your work with other Ceylon developers.

Seamless web development

Ceylon is evolving into a compelling platform for web development, allowing you to reuse code written for a Java server on the JavaScript client and vice-versa, while still interoperating cleanly with native code on both sides. With our new serialization facility, you'll be able to transparently move objects between client and server. The Ceylon SDK comes with a HTTP server module based on Undertow, and a transactions module based on JBoss Transactions, but if you're looking for something "heavier", Ceylon modules can already be deployed on RedHat's OpenShift cloud platform, on WildFly, or on Vert.x 2 (with Vert.x 3 support coming soon). Alternatively you could use Ceylon in Node.js!

Developing web applications in Ceylon is actually enjoyable. With Ceylon, you'll spend much less time redeploying and/or refreshing things, or wrestling with inadequate tooling, and more time focused on writing elegant code.

A helpful and open-minded community

The Ceylon community is friendly and helpful, and you can usually get your questions answered quickly at just about any time of day. And we're extremely open to feedback, and don't mind explaining the reasoning behind decisions we've made. Of course we have the traditional mailing lists like any other open source project, but actually we're much more active on Gitter and on the GitHub issue tracking system. So come join us on our Gitter channel, and ask us anything!

If you're motivated to contribute, there's plenty of places to get involved. Some fairly significant parts of the Ceylon ecosystem were originally written by community contributors. Currently, the development of two new compiler backends is being driven by the community. Note that a whole lot of development, including the IntelliJ IDE, and the new compiler backends, is now being done in Ceylon, so you can learn Ceylon by contributing to the Ceylon project itself!

Why build your own type system?

In Ceylon 1.2 we've factored out the type system of Ceylon as an independent module, with minimal dependencies and a clean API. The ceylon-model project incorporates:

  • an extensible object-oriented model of the type system in Ceylon,
  • algorithms for reasoning about types at compile time—or even at runtime in a system with reified generics—and
  • a framework for model loading, that is, for loading a model representing the types in a program or module from binary artifacts, source artifacts, or anything else—this is a necessary feature for incremental compilation and modularity.

Why might this be interesting to someone outside the Ceylon team? Well, if you're thinking of creating a programming language with nominal typing, you could consider reusing ceylon-model in your own language, saving yourself the problem of implementing all the hairy typing algorithms that we've already spent years getting right. That way, you can concentrate on the stuff that's really different about your language, that is, the syntax, the compiler backend, the runtime semantics, and the basic language types.

ceylon-model implements an extremely complete type system, including all the following features:

  • nominal typing with single-inheritance of classes and multiple-inheritance of interfaces,
  • bounded parametric polymorphism (generics with constraints),
  • sum types,
  • union and intersection types, with sophisticated reasoning about principal types and canonicalization,
  • reasoning about subtyping,
  • reasoning about coverage and disjoint types,
  • principal instantiation inheritance (more powerful than the single-instantiation inheritance found in other languages with subtyping and generics),
  • declaration-site and use-site variance,
  • type constructor polymorphism (higher-kinded types),
  • higher-rank / impredicative polymorphism,
  • type aliases including generic type aliases,
  • self types, and
  • some support for overloading.

It doesn't include any functionality for type inference, flow-sensitive typing, or pattern matching, since in Ceylon that stuff is implemented within the typechecker, and anyway your language probably has different rules to ours.

It seems to me that unless your type system is based on structural typing or dependent types, ceylon-model is an awesome starting point. It might already have everything you need, and, even if it doesn't, it provides a framework that would let you incorporate other features into the type system. For example, I imagine that primitively-defined function, array, tuple, or record types would be easy to incorporate.

We're already reusing this model between the typechecker, two compiler backends, two IDE projects, and the runtime, so we know its API is reasonable.

The only stumbling block is that the project is implemented in Java, so if you want to use it in a compiler written in a different language, you'll probably want to translate it to that language first. The code is pretty clean and straightforward, so that's probably not a massive hurdle to overcome.

The license for ceylon-model is Apache 2.0, so you can basically do what you like with it. But if you decide to use it in a project, please drop us a note, because it makes us happy to see people reuse our work.

Constructors in Ceylon

Since the earliest versions of Ceylon, we've supported a streamlined syntax for class initialization where the parameters of a class are listed right after the class name, and initialization logic goes directly in the body of the class.

class Color(shared Integer rgba) {

    assert (0 <= rgba <= #FFFFFFFF);

    function encodedValue(Integer slot)
            => rgba.rightLogicalShift(8*slot).and(#FF);

    shared Integer alpha => encodedValue(3);

    shared Integer red => encodedValue(2);
    shared Integer green => encodedValue(1);
    shared Integer blue => encodedValue(0);

    function hex(Integer int) => formatInteger(int, 16);

    string => "Color { \ 
               alpha=``hex(alpha)``, \ 
               red=``hex(red)``, \ 
               green=``hex(green)``, \ 
               blue=``hex(blue)`` }";

}

We can instantiate a class like this:

Color red = Color(#FFFF0000);

The ability to refer to parameters of the class directly from the members of the class really helps cut down on verbosity, and most of the time this is a really comfortable way to write code.

However, as we've seen over the past few years of writing Ceylon code, there are moments when we would really appreciate the ability to write a class with multiple initialization paths, something like constructors in Java, C#, or C++. To be clear, in the overwhelmingly common case—something more than 90% of classes, I would estimate—constructors are unnecessary and uncomfortable. But we still need a good solution for the remaining trickier cases.

A couple of especially compelling cases we ran into were:

  • the class Array in ceylon.language, which can be allocated with a list of elements, or with a size and a single element value, and
  • cloning copy constructors, used, for example, to implement HashMap.clone() and HashSet.clone().

Unfortunately, I've always found the design of constructors that Java and C# inherited from C++ to be a bit strange and inexpressive. So before I tell you what we've done about constructors in Ceylon 1.2, let me start by explaining what I think is wrong with constructors in Java.

What's wrong with constructors in Java?

As alluded above, the biggest problem with the constructor syntax in languages that borrow from C++ is that in the common case of a class with just one constructor, the parameters of that constructor aren't available in the body of the class, leading to awful code like the following:

class Point {
    public float x;
    public float y;
    public Point(float x, float y) {
        this.x = x;
        this.y = y;
    }
    public String toString() {
        return "(" + x + ", " + y + ")";
    }
}

This hurts. Fortunately, we've already made that pain go away in Ceylon.

class Point(shared Float x, shared Float y) {
    string => "(``x``, ``y``)";
}

So let's look at some additional issues with constructors in Java.

To begin with, the syntax is irregular. In C-like languages, the grammar for a declaration is:

Modifier* (Keyword|Type) Identifier OtherStuff

Constructors, strangely, don't conform to this general schema, having been bolted on later.

Second, the constructors of a class are all forced to have the same name. This seems like a quite bizarre restriction:

  • If they all have the same name, why not declare them with a keyword instead of an identifier? DRY, anyone?
  • It's a restriction that robs me of expressiveness. Instead of new ColorWithRGBAndAlpha(r,g,b,a), giving me a clue as to the semantics of the arguments, I write just new Color(r,g,b,a), and the reader is left guessing.
  • Constructors thus run into Java's totally broken support for overloading. I can't have a constructor that takes a List<Float> and another which takes a List<Integer>, since these two parameter types have the same erasure.
  • Constructor references (Class::new in Java) can be ambiguous, depending on the context.

Third, constructors aren't forced to initialize the instance variables of the class. All Java types have a "default" (zero or null) value, and if you forget to initialize an instance variable in Java, you'll get a NullPointerException, or, worse, an incorrect zero value at runtime. These problems most certainly belong to the class of problems that I expect a static type system to be able to detect, and, indeed, in other contexts Java does detect uninitialized variables.

Further note that this would be an even bigger problem in Ceylon, because most types don't have null as an instance, so there is no obvious "default" value.

As usual, my purpose here isn't to bash Java, but to justify why we've done things differently in Ceylon.

Named constructors and default constructors

By contrast, the newly introduced syntax for constructors in Ceylon is regular, expressive, and doesn't rely on overloading (which Ceylon doesn't support, except when interoperating with native Java code). Here's the basic syntax for a constructor:

new withFooAndBar(Foo foo, Bar bar)
        extends anotherConstructor(foo) {
    //do stuff
}

When the class to which the constructor belongs directly extends Basic, the extends clause is optional.

And here's an example of how it's used:

class Color {

    shared Integer rgba;

    //default constructor
    shared new (Integer rgba) {
        assert (0 <= rgba <= #FFFFFFFF);
        this.rgba = rgba;
    }

    //named constructor
    shared new withRGB(
        Integer red, Integer green, Integer blue, 
        Integer alpha = #FF) {
        assert (0 <= red <= #FF, 
                0 <= green <= #FF, 
                0 <= blue <= #FF);
        rgba = 
                alpha.leftLogicalShift(24) +
                red.leftLogicalShift(16) +
                green.leftLogicalShift(8) +
                blue;
    }

    //another named constructor
    shared new withRGBIntensities(
        Float red, Float green, Float blue, 
        Float alpha = 1.0) {
        assert (0.0 <= red <= 1.0, 
                0.0 <= green <= 1.0, 
                0.0 <= blue <= 1.0);
        function int(Float intensity) 
                => (intensity*#FF).integer;
        rgba = 
                int(alpha).leftLogicalShift(24) +
                int(red).leftLogicalShift(16) +
                int(green).leftLogicalShift(8) +
                int(blue);
    }

    function encodedValue(Integer slot)
            => rgba.rightLogicalShift(8*slot).and(#FF);

    shared Integer alpha => encodedValue(3);

    shared Integer red => encodedValue(2);
    shared Integer green => encodedValue(1);
    shared Integer blue => encodedValue(0);

    function hex(Integer int) => formatInteger(int, 16);

    string => "Color { \ 
               alpha=``hex(alpha)``, \ 
               red=``hex(red)``, \ 
               green=``hex(green)``, \ 
               blue=``hex(blue)`` }";

}

Constructor declarations are indicated with the keyword new, and have a name which begins with a lowercase letter. We call a constructor like this:

Color red = Color.withRGBIntensities(1.0, 0.0, 0.0);

Or, using named arguments, like this:

Color red = 
    Color.withRGBIntensities { 
        red = 1.0; 
        green = 0.0;
        blue = 0.0;
    };

A function reference to a constructor has a natural syntax:

Color(Float,Float,Float) createColor
        = Color.withRGBIntensities;

A class may have a constructor, called the default constructor, with no name. Instantiation via the default constructor works just like instantiation of a class without constructors:

Color red = Color(#FFFF0000);

A class isn't required to have a default constructor, but most classes will have one.

Why do we need the concept of a default constructor? Well, because a class with constructors may not have a parameter list. Wait, let's stop and reemphasize that caveat, because it's an important one:

You can't add constructors to a class with a parameter list! Instead, you must first rewrite the class to use a "default constructor" for its "normal" initialization logic.

However, a class with constructors may still have initialization logic directly in the body of the class. For example, the following is perfectly legal:

class Color {

    shared Integer rgba;

    shared new (Integer rgba) {
        this.rgba = rgba;
    }

    shared new withRGB(
        Integer red, Integer green, Integer blue, 
        Integer alpha = #FF) {
        assert (0 <= red <= #FF, 
                0 <= green <= #FF, 
                0 <= blue <= #FF);
        rgba = 
                alpha.leftLogicalShift(24) +
                red.leftLogicalShift(16) +
                green.leftLogicalShift(8) +
                blue;
    }

    shared new withRGBIntensities(
        Float red, Float green, Float blue, 
        Float alpha = 1.0) {
        assert (0.0 <= red <= 1.0, 
                0.0 <= green <= 1.0, 
                0.0 <= blue <= 1.0);
        function int(Float intensity) 
                => (intensity*#FF).integer;
        rgba = 
                int(alpha).leftLogicalShift(24) +
                int(red).leftLogicalShift(16) +
                int(green).leftLogicalShift(8) +
                int(blue);
    }

    //executed for every constructor
    assert (0 <= rgba <= #FFFFFFFF);

    //other members
    ...
}

The last assert statement is executed every time the class is instantiated.

Value constructors

The constructors we've just seen are termed callable constructors in the language specification, because they declare parameters. We also have value constructors, which don't declare parameters, and which are executed once, the first time the constructor is evaluated in the context to which the class belongs. For a toplevel class, a value constructor is a singleton.

class Color {

    shared Integer rgba;

    //default constructor
    shared new (Integer rgba) {
        this.rgba = rgba;
    }

    //value constructors

    shared new white {
        rgba = #FFFFFFFF;
    }

    shared new red {
        rgba = #FFFF0000;
    }

    shared new green {
        rgba = #FF00FF00;
    }

    shared new blue {
        rgba = #FF0000FF;
    }

    //etc
    ...

}

We can use a value constructor like this:

Color red = Color.red;

Sometimes the constructors of a class share certain initialization logic. If that logic doesn't depend upon the parameters of the class, we can put it directly in the body of the class, as we've already seen. But if it does depend upon the parameters, we often need to take a different tack.

Constructor delegation

To facilitate reuse of initialization logic within a class, it's useful to allow a constructor to delegate to a different constructor of the same class. For this, we use the extends clause:

Integer int(Float intensity) 
        => (intensity*#FF).integer;

class Color {

    shared Integer rgba;

    shared new (Integer rgba) {
        this.rgba = rgba;
    }

    //value constructors delegate to the default constructor

    shared new white 
            extends Color(#FFFFFFFF) {}

    shared new red 
            extends Color(#FFFF0000) {}

    shared new green 
            extends Color(#FF00FF00) {}

    shared new blue 
            extends Color(#FF0000FF) {}

    shared new withRGB(
        Integer red, Integer green, Integer blue, 
        Integer alpha = #FF) {
        assert (0 <= red <= #FF, 
                0 <= green <= #FF, 
                0 <= blue <= #FF);
        rgba = 
                alpha.leftLogicalShift(24) +
                red.leftLogicalShift(16) +
                green.leftLogicalShift(8) +
                blue;
    }

    shared new withRGBIntensities(
        Float red, Float green, Float blue, 
        Float alpha = 1.0) 
            //delegate to other named constructor
            extends withRGB(int(red), 
                            int(green), 
                            int(blue),
                            int(alpha)) {}

    assert (0 <= rgba <= #FFFFFFFF);

    //other members
    ...
}

A constructor may only delegate to a constructor defined earlier in the body of the class.

Note that we've written extends Color(#FFFFFFFF) to delegate to the default constructor of Color.

Definite initialization and partial constructors

An ordinary constructor like Color.withRGB() or Color.withRGBIntensities() has a responsibility to initialize every value reference belonging to the class that is either:

  • shared, or
  • used ("captured") by another member of the class.

The Ceylon compiler enforces this responsibility at compile time and will reject the code unless it can prove that every value reference has been fully initialized, either:

  • by every ordinary constructor, or
  • in the body of the class itself.

This rule would make it difficult to factor out common logic contained in constructors if it weren't for the notion of a partial constructor. For a partial constructor, the requirement that all references are fully initialized is relaxed. But a partial constructor may not be used to directly instantiate the class. It may only be called from the extends clause of another constructor of the same class. A partial constructor is indicated by the abstract annotation:

Here's a contrived example:

class ColoredPoint {
    shared Point point;
    shared Color color;

    //partial constructor
    abstract new withColor(Color color) {
        this.color = color;
    }

    shared new forCartesianCoords(Color color, 
        Float x, Float y) 
            //delegate to partial constructor
            extends withColor(color) {
        point = Point.cartesian(x, y);
    }

    shared new forPolarCoords(Color color, 
        Float r, Float theta) 
            //delegate to partial constructor
            extends withColor(color) {
        point = Point.polar(r, theta);
    }

    ...

}

So far, we've only seen how to delegate to another constructor of the same class. But when a class extends a superclass, every constructor must ultimately delegate—perhaps indirectly—to a constructor of the superclass.

Constructors and inheritance

A class may extend a class with constructors, for example:

class ColoredPoint2(color, Float x, Float y) 
        extends Point.cartesian(x, y) {
    shared Color color;
    ...
}

A more interesting case is when the extending class itself has constructors:

class ColoredPoint extends Point {
    shared Color color;

    shared new forCartesianCoords(Color color, 
        Float x, Float y)
            //delegate to Point.cartesian()
            extends cartesian(x, y) {
        this.color = color;
    }

    shared new forPolarCoords(Color color, 
        Float r, Float theta)
            //delegate to Point.polar()
            extends polar(r, theta) {
        this.color = color;
    }

    ...
}

In this example, the constructors delegate directly to constructors of the superclass.

Ordering of initialization logic

With constructor delegation, together with initialization logic defined directly in the body of the class, you must be imagining that initialization can get pretty convoluted.

Well, no. The general principle of initialization in Ceylon remains unchanged: initialization always flows from top to bottom, allowing the typechecker to verify that every value is initialized before it is used.

Consider this class:

class Class {
    print(1);
    abstract new partial() {
        print(2);
    }
    print(3);
    shared new () extends partial() {
        print(4);
    }
    print(5);
    shared new create() extends partial() {
        print(6);
    }
    print(7);
}

Calling Class() results in the following output:

1
2
3
4
5
7

Calling Class.create() results in this output:

1
2
3
5
6
7

All quite orderly and predictable!

In the comments, David Hagen suggests a way of understanding how constructor delegation and ordering works.

Using value constructors to emulate enums

You might already have noticed that if a class only has value constructors, it's very similar to a Java enum.

shared class Day {
    shared actual String string;
    abstract new named(String name) {
        string = name;
    } 
    shared new sunday extends named("SUNDAY") {}
    shared new monday extends named("MONDAY") {}
    shared new tuesday extends named("TUESDAY") {}
    shared new wednesday extends named("WEDNESDAY") {}
    shared new thursday extends named("THURSDAY") {}
    shared new friday extends named("FRIDAY") {}
    shared new saturday extends named("SATURDAY") {}
}

We therefore let you use value constructors in a switch statement. The ability to switch over value constructors can be viewed as an extension of the pre-existing facility for switching over literal values of types like Integer, Character, and String.

Day day = ... ;
String message;
switch (day)
case (Day.friday) { 
    message = "thank god"; 
}
case (Day.sunday | Day.saturday) { 
    message = "we could be having this conversation with beer"; 
}
else {
    message = "need more coffee";
}
print(message);

But when we noticed the similarity to Java enums, we decided to take the idea a little further than what Java offers here. Java enumerations are open, in the sense that a switch statement which covers all enumerated values of an enum must still include a default case to be considered exhaustive by definite assignment and definite return checking.

But in Ceylon we also have the notion of closed enumerated types, where the "default" case—the else clause of Ceylon's switch statement—may be omitted, but the whole switch will still be considered exhaustive.

Note: API designers should be careful to only "close" an enumeration that won't grow new value constructors in future revisions of the API. Day and Boolean are good examples of closed enumerations. ErrorType is an examples of an open enumeration.

If we add an of clause to Day, it will be considered a closed enumeration.

shared class Day 
        of sunday | monday | tuesday | wednesday | 
           thursday | friday | saturday {
    shared actual String string;
    abstract new named(String name) {
        string = name;
    } 
    shared new sunday extends named("SUNDAY") {}
    shared new monday extends named("MONDAY") {}
    shared new tuesday extends named("TUESDAY") {}
    shared new wednesday extends named("WEDNESDAY") {}
    shared new thursday extends named("THURSDAY") {}
    shared new friday extends named("FRIDAY") {}
    shared new saturday extends named("SATURDAY") {}
}

Now Ceylon will consider a switch statement that covers all the value constructors as an exhaustive switch, and we can write the following code without needing an else clause:

Day day = ... ;
String message;
switch (day)
case (Day.monday | Day.tuesday | 
      Day.wednesday | Day.thursday) { 
    message = "need more coffee"; 
}
case (Day.friday) { 
    message = "thank god"; 
}
case (Day.sunday | Day.saturday) { 
    message = "we could be having this conversation with beer"; 
}
print(message);

This is an alternative to the existing pattern for emulating Java-style enums.

A final word

The design I've presented here is the final result of a thought process that spanned five years. I personally found this to be a surprisingly difficult problem to address in a principled way. For a time, I hoped to not even need to have constructors in the language at all. But ultimately I'm very happy with the end result. It seems to me not only principled and consistent with the rest of the language, but also very expressive and powerful.

A little more about type functions

My previous post about type functions generated some interesting discussion, here, and on reddit.

Therefore, I think it's worth tying up several loose ends from the earlier post. So here's a collection of further observations about type functions.

Warning: this post addresses some very technical details of how we've incorporated type functions into Ceylon's type system. Don't even bother continuing any further until you've read the earlier post.

The "why" of this

The most well-known application of higher-order generics is for representing high-level abstractions of container types: functors, monads, and friends. That's not what motivated me to experiment with higher-order generics in Ceylon, and as a practical matter I still have little interest in these abstractions, though they're fun to play with.

No, what bothered me was not that Ceylon's type system wasn't powerful enough to represent Functor or Monad, but rather that Ceylon's type system wasn't powerful enough to represent Ceylon. I'll show you what I mean by that in just a second. But first I want to argue that type functions can be seen as a regularization of the language.

From a purely syntactic point of view, it's always seemed a little strange that every sort of type declaration in Ceylon can have a list of type parameters, except for a type parameter itself. Furthermore, it's noticeable that I can take a reference, or meta reference to any program element unless it has a list of type parameters. Now, such restrictions might seem reasonable if a parameterized type parameter or a reference to a generic declaration were not meaningful notions at a fundamental level. But they clearly are meaningful, and even at least somewhat useful.

Exactly how useful is a different question—the jury's still out, at least in my mind. And so perhaps we'll ultimately conclude that this stuff isn't worth its weight. The weight being, substantially, the time it takes for programmers new to Ceylon to understand this stuff.

In the original post, I showed how type functions were necessary to represent the type of a reference to a generic function. One place where this problem arises is with one of Ceylon's more unique features: its typesafe metamodel.

A use case for generic function reference types

Usually, I can obtain a metamodel object that represents a class or function and captures its type signature. For example, the expression `String` evaluates to a metamodel object that captures the type and initializer parameters of the class String:

Class<String,[{Character*}]> stringClass = `String`;

For a generic declaration, I can do a similar thing, as long as I'm prepared to nail down the type arguments. For example, I can write `Singleton<String>` to get a metamodel object representing the class Singleton after applying the type argument String:

Class<Singleton<String>,[String]> stringSingletonClass
        = `Singleton<String>`

But in Ceylon as it exists today, I can't obtain a typed metamodel object that represents just `Singleton`, because to represent the type of that metamodel object I would necessarily need a type function.

Now, with the new experimental support for type functions, the type of the expression `Singleton` could be <T> => Class<Singleton<T>,[T]>(), allowing code like this:

value singletonGenericClass = `Singleton`;
...
Class<Singleton<String>,[String]> stringSingletonClass 
        = singletonGenericClass<String>();

That's just one example of how allowing references to generic functions makes Ceylon feel more "complete".

Two use cases for anonymous type functions

I get the impression that the "scariest" bit of what I've presented in the previous post is the notation for anonymous type functions. That is, the following syntax:

<X> => X
<X> => [X,X,X]
<X,Y> => X|Y
<T> given T satisfies Object => Category<T>(T*)

But I'm convinced that this notation is not really that hard to understand. The reason I assert this is because if I give each of these type functions a name, then most of you guys have no problem understanding them:

alias Identity<X> => X;
alias Triple<X> => [X,X,X];
alias Union<X,Y> => X|Y;
alias CategoryCreator<T> given T satisfies Object => Category<T>(T*);

But—one might reasonably enquire—why do we even need them, if the named versions are easier to read?

Well, we need them:

  1. in order to be able to denote the type of a reference to a generic function—remember, we don't have undenotable types in Ceylon—and
  2. to make it easy to partially apply a named type function like Map.

For example, we want to be able to write stuff like <T> => Map<String,T> when working with higher-order generics, thus turning a type function of two type parameters into a type function of one type parameter.

Are type functions "type types"?

One thing I should have made very clear, and forgot, is that type functions don't represent an additional meta level. In Ceylon's type system, type functions are types, in the very same sense that function are values in Ceylon and other modern languages.

There simply is no additional meta-type system for types in Ceylon. The closest thing we have to a "type type" is a generic type constraint, but that's an extremely impoverished sort of type type, since Ceylon provides no facilities at all to abstract over type constraints—I can't even assign an alias to a type constraint and reuse it by name.

Ceylon reasons about type constraints and assignability of types to type variables using hardcoded rules written primitively into the language spec and type checker, not by abstraction over the types of types.

Type functions and subtyping

But if a type function is a type, what are its subtyping relationships with other types and other type functions?

Well, first, recall that some type functions have instances: generic function references. We didn't want to introduce values into the language that aren't Objects, so we've declared that every type function is a subtype of Object. This preserves the useful property that our type system has a single root type Anything.

Next, recall that an ordinary function type is covariant in its return type, and contravariant in its parameter types. For example, the function type:

String(Object, Object)

is a subtype of:

Object(String, String)

Since if a function accepts two Objects and returns a String, then it's clearly also a function that accepts two Strings and returns an Object.

Given two function types with one parameter:

F(P)
G(Q)

Then F(P) is a subtype of G(Q) iff P is a supertype of Q and F is a subtype of G.

Similar rules apply to type functions. Consider two type functions of one type parameter:

alias A<X> given X satisfies U => F<X>
alias B<Y> given Y satisfies V => G<Y>

Then A is a subtype of B iff:

  • the upper bound U on X is a supertype of the upper bound V on Y, and
  • for any type T, F<T> is a subtype of G<T>.

That is to say, if A<X> accepts every type argument T that B<Y> accepts, and for each such T, the applied type A<T> is a subtype of the applied type B<T>, then we can soundly replace B with A in well-typed code.

(Of course, these rules generalize to type functions with multiple type parameters.)

Generic function types and subtyping

Now let's narrow our attention to consider only type functions that represent the types of generic functions. To make it easier, we'll consider generic functions of the following form, with just one type parameter and just one value parameter:

F<X> f<X>(P<X> p) given X satisfies U => ... ;

Here, F<X> is the return type, a type expression involving the type parameter X, and P<X> is the parameter type, which also involves X.

The type of this generic function—as we saw in the previous post— is the type function:

<X> given X satisfies U => F<X>(P<X>)

So let's consider two type functions of the general form we're considering:

alias A<X> given X satisfies U => F<X>(P<X>)
alias B<Y> given Y satisfies V => G<Y>(Q<Y>)

Then we see quickly that A is a subtype of B iff:

  • the upper bound U on X is a supertype of the upper bound V on Y, and
  • for any type T, the return type F<T> of A is a subtype of the return type G<T> of B, and the parameter type P<T> of A is a supertype of the parameter type Q<T> of B.

For example, this generic function type:

<X> => X&Object(X|Object)

is a subtype of this generic function type:

<X> given X satisfies Object => X(X)

Take a minute to convince yourself that this is correct intuitively.

(Again, these rules generalize naturally to functions with multiple type parameters and/or multiple value parameters.)

Type functions and type inference

When we call a first-order generic function in Ceylon, we don't usually need to explicitly specify type arguments. Instead, we can usually infer them from the value arguments of the invocation expression. For example, if we have this generic function:

List<Out> map<In,Out>(Out(In) fun, List<In> list) => ... ;

Then we can always safely infer In and Out, because there's a unique most-precise choice of type arguments:

value list = map(Integer.string, ArrayList { 10, 20, 30 });

In this example, we can safely infer that In is Integer, and Out is String, without any loss of precision.

Unfortunately, once higher-order generics come into play, inference of type functions is a much more ambiguous problem. Consider this second-order generic function, which abstracts the map() function away from the container type, by introducing the type function variable Box to represent the unknown container type:

Box<Out> fmap<Box,In,Out>(Out(In) fun, Box<In> box) 
        given Box<Element> { ... }

And now consider the following invocation of this function:

fmap(Integer.string, ArrayList { 10, 20, 30 })

What type should we infer for Element, and what type function for Box?

  • Integer and List?
  • Integer and Iterable?
  • Integer and ArrayList?
  • Integer and MutableList?
  • Integer and ListMutator?
  • Integer and Collection?
  • Object and Category?

In general, there might be several different reasonable choices, and no really good criteria for choosing between them. So in this case, we require that the type arguments be specified explicitly:

fmap<List,Integer,String>(Integer.string, ArrayList { 10, 20, 30 })

However, there is a pattern we can use to make type function inference possible. In this case, we could define the following interface:

interface Functor<Box,Element> given Box<Value> { ... }

Now let's imagine that our ArrayList class inherits Functor, so that any ArrayList<Element> is a Functor<List,Element>.

And let's redefine fmap() like this:

  Box<Out> fmap<Box,In,Out>(Out(In) fun, Functor<Box,In> box) 
        given Box<Element> { ... }

Then, finally, for the same instantiation expression we had before:

fmap(Integer.string, ArrayList { 10, 20, 30 })

we can now unambiguously infer that Box is List and In is Integer, since those types are encoded as type arguments to Functor in the principal supertype instantiation of Functor for the expression ArrayList { 10, 20, 30 }.

Instances of type functions

In the original post, we noted that a type function that returns a function type is the type of a generic function. For example, the type function:

<X> given X satisfies Object => X(X)

Is the type of this generic function:

X f<X>(X x) given X satisfies Object => x;

But then it's natural to enquire: if some type functions are the types of generic functions, what are the other type functions the types of?

Well, if you reflect for a second on the relationship between types and values, I think you'll see that they must be the types of generic value declarations. That is, this type function:

<X> => List<X>

would be the type of this value, written in pseudo-Ceylon:

List<X> list<X> => ... ;

That is, when presented with a type X, list<X> evaluates to a List<X>.

Of course there are no actual generic values in Ceylon, the closest thing we have is a nullary generic function:

List<X> list<X>() => ... ;

whose type is actually:

<X> => List<X>()

There's no plan to ever introduce generic values into Ceylon, so types like <X> => List<X> have no instances. They're useful only as type arguments to higher-order generic types.

Type functions and principal typing

Finally, let's address a rather technical point.

A very important property of Ceylon's type system is the ability to form a principal instantiation of any union or intersection of different instantiations of a generic type.

For, example, for the covariant type List<T>:

  • List<X> | List<Y> has the principal instantiation List<X|Y>
  • List<X> & List<Y> has the principal instantiation List<X&Y>

For the contravariant type Comparable<T>:

  • Comparable<X> | Comparable<Y> has the principal instantiation Comparable<X&Y>
  • Comparable<X> & Comparable<Y> has the principal instantiation Comparable<X|Y>

Naturally, it's important that we can do the same tricks for intersections and unions of instantiations of higher-order types. As at happens, this works out extremely naturally, using the following identities:

  • <<X> => F<X>> | <<Y> => G<Y>> is a subtype of <T> => F<T> | G<T>, and
  • <<X> => F<X>> & <<Y> => G<Y>> is a subtype of <T> => F<T> & G<T>.

Thus, if we have the following covariant second-order type:

interface Functor<out Element, out Container>
        given Container<E> { ... }

Then we obtain the following principal instantiations:

  • Functor<E,A> | Functor<F,B> has the principal instantiation Functor<E|F,<T> => A<T>|B<T>>, and
  • Functor<E,A> & Functor<F,B> has the principal instantiation Functor<E&F, <T> => A<T>&B<T>>.

You don't need to know these identities when you're writing code in Ceylon, but it's nice to know that type functions don't undermine the basic algebraic properties which are the reason Ceylon's type system is so nice to work with. Everything fits together here, without weird holes and corner cases.

A word about "rank"

In the previous post I described our support for references to generic functions as "arbitrary rank" polymorphism, which prompted a short discussion about how to measure the rank of a generic type. I now think that the term "rank" probably isn't very meaningful in connection to Ceylon, since there's nothing special about our function types: they're just instantiations of the perfectly ordinary generic type Callable. As suggested by Kamatsu on reddit, it seems to me that a better word to use is probably "impredicative".

Programming with type functions in Ceylon

I've recently been working on some experimental new features of Ceylon's already extremely powerful type system. What I'm going to explain in this post is known, technically, as:

  • higher order generic types (or type constructor polymorphism, or higher kinds), and
  • higher rank generic types (or rank-N polymorphism).

Please don't worry about this jargon salad. (And please don't try to google any of those terms, because the explanations you'll find will only make these pretty straightforward notions seem confusing.) Stick with me, and I'll do my best to explain the concepts in intuitive terms, without needing any of the above terminology.

But first, let's start with pair of examples that illustrate a motivating problem.

This function simply returns its argument:

Object pipeObject(Object something) => something;

This function adds Floats:

Float addFloats(Float x, Float y) => x+y;

Modern programming language let us treat either of these functions as a value and pass it around the system. For example, I can write:

Object(Object) pipeObjectFun = pipeObject;

Or:

Float(Float,Float) addFloatsFun = addFloats;

Where Object(Object) and Float(Float,Float) represent the types of the functions, and pipeObject and addFloats are a references to the functions. So far so good.

But sometimes it's useful to have a function that abstracts away from the concrete data type using generics. We introduce a type variable, to represent the "unknown" type of thing we're dealing with:

Any pipe<Any>(Any anything) => anything;

And:

Number add<Number>(Number x, Number y)
            given Number satisfies Summable<Number>
        => x+y;

Sometimes, as in add(), the unknown type is constrained in some way. We express this using a type constraint:

given Number satisfies Summable<Number>

This is Ceylon's way of denoting that Number may only be a type which is a subtype of the upper bound Summable<Number>, i.e. that it is a type to which we can apply the addition operator +.

Now, what if I want to pass around a reference to this function. Well, one thing I can typically do is nail down the unknown type to a concrete value:

Object(Object) pipeObjectFun = pipe<Object>;

Or:

Float(Float,Float) addFloatsFun = add<Float>;

But that's a bit disappointing—we've lost the fact that add() was generic. Now, in object-oriented languages it's possible to define the generic function as a member of a class, and pass an instance of the class around the system. This is called the strategy pattern. But it's inconvenient to have to write a whole class just to encapsulate a function reference.

It would be nicer to be able to write:

TypeOfPipe pipeFun = pipe;

And:

TypeOfAdd addFun = add; 

Where TypeOfPipe and TypeOfAdd are the types of the generic functions. The problem is that there's no way to represent these types within the type system of most languages. Let's see how we can do that in Ceylon 1.2.

Introducing type functions

I promised to avoid jargon, and avoid jargon I will. The only bit of terminology we'll need is the idea of a type function. A type function, as its name implies, is a function that accepts zero or more types, and produces a type. Type functions might seem exotic and abstract at first, but there's one thing that will help you understand them:

You already know almost everything you need to know about type functions, because almost everything you know about ordinary (value) functions is also true of type functions.

If you stay grounded in the analogy to ordinary functions, you'll have no problems with the rest of this post, I promise.

So, we all know what an ordinary function looks like:

function addFloats(Float x, Float y) => x+y;

Let's break that down, we have:

  • the function name, and list of parameters, to the left of a fat arrow, and
  • an expression on the right of the fat arrow.

A type function doesn't look very different. It has a name and (type) parameters on the left of a fat arrow, and a (type) expression on the right. It looks like this:

alias Pair<Value> => [Value,Value];

Aha! This is something we've already seen! So a type function is nothing more than a generic type alias! This particular type function accepts a type, and produces a tuple type, a pair, whose elements are of the given type.

Actually not every type function is a type alias. A generic class or interface is a type function. For example:

interface List<Element> { ... }

This interface declaration accepts a type Element, and produces a type List<Element>, so it's a type function.

I can call a function by providing values as arguments:

pipe("hello")
add(1.0, 2.0)

These expressions produce the values "hello" and 3.0.

I can apply a type function by providing types as arguments:

Pair<Float>

This type expression produces the type [Float,Float], by applying the type function Pair to the type argument Float.

Similarly, I can apply the type function List:

List<String>

This type expression just literally produces the type List<String>, by applying the type function List to the type argument String.

On the other hand, I can take a reference to a value function by just writing its name, without any arguments, for example, pipe, or add. I can do the same with type functions, writing Pair or List.

Back in the quotidian world of ordinary values, I can write down an anonymous function:

(Float x, Float y) => x+y

In the platonic world of types, I can do that too:

<Value> => [Value,Value]

Finally, an ordinary value function can constrain its arguments using a type annotation like Float x. A type function can do the same thing, albeit with a more cumbersome syntax:

interface List<Element> 
        given Element satisfies Object 
{
    //define the type list
    ...
}

Even an anonymous type function may have constraints:

<Value> given Value satisfies Object 
        => [Value,Value]

Jargon watch: most people, including me, use the term type constructor instead of "type function".

Type functions are types

Now we're going to make a key conceptual leap. Recall that in modern languages, functions are treated as values. I can

  1. take a function, and assign it to a variable, and then
  2. call that variable within the body of the function.

For example:

void secondOrder(Float(Float,Float) fun) {
    print(fun(1.0, 2.0));
}

//apply to a function reference
secondOrder(addFloats);

//apply to an anonymous function
secondOrder((Float x, Float y) => x+y);

We call functions which accept functions higher order functions.

Similarly, we're going to declare that type functions are types. That is, I can:

  1. take a type function and assign it to a type variable, and then
  2. apply that type variable in the body of the declaration it parameterizes.

For example:

interface SecondOrder<Box> given Box<Value> {
    shared formal Box<Float> createBox(Float float);
}

//apply to a generic type alias
SecondOrder<Pair> something;

//apply to a generic interface
SecondOrder<List> somethingElse;

//apply to an anonymous type function
SecondOrder<<Value> => [Value,Value]> somethingScaryLookin;

The type constraint given Box<Value> indicates that the type variable Box accepts type functions with one type argument.

Now, there's one thing to take note of here. At this point, the notion that type functions are types is a purely formal statement. An axiom that defines what kinds of types I can write down and expect the typechecker of my programming language to be able to reason about. I have not—yet—said that there are any actual values of these types!

Jargon watch: the ability to treat a type function as a type is called higher order generics.

The type of a generic function is a type function

Let's come back to our motivating examples:

Any pipe<Any>(Any anything) => anything;

Number add<Number>(Number x, Number y)
            given Number satisfies Summable<Number>
        => x+y;

If you squint, you'll see that these are actually functions with two parameter lists. The first parameter lists are:

<Any>

And:

<Number> given Number satisfies Summable<Number>

Which both accept a type. The second parameter lists are:

(Any anything)

And:

(Number x, Number y)

Therefore, we can view each generic function as a function that accepts a type and produces an ordinary value function. The resulting functions are of type Any(Any) and Value(Value,Value) respectively.

Thus, we could write down the type of our first generic function pipe() like this:

<Any> => Any(Any)

And the type of add() is:

<Number> given Number satisfies Summable<Number>
        => Number(Number,Number)

Phew. That looks a bit scary. But mainly because of the type constraint. Because generic function types like this are pretty verbose, we can assign them aliases:

alias AdditionLikeOperation
        => <Number> given Number satisfies Summable<Number>
                => Number(Number,Number);

Or, equivalently, but more simply:

alias AdditionLikeOperation<Number> 
        given Number satisfies Summable<Number>
                => Number(Number,Number);

That was the hard part—we're almost done.

References to generic functions

Now we can use these types as the types of references to generic functions:

<Any> => Any(Any) pipeFun = pipe;

AdditionLikeOperation addFun = add;

And we can apply these function references by providing type arguments:

String(String) pipeString = pipeFun<String>;
Object(Object) pipeObject = pipeFun<Object>;

Float(Float,Float) addFloats = addFun<Float>;
Integer(Integer,Integer) addInts = addFun<Integer>;

Or, alternatively, we can just immediately apply the generic function references to value arguments, and let Ceylon infer the type arguments, just as it usually does when you call a function directly:

String hi = pipeFun("hello");
Integer zero = pipeFun(0);

Float three = addFun(1.0, 2.0);
String helloWorld = addFun("Hello", "World");

And now we've solved the problem posed at the beginning!

Now for the kicker: the types <Any> => Any(Any) and AdditionLikeOperation are both type functions. Indeed, the type of any generic function is a type function of this general form:

<TypeParameters> => ReturnType(ParameterTypes)

Similarly, every type function of this general form is the type of some generic function.

So we've now shown that some type functions are not only types, they're the types of values—the values are references to generic functions like pipe and add.

Jargon watch: the ability to treat a generic function as a value is called higher rank generics.

Abstraction over generic functions

Finally, we can use all this for something useful. Let's consider a scanner library that is abstracted away from all of:

  • the character type,
  • the token type,
  • the kind of container that tokens occur in.

Then we might have a scan() function with this sort of signature:

"Tokenize a stream of characters, producing
 a stream of tokens."
Stream<Token> scan<Char,Token,Stream>
        (grammar, characterStream, newToken, newStream)
            //Note: Stream is a reference to a type function!
            given Stream<Element> satisfies {Element*} {

    //parameters:

    "The token grammar."
    Grammar grammar;

    "The character stream to tokenize."
    Stream<Char> characterStream;

    "Constructor for tokens, accepting a
     substream of characters."
    Token newToken(Stream<Char> chars);

    "Generic function to construct a stream
     of characters or tokens."
     //Note: newStream is a reference to a generic function!
     Stream<Elem> newStream<Elem>({Elem*} elements);

    //implementation:

    Stream<Token> tokenStream;

    //do all the hard work
    ...

    return tokenStream;
}

Here:

  • Char is the unknown character type,
  • Token is the unknown token type,
  • Stream is a type function representing an unknown container type that can contain either characters or tokens,
  • newToken is a function that accepts a substream of characters and creates a Token,
  • characterStream is a stream of characters, and, most significantly,
  • newStream is a generic function that constructs streams of any element type, used internally to create streams of characters as well as tokens.

We could use this function like this:

//Let's use String as the token type, Character as the
//character type, and Iterable as the stream type.

//input a stream of characters
{Character*} input = ... ;

//and some token grammar
Grammar grammar = ... ;

//get back a stream of Strings
{String*} tokens =
        scan<Character, String, Iterable>
            (grammar, input, String,
                    //Note: a generic anonymous function! 
                    <Elem>({Elem*} elems) => elems);

Or like this:

//use LinkedList as the stream type
import ceylon.collection { LinkedList }

//we don't need Unicode support, so let's use
//Ceylon's 8-bit Byte as our character type
alias Char => Byte;

//define our own token type
class BasicToken(LinkedList<Char> charList) {
    string => String { for (b in charList) 
                       b.unsigned.character };
}

//input a linked list of characters
LinkedList<Char> input = ... ;

//and some token grammar
Grammar grammar = ... ;

//get back a linked list of BasicTokens
LinkedList<BasicToken> tokens =
        scan<Char, BasicToken, LinkedList>
            (grammar, input, BasicToken,
                    //Note: a generic function ref! 
                    LinkedList);

As you can see, our parsing algorithm is now almost completely abstracted away from the concrete types we want to use!

Compiling this code

In Ceylon 1.2, programming with type functions is an experimental feature that only works in conjunction with the JavaScript backend. So you can run code that uses type functions on a JavaScript virtual machine, but not on the JVM. This is something we're inviting you to play with to see if the whole community agrees it is useful. But right now it's not covered in the language specification, and it's not supported by the Java backend.

The typechecker itself supports extremely sophisticated reasoning about higher order and higher rank types. Type functions are fully integrated with Ceylon's powerful system of subtype polymorphism, including with union and intersection types, and with type inference and type argument inference. There's even limited support for type function inference! And there's no arbitrary upper limits here; not only rank-2 but arbitrary rank types are supported.

If there's enough interest, I'll cover that material in a future post.