Blog of Gavin King

Ceylon M5 progress report

Ceylon M4 was released back at the end of October, more than three months ago, so you might be wondering what the hell have we been up to and where in hell is that M5 release we promised! At the very least you deserve a progress report.

So what's going on is that M5 has gone quite a bit off the roadmap. This was kind of my last chance to revisit some design choices I made a couple of years ago before there was any kind of compiler for Ceylon or libraries written in Ceylon. Inevitably, there were some things that seemed like a good idea at the time, but:

  • I never really managed to sell to everyone else,
  • we ended up falling out of love with, or
  • just didn't quite work out in practice once we started writing real code.

This was really our last opportunity to fix things like that, so we've lost some time there.

Additionally, there was one thing, namely tuples, that I didn't think we were going to need, that we did wind up needing, and accommodating it elegantly into the language was actually somewhat disruptive. This also cost some time, and there is still some unfinished work there.

Finally, we decided to throw away almost all the human-compiled code in the language module and compile it with the Ceylon compiler instead. That was hard work, but it sure makes as feel a lot more secure and saves us a lot of pain when we add things to the language module.

So the bad news is that, given the above, some functionality is going to slip from M5, and therefore M5 will not be a feature-complete implementation of the language. In particular, it wont have support for:

  • annotations,
  • serialization, or
  • the metamodel (reflection).

It will have the following new features:

  • tuples
  • direct interop with native JavaScript APIs via the new dynamic block (more on this in a separate post)
  • the : operator
  • verbatim strings
  • fat arrow => abbreviation for defining single-expression functions
  • the late annotation
  • binary and hexadecimal numeric literals
  • defaulted type parameters
  • experimental support for reified generics
  • many miscellaneous minor improvements to the syntax and typing rules of the language
  • ceylon.time module
  • ceylon.net.httpd package (a HTTP server)
  • the compose() and curry() functions

(Please don't try to tell me that's not a pretty impressive list!)

I also finally found the time to give the language specification some love, after several months of neglect, so it's now up to date with all this new work. Ceylon is a better language now, I'm certain of it.

Well, I've a little more bad news: M5 is still not ready :-(

  • Stef's currently putting the finishing touches on his initial implementation of reified generics,
  • David's working on some performance issues with compilation inside the IDE that we would prefer to see fixed in M5,
  • Roland Tepp is finishing off work on the initial release of ceylon.time,
  • Matej Lazar is doing some API improvements to ceylon.net.httpd, and
  • there's still a bunch of backend bugs that need fixing.

Bear with us. It's frankly very painful for us to have not had a release in almost four months, but in a funny way that's more of a sign of how close we are now to delivering something you can really use to write real applications.

Things static type systems shouldn't have

If you've worked with any programming language, you have gripes about it. All design involves compromises, and all languages have warts. More interesting are the warts that are shared by many languages. Here's a short list of things that are extremely common in static type systems, that I think are almost always mistakes.

A populated bottom type

By definition, the bottom type is a type with no values. If null is an instance of the bottom type, that's a hole in the type system. Yes, yes, it's done for convenience. But if we were to go round adding holes into our type system every time we run into something slightly inconvenient, we may as well just give up and go use a dynamic language.

Non-denoteable types

A non-denoteable type is a type which can't be written down in the programming language itself. Many languages with sophisticated static type systems require the use of non-denoteable types to typecheck the code. This almost guarantees that the programmer can't always reproduce the compiler's reasoning, or understand the resulting error messages.

Aggressively replacing unions with intersections

For some reason, there's some kind of prejudice against union types. Your typical compiler, upon encountering something internally that is naturally a union type, immediately attempts to replace it with the intersection of its supertypes. This doesn't make a whole lot of sense to me, since in these languages both unions and intersections are usually equally non-denoteable, and this operation involves throwing away a whole lot of useful information about the type.

Assigning references a single immutable type

A value has an immutable type. A reference to a value doesn't need to. Indeed, what the compiler knows (or rather could know) about the type of a reference changes depending upon precisely where we are in the code. But there's a strong tradition that each named reference has a single declared or inferred type everywhere it is used. This tradition even infects Ceylon to some extent (though this will change in future versions).

Primitive types or primitively-defined compound types

I hesitated to put this one on the list. There is an approach to building a static type system where we start with certain primitive values, layer tuple support over those types, layer records over tuples, and eventually arrive at generic abstract data types. I'm not talking about that kind of type system. What I'm talking about is where we start with a totally general way of defining abstract data types, and then stick some extra primitive types into the system because we find our supposedly totally general thing to be inadequate.

Overloading and implicit type conversions

Yes, yes, I know they seem convenient. But in fact the benefits provided by overloading or implicit type conversions are almost entirely trivial. They just don't increase the expressiveness of your type system. And they really screw up type inference, which is why you don't see these things in academic or research languages. Dynamic languages don't have anything like overloading or implicit type conversions, and nobody misses them. Because you don't need 'em. They're an abuse of the static type information.

Abstracting over functions

Last week, I was finally able to write the functions compose() and curry() in Ceylon and compile them with the Ceylon compiler. These functions are general purpose higher order functions that operate on other functions at a very high level of abstraction. In most programming languages, it's very easy to define a compose() function that works for functions with just one parameter. For example, in Ceylon:

X compose<X,Y,Z>(X(Y) f, Y(Z) g)(Z z) => f(g(z));

In this function signature, f() and g() are functions with one parameter where the return type of g() is the same as the parameter type of f(). The resulting function is also a function with one parameter, and has the same return type as f() and the same parameter type as g().

String format(Float|Integer number) { .... }
value printf = compose(print,format);
printf(1.0); //prints 1.0

Now, things get trickier when g() has more than one parameter. Imagine the following definition of format():

String format(String pattern, Float|Integer number) { .... }

This function has the type String(String,Float|Integer), which can't be a Y(Z), no matter what argument types we choose or infer for Y and Z. To see this more clearly, and to see how we can come up with a more powerful definition of compose(), we're going to have to understand how Ceylon really represents function types. But first we're going to need to understand tuples.

Tuples

Ceylon recently grew tuples. I've heard all sorts of arguments for including tuples in the language, but what finally convinced me was realizing that they were a simpler way to represent function types.

So a major goal of adding tuples to the language was to add zero additional complexity to the type system. So tuples are instances of an ordinary class, Tuple, which you can see defined here.

A tuple is a linked list where each link in the list encodes the static type of the element. Without syntax sugar, we can write:

value point = Tuple(0.0, Tuple(0.0, Tuple("origin")));

I bet you're wondering what type point has. Well, if you really have to know, it's:

Tuple<Float|String,Float,Tuple<Float|String,Float,Tuple<String,String,Empty>>> 

Phew! Fortunately we never see this type, because Ceylon lets us abbreviate it to [Float,Float,String], and the IDE always shows us the abbreviated form. We can also use the square brackets to instantiate tuples, letting us write:

[Float,Float,String] point = [0.0, 0.0, "origin"];

(Yep, tuples are square in Ceylon. Don't worry, you'll quickly get used to that.)

A tuple is a sequence (Tuple is a subclass of Sequence), so the following is well-typed:

<Float|String>[] seq = point;
Null|Float|String first = seq[0];

Now, what makes a tuple special is that we can access its elements without losing their static type information. We can write:

Float x = point.first;
Float y = point.rest.first;
String label = point.rest.rest.first;

I should emphasize that we're not seeing any magic here. You could write your own MyTuple class and reproduce the exact same behavior!

I bet you're wondering what happens if I type point.rest.rest.rest.first. Well, take a closer look at the verbose type of point again. The chain of "links" is terminated by Empty, Ceylon's empty sequence type. And the type of first on Empty is Null. So we have:

Null zippo = point.rest.rest.rest.first;

That is, the expression is provably null. Provable within the type system, that is.

Of course, we don't want to make you write out suff like point.rest.rest.first whenever you need to get something out of a tuple. A tuple is a sequence, so you can access its elements using the index operator, for example:

Integer i = ... ;
Null|Float|String ith = point[i];

But, as a special convenience, when the index is a literal integer, the compiler will treat it as if it were a chain of calls to rest and first:

Float x = point[0];
Float y = point[1];
String label = point[2];
Null zippo = point[3];

A chain of Tuple instances doesn't need to be terminated by an Empty. It may, alternatively, be terminated by a nonempty sequence, any instance of Sequence. We can write the following:

String[] labels = ["origin", "center"];
[Float, Float, String*] point = [0.0, 0.0, *labels];

The type [Float, Float, String*] is an abbreviation for:

Tuple<Float|String,Float,Tuple<Float|String,Float,String[]>> 

It represents a sequence of two Floats, followed by an unknown number of Strings. So the following is well-typed:

[Float, Float, String*] point0 = [0.0, 0.0];
[Float, Float, String*] point1 = [0.0, 0.0, "origin"];
[Float, Float, String*] point2 = [0.0, 0.0, "origin", "center"];

We'll now see how all this is useful for abstracting over function parameter types.

Function types

An instance of the interface Callable is a function.

shared interface Callable<out Return, in Arguments> 
        given Arguments satisfies Anything[] {}

Going back to our function format() above, its type is:

Callable<String,[String,Float|Integer]>

You can see that we've represented the parameter types of the function using a tuple type. That is to say, Ceylon views a function as accepting a tuple of arguments, and producing a single value. Indeed, Ceylon even lets use write that explicitly:

[String,Float] args = ["%5.2f", 1.0];
print(format(*args));

Here we "spread" the elements of the tuple args across the parameters of the function format(), just like you can do in some dynamically typed languages!

Now consider a function with a variadic argument:

 String format(String pattern, Float|Integer* numbers) { .... }

This function accepts a String, followed by any number of Floats and Integers. We can represent its type as follows:

 Callable<String,[String,Float|Integer*]>

We usually abbreviate function types, writing String(String,Float|Integer) or String(String,Float|Integer*) for the function types above.

Ceylon also has defaulted parameters. The function type String(String,String=) means Callable<String,[String]|[String,String]> and represents a function whose second parameter has a default value.

Notice how, given the definitions we've just seen, assignability between function types works out correctly:

  • a String(Object) is an instance of Object(String),
  • a String(String=) is an instance of String() and of String(String), and
  • a String(String*) is an instance of String(), of String(String), and of String(String,String).

Function composition and currying

Finally we have the machinery we need to define compose() and curry().

The signature of compose() is:

shared Callable<X,Args> compose<X,Y,Args>(X(Y) x, Callable<Y,Args> y) 
        given Args satisfies Anything[]

Notice how the type parameter Args abstracts over all possible parameter lists, just as it does in the definition of Callable. To actually implement compose(), I'm going to need to resort to two native functions that I can't yet implement within the language. I can write down their signatures, however, so this isn't a limitation of the type system itself:

shared native Callable<Return,Args> flatten<Return,Args>
            (Return tupleFunction(Args tuple))
        given Args satisfies Anything[];

shared native Return unflatten<Return,Args>
            (Callable<Return,Args> flatFunction)(Args args)
        given Args satisfies Anything[];

If you're not extremely interested, you can skip over these declarations, and just look at an example of what they do:

String([String,Float|Integer]) unflatFormat = unflatten(format);
String(String,Float|Integer) flatFormat = flatten(unflatFormat);

That is, unflatten() takes any function with any parameter list, and returns a function that accepts a single tuple of the same length as the original parameter list. On the other hand, flatten() does the opposite. It takes a function that accepts a single tuple, and returns a function with a parameter list of the same length as the tuple.

OK, now we can implement compose():

shared Callable<X,Args> compose<X,Y,Args>(X(Y) f, Callable<Y,Args> g) 
        given Args satisfies Anything[]
               => flatten((Args args) => f(unflatten(g)(args)));

Perhaps I should unpack this slightly for readability:

shared Callable<X,Args> compose<X,Y,Args>(X(Y) f, Callable<Y,Args> g) 
        given Args satisfies Anything[] {
    X composed(Args args) {
        Y y = unflatten(g)(args);
        return f(y);
    }
    return flatten(composed);
}

The definition of curry() is similarly dense:

shared Callable<Return,Rest> curry<Return,Argument,First,Rest>
            (Callable<Return,Tuple<Argument,First,Rest>> f)
            (First first)
        given First satisfies Argument 
        given Rest satisfies Argument[] 
                => flatten((Rest args) => unflatten(f)(Tuple(first, args)));

This function accepts a function with at least one parameter and returns a function with two parameter lists, the first parameter list has the first parameter of the original function, and the second parameter list has the rest of the parameters.

Great! Now we can write:

value printf = compose(print,format);
printf("%5.2f", 1.0); //prints 1.00
value printFloat = curry(printf)("%5.2f");
printFloat(2.5); //prints 2.50

A final word

This post has been long and quite involved. I hope some of you made it this far. This isn't typical Ceylon code we're looking at here. We're starting to move into the field of metaprogramming, which is just becoming possible in Ceylon. What I'm really trying to demonstrate here is how Ceylon layers sophisticated constructs as pure syntax sugar over a relatively simple type system. I'm trying to show that you can have things like tuple and function types without defining them primitively as part of the type system, or resorting to inelegant hacks like F1, F2, F3, Tuple1, Tuple2, Tuple3, etc. And indeed it works out better this way, I believe.

Some new screenshots

If you're interested, here's some screenshots of some of the newer stuff in Ceylon IDE.

Here's what the brand new Repository Explorer view looks like:

repo-explorer

The repository explorer helps us find the module we're looking for, in the repositories configured for a project, all the way to Ceylon Herd. But you don't even usually need to use the Repository Explorer, because the IDE will propose module named and versions when editing the module descriptor.

module-completion

The actual repositories available to you can be configured in the New Ceylon Project wizard, or in the project properties:

module-repos

From the project properties page, you can also enable compilation to JavaScript:

compiler-settings.png

We've finally resolved the performance issues we were having in the IDE when writing code that depends upon the Java SDK. Of course, doc hover, autocompletion, and hyperlink navigation works even for Java declarations:

java-interop

Assertions in Ceylon

A distinguishing characteristic of Ceylon is that exceptions aren't used to represent programming errors. Well, that statement is a little vague or even over-broad, so let me make it more concrete with several examples of exceptions that I think always indicate a programming error in Java:

  • NullPointerException
  • ClassCastException
  • IndexOutOfBoundsException

The big problem with these exceptions is that they undermine the static type system.

Sure, all exceptions work around the type system—that's basically what we have exceptions for. And when you're talking about exceptions that represent transient conditions that can legitimately occur in a production system, and you're using the exception to transmit information from one part of the system (that can fail) to a completely different part of the system (that knows what to do in the case of failure) without forcing awareness of the failure onto every bit of intermediate code that occurs between these two bits of the system, then I think that's very reasonable.

But that's not the role of the exceptions above. These exceptions:

  • should never occur at runtime in a production system
  • represent problems that must be fixed by the programmer editing code
  • tend to hide the "corner" condition they represent from someone reading the code
  • are much too low-level to carry any useful information about the real problem

Instead, Ceylon tries to encode these "corner" conditions into the type system. The compiler won't let you write:

print(process.arguments[1].uppercased);

This code isn't well-typed because process.arguments[1] is of type String?, reflecting the fact that there might not be a second element in the list process.arguments.

Instead you're forced to at least take into account the possibility that there are less than two arguments:

if (exists arg = process.arguments[1]) {
    print(arg.uppercased);
}
else {
    throw Exception("missing second argument");
}

I know some of you reading this are itching to condemn Ceylon for making you write three lines of code instead of one. But, as I find myself repeating over and over, we spend much more time reading other people's code than we spend writing our own code, and the second code example makes it much clearer to the reader that there is another case that needs to be taken into account. Furthermore the exception carries much more information about what went wrong than an IndexOutOfBoundsException would.

But what if, ask my doubters, I already know that there is more than one argument? What if my code looks like this:

if (process.arguments.size>=3) {
    if (exists arg = process.arguments[1]) {
        print(arg.uppercased);
    }
    else {
        throw Exception("missing second argument");
    }
}

In this case, it's clear that the exception can never occur. I'm forced to write code that simply never executes at runtime, under any scenario!

Well, that's a great point, and it's exactly why we've introduced the assert statement in M4. An assertion failure, unlike an exception, always represents a programming error—in production code, an assertion should never fail.

Now, sure, assertion failures are represented by a kind of exception, but unlike NullPointerException or IndexOutOfBoundsException, this exception is much more likely to carry useful information about the cause of the failure. And the assertion helps document the assumptions made by the code, for the benefit of people coming along and reading it later.

We can rewrite the example above like this:

if (process.arguments.size>=3) {
    assert(exists arg = process.arguments[1]);
    print(arg.uppercased);
}

Or even like this:

value arg = process.arguments[1];
if (process.arguments.size>=3) {
    assert(exists arg);
    print(arg.uppercased);
}

You can assert an exists, nonempty, is, or boolean condition, all the same options you have with if or while.

Object person = ... ;
assert (is Person person);
print(person.name);

Note that this is a lot like a traditional Java-style typecast, but the syntax reflects a much more disciplined approach to the problem.

You're encouraged to add some extra information to document the assertion:

value arg = process.arguments[1];
if (process.arguments.size>=3) {
    doc "second argument must be provided"
    assert(exists arg);
    print(arg.uppercased);
}

The documentation appears in the assertion failure message.

This documentation becomes especially useful if start using Extract Function to refactor this code:

void printSecondArg(String? arg) {
    doc "second argument must be provided"
    assert(exists arg);
    print(arg);
}

value arg = process.arguments[1];
if (process.arguments.size>=3) {
    printSecondArg(arg);
}

In future, the Ceylon documentation tool will automatically include assertions like this about method parameter values in a list of method preconditions.

A single assert statement may assert multiple conditions, for example:

value first = process.arguments[0];
value second = process.arguments[1];
if (process.arguments.size>=3) {
    assert(exists first, exists second);
    print(first + ", " + second);
}

For the record, by popular demand, Ceylon M4 even lets us include multiple conditions in an if or while statement:

if (exists first = process.arguments[0],
    exists second = process.arguments[1]) {
    print(first + ", " + second);
}