Team blog

Mini-status update

It's now been exactly two months since the M1 release of the Ceylon command line distribution, and more than a month since the first release of Ceylon IDE. And we've just finished our second face-to-face team meeting, in Barcelona. You might be wondering what's really been keeping us busy, so I figured it's time for a little status update.

Ceylon for the JVM and Ceylon IDE

Of course, most of our effort has been focussed on the M2 release of the command line tools and IDE. The top-priority issue for M2 was Java interop, but this turned out to be a somewhat more challenging task than we had anticipated, so there has been some shuffling of the roadmap to accommodate that. Therefore, M2 will feature five major new features:

  • Java interoperability
  • enumerated (algebraic) types
  • enhancements to the language module
  • optimization of numeric operations
  • support for remote module reposities

The IDE gets:

  • the ability to compile against binary archives, including Java .jars
  • Create Subtype and Move to New Unit wizards
  • several new quickfixes

Of course, there's many other minor bugfixes and enhancements I'm glossing over here.

We're planning to release Ceylon M2 in March.

Ceylon for JavaScript and the Ceylon Web Runner

Meanwhile, the JavaScipt compiler for Ceylon has taken off and is progressing much faster than anticipated. You can play with it online using the "teaser" pre-release of Ceylon Web Runner here:

http://try-ceylon.rhcloud.com/

(Note that the Web Runner isn't finished yet, this is just what the guys whipped over the last two weeks in between working on other stuff and eating paella.)

Since we've taken the decision to dedicate serious time and effort to this part of the project, I think we're going to need to stop calling Ceylon "a JVM-based programming language", and emphasize that it is a programming language suitable for many different virtual machines. Ceylon's minimal language module - which is being carefully designed to not be dependent on anything JVM-specific - makes the language especially adaptable to alternative platforms.

How we test Ceylon

Writing a new language is a lot about the ideas that go into the language, and then a lot about implementation.

In the case of Ceylon, a lot of effort went into thinking, discussions, negotiations, explanations, and eventually trying and documentation, in the form of a specification. Then there's work to be done on the AntLR grammar, the Abstract Syntax Tree (AST) that the parser/lexer give us, and then the typechecking phase until finally we get to the backend (backends, in our case: Java bytecode and JavaScript).

But we're not stopping at the language, we're also developing a whole SDK, with an API, tools (ceylond, ceylonc, ceylon), a module system, and an IDE.

Naturally, while we're in the process of implementing all this stuff, we spend a fair amount of time discovering new problems, new solutions, and refactoring a lot of code to make it fitter. The number of interdependent pieces in all this — and the expectation of high-quality — is such that we have to have a proper test suite for these things and, rest assured: we do. If you've ever wondered how new languages are tested, read on to see how we test the Ceylon implementation.

Type-checker and parser tests

The first tests in the tool chain concern the very early phase of compiling Ceylon: checking that good code compiles, bad code does not compile, and that the type system reasons about types the way we expect it to. We currently have 1251 such checks, quite smartly done using compiler annotations such as @error to mark an AST node where we expect an error, and @type["Foo"] to make sure the AST node is of type Foo.

Here is a brief example where we make sure we can't return a value from a void method:

void cannotReturnValue() {
    if ("Hello"=="Goodbye") {
        @error return var;
    }
}

And another example where we make sure that the type inference is correct:

@type["Sequence<String>|Sequence<Integer>"] value ut = f({ "aaa" },{ 1 });

The tests will walk the AST and when they see one of these annotations, they check that an error is reported on the node, or that the node type is as we expected.

Java bytecode generation tests

In the Java bytecode compiler backend, we generate Java bytecode, but as previously described we are plugged in the javac compiler, and we feed it (pseudo-) Java AST. This turns out to be again very useful since it's much easier to compare Java source code than byte code (for humans). So we have tests that try to compile a small Ceylon file, typically one per feature, and then we compare the Java code generated with the Java code we expect. For example, to make sure we can create a class in Ceylon, we write this test:

class Klass() {}

And we see if it generates the following Java code:

package com.redhat.ceylon.compiler.java.test.structure.klass;

class Klass {
    Klass() {
    }
}

We currently have around 250 tests like these. This might seem a small number, but it's the number of files we're comparing, so for example when we test numeric operation optimisation, we have one test with 350 lines of Ceylon numeric operator tests to be tested. We have 100% coverage of the each feature promised on the roadmap.

Of course, occasionally we have bugs for things that are corner cases, so for each bug reported we have one such test as well.

Model loader tests

The compiler is not only generating bytecode, though. It's also loading bytecode. By the truck-load. Since it's an incremental compiler, it needs to be able to load compiled Ceylon bytecode and map it into a model that the type-checker can work with. The piece that loads bytecode (using three different reflection libaries: Javac, Java Reflection and Eclipse JDT) is called the model loader.

Those get their own tests. We write two files. The first includes code we will compile:

shared class Klass() {
}

And the second will reference declarations from the first file:

Klass klass = Klass();

But because we compile the second one on its own, it will load the model for the first file from the compiled bytecode (incremental compilation). The test will then compare the model representation for Klass that we got during the first compilation (when we were compiling Klass) with the model representation we loaded for Klass during the second compilation. The first model will come directly from the typechecker after parsing the AST, while the second will come from the model loader. By walking both recursively and checking that they are completely equivalent, we ensure that we produce the right model when loading it from bytecode.

We have currently 11 such tests, but once again, do not be fooled by the low number of tests here, there's only a certain number of things we can test here: class, interface, method, attribute declarations and their signatures. We're not testing things like statements or expressions, only signatures of declarations. And for each declaration we have recursive tests that check every property of the model object (of which there are many).

Incidentally, we don't just load Ceylon bytecode, we also load Java bytecode, for interoperability with Java classes, and since the ceylon.language module is currently hand-written in Java, none of the Ceylon code could be compiled if the model loader wasn't able to load a model from its bytecode. So it gets a lot of testing everywhere.

Interoperability tests

As I mentioned previously, Ceylon is fully interoperable with Java, so we have tests that make sure that we can import Java modules, packages, types, call their methods, read their fields, implement Java interfaces, etc…

We currently have 10 tests for this, each including all the variations on a theme (static methods, fields, constructors…).

Recovery tests

We have a few tests to make sure the compiler backend doesn't crash on unparseable input, or on Ceylon code that is improperly typed.

Module system

We have 34 tests that make sure that the compiler produces .car files, source archives and MD5 checksums in the right place, that it can save and find them back locally, or via HTTP, that we can load .jar modules, and that we they contain what is expected. We also test that we can resolve dependency graphs, cache HTTP files and check the MD5 checksums.

Runtime behaviour tests

However beautiful the strategy, you should occasionally look at the results — Sir Winston Churchill

We have about 20 tests that make sure that we can invoke the Ceylon compiler, on any number of files, including incrementally. We test that we can run Ceylon programs. We also test that the runtime behaviour of statements is correct (it's not enough to check that the for loop is compiled to a certain Java bytecode, we want to make sure it runs as we think it should).

Tool tests

We have a few tests that check that ceylond (our documentation generator) is able to produce documentation, and that it produces it correctly, while using the model loader (using Java Reflection, unlike the compiler which uses Javac Reflection).

We have a few manual tests (we need to automate that) to check that our ant tasks run as expected.

ceylon.language tests

Last but not least, we have 633 runtime tests written in Ceylon that check that the ceylon.language API behaves as expected, which is the ultimate test, since it effectively requires all the pieces previously described to work in order to do anything.

We even have one test which attemps to compile the Ceylon reference implementation of ceylon.language, which the typechecker handles, but the backend doesn't yet. Once that one passes, we'll be ready for Ceylon 1.0.

About speed

I've recently had a discussion about the speed of test execution with the Ceylon team, and was shocked to discover that while I was complaining that the Java backend compiler was taking 40 seconds to run, some of my colleagues had to wait more than 2 minutes for the same tests!

We're now looking at running some of those tests in parallel to speed things up on multi-core systems, but unfortunately it doesn't look like we'll be able to run them in parallel using Eclipse. We should be able to do it using Ant, though.

On breaking tests

We break tests all the time. Most of the time when we fix a bug or add a feature we break a given number of tests. Sometimes we fix a bigger bug and break most of the tests. This is great: we can find the cause of the breakage straight away thanks to all these tests. Sometimes though, it takes a little bit of work to fix the tests we broke :)

Conclusion

We've an awful lot of tests, and we test (almost) everything. Those that are missing will get automated as soon as possible so we can forget about them, because that's what tests are for: when we fix a bug or add a feature, we know it works, and we know we didn't break anything. This is both priceless and fundamental for a new language.

Note

Since this post was originally written:

  • ceylonc has become ceylon compile.
  • ceylond has become ceylon doc.

Generics and enumerated types

In the previous post I discussed Ceylon's support for enumerated types. Looking back over the post, I notice that I forgot to mention anything about generics! Let's remedy that glaring omission!

Parameterized enumerated types

An enumerated type may be parameterized:

interface Container<Item>
    of None<Item> | One<Item> | Many<Item> { ... }

Ceylon currently requires that each case of the enumerated type completely "covers" all possible arguments to the type parameter of the enumerated type itself. So we could not write:

//bad!
interface One<Item> 
    satisfies Container<Item>
    given Item satisfies Object { ... }

By adding a new constraint on Item that is not present in Container, we've created a situation where for some values of Item, Container<Item> is well-defined, but One<Item> is an illegal type. This is currently considered an error.

One thing we are allowed to write, however, is the following:

interface Container<out Item>
    of None | One<Item> | Many<Item> { ... }

interface None 
    satisfies Container<Bottom> { ... }

Since Container<Bottom> is a subtype of Container<Item> for any value of Item, this is considered well-typed.

Removing this restriction

As you've probably guessed, this restriction is an inconvenient one. There are some very useful enumerated types where the list of cases of the enumeration depend upon the value of the type argument. It's so useful that there is a term for this construct in the literature: a generalized algebraic type, or GADT.

In the language design FAQ, I already wrote up a mini-explanation of GADT support with an example, so I won't repeat myself here. A future version of Ceylon will likely support GADTs.

Type parameters with enumerated constraints

In the previous post, I started with an example where a union type is used to emulate overloading. Alert readers might have noticed that the only reason this worked was that the "overloaded" type only appeared once in the method signature. We had something like this:

void add(Integer|Float x) { ... }

What if the overloaded type appears more than once? Well, to handle this sort of thing, we would need to introduce a type parameter:

Num increment<Num>(Num x) 
        given Num of Integer|Float {
    ...
}

The syntax given Num of Integer|Float defines a type parameter with an enumerated type constraint. The unknown type must be one of a list of types.

This looks like a very nice feature, but the truth is that Ceylon is still missing one extra thing in order for it to be truly useful. The problem is that the following code is not well-typed:

Num increment<Num>(Num x) 
        given Num of Integer|Float {
    switch (x)
    case (is Integer) { return x+1; } //error: Integer not assignable to Num
    case (is Float) { return x+1.0; } //error: Float not assignable to Num
}

The if (is ...) and case (is ...) constructs let us narrow the type of a single value. What we need here is the ability to narrow a type parameter itself, "pinning" the value of the type parameter within the body of the method, letting us write:

Num increment<Num>(Num x) 
        given Num of Integer|Float {
    switch (Num)
    case (Integer) { return x+1; }
    case (Float) { return x+1.0; }
}

Unfortunately, this functionality depends upon having fully reified type parameters, so it will have to wait until M5, later this year.

I think this is ultimately quite a powerful feature. It looks like overloading when we look at simple examples like this parameterized method, but it's actually a bit more than that. Notice that a whole type could have a parameter with an enumerated type constraint! With upper bound type constraints, generics let us abstract a parameterized declaration over all types which share a common supertype. Enumerated type constraints help us abstract over types which don't have a common ancestor, and which weren't designed to be treated polymorphically.

Enumerated types in Ceylon

We're just wrapping up the implementation of Ceylon's enumerated types (algebraic types, if you prefer that term). This feature is turning out significantly nicer than what was originally defined in the spec.

Unions and switch

The simplest kind of enumerated type is simply a union of classes. For example, the type Integer|Float is a very simple enumerated type. Since Float and Integer are distinct classes and neither inherits from the other, the compiler reasons that the types are disjoint, meaning they have no common instances. Therefore, it lets us write the following:

void add(Integer|Float x) {
    switch (x)
    case (is Integer x) { integers.add(x); }
    case (is Float x) { floats.add(x); }
}

Notice that inside the block that follows a case, the type of the switched variable is narrowed to the case type. We know that x is an Integer in the first case, and a Float in the second, so the compiler lets us make use of this additional information.

There are two basic things to appreciate about Ceylon's switch construct:

  1. The cases must be disjoint. The behavior never depends upon the order in which the cases appear.
  2. If the cases don't completely cover all possible values of the switch expression type, we need to include an else clause.

A switch statement which covers all possible values of the switched type is said to have an exhaustive list of cases.

The purpose of the required else clause is to syntactically distinguish switch statements which don't need to be fixed and recompiled when a new type is added.

void add(Integer|Float|String x) {
    switch (x)
    case (is Integer x) { integers.add(x); }
    case (is Float x) { floats.add(x); }
    else {}
}

Enumerated classes

We can extend this idea for an interface or abstract class. Ceylon lets us declare that an abstract class or interface is equivalent to a union using the of clause. For example, the root class of the type hierarchy, Void, is declared like this:

shared abstract class Void() of Nothing | Object {}

This says that the class Void has exactly the same instances as the union type Nothing|Object. So when we encounter a value of type Void, we know it must either be an instance of Object, or an instance of Nothing. We can write:

void printVoid(Void v) {
    switch (v)
    case (is Nothing) { print("nothing"); }
    case (is Object) { print(v); }
}

The compiler won't let us define a third subclass of Void that doesn't extend either Nothing or Object. We call the types mentioned in the of clause the cases of an enumerated type.

Since an object declaration is also a type, we can even include the names of objects in the of clause. For example, this is how the language module expressed the fact that the only instance of Nothing is the null value:

shared abstract class Nothing() of null {}
shared object null extends Nothing() {}

More commonly, there will be multiple values, allowing us to recapture the semantics of a Java enum, without introducing a special language construct. For example, Ceylon's Boolean is an enumerated type with two values:

shared abstract class Boolean() of true | false {}

shared object true extends Boolean() {
    shared actual String string { return "true"; }
}

shared object false extends Boolean() {
    shared actual String string { return "false"; }
}

Therefore, we can switch on a Boolean value:

void printBoolean(Boolean bool) {
    switch (bool)
    case (true) { print("true"); }
    case (false) { print("false"); }
}

Note that the classic example of a Java enumerated type works out significantly more verbose in Ceylon:

shared abstract class Suit() 
        of hearts|diamonds|clubs|spades {}
shared object hearts extends Suit() {} 
shared object diamonds extends Suit() {} 
shared object clubs extends Suit() {} 
shared object spades extends Suit() {} 

I think that's OK. The tradeoff is that enumerated types in Ceylon are simply much more powerful and flexible.

Enumerated interfaces

Enumerated interfaces are a little more special. Consider:

interface Association 
    of OneToOne | OneToMany | ManyToOne | ManyToMany { ... }

Interfaces support multiple inheritance, so in general interface types are not disjoint. But when an interface declares an enumerated list of subinterfaces, the compiler enforces that those interfaces be disjoint by rejecting any attempt to define a concrete class or object that is a subtype of more than one of them. This code is not well-typed:

class BrokenAssociation() 
    satisfies OneToOne & OneToMany {} //error: inherits multiple cases of enumerated type

Note that the types named in the of clause don't need to be direct subtypes of the enumerated type. The following code is well-typed:

interface ToOne satisfies Association {}
class OneToOne() satisfies ToOne {}
class ManyToOne() satisfies ToOne {}

The interface ToOne is a subtype of the enumerated type Association but not of any of its cases. That's acceptable, because ToOne is an abstract type that can't be directly instantiated. Of course, every one of its concrete subtypes must be a subtype of one of the enumerated cases of Association.

Handling multiple cases at once

A case of a switch can handle more than one type at once. For example, ToOne could be a case type:

Association assoc = ... ;
switch (assoc)
case (is ToOne) { ... }
else { ... }

Even better, we can form unions of the cases of the enumerated type:

Association assoc = ... ;
switch (assoc)
case (is OneToOne|ManyToOne) { ... }
case (is OneToMany|ManyToMany) { ... }

For object cases, the syntax is slightly different:

switch (suit)
case (hearts, diamonds) { return red; }
case (clubs, spades) { return black; }

How the compiler reasons about enumerated types

Now let's see something cool. When the compiler encounters an enumerated type in a switch statement, it:

  1. first expands the type to a union of its cases. And it does this recursively to any of the cases which are themselves enumerated types, and then
  2. to determine if a switch statement has an exhaustive list of case branches, takes the union of each of the case types and determines if it is a supertype of the union type produced in step 1.

Thus, the following switch is well-typed:

void printVoid(Void v) {
    switch (v)
    case (nothing) { print("nothing"); }
    case (is Object) { print(v); }
}

Likewise, remembering that Boolean? means Nothing|Boolean, we can write:

void printBoolean(Boolean? bool) {
    switch (bool)
    case (true) { print("true"); }
    case (false) { print("false"); }
    case (nothing) { print("nothing"); }
}

But what about disjointness of the case types? How is that determined? Well, first of all, the compiler is able to reason that the intersection of two types is empty if they inherit from different cases of an enumerated type. So whenever the compiler encounters a type like Nothing&Boolean, it automatically simplifies the type to Bottom.

So the disjointness restriction on cases of a switch boils down to checking that for each pair of cases, the intersection of the case types is Bottom.

A sort of manifesto

The latest generation of statically typed languages combine subtyping, type parameterization, and varying levels of support for type inference. But their type systems have - reflecting their development over time - grown so complex and riddled with corner cases and unintuitive behavior that few programmers are able to understand the reasoning of the compiler all the time. Error messages are confusing, referring to concepts that only language theorists care about, or to types which can't be written down in the language itself. The type systems are so complex that they even include corner cases which are mathematically undecidable, resulting in the possibility of compile-time stack overflows or non-termination.

For example, Java features sophisticated support for covariance and contravariance via wildcard type arguments. But most Java developers avoid making use of this extremely powerful feature of the type system, viewing it as a source of confusing error messages and syntactic ugliness.

Other languages introduce even further complexity to this mix, for example, implicit type conversions, which result in unintuitive behavior, especially when combined with generic types and covariance, or nonlocal type inference, which can result in frustratingly slow compilation times.

But the desire to combine the best of the ML family of languages with the best of the C family of languages is not, in itself, a misplaced goal. A type system featuring subtyping, type parameterization, simplified covariance and contravariance, limited type inference, and algebraic types could be a powerful tool in the hands of developers, if it:

  • could be defined in terms that reflect how programmers intuitively reason about types,
  • was packaged up in a syntax that is easy to read and understand, and
  • constrained so as to eliminate performance problems, undecidability, and confusing corner cases.

A language like this would necessarily be simpler than some other recent statically typed languages, but ultimately more powerful, if its features were more accessible to developers.

Ceylon is a language with several goals, including readability, modularity, and excellent tool support. In terms of its type system, a core goal is to provide powerful mechanisms of abstraction without scaring away the people who will ultimately benefit from this power with confusing error messages and unintuitive collisions between intersecting language features.

In pursuit of that goal, we've had to start by taking some things away. When you start using Ceylon, you'll find there are some things you can do in Java that you can't do in Ceylon. Our hope is that, as you get further into the language, you'll find yourself naturally writing better code, making use of more general-purpose abstractions and stronger type safety, without even really needing to try. It will just feel like the most natural way to do it.

At least, that's the theory. ;-)