Blog of Gavin King

New syntax for attribute initialization

I think perhaps the least ergonomic syntactic feature of Ceylon has been the syntax for attribute initialization. I never ever managed to get comfortable writing the following:

//old syntax!
class Person(String name, Integer age=0) {
    shared String name = name;
    shared variable Integer age:=age;
}

Now, I hate the verbosity of C++/Java-style constructors, but one thing they definitely have going for them is that they give you distinct namespaces for class members and constructor parameters. Ceylon's more compact syntax doesn't give you that, so we had to come up with the ad hoc rule that parameter names hide attribute names, since we didn't want people to be forced to always name parameters like initialName, initialAge, etc.

Worse, this ad hoc rule results in some fairly pathological stuff. Consider:

//old syntax!
class Person(String name, Integer age=0) {
    shared String name = name;
    shared variable Integer age:=age;
    this.age:=1;
    print(age); //would print 0 
}

So we were forced to add some extra smelly rules to let the compiler detect and prevent things like this.

Well, this solution has finally worn out its welcome. We've decided to take a different approach in M3. (Indeed, I just finished updating the language spec and type checker.)

Now, the above can be written like this:

class Person(name, age=0) {
    shared String name;
    shared variable Integer age;
}

That is, a parameter declaration can now simply be a reference to an attribute, in which case its type is inferred from the type of the attribute. The parameter itself is never addressable in the body of the class, but it's argument is automatically used to initialize the attribute. That is, there's only one unambiguous thing called age:

class Person(name, age=0) {
    shared String name;
    shared variable Integer age;
    age:=1;
    print(age); //prints 1 
}

The really nice thing about this solution is that it gives us an extremely terse and well-organized syntax for "data holder" classes.

class Address(street, city, state, zip, country) {

    doc "the street, street number, 
         and unit number"
    shared String street;

    doc "the city or municipality"
    shared String city;

    doc "the state"
    shared String state;

    doc "the zip or post code"
    shared String zip;

    doc "the country code"
    shared String country;

}

I think it would be hard to come up with a more readable syntax than this!

UPDATE: It's interesting to compare this solution to the syntax used by Dart, which is along the same lines. I swear I didn't copy!

Mini-status update

It's now been exactly two months since the M1 release of the Ceylon command line distribution, and more than a month since the first release of Ceylon IDE. And we've just finished our second face-to-face team meeting, in Barcelona. You might be wondering what's really been keeping us busy, so I figured it's time for a little status update.

Ceylon for the JVM and Ceylon IDE

Of course, most of our effort has been focussed on the M2 release of the command line tools and IDE. The top-priority issue for M2 was Java interop, but this turned out to be a somewhat more challenging task than we had anticipated, so there has been some shuffling of the roadmap to accommodate that. Therefore, M2 will feature five major new features:

  • Java interoperability
  • enumerated (algebraic) types
  • enhancements to the language module
  • optimization of numeric operations
  • support for remote module reposities

The IDE gets:

  • the ability to compile against binary archives, including Java .jars
  • Create Subtype and Move to New Unit wizards
  • several new quickfixes

Of course, there's many other minor bugfixes and enhancements I'm glossing over here.

We're planning to release Ceylon M2 in March.

Ceylon for JavaScript and the Ceylon Web Runner

Meanwhile, the JavaScipt compiler for Ceylon has taken off and is progressing much faster than anticipated. You can play with it online using the "teaser" pre-release of Ceylon Web Runner here:

http://try-ceylon.rhcloud.com/

(Note that the Web Runner isn't finished yet, this is just what the guys whipped over the last two weeks in between working on other stuff and eating paella.)

Since we've taken the decision to dedicate serious time and effort to this part of the project, I think we're going to need to stop calling Ceylon "a JVM-based programming language", and emphasize that it is a programming language suitable for many different virtual machines. Ceylon's minimal language module - which is being carefully designed to not be dependent on anything JVM-specific - makes the language especially adaptable to alternative platforms.

Generics and enumerated types

In the previous post I discussed Ceylon's support for enumerated types. Looking back over the post, I notice that I forgot to mention anything about generics! Let's remedy that glaring omission!

Parameterized enumerated types

An enumerated type may be parameterized:

interface Container<Item>
    of None<Item> | One<Item> | Many<Item> { ... }

Ceylon currently requires that each case of the enumerated type completely "covers" all possible arguments to the type parameter of the enumerated type itself. So we could not write:

//bad!
interface One<Item> 
    satisfies Container<Item>
    given Item satisfies Object { ... }

By adding a new constraint on Item that is not present in Container, we've created a situation where for some values of Item, Container<Item> is well-defined, but One<Item> is an illegal type. This is currently considered an error.

One thing we are allowed to write, however, is the following:

interface Container<out Item>
    of None | One<Item> | Many<Item> { ... }

interface None 
    satisfies Container<Bottom> { ... }

Since Container<Bottom> is a subtype of Container<Item> for any value of Item, this is considered well-typed.

Removing this restriction

As you've probably guessed, this restriction is an inconvenient one. There are some very useful enumerated types where the list of cases of the enumeration depend upon the value of the type argument. It's so useful that there is a term for this construct in the literature: a generalized algebraic type, or GADT.

In the language design FAQ, I already wrote up a mini-explanation of GADT support with an example, so I won't repeat myself here. A future version of Ceylon will likely support GADTs.

Type parameters with enumerated constraints

In the previous post, I started with an example where a union type is used to emulate overloading. Alert readers might have noticed that the only reason this worked was that the "overloaded" type only appeared once in the method signature. We had something like this:

void add(Integer|Float x) { ... }

What if the overloaded type appears more than once? Well, to handle this sort of thing, we would need to introduce a type parameter:

Num increment<Num>(Num x) 
        given Num of Integer|Float {
    ...
}

The syntax given Num of Integer|Float defines a type parameter with an enumerated type constraint. The unknown type must be one of a list of types.

This looks like a very nice feature, but the truth is that Ceylon is still missing one extra thing in order for it to be truly useful. The problem is that the following code is not well-typed:

Num increment<Num>(Num x) 
        given Num of Integer|Float {
    switch (x)
    case (is Integer) { return x+1; } //error: Integer not assignable to Num
    case (is Float) { return x+1.0; } //error: Float not assignable to Num
}

The if (is ...) and case (is ...) constructs let us narrow the type of a single value. What we need here is the ability to narrow a type parameter itself, "pinning" the value of the type parameter within the body of the method, letting us write:

Num increment<Num>(Num x) 
        given Num of Integer|Float {
    switch (Num)
    case (Integer) { return x+1; }
    case (Float) { return x+1.0; }
}

Unfortunately, this functionality depends upon having fully reified type parameters, so it will have to wait until M5, later this year.

I think this is ultimately quite a powerful feature. It looks like overloading when we look at simple examples like this parameterized method, but it's actually a bit more than that. Notice that a whole type could have a parameter with an enumerated type constraint! With upper bound type constraints, generics let us abstract a parameterized declaration over all types which share a common supertype. Enumerated type constraints help us abstract over types which don't have a common ancestor, and which weren't designed to be treated polymorphically.

Enumerated types in Ceylon

We're just wrapping up the implementation of Ceylon's enumerated types (algebraic types, if you prefer that term). This feature is turning out significantly nicer than what was originally defined in the spec.

Unions and switch

The simplest kind of enumerated type is simply a union of classes. For example, the type Integer|Float is a very simple enumerated type. Since Float and Integer are distinct classes and neither inherits from the other, the compiler reasons that the types are disjoint, meaning they have no common instances. Therefore, it lets us write the following:

void add(Integer|Float x) {
    switch (x)
    case (is Integer x) { integers.add(x); }
    case (is Float x) { floats.add(x); }
}

Notice that inside the block that follows a case, the type of the switched variable is narrowed to the case type. We know that x is an Integer in the first case, and a Float in the second, so the compiler lets us make use of this additional information.

There are two basic things to appreciate about Ceylon's switch construct:

  1. The cases must be disjoint. The behavior never depends upon the order in which the cases appear.
  2. If the cases don't completely cover all possible values of the switch expression type, we need to include an else clause.

A switch statement which covers all possible values of the switched type is said to have an exhaustive list of cases.

The purpose of the required else clause is to syntactically distinguish switch statements which don't need to be fixed and recompiled when a new type is added.

void add(Integer|Float|String x) {
    switch (x)
    case (is Integer x) { integers.add(x); }
    case (is Float x) { floats.add(x); }
    else {}
}

Enumerated classes

We can extend this idea for an interface or abstract class. Ceylon lets us declare that an abstract class or interface is equivalent to a union using the of clause. For example, the root class of the type hierarchy, Void, is declared like this:

shared abstract class Void() of Nothing | Object {}

This says that the class Void has exactly the same instances as the union type Nothing|Object. So when we encounter a value of type Void, we know it must either be an instance of Object, or an instance of Nothing. We can write:

void printVoid(Void v) {
    switch (v)
    case (is Nothing) { print("nothing"); }
    case (is Object) { print(v); }
}

The compiler won't let us define a third subclass of Void that doesn't extend either Nothing or Object. We call the types mentioned in the of clause the cases of an enumerated type.

Since an object declaration is also a type, we can even include the names of objects in the of clause. For example, this is how the language module expressed the fact that the only instance of Nothing is the null value:

shared abstract class Nothing() of null {}
shared object null extends Nothing() {}

More commonly, there will be multiple values, allowing us to recapture the semantics of a Java enum, without introducing a special language construct. For example, Ceylon's Boolean is an enumerated type with two values:

shared abstract class Boolean() of true | false {}

shared object true extends Boolean() {
    shared actual String string { return "true"; }
}

shared object false extends Boolean() {
    shared actual String string { return "false"; }
}

Therefore, we can switch on a Boolean value:

void printBoolean(Boolean bool) {
    switch (bool)
    case (true) { print("true"); }
    case (false) { print("false"); }
}

Note that the classic example of a Java enumerated type works out significantly more verbose in Ceylon:

shared abstract class Suit() 
        of hearts|diamonds|clubs|spades {}
shared object hearts extends Suit() {} 
shared object diamonds extends Suit() {} 
shared object clubs extends Suit() {} 
shared object spades extends Suit() {} 

I think that's OK. The tradeoff is that enumerated types in Ceylon are simply much more powerful and flexible.

Enumerated interfaces

Enumerated interfaces are a little more special. Consider:

interface Association 
    of OneToOne | OneToMany | ManyToOne | ManyToMany { ... }

Interfaces support multiple inheritance, so in general interface types are not disjoint. But when an interface declares an enumerated list of subinterfaces, the compiler enforces that those interfaces be disjoint by rejecting any attempt to define a concrete class or object that is a subtype of more than one of them. This code is not well-typed:

class BrokenAssociation() 
    satisfies OneToOne & OneToMany {} //error: inherits multiple cases of enumerated type

Note that the types named in the of clause don't need to be direct subtypes of the enumerated type. The following code is well-typed:

interface ToOne satisfies Association {}
class OneToOne() satisfies ToOne {}
class ManyToOne() satisfies ToOne {}

The interface ToOne is a subtype of the enumerated type Association but not of any of its cases. That's acceptable, because ToOne is an abstract type that can't be directly instantiated. Of course, every one of its concrete subtypes must be a subtype of one of the enumerated cases of Association.

Handling multiple cases at once

A case of a switch can handle more than one type at once. For example, ToOne could be a case type:

Association assoc = ... ;
switch (assoc)
case (is ToOne) { ... }
else { ... }

Even better, we can form unions of the cases of the enumerated type:

Association assoc = ... ;
switch (assoc)
case (is OneToOne|ManyToOne) { ... }
case (is OneToMany|ManyToMany) { ... }

For object cases, the syntax is slightly different:

switch (suit)
case (hearts, diamonds) { return red; }
case (clubs, spades) { return black; }

How the compiler reasons about enumerated types

Now let's see something cool. When the compiler encounters an enumerated type in a switch statement, it:

  1. first expands the type to a union of its cases. And it does this recursively to any of the cases which are themselves enumerated types, and then
  2. to determine if a switch statement has an exhaustive list of case branches, takes the union of each of the case types and determines if it is a supertype of the union type produced in step 1.

Thus, the following switch is well-typed:

void printVoid(Void v) {
    switch (v)
    case (nothing) { print("nothing"); }
    case (is Object) { print(v); }
}

Likewise, remembering that Boolean? means Nothing|Boolean, we can write:

void printBoolean(Boolean? bool) {
    switch (bool)
    case (true) { print("true"); }
    case (false) { print("false"); }
    case (nothing) { print("nothing"); }
}

But what about disjointness of the case types? How is that determined? Well, first of all, the compiler is able to reason that the intersection of two types is empty if they inherit from different cases of an enumerated type. So whenever the compiler encounters a type like Nothing&Boolean, it automatically simplifies the type to Bottom.

So the disjointness restriction on cases of a switch boils down to checking that for each pair of cases, the intersection of the case types is Bottom.

A sort of manifesto

The latest generation of statically typed languages combine subtyping, type parameterization, and varying levels of support for type inference. But their type systems have - reflecting their development over time - grown so complex and riddled with corner cases and unintuitive behavior that few programmers are able to understand the reasoning of the compiler all the time. Error messages are confusing, referring to concepts that only language theorists care about, or to types which can't be written down in the language itself. The type systems are so complex that they even include corner cases which are mathematically undecidable, resulting in the possibility of compile-time stack overflows or non-termination.

For example, Java features sophisticated support for covariance and contravariance via wildcard type arguments. But most Java developers avoid making use of this extremely powerful feature of the type system, viewing it as a source of confusing error messages and syntactic ugliness.

Other languages introduce even further complexity to this mix, for example, implicit type conversions, which result in unintuitive behavior, especially when combined with generic types and covariance, or nonlocal type inference, which can result in frustratingly slow compilation times.

But the desire to combine the best of the ML family of languages with the best of the C family of languages is not, in itself, a misplaced goal. A type system featuring subtyping, type parameterization, simplified covariance and contravariance, limited type inference, and algebraic types could be a powerful tool in the hands of developers, if it:

  • could be defined in terms that reflect how programmers intuitively reason about types,
  • was packaged up in a syntax that is easy to read and understand, and
  • constrained so as to eliminate performance problems, undecidability, and confusing corner cases.

A language like this would necessarily be simpler than some other recent statically typed languages, but ultimately more powerful, if its features were more accessible to developers.

Ceylon is a language with several goals, including readability, modularity, and excellent tool support. In terms of its type system, a core goal is to provide powerful mechanisms of abstraction without scaring away the people who will ultimately benefit from this power with confusing error messages and unintuitive collisions between intersecting language features.

In pursuit of that goal, we've had to start by taking some things away. When you start using Ceylon, you'll find there are some things you can do in Java that you can't do in Ceylon. Our hope is that, as you get further into the language, you'll find yourself naturally writing better code, making use of more general-purpose abstractions and stronger type safety, without even really needing to try. It will just feel like the most natural way to do it.

At least, that's the theory. ;-)