Blog of Gavin King

Updated language design FAQ

I just added a bunch of new material to the language design FAQ. This document is necessarily opinionated, since by nature it represents our reasons for designing this language the way we did, but I've tried to keep it as objective as possible. Of course, aesthetic considerations are always also important in the design of any language, library, or framework.

I've added several new sections about the type system, including material on:

Please help me out here: what's missing from this document? What questions do you have about Ceylon aren't answered here? Which responses don't make sense?

Principles that guide this project

I've been trying to figure out how to explain what it is we're trying to achieve in the Ceylon language, and where Ceylon has a somewhat different emphasis to other similar languages. The following aren't language features as such, rather, they're principles that will guide the design of the whole platform. They're what we think are important in a language meant for writing large programs in teams.

Readability

We spend much more time reading other people's code than writing our own, so readability is the most important thing in a language meant for teams. A programming language is partly for communication between humans. Verbosity can sometimes contribute to readability, and it can sometimes harm it. Neither verbosity nor brevity is a goal in and of itself. What matters is striking a balance that makes difficult code understandable. The value of readability increases as the size of a team increases.

Predictability

The programmer should be able to reproduce the reasoning of the compiler according to intuitive rules. Errors produced by the compiler should be understandable to the programmer, and they should ideally identify the true cause of the problem. Therefore, given the complexity of today's modern statically-typed languages, it's important to fight really hard to keep the number of special cases, corner cases, and interactions between intersecting features under control. Too many corner causes result in more bugs and, just as bad or worse, more "perceived" bugs, from the perspective of the user. Predictability sometimes means sacrificing potentially useful features that exhibit unintuitive behavior in the context of the rest of the language.

Toolability

Most programmers are much more productive with the help of an IDE, especially when working with libraries or frameworks. And so anything that makes code more difficult for tools to understand (dynamic typing, XML, template technologies that mix multiple languages in a single file, etc) makes us dramatically less productive. A statically typed language is a language designed for tools. (From this point of view, a command-line compiler is just an IDE with a very primitive user interface.) The value of tooling increases as the size of a program and its dependencies increases.

Modularity

If modularity isn't built into the language and its toolset, you wind up with a mess of competing module systems that don't work together properly and that naturally grow into overcomplex monsters because they can't depend upon a simple uniform model. Modularity is also a critical concern for client-side computing, where a bloated SDK has a direct negative impact on usability. The value of modularity increases as a software ecosystem grows.

Metaprogrammability

Metaprogramming is the ability to write code that operates upon other code. It's the foundation of sophisticated frameworks and libraries. Undisciplined metaprogramming can be harmful, so ideally a language would expose facilities for metaprogramming in a disciplined, typesafe way.

Obviously, these aren't the only criteria we take into account when we design functionality for Ceylon, but they are things we're trying to especially emphasize. Of course, there's a further overriding value that's more important than any of these: Ceylon should be a language that makes it easy to get real work done.

Cross-platform reified types for Ceylon

According to the Ceylon language specification, I'm allowed to write code like this on Ceylon:

Object obj = HashSet<String&Number>();
if (is Set<String|Integer> obj) {
    for (x in obj) {
        //x is a String|Integer here
        ...
    }
}

(If you're wondering, yes, the is condition is always satisfied in this code, because HashSet is a subtype of Set, which is covariant in its type parameter, and String&Number is a subtype of String which is a subtype of String|Integer.)

Now, this code doesn't currently pass the Ceylon typechecker, because neither of the existing backends (ceylon-compiler for the JVM, and ceylon-js for JavaScript) support reification of type arguments. Indeed, since JavaScript doesn't really have types at all, even simpler things like if (is String obj) don't yet work.

So, in order to support the functionality defined by the language specification, two things are needed:

  1. the compilers need to automatically create and pass a metamodel object that completely reifies the type of an object to each instantiation, and
  2. the runtime needs to be able to reason about the assignability of generic and union/intersection types.

So we need some kind of runtime representation of the metamodel and assignability algorithm that the typechecker uses at compile time. Actually, this will eventually evolve into Ceylon's typesafe metamodel API.

Unfortunately we can't easily just reuse the code we already have in the typechecker, because:

  • it's deeply interdependent with a lot of other typechecker functionality, and
  • it's implemented in Java, and therefore can't run inside a JavaScript VM.

Obviously, we need to trim down this code, and rewrite it in Ceylon, where it can be easily compiled to either Java classes or JavaScript code, or to whatever other runtime Ceylon eventually supports.

I had originally imagined that this stuff would be part of the language module. But now I've started seriously taking into account the web browser as a platform for Ceylon, I realize that we need to keep the language module really tiny. So this will be a separate module.

That sounds to me like a super-fun and interesting project for someone to take on. Determining assignability in Ceylon is a very interesting problem, with things like covariance, contravariance, intersections and unions to take into account. Indeed there some very interesting identities involving variance, unions/intersections, and the single instantiation inheritance property. I already have an existing algorithm in the typechecker, of course, which works mainly via canonicalization of principal types, but Ross has suggested that he has a better algorithm that avoids the need for canonicalization, which sounds like it might work better at runtime.

Any takers?

Prototypes vs lexical scope in the Ceylon JavaScript compiler

In the previous post I introduced the Ceylon JavaScript compiler project. One of the things I mentioned was that there was an extremely natural mapping from Ceylon to JavaScript making use of JavaScript's lexical scope to create nicely encapsulated JavaScript objects. For example, given the following Ceylon code:

shared class Counter(Integer initialCount=0) {
    variable value currentCount:=initialCount;
    shared Integer count {
        return currentCount;
    }
    shared void inc() {
        currentCount:=currentCount+1; 
    }
}

We produce the following JavaScript:

var $$$cl15=require('ceylon/language/0.1/ceylon.language');

//class Counter at members.ceylon (1:0-9:0)
this.Counter=function Counter(initialCount){
    var $$counter=new CeylonObject;

    //value currentCount at members.ceylon (2:4-2:45)
    var $currentCount=initialCount;
    function getCurrentCount(){
        return $currentCount;
    }
    function setCurrentCount(currentCount){
        $currentCount=currentCount;
    }

    //value count at members.ceylon (3:4-5:4)
    function getCount(){
        return getCurrentCount();
    }
    $$counter.getCount=getCount;

    //function inc at members.ceylon (6:4-8:4)
    function inc(){
        setCurrentCount(getCurrentCount().plus($$$cl15.Integer(1)));
    }
    $$counter.inc=inc;

    return $$counter;
}

Notice that this code is really quite readable and really not very different to the original Ceylon.

Let's load this module up in the node REPL, and play with the Counter.

> Counter = require('./node_modules/default/members').Counter
[Function: Counter]
> Integer = require('./runtime/ceylon/language/0.1/ceylon.language').Integer
[Function: Integer]
> c = Counter(Integer(0))
{ getCount: [Function: getCount], inc: [Function: inc] }

The Counter instance presents a nice clean API with getCount() and inc() functions:

> c.getCount().value
0
> c.inc()
> c.inc()
> c.getCount().value
2

Notice that the actual value of $$counter is completely hidden from the client JavaScript code. Another nice thing about this mapping is that it is completely free of JavaScript's well-known broken this. I can freely use the methods of c by reference:

> inc = c.inc
[Function: inc]
> count = c.getCount
[Function: getCount]
> inc()
> count().value
3

Now, an issue that was bugging me about this mapping - and bugging Ivo even more - is the performance cost of this mapping compared to statically binding the methods of a class to its prototype. Ivo did some tests and found that it's up to like 100 times slower to instantiate an object that defines its methods in lexical scope instead of using its prototype on V8. Well, that's not really acceptable in production, so I've added a switch that generates code that makes use of prototypes. With this switch enabled, then for the same Ceylon code, the compiler generates the following:

var $$$cl15=require('ceylon/language/0.1/ceylon.language');

//ClassDefinition Counter at members.ceylon (1:0-12:0)
function $Counter(){}

//AttributeDeclaration currentCount at members.ceylon (2:4-2:45)
$Counter.prototype.getCurrentCount=function getCurrentCount(){
    return this.currentCount;
}
$Counter.prototype.setCurrentCount=function setCurrentCount(currentCount){
    this.currentCount=currentCount;
}

//AttributeGetterDefinition count at members.ceylon (3:4-5:4)
$Counter.prototype.getCount=function getCount(){
    return this.getCurrentCount();
}

//MethodDefinition inc at members.ceylon (6:4-8:4)
$Counter.prototype.inc=function inc(){
    this.setCurrentCount(this.getCurrentCount().plus($$$cl15.Integer(1)));
}

this.Counter=function Counter(initialCount){
    var $$counter=new $Counter;
    $$counter.initialCount=initialCount;        
    return $$counter;
}

Clearly this code is a bit harder to understand than what we started with. It's also a lot uglier in the REPL:

> c = Counter(Integer(0))
{ initialCount: { value: 0, ... } }

Notice that the internal state of the object is now exposed to clients. And all its operations - held on the prototype - are also accessible, even the non-shared operations. Finally, JavaScript's this bug is back:

> inc = c.inc
[Function: inc]
> inc()
TypeError: Object #<error> has no method 'getCurrentCount'
    at inc (/Users/gavin/ceylon-js/build/test/node_modules/default/members.js:21:31)
    ...

We have to use the following ugly workaround:

> inc = function(){c.inc.apply(c,arguments)}
> inc()
> c.getCount().value
1

(Of course, the compiler automatically inserts these wrapper functions when you write a function reference at the Ceylon level.)

Personally, I don't really see why the JavaScript interpreter in V8 could not in principle internally optimize our original code to something more like our "optimized" code. I think it would make JavaScript a much more pleasant language to deal with if there wasn't such a big difference in performance there.

Anyway, if you're producing your JavaScript by writing Ceylon, this is now just a simple compiler switch :-)

Compiling Ceylon to JavaScript

We've always talked about Ceylon as a language for the JVM, since that's the platform we use every day and the one we think has the most to offer for server-side computing. But it's not the only VM out there, and there is very little in the definition of Ceylon that is JVM-specific. A couple of days ago I finally found time to work on a little project that's been nagging at me for a while: compiling Ceylon to JavaScript.

Why would anyone want to do that? Well, here's a few ideas:

  • to do client side development using a modern statically typed language,
  • to reuse server-side code on the client,
  • to run Ceylon code on node, or
  • for easy experimentation in a REPL.

I had anticipated that the language translation part of this would be a pretty easy task, and it turns out to be even easier than I had imagined. JavaScript isn't a very big language, so it took me two or three hours to re-learn it (or, more accurately, learn it properly for the first time), and I was ready to start generating code!

Some things that made this job especially easy:

  • Ceylon has well-defined semantics defined in a written specification. This is absolutely key for any kind of multi-platform-oriented language.
  • The Ceylon compiler has a layered architecture with a well-defined API between the parser/typechecker and the back end. Indeed, the two projects are developed by completely independent teams. Therefore, adding a new backend is an easy task.
  • JavaScript lacks a native type system, and all objects are essentially just associative arrays. This makes it an especially easy target for language translation. I had not fully appreciated just how much difference this makes!
  • Ceylon and JavaScript both view a "class" as just a special kind of function. Un-shared Ceylon declarations map naturally to JavaScript's lexical scope, and shared declarations map naturally to object members. JavaScript and Ceylon have similar treatment of closure and of first-class function references.
  • Neither Ceylon nor JavaScript has overloading. Indeed, the way you do overloading in Ceylon, using union types, is a totally natural match for how the problem is solved in dynamic languages!

On the other hand, there is one area where JavaScript is extremely, embarrassingly, inadequate: modularity. After a bit of googling around, I decided we should map Ceylon packages to Common JS modules. CommonJS is more of a server-side oriented solution, but apparently there are tools to repackage CommonJS modules for the browser. (The structure I've gone for is one script per package, grouped together in directories by module and module version.) We'll see how we go with this solution. It's certainly convenient for running Ceylon modules on node.

From the JavaScript side, you can load a Ceylon package named foo.bar.baz from version 1.0 of a module named foo.bar like this:

var foobarbaz=require('foo/bar/1.0/foo.bar.baz');

Now it's easy to instantiate a class named Counter and call its methods and attributes:

var counter = foobarbaz.Counter();
counter.inc();
console.log(counter.getCount());

I've pushed what I have so far to the new ceylon-js repository in GitHub. I've been focusing on the "hard bits" like multiple inheritance, nested classes, modularity, named argument invocations, encapsulation, etc., so what's there today is missing some of the more basic stuff like operators and control structures. And we need to reimplement the language module in JavaScript. Still, it's likely that the JavaScript backend will soon overtake the JVM backend in terms of feature completeness. Of course, I'm looking for contributors!