Team blog

Compiling Ceylon to JavaScript

We've always talked about Ceylon as a language for the JVM, since that's the platform we use every day and the one we think has the most to offer for server-side computing. But it's not the only VM out there, and there is very little in the definition of Ceylon that is JVM-specific. A couple of days ago I finally found time to work on a little project that's been nagging at me for a while: compiling Ceylon to JavaScript.

Why would anyone want to do that? Well, here's a few ideas:

  • to do client side development using a modern statically typed language,
  • to reuse server-side code on the client,
  • to run Ceylon code on node, or
  • for easy experimentation in a REPL.

I had anticipated that the language translation part of this would be a pretty easy task, and it turns out to be even easier than I had imagined. JavaScript isn't a very big language, so it took me two or three hours to re-learn it (or, more accurately, learn it properly for the first time), and I was ready to start generating code!

Some things that made this job especially easy:

  • Ceylon has well-defined semantics defined in a written specification. This is absolutely key for any kind of multi-platform-oriented language.
  • The Ceylon compiler has a layered architecture with a well-defined API between the parser/typechecker and the back end. Indeed, the two projects are developed by completely independent teams. Therefore, adding a new backend is an easy task.
  • JavaScript lacks a native type system, and all objects are essentially just associative arrays. This makes it an especially easy target for language translation. I had not fully appreciated just how much difference this makes!
  • Ceylon and JavaScript both view a "class" as just a special kind of function. Un-shared Ceylon declarations map naturally to JavaScript's lexical scope, and shared declarations map naturally to object members. JavaScript and Ceylon have similar treatment of closure and of first-class function references.
  • Neither Ceylon nor JavaScript has overloading. Indeed, the way you do overloading in Ceylon, using union types, is a totally natural match for how the problem is solved in dynamic languages!

On the other hand, there is one area where JavaScript is extremely, embarrassingly, inadequate: modularity. After a bit of googling around, I decided we should map Ceylon packages to Common JS modules. CommonJS is more of a server-side oriented solution, but apparently there are tools to repackage CommonJS modules for the browser. (The structure I've gone for is one script per package, grouped together in directories by module and module version.) We'll see how we go with this solution. It's certainly convenient for running Ceylon modules on node.

From the JavaScript side, you can load a Ceylon package named foo.bar.baz from version 1.0 of a module named foo.bar like this:

var foobarbaz=require('foo/bar/1.0/foo.bar.baz');

Now it's easy to instantiate a class named Counter and call its methods and attributes:

var counter = foobarbaz.Counter();
counter.inc();
console.log(counter.getCount());

I've pushed what I have so far to the new ceylon-js repository in GitHub. I've been focusing on the "hard bits" like multiple inheritance, nested classes, modularity, named argument invocations, encapsulation, etc., so what's there today is missing some of the more basic stuff like operators and control structures. And we need to reimplement the language module in JavaScript. Still, it's likely that the JavaScript backend will soon overtake the JVM backend in terms of feature completeness. Of course, I'm looking for contributors!

Last minute gifts - IDE screenshots and forums

Gavin has just created a very nice set of screenshots picturing what our IDE is capable of. Have a look at it, a picture is worth a thousand words so imagine 13!

teaser

More info on the IDE

I also wanted to point out that with the release of Milestone 1, we have opened up the user centered forum on Google groups. Come join the fun and feel free to ask any questions, comments etc we want to know what you think and what we need to improve.

Happy holidays everyone :)

First official release of Ceylon

Today, we're proud to announce the release of Ceylon M1 "Newton". This is the first official release of the Ceylon command line compiler, documentation compiler, language module, and runtime, and a major step down the roadmap toward Ceylon 1.0.

You can get it here:

http://ceylon-lang.org/download

We plan a compatible M1 release of Ceylon IDE later this week.

Language features

In terms of the language itself, M1 has essentially all the features of Java except enumerated types, user-defined annotations, and reflection. It even incorporates a number of improvements over Java, including:

  • JVM-level primitive types are ordinary classes in Ceylon
  • type inference and type argument inference based on analysis of principal types
  • streamlined class definitions via elimination of getters, setters, and constructors
  • optional parameters with default values
  • named arguments and the "object builder" syntax
  • intersection types, union types, and the bottom type
  • static typing of the null value and empty sequences
  • declaration-site covariance and contravariance instead of wildcard types
  • more elegant syntax for type constraints
  • top-level function and value declarations instead of static members
  • nested functions
  • richer set of operators
  • more elegant syntax for annotations
  • immutability by default

Support for the following language features is not yet available:

  • first-class and higher-order functions
  • comprehensions
  • algebraic types, enumerated types, and switch/case
  • mixin inheritance
  • member class refinement
  • reified generics
  • user-defined annotations and the type safe metamodel

Furthermore, numeric operators are not currently optimized by the compiler, so numeric code is expected to perform poorly.

This page provides a quick introduction to the language. The draft language specification is the complete definition.

Modularity and runtime

Ceylon modules may be executed on any standard JVM. The toolset and runtime for Ceylon is based around .car module archives and module repositories. The runtime supports a modular, peer-to-peer class loading architecture, with full support for module versioning and multiple repositories.

This release of Ceylon includes support for local module repositories. Support for remote repositories and the shared community repository modules.ceylon-lang.org will be available in the next release.

Chapter 7 of the language specification contains much more information about the Ceylon module system and command line tools.

SDK

At this time, the only module available is the language module ceylon.language, included in the distribution.

Java interoperability

There are a number of issues that currently affect interoperability with Java. These issues are a top priority for the next release.

Source code

The source code for Ceylon, its specification, and its website, is freely available from GitHub:

https://github.com/ceylon

Issues

Bugs and suggestions may be reported in GitHub's issue tracker.

Community

The Ceylon community site includes documentation, the current draft of the language specification, the roadmap and information about getting involved.

http://ceylon-lang.org

Acknowledgement

We're deeply indebted to the community volunteers who contributed a substantial part of the current Ceylon codebase, working in their own spare time. The following people have contributed to this release:

Stephane Epardaud, Tako Schotanus, Gary Benson, Emmanuel Bernard, Andrew Haley, Tom Bentley, Ales Justin, David Festal, Flavio Oliveri, Sergej Koshchejev, Max Rydahl Andersen, Mladen Turk, James Cobb, Ben Keating, Michael Brackx, Ross Tate, Ivo Kasiuk, Gertjan Assies, Nicolas Leroux, Julien Viet

Let it work

Hi, my name is Stéphane Épardaud and I´ll be your technical writer today :)

I want to talk a bit about some of the challenges we faced in the Ceylon compiler, and the solutions we found. As is described in the compiler architecture page the backend of the Ceylon compiler extends OpenJDK´s Javac compiler by translating Ceylon source code into Javac AST, which is then compiled into bytecode by Javac. Some of the reasons why we went this route of extending Javac rather than create our own compiler from scratch are that:

  • We are guaranteed to generate valid bytecode, because it has to be valid Java code, since it´s checked by Javac.
  • We can compile Java and Ceylon code at the same time, without needing to write a Java parser and compiler. (Well this is not technically true in M1, but it will definitely be possible).

But there are things we can´t do properly in Java, and here I´m going to give you an example where we scratched our heads in trying to find a proper mapping.

Attributes instead of fields

In Ceylon, we don´t have Java fields, we have attributes, which are similar to JavaBean´s properties. This means that Ceylon attributes are translated to JavaBean getters and setters. And for interoperability we map JavaBean properties to Ceylon attributes. Now the biggest challenge with using JavaBean getter and setter methods in place of fields is that we want attributes to support the same operations you can do on Java fields, such as the ++ operation. How do we map this:

class Counter() {
    Natural n = 0;
}
Counter c = Counter();
Natural n = c.n++;

Into working Java code which looks like this (optimised for long because otherwise ++ is polymorphic):

class Counter{
    long n;
    long getN(){
        return n;
    }
    void setN(long n){
        this.n = n;
    }
}
Counter c = new Counter();
long n = c.getN()++;

Wait a minute: this is not valid!

So the problem is that there are a lot of operations you can do on l-values, that is, variables which can be assigned. To summarize the difference between l-values and r-values, the following mnemonics helps: an l-value is something which can be assigned and read, it can appear as the left side of an assignment, while an r-value is an expression that can only be read and not assigned. In our example, c.n is an l-value while 2 + 2 would be an r-value.

So we expect to be able to do every assignment operation on l-values, such as :=, += and ++. The problem we face is that in Java, c.n is an l-value but when using getters, c.getN() is not: it´s an r-value, you can´t assign to it, you can´t do ++ on it. For that you need to use the setter. Now the thing is that setters in JavaBean return void, so they´re not expressions, or even an l-value: they´re statements. And we can´t put statements inside expressions. For instance we can´t do:

Counter c = new Counter();
long n = c.setN(c.getN()+1);

We cannot do that because setN() is a statement: it returns void. Plus that would actually be an incorrect way to define ++, since we need to return the old value of n prior to the increment, so we´d need a temporary variable. The only way to have statements inside expressions in Java is to create an anonymous class:

Counter c = new Counter();
long n = new Object(){
    long postIncrement(Counter c){
        long previousValue = c.getN();
        c.setN(previousValue+1);
        return previousValue;
    }
}.postIncrement(c);

And the solution to all other other assignment operations are similar: anonymous classes for things as trivial as ++, surely this is crazy? If only there were some other way, short of generating bytecode ourselves (in which case we can do whatever we want without needing do make it translatable into Java).

Let it be…

So one day we´re looking inside OpenJDK´s Javac to try to find something, and we stumble upon mention of a comma operator. For those who don´t know C), the comma operator (,) allows you to execute several expressions and return the right-most expression value.

We look at this and we think: “this can´t be right, Java doesn´t have the comma operator, we´d know”. So why is it there? Looking a bit more we discover that it´s there to support ++ on boxed Integer values. Because this isn´t a primitive operation, you need the same sort of workaround we have:

Integer i = new Integer(0);
Integer j = new Object(){
    Integer postIncrement(Integer previousValue){
        // assuming you could assign a captured variable:
        i = new Integer(previousValue.intValue() + 1);
        return previousValue;
    }
}.postIncrement(i);

So they use this operator in order to save a temporary value in an expression context, where you normally can´t. And upon further examination it turns out that they (the OpenJDK Javac authors) implemented the comma operator using an even more generic exppression: a Let expression!

I´m very familiar with let expressions, such as they are in Scheme or in ML, but I´m sure many of you are not, so in short:

A let expression allows you to declare and bind new variables in a local scope, run statements and return an expression from this scope, all in the context of an expression.

So let´s rewrite our previous example in pseudo-Java with let:

Integer i = new Integer(0);
Integer j = (let
              // store the previous value in a temporary variable
              Integer previousValue = i;
             in 
              // assign the new value
              i = new Integer(previousValue.intValue() + 1);
              // return the previous value
              return previousValue;);

Now, obviously this is not valid Java, because let expressions are not part of the Java language, but the OpenJDK Javac compiler uses this construct behind the scenes to rewrite parts of the Java AST into pseudo-code that can be translated into efficient bytecode in the end. All they needed was an AST node to represent this, and support from the bytecode generator to support this AST type.

And guess what: since we feed Java AST to Javac we can use this construct :)

In fact this is precisely how we solved most of our issues, such as the ++ operator:

Counter c = new Counter();
long n = (let
           long previousValue = c.getN();
          in
           c.setN(previousValue+1);
           return previousValue;);

This solution allows us to define every assignment operator such as :=, ++ or += on attributes, that are mapped into JavaBean getter/setter methods using efficient code.

All we needed to do was to add some bits of support for let expressions inside Javac because they never needed to get them so early in the AST so it was missing some support in one or two phases of the compiler, but peanuts really.

Conclusion

When we set out to extend the Javac compiler we didn´t really know what to expect, but over time we´ve found it has a really solid API and is very well done and documented. We were able to extend it in ways it was never imagined to be extended, and it followed along nicely. Not only that but we found out that the OpenJDK developers, when faced with the issue of ++ on boxed Integers didn´t just hack along some quick and dirty way to fix it: they went ahead and implemented a much more powerful and generic way to solve every similar issue with the let expression. Congratulation guys, you did good and it was worth it, because thanks to you we can implement really crazy stuff.

We´re now using this let expression for implementing many operators and features, such as:

  • named parameter invocation, to keep source-file evaluation order before reordering the parameters for the callee,
  • the ?., ? and ?[] null-safe operators, to store the temporary variable before we test it for null.

So thanks, OpenJDK authors, thanks to you we´ll have efficient compilation of Ceylon code :)

Implementing Method Invocation in ceylonc

My main involvement in the Ceylon project has been in the compiler and within that one of the things I've been involved with is method invocation. So I thought I'd blog about some of the details of the compiler, to show that working on it isn't that hard.

Syntactically, Ceylon has two different ways of invoking a method (and the metamodel will add a third). Positional invocation will be very familiar to a Java programmer. In Ceylon it's conceptually pretty much the same, including support for 'varargs'. In this post I'm going to go into some detail about how support for positional method invocation is implemented in the compiler. I might cover the other syntactic form, named argument invocation, at a later date.

Before we go much further I just need to define some terminology. A method is declared with zero or more parameters, the last of which may be a sequenced parameter ('varargs'):

void foo(Natural n, Integer i, String... strings) {
    // some logic
}

At a (positional) call site the method is supplied with values for each of the parameters in the declaration. These values are usually called the arguments of the method invocation.

How we generate code

In the following sections I'm going to be presenting bits of Ceylon code and the 'equivalent Java' code but it's important to understand that the Ceylon compiler doesn't actually generate Java source code. It instead constructs an abstract syntax tree (AST) directly using the internal OpenJDK javac API. This AST is then subject to the same Java type checks as normal Java source code, before it get converted to bytecode. You can read more about the architecture of the Ceylon compiler here.

The major benefit of piggy-backing on javac like this is that we don't have to get into the details of generating correct bytecode, such as worrying about which instruction to jump to. We can stick to higher level concepts that we're more familiar with, while we focus on actually getting something working. In the long term, it would be nice if ceylonc could be self-hosting.

Erasure

Because of the similarity with Java, supporting positional invocation in Ceylon isn't difficult: It boils down to generating a plain Java method invocation. But the two certainly aren't equivalent.

Although notionally in Ceylon 'everything is an object', the compiler is allowed (and does) optimise the numerical types (Natural, Integer, Float, Boolean) to the corresponding Java primitive type (long, long, double and boolean respectively). This means that when you write a Ceylon statement such as

Natural n = 1;

it is transformed into a Java statement like this

final long n = 1;

We call this 'erasure' (yes, I know erasure has another meaning in Java to do with the loss of generic type information, but it's the term we use).

Erasure in itself wouldn't cause a problem for method invocation because the method parameters are subject to erasure just as the method arguments are. However, sometimes we need to 'box' the primitive, just like Java does.

A good example of this is passing a Natural argument to a parameter declared Natural?. The Java method declaration must use a boxed type (Natural from the runtime) as opposted to the Java primitive (long) it would otherwise be erased to in order to cope with the possibility of being passed a null. This means it is the compiler's responsibility to box the erased Natural (a Java long) at the call site.

Ceylon uses its own boxing classes in the runtime version of the language module. Each class implements the API of the relevant type. Because Ceylon doesn't use the same classes to box primitives as Java does we can't rely on javac's auto boxing/unboxing support. Performing this boxing correctly and exactly when and where it's needed is where method invocation starts to get a little more complex than simply being 'A Ceylon method invocation is the same as a Java method invocation'.

Varargs

Varargs isn't implemented in terms of Java's varargs support. The reason in this case is that a Ceylon sequence is not the same thing as a Java array. So when someone declares a Ceylon method like this

void varargs(T... ts) {

}

the equivalent Java looks something like this

void varargs(Iterable<T> ts) {

}

Now consider the Ceylon call site

varargs<String>("foo", "bar", "baz");

When compiling this invocation we have to create a concrete instance of the Iterable<T> (using the arguments provided) to pass to the method. This is done using an ArraySequence (an implementation of a Ceylon Sequence in the runtime), so that the generated Java looks something like this

varargs(new ArraySequence("foo", "bar", "baz"));

Aside: The observant reader will realise that using varargs with erased types creates another boxing problem...

Conclusion

So, after all that explanation, hopefully the source code should make some kind of sense.

None of what I've discussed above should be that hard to understand for anyone who's done much Java programming. I will admit at this point that I deliberately chose something that would be familiar and where the transformation between Ceylon and Java are small. This has allowed me to focus on some of the annoying-but-necessary details that are important to understand if you're going to hack on the compiler.

The take-home message is that you really don't have to know a great deal about compilers, or even the JVM to be able to contribute something genuinely useful.

Note

Since this post was originally written:

  • the Natural type has since been remove from ceylon.language.
  • ceylonc has become ceylon compile.