Blog tagged compiler

Ceylon on Java 9 + Jigsaw

Everyone is talking about modules these days. New languages try to incorporate them, and older languages try to retrofit them in. Which is great news, because modules are essential. Java 9 is around the corner, because it's supposed to come out next year, and the really big new feature is modularity, which it calls the Jigsaw project.

Ceylon is a language that featured modularity from the start, as part of the language and not as an afterthought requiring complex third-party tool integration. In fact, at the time we designed our Java JDK integration (at the time of Java 7), we went as far as using the Jigsaw modularity plans for the JDK (yes Jigsaw got delayed a few times) from the start, requiring JDK users to import Jigsaw modules as they were planned at the time, rather than import the whole JDK in one go. So perhaps we were the first ones with a modular JDK, in some sense :)

Java 9’s Jigsaw

Jigsaw is a very large project, which includes the following changes:

  • Modularisation of the JDK into smaller units, such as java.base, java.xml that Ceylon users of the JDK are already familiar with.
  • This modularisation means removal of rt.jar that contained every JDK class. In fact it's been replaced by a bootmodules.jimage file which is not a jar, but whose contents can be accessed by a virtual NIO FileSystem at jrt:/.
  • You can write your own modules. To turn your Java code into a Java 9 module, you simply add a module descriptor in a file called module-info.java (much like Ceylon module descriptors, or Java package descriptors), which describes your module and the Java 9 compiler and jar tools will then generate a jar with a module-info.class descriptor at the root of the jar.
  • That module descriptor allows you to specify the module name, the packages it exports, the name of the modules it imports and a few other things. But not versions, unfortunately, which are currently "out of scope" in Java 9.
  • You can run your code as previously from the classpath, or as modules from the module path. The module path is just a folder in which you can place your modules and the JRE will look them up for you based on module name alone.

Ceylon and Jigsaw

Java 9 has two early-access (EA) downloads for users to try the module system. Only one of them includes user modules. Make sure you use that one if you want to try out Ceylon running on Java 9.

Over the past weeks I've worked on getting Ceylon compiling and running on Java 9. This involved (among other details) the following things:

  • Generating module-info.class files from Ceylon module descriptors.
  • Generating module-info.class files for the Ceylon distribution modules which are not written in Ceylon (like the compilers or runtime system).
  • Making use of the Java 9 module descriptors for the shared packages information it contains (something supported by Ceylon since the beginning, but which was lacking for plain Java jars).
  • Backporting Java 9 code that deals with modules to the javac fork we use to compile Java files and generate bytecode.
  • Dealing with the removal of rt.jar and the boot classpath.
  • Creating a new tool ceylon jigsaw which allows for the creation of a Java 9 module path.
  • Making sure we can run Ceylon modules as Java 9 modules as an alternative to the four existing JVM runtimes which are the JBoss Modules, classpath, OSGi or Java EE.
  • Make sure we can build and run on any of Java 7,8,9. This means that by default we do not generate Java 9 module descriptors, because several tools have problems dealing with them at this time.
  • We have split some things out of the ceylon.language module so that it no longer depends on the compilers and type-checker, which means a lighter minimal runtime, which will be even further improved in the next weeks with more dependency removals :)

Just tell me how to try this!

I will spare you the many details of this work, but with help from the Java 9 team, this is how you can run your Ceylon modules on a Java 9 runtime:

  • Download the Java 9 EA with Jigsaw.
  • Get the Ceylon distribution code, and compile it with ant -Djigsaw=true clean dist to get the Java 9 module descriptors.
  • Write your Ceylon module normally, but compile it with .../ceylon/dist/dist/bin/ceylon compile --generate-module-info to generate the Java 9 module descriptors.
  • Create your Java 9 module path in an mlib folder with .../ceylon/dist/dist/bin/ceylon jigsaw create-mlib my.module/1.
  • Run your Ceylon module on Java 9 with .../jdk1.9.0-jigsaw/bin/java -mp mlib -m ceylon.language my.module/1. At the moment, the ceylon.language module acts as main module and does the required setting up of the Ceylon runtime before loading and invoking your Ceylon module.

That's all there is to it!

Caveats

Java 9 is not complete yet, and our support for Java 9 is also not complete. There will be issues and bugs, and in fact we already know of several limitations, such as the following:

  • While you can import a pure Java 9 module from Ceylon, we will respect its exported packages, but we will not respect its dependencies, because Java 9 modules do not include dependency versions. In fact, even the module's version is not stored in the source module descriptor, but added by an optional flag to the Java 9 jar tool. Ceylon requires module dependencies to describe a version, so we have to combine the Java 9 module descriptor with another descriptor such as an OSGi descriptor or a Maven pom.xml descriptor. This merging of information is not currently done.
  • Java 9 does not currently support optional modules or module cycles. It is not clear if they will support them at this time, unfortunately.
  • The ceylon import-jar tool may complain about module visibility artifacts. We intend to fix this in time, but for now you can use --force.
  • The JDK module list we used in Ceylon has slightly changed in Java 9. This is what we get for being the first to support Jigsaw ;) For example, the javax.xml module has been renamed to java.xml. We have set up aliases so that it "just" works, but there are modules that have been merged, and packages that have changed module, so it will not always work.
  • The Java 9 runtime has been tested, but not as thoroughly as the existing JBoss Modules, classpath, OSGi or Java EE runtimes. We expect a few issues in the Ceylon metamodel.

Let it work

Hi, my name is Stéphane Épardaud and I´ll be your technical writer today :)

I want to talk a bit about some of the challenges we faced in the Ceylon compiler, and the solutions we found. As is described in the compiler architecture page the backend of the Ceylon compiler extends OpenJDK´s Javac compiler by translating Ceylon source code into Javac AST, which is then compiled into bytecode by Javac. Some of the reasons why we went this route of extending Javac rather than create our own compiler from scratch are that:

  • We are guaranteed to generate valid bytecode, because it has to be valid Java code, since it´s checked by Javac.
  • We can compile Java and Ceylon code at the same time, without needing to write a Java parser and compiler. (Well this is not technically true in M1, but it will definitely be possible).

But there are things we can´t do properly in Java, and here I´m going to give you an example where we scratched our heads in trying to find a proper mapping.

Attributes instead of fields

In Ceylon, we don´t have Java fields, we have attributes, which are similar to JavaBean´s properties. This means that Ceylon attributes are translated to JavaBean getters and setters. And for interoperability we map JavaBean properties to Ceylon attributes. Now the biggest challenge with using JavaBean getter and setter methods in place of fields is that we want attributes to support the same operations you can do on Java fields, such as the ++ operation. How do we map this:

class Counter() {
    Natural n = 0;
}
Counter c = Counter();
Natural n = c.n++;

Into working Java code which looks like this (optimised for long because otherwise ++ is polymorphic):

class Counter{
    long n;
    long getN(){
        return n;
    }
    void setN(long n){
        this.n = n;
    }
}
Counter c = new Counter();
long n = c.getN()++;

Wait a minute: this is not valid!

So the problem is that there are a lot of operations you can do on l-values, that is, variables which can be assigned. To summarize the difference between l-values and r-values, the following mnemonics helps: an l-value is something which can be assigned and read, it can appear as the left side of an assignment, while an r-value is an expression that can only be read and not assigned. In our example, c.n is an l-value while 2 + 2 would be an r-value.

So we expect to be able to do every assignment operation on l-values, such as :=, += and ++. The problem we face is that in Java, c.n is an l-value but when using getters, c.getN() is not: it´s an r-value, you can´t assign to it, you can´t do ++ on it. For that you need to use the setter. Now the thing is that setters in JavaBean return void, so they´re not expressions, or even an l-value: they´re statements. And we can´t put statements inside expressions. For instance we can´t do:

Counter c = new Counter();
long n = c.setN(c.getN()+1);

We cannot do that because setN() is a statement: it returns void. Plus that would actually be an incorrect way to define ++, since we need to return the old value of n prior to the increment, so we´d need a temporary variable. The only way to have statements inside expressions in Java is to create an anonymous class:

Counter c = new Counter();
long n = new Object(){
    long postIncrement(Counter c){
        long previousValue = c.getN();
        c.setN(previousValue+1);
        return previousValue;
    }
}.postIncrement(c);

And the solution to all other other assignment operations are similar: anonymous classes for things as trivial as ++, surely this is crazy? If only there were some other way, short of generating bytecode ourselves (in which case we can do whatever we want without needing do make it translatable into Java).

Let it be…

So one day we´re looking inside OpenJDK´s Javac to try to find something, and we stumble upon mention of a comma operator. For those who don´t know C), the comma operator (,) allows you to execute several expressions and return the right-most expression value.

We look at this and we think: “this can´t be right, Java doesn´t have the comma operator, we´d know”. So why is it there? Looking a bit more we discover that it´s there to support ++ on boxed Integer values. Because this isn´t a primitive operation, you need the same sort of workaround we have:

Integer i = new Integer(0);
Integer j = new Object(){
    Integer postIncrement(Integer previousValue){
        // assuming you could assign a captured variable:
        i = new Integer(previousValue.intValue() + 1);
        return previousValue;
    }
}.postIncrement(i);

So they use this operator in order to save a temporary value in an expression context, where you normally can´t. And upon further examination it turns out that they (the OpenJDK Javac authors) implemented the comma operator using an even more generic exppression: a Let expression!

I´m very familiar with let expressions, such as they are in Scheme or in ML, but I´m sure many of you are not, so in short:

A let expression allows you to declare and bind new variables in a local scope, run statements and return an expression from this scope, all in the context of an expression.

So let´s rewrite our previous example in pseudo-Java with let:

Integer i = new Integer(0);
Integer j = (let
              // store the previous value in a temporary variable
              Integer previousValue = i;
             in 
              // assign the new value
              i = new Integer(previousValue.intValue() + 1);
              // return the previous value
              return previousValue;);

Now, obviously this is not valid Java, because let expressions are not part of the Java language, but the OpenJDK Javac compiler uses this construct behind the scenes to rewrite parts of the Java AST into pseudo-code that can be translated into efficient bytecode in the end. All they needed was an AST node to represent this, and support from the bytecode generator to support this AST type.

And guess what: since we feed Java AST to Javac we can use this construct :)

In fact this is precisely how we solved most of our issues, such as the ++ operator:

Counter c = new Counter();
long n = (let
           long previousValue = c.getN();
          in
           c.setN(previousValue+1);
           return previousValue;);

This solution allows us to define every assignment operator such as :=, ++ or += on attributes, that are mapped into JavaBean getter/setter methods using efficient code.

All we needed to do was to add some bits of support for let expressions inside Javac because they never needed to get them so early in the AST so it was missing some support in one or two phases of the compiler, but peanuts really.

Conclusion

When we set out to extend the Javac compiler we didn´t really know what to expect, but over time we´ve found it has a really solid API and is very well done and documented. We were able to extend it in ways it was never imagined to be extended, and it followed along nicely. Not only that but we found out that the OpenJDK developers, when faced with the issue of ++ on boxed Integers didn´t just hack along some quick and dirty way to fix it: they went ahead and implemented a much more powerful and generic way to solve every similar issue with the let expression. Congratulation guys, you did good and it was worth it, because thanks to you we can implement really crazy stuff.

We´re now using this let expression for implementing many operators and features, such as:

  • named parameter invocation, to keep source-file evaluation order before reordering the parameters for the callee,
  • the ?., ? and ?[] null-safe operators, to store the temporary variable before we test it for null.

So thanks, OpenJDK authors, thanks to you we´ll have efficient compilation of Ceylon code :)

Implementing Method Invocation in ceylonc

My main involvement in the Ceylon project has been in the compiler and within that one of the things I've been involved with is method invocation. So I thought I'd blog about some of the details of the compiler, to show that working on it isn't that hard.

Syntactically, Ceylon has two different ways of invoking a method (and the metamodel will add a third). Positional invocation will be very familiar to a Java programmer. In Ceylon it's conceptually pretty much the same, including support for 'varargs'. In this post I'm going to go into some detail about how support for positional method invocation is implemented in the compiler. I might cover the other syntactic form, named argument invocation, at a later date.

Before we go much further I just need to define some terminology. A method is declared with zero or more parameters, the last of which may be a sequenced parameter ('varargs'):

void foo(Natural n, Integer i, String... strings) {
    // some logic
}

At a (positional) call site the method is supplied with values for each of the parameters in the declaration. These values are usually called the arguments of the method invocation.

How we generate code

In the following sections I'm going to be presenting bits of Ceylon code and the 'equivalent Java' code but it's important to understand that the Ceylon compiler doesn't actually generate Java source code. It instead constructs an abstract syntax tree (AST) directly using the internal OpenJDK javac API. This AST is then subject to the same Java type checks as normal Java source code, before it get converted to bytecode. You can read more about the architecture of the Ceylon compiler here.

The major benefit of piggy-backing on javac like this is that we don't have to get into the details of generating correct bytecode, such as worrying about which instruction to jump to. We can stick to higher level concepts that we're more familiar with, while we focus on actually getting something working. In the long term, it would be nice if ceylonc could be self-hosting.

Erasure

Because of the similarity with Java, supporting positional invocation in Ceylon isn't difficult: It boils down to generating a plain Java method invocation. But the two certainly aren't equivalent.

Although notionally in Ceylon 'everything is an object', the compiler is allowed (and does) optimise the numerical types (Natural, Integer, Float, Boolean) to the corresponding Java primitive type (long, long, double and boolean respectively). This means that when you write a Ceylon statement such as

Natural n = 1;

it is transformed into a Java statement like this

final long n = 1;

We call this 'erasure' (yes, I know erasure has another meaning in Java to do with the loss of generic type information, but it's the term we use).

Erasure in itself wouldn't cause a problem for method invocation because the method parameters are subject to erasure just as the method arguments are. However, sometimes we need to 'box' the primitive, just like Java does.

A good example of this is passing a Natural argument to a parameter declared Natural?. The Java method declaration must use a boxed type (Natural from the runtime) as opposted to the Java primitive (long) it would otherwise be erased to in order to cope with the possibility of being passed a null. This means it is the compiler's responsibility to box the erased Natural (a Java long) at the call site.

Ceylon uses its own boxing classes in the runtime version of the language module. Each class implements the API of the relevant type. Because Ceylon doesn't use the same classes to box primitives as Java does we can't rely on javac's auto boxing/unboxing support. Performing this boxing correctly and exactly when and where it's needed is where method invocation starts to get a little more complex than simply being 'A Ceylon method invocation is the same as a Java method invocation'.

Varargs

Varargs isn't implemented in terms of Java's varargs support. The reason in this case is that a Ceylon sequence is not the same thing as a Java array. So when someone declares a Ceylon method like this

void varargs(T... ts) {

}

the equivalent Java looks something like this

void varargs(Iterable<T> ts) {

}

Now consider the Ceylon call site

varargs<String>("foo", "bar", "baz");

When compiling this invocation we have to create a concrete instance of the Iterable<T> (using the arguments provided) to pass to the method. This is done using an ArraySequence (an implementation of a Ceylon Sequence in the runtime), so that the generated Java looks something like this

varargs(new ArraySequence("foo", "bar", "baz"));

Aside: The observant reader will realise that using varargs with erased types creates another boxing problem...

Conclusion

So, after all that explanation, hopefully the source code should make some kind of sense.

None of what I've discussed above should be that hard to understand for anyone who's done much Java programming. I will admit at this point that I deliberately chose something that would be familiar and where the transformation between Ceylon and Java are small. This has allowed me to focus on some of the annoying-but-necessary details that are important to understand if you're going to hack on the compiler.

The take-home message is that you really don't have to know a great deal about compilers, or even the JVM to be able to contribute something genuinely useful.

Note

Since this post was originally written:

  • the Natural type has since been remove from ceylon.language.
  • ceylonc has become ceylon compile.