Architecture of Ceylon

The Ceylon project comprises the following major subsystems:

  • typechecker,
  • compiler,
  • launcher and module runtime,
  • documentation compiler,
  • IDE, and
  • language module.

This architecture makes it possible to support alternate runtime platforms, for example, a JavaScript-based runtime, or any other kind of virtual machine.

Contrary to common belief, compilers aren't magical, nor even as difficult as you probably imagine. You don't need a PhD to understand the Ceylon compiler.

Typechecker

What we call the typechecker, which is found in the typechecker directory of the repository, is actually responsible for much more than just typechecking. It includes:

  • an ANTLR-based lexer/parser,
  • a typesafe syntax tree for the Ceylon language,
  • a model of the types encountered in the code, and
  • a type analysis engine.

The lexer and parser are generated from the ANTLR grammar defined in the file Ceylon.g. The parser builds a syntax tree representing the input source code.

The syntax tree is currently generated from a specification defined in the file Ceylon.nodes (but this might change in future). The syntax tree has a Java class that represents each syntactic construct in the language. An instance of the tree represents the code in a certain compilation unit.

The model is an abstract representation of the types that are available to the compiler, not just in the compilation unit being compiled. Indeed, the compiler is even able to build a model for classes it encounters in precompiled module archives. However, note that the model contains much less information than the tree. For example, it does not contain any information about the procedural code contained in a class, method, or attribute.

The type analysis engine consists of several visitor classes that implement the rules defined in the language specification. They walk the syntax tree validating all the various rules that correct Ceylon code must satisfy, and attaching errors to tree nodes that fail to satisfy the rules. In addition, the type analysis visitors build up a model of the types they encounter in the tree, and create links from the tree to associated model objects. Thus, typing information is available to the compiler when it comes to transform the syntax tree to Java.

The typechecker has no dependencies to anything JVM-specific, so it can be reused with other backends.

Type analysis

Type analysis takes place in three phases. The type system was designed to never require more than three passes over the syntax tree.

  1. DeclarationVisitor creates model objects for each named declaration and keeping track of the scope in which it occurs.
  2. TypeVisitor analyses import statements and explicit type declarations, and assigns types to the model objects for explicitly typed declarations.
  3. ExpressionVisitor analyses the types of expressions, resolves member references, reports typing errors, and infers types of declarations without explicit type declarations.

Compiler

Thus, the thing we call the compiler, which is found in the compiler-java directory of the repository, is actually just half of the compiler. This "compiler" actually calls the typechecker when it needs the syntax tree for a compilation unit.

The compiler has two main responsibilities:

  • to build a model from pre-compiled binary classes that are found in module archives, and
  • transform the Ceylon syntax tree that is produced by the typechecker to a Java syntax tree that is understood by javac.

Finally, the compiler hands the Java syntax tree off to javac to produce bytecode. We're essentially using javac as the world's most sophisticated bytecode library.

Since javac already supports incremental compilation, so does the Ceylon compiler.

Launcher and module runtime

The Ceylon module runtime (in the runtime directory of the repository) is based on JBoss Modules. The Ceylon launcher simply starts java and invokes the module runtime. JBoss Modules bootstraps via a local repository, which must contain the following dependencies:

  • the Ceylon language module,
  • the Ceylon module resolver jar,
  • the Ceylon runtime jar, and
  • the JBoss Modules jar.

Finally, JBoss Modules is responsible for loading module archives as required according to the metadata contained in the module descriptors.

Documentation compiler

The documentation compiler (in the compiler-java directory of the repository, like the compiler) takes as its input the model produced by the typechecker. It's job is to produce HTML documentation. There is currently no support for alternate output formats.

IDE

The Ceylon IDE is a plugin for eclipse, and may be found in the ceylon-ide-eclipse repository. It is based on IMP, which provides us with a lot of the infrastructure that is common to programming language editors on Eclipse.

The Eclipse plugin is also built on top of the typechecker. It works directly with the syntax tree and model, which means that anything the typechecker knows about the source code, the IDE also knows. This includes types, members of types, errors, etc. And, of course, the IDE does not need to contain its own parser.

The IDE maintains a central model which it updates as part of the incremental compilation process. It also has a "forked" version of this model for each open Ceylon source editor. Each time a change is made in a source editor, a new "fork" of the model is produced. When the editor is saved, the central model is updated.

Searching for declarations is extremely fast in the Ceylon IDE since it works against the central model, not against the text of the source files.

The IDE does not directly use the compiler to perform its own work, but it does invoke the compiler as the last step of incremental compilation.

Language module

The language module is found in the language directory of the repository. The language module is special, because it contains types that are used by the compiler to compile other code. Therefore, the language module itself can't be compiled - there is a chicken/egg problem where you would need to compile the language module first, before you could compile the language module.

Furthermore, in order to achieve acceptable performance, the language module needs to take advantage of hand-written Java code.

Therefore, considering the Javascript backend, there are three implementations of the language module:

  • an incomplete implementation in Ceylon,
  • a complete implementation in Javascript, and
  • a complete implementation in Java.

Keeping the three versions in sync is a rather painful process!

The language module for the JVM also contains several annotations which are used by the Ceylon compiler at compile time to reverse engineer the model from precompiled Ceylon code in a module archive.

Javascript backend

There is another "half-compiler" (in the compiler-js directory of the repository) that uses the typechecker's syntax tree to generate Javascript code. This, along with a full implementation of the language module done in Javascript, can be used to transform Ceylon source code into Javascript code, which can be run in node.js or inside a browser. The compiler itself is written in Java, and node.js is used for testing. There are two kinds of tests: one is to check the correctness of the generated js code, and the other is to check that the language module implementation in js works as expected (and it actually runs all the tests from the ceylon.language project).

This project is what makes the Ceylon Web Runner possible.