Implementing Method Invocation in ceylonc
My main involvement in the Ceylon project has been in the compiler and within that one of the things I've been involved with is method invocation. So I thought I'd blog about some of the details of the compiler, to show that working on it isn't that hard.
Syntactically, Ceylon has two different ways of invoking a method (and the metamodel will add a third). Positional invocation will be very familiar to a Java programmer. In Ceylon it's conceptually pretty much the same, including support for 'varargs'. In this post I'm going to go into some detail about how support for positional method invocation is implemented in the compiler. I might cover the other syntactic form, named argument invocation, at a later date.
Before we go much further I just need to define some terminology. A method is declared with zero or more parameters, the last of which may be a sequenced parameter ('varargs'):
void foo(Natural n, Integer i, String... strings) {
// some logic
}
At a (positional) call site the method is supplied with values for each of the parameters in the declaration. These values are usually called the arguments of the method invocation.
How we generate code
In the following sections I'm going to be presenting bits of Ceylon code and
the 'equivalent Java' code but it's
important to understand that the Ceylon compiler doesn't
actually generate Java source code. It instead constructs an abstract syntax
tree (AST) directly
using the internal OpenJDK javac
API. This AST is then subject to the same
Java type checks as normal Java source code, before it get converted to
bytecode. You can read more about the architecture of the Ceylon compiler
here.
The major benefit of piggy-backing on javac
like this is that
we don't have to get into the details of generating correct bytecode, such as
worrying about which instruction to jump to. We can
stick to higher level concepts that we're more familiar with, while we focus
on actually getting something working. In the long term, it would be nice
if ceylonc
could be self-hosting.
Erasure
Because of the similarity with Java, supporting positional invocation in Ceylon isn't difficult: It boils down to generating a plain Java method invocation. But the two certainly aren't equivalent.
Although notionally in Ceylon 'everything is an object', the compiler is
allowed (and does) optimise the numerical types
(Natural
,
Integer
,
Float
,
Boolean
) to the
corresponding Java primitive type
(long
, long
, double
and boolean
respectively). This means that when you
write a Ceylon statement such as
Natural n = 1;
it is transformed into a Java statement like this
final long n = 1;
We call this 'erasure' (yes, I know erasure has another meaning in Java to do with the loss of generic type information, but it's the term we use).
Erasure in itself wouldn't cause a problem for method invocation because the method parameters are subject to erasure just as the method arguments are. However, sometimes we need to 'box' the primitive, just like Java does.
A good example of this is
passing a Natural
argument to a parameter declared
Natural?
.
The Java method declaration must use a boxed type (Natural
from the runtime)
as opposted to the Java primitive (long
) it would otherwise be erased to
in order to cope with the possibility of being passed a null
.
This means it is the compiler's responsibility to box the
erased Natural
(a Java long
) at the call site.
Ceylon uses its own boxing classes in the runtime version of the
language module.
Each class implements the API of the relevant type. Because Ceylon doesn't use
the same classes to box primitives as Java does we can't rely on
javac
's auto boxing/unboxing support. Performing this boxing correctly and
exactly when and where it's needed is where method invocation starts to
get a little more complex than simply being 'A Ceylon method invocation is the
same as a Java method invocation'.
Varargs
Varargs isn't implemented in terms of Java's varargs support. The reason in this case is that a Ceylon sequence is not the same thing as a Java array. So when someone declares a Ceylon method like this
void varargs(T... ts) {
}
the equivalent Java looks something like this
void varargs(Iterable<T> ts) {
}
Now consider the Ceylon call site
varargs<String>("foo", "bar", "baz");
When compiling this invocation we have to create a concrete instance of the
Iterable<T>
(using the arguments provided) to pass
to the method.
This is done using an ArraySequence
(an implementation of a Ceylon
Sequence
in the runtime), so that the generated Java looks something like this
varargs(new ArraySequence("foo", "bar", "baz"));
Aside: The observant reader will realise that using varargs with erased types creates another boxing problem...
Conclusion
So, after all that explanation, hopefully the source code should make some kind of sense.
None of what I've discussed above should be that hard to understand for anyone who's done much Java programming. I will admit at this point that I deliberately chose something that would be familiar and where the transformation between Ceylon and Java are small. This has allowed me to focus on some of the annoying-but-necessary details that are important to understand if you're going to hack on the compiler.
The take-home message is that you really don't have to know a great deal about compilers, or even the JVM to be able to contribute something genuinely useful.
Note
Since this post was originally written:
- the
Natural
type has since been remove fromceylon.language
. -
ceylonc
has becomeceylon compile
.