The language module

This is the fourteenth part of the Tour of Ceylon. The previous part introduced comprehensions. We're now going to learn about Ceylon's language module and some of the basic types it defines.

The language module is special, because it is referred to by the language specification, and some language-level constructs are defined in terms of the types it declares. Therefore, you can think of it as forming part of the language definition. In practice, the language module is implemented in a mix of Ceylon and native (Java and JavaScript) code. Of course, we've already met quite a few of the inhabitants of the language module, especially in this chapter.

An overview of the language module

The module ceylon.language contains classes and interfaces that are referred to in the language specification, other declarations they refer to, and a number of related useful functions and types. Let's meet the main characters.

Object and Null

Just like Java, Ceylon has a class named Object.

"The abstract supertype of all types representing 
 definite values..."
see (`class Basic`, `class Null`)
shared abstract class Object() 
        extends Anything() {

    "Determine if two values are equal..."
    shared formal Boolean equals(Object that);

    "The hash value of the value..."
    shared formal Integer hash;

    "A developer-friendly string representing the 
     instance..."
    shared default String string
            => className(this) + "@" + 
               Integer.format(hash, #10);

}

Ceylon's, Object isn't the root of the type system. An expression of type Object has a definite, well-defined, non-null value. As we've seen, Ceylon type system also has the class Null, which is the type of null.

"The type of the [[null]] value. Any union type of form 
 `Null|T` is considered an _optional_ type, whose values
 include `null`. Any type of this form may be written as
 `T?` for convenience..."
see (`value null`)
shared abstract class Null() 
        of null
        extends Anything() {}

The object null is the only instance of this class.

Therefore, Object and Null share a superclass, named Anything.

The root class

Anything is the root supertype of the whole type system. It's an enumerated class:

"The abstract supertype of all types. A value of type 
 `Anything` may be a definite value of type [[Object]], or 
 it may be the [[null]] value. A method declared `void` is 
 considered to have the return type `Anything`..."
shared abstract class Anything() 
        of Object | Null {}

All Ceylon types are assignable to Anything. Expressions of type Anything aren't useful for very much, since Anything has no members or operations. The one useful thing you can do with Anything is represent the signature of a method when you don't care about the return type, since a method declared void is considered to have return type Anything, as we saw in the part about functions.

All types that represent well-defined values extend Object, including:

  • user-written classes,
  • all interfaces, including,
  • function types, and even
  • the types that are considered primitive in Java, such as Boolean, Integer, Float, Byte, and
    Character.

The only things that aren't assignable to Object are the value null, along with untyped values produced by a dynamically typed expression when interoperating with JavaScript.

All Objects have a definition of value equality, expressed in their implementations of equals() and hash.

Equality and identity

On the other hand, since Object is a supertype of types like Float which are passed by value at the level of the virtual machine, you can't use the === operator to test the identity of two values of type Object. The following is not allowed:

Integer x = 1;
assert (x===1); //compile error: Integer is not Identifiable 

Instead, === is defined to act only on instances of the interface Identifiable. Integer, Float, Character, and String don't satisfy this interface, but most classes do.

"The abstract supertype of all types with a well-defined
 notion of identity. Values of type `Identifiable` may 
 be compared using the `===` operator to determine if 
 they are references to the same object instance..."
shared interface Identifiable {

    "Identity equality comparing the identity of the two 
     values..."
    shared default actual Boolean equals(Object that)
            => if (is Identifiable that)
                then this===that
                else false;

    "The system-defined identity hash value of the 
     instance..."
    see (`function identityHash`)
    shared default actual Integer hash => identityHash(this);

}

Identifiable implements the hash attribute and equals() method of Object, which are very similar to the equals() and hashCode() methods defined by java.lang.Object.

Just like in Java, you can refine this default implementation in your own classes. This is the normal way to get a customized behavior for the == operator, the only constraint being, that for subtypes of Identifiable, x===y should imply x==y— equality should be consistent with identity.

The default superclass

By default, a user-written class extends the class Basic, which extends Object and satisfies Identifiable. It's possible for a user-written class to directly extend Object, but most of the classes you write will be subclasses of Basic. All classes with variable attributes must extend Basic.

"The default superclass when no superclass is explicitly
 specified using `extends`..."
shared abstract class Basic() 
        extends Object() satisfies Identifiable {}

An interface never has a definition of identity equality unless it explicitly satisfies Identifiable.

Operator polymorphism

Ceylon discourages the creation of intriguing executable ASCII art. Therefore, true operator overloading is not supported by the language. Instead, almost every operator (every one except the primitive ., (), is, =, ===, and of operators) is considered a shortcut way of writing some more complex expression involving other operators and ordinary function calls.

For example, the < operator is defined in terms of the interface Comparable, which has a method named compare(). The operator expression

x<y

means, by definition,

x.compare(y) === smaller

The equality operator == is defined in terms of the class Object, which has a method named equals(). So

x==y

means, by definition,

x.equals(y)

Therefore, it's easy to customize operators like < and == with specific behavior for our own classes, just by implementing or refining methods like compare() and equals(). Thus, we say that operators are polymorphic in Ceylon.

Apart from Comparable and Object, which provide the underlying definition of comparison and equality operators, the following interfaces are also important in the definition of Ceylon's polymorphic operators:

  • Summable supports the infix + operator,
  • Invertible supports the prefix and infix - operators,
  • Ordinal supports the unary ++ and -- operators,
  • Numeric supports the infix * and / operators,
  • Exponentiable supports the power operator ^,
  • Scalable supports the scalar multiplication operator **,
  • Comparable supports the comparison operators <, >, <=, >=, and <=>,
  • Enumerable supports the range operators .. and :,
  • Correspondence and CorrespondenceMutator support the index operator,
  • Ranged supports the subrange operators,
  • Boolean is the basis of the logical operators &&, ||, !, and
  • Set is the basis of the set operators |, &, and, ~.

Comparison operators

In addition to the traditional <, >, <=, and >= operators, which evaluate to Boolean, there is a <=> operator, which produces an instance of the enumerated type Comparison.

switch(x<=>0)
case (smaller) {
    return sqrt(-x);
}
case (equal) {
    return 0;
}
case (larger) {
    return sqrt(x);
}

Two < or <= operators may be combined to determine if a value falls within a range:

assert(0<quantity<=100);

Set operators

The operators | and & represent set union and intersection when they occur in a value expression. But, as we've already seen, when they occur in a type expression they represent type union and intersection! Indeed, there's a relationship between the two kinds of union/intersection:

Set<Integer> integers = ... ;
Set<Float> floats = ... ;
Set<Float|Integer> numbers = integers | floats;

Set<Foo> foos = ... ;
Set<Bar> bars = ... ;
Set<Foo&Bar> foobars = foos & bars;

The binary ~ operator represents complement (set subtraction).

These operators may only be used with expressions of type Set.

Gotcha!

There's no operators representing bitwise operations like NOT, AND, OR, XOR, so we must write these operations as method calls.

Indexed operations

We can access an element of a Correspondence by using the index operator. Both Lists and Maps are instances of Correspondence:

"string must start with a \""
assert (exists ch = text[0], ch=='"');

Mutable lists and maps are instances of CorrespondenceMutator, which allows indexed assignment to element. One example of a mutable list is Array:

value array = Array.ofSize(5, 0);
for (i in 0:5) {
    array[i] = i^2;
}

All Lists are also instances of Ranged. We can produce a subrange of a Ranged object by providing two endpoints:

if (text[i..i]=="/") {
    [String,String] split = [text[...i-1], text[i+1...]];
    //...
}

We can also produce a subrange of a Ranged object by providing a starting point and a length.

String selectedText = text[selection.offset:selection.length];

Please take careful note the difference between .. and :, they have quite distinct purposes:

print("hello"[2..2]); //prints "l"
print("hello"[2:2]);  //prints "ll"

print("hello"[2..0]); //prints "leh"
print("hello"[2:0]);  //prints ""

Characters and character strings

We've already met the class String, way back in the first leg of the tour. Ceylon strings are made of Characters.

String hello = "hello \{SPARKLING HEART}";
for (ch in hello) {
    print("U+``formatInteger(ch.integer, #10).padLeading(4, '0')`` '``ch``'");
}

A character literal is written between single quotes.

Character[] latinLetters = concatenate('a'..'z', 'A'..'Z');
Character newline = '\n';
Character pi = '\{#0001D452}';

An instance of Character represents a 32-bit Unicode character, not a Java-style UTF-16 char.

A String is a List of Characters. And therefore a String is a List of 32-bit Unicode codepoints, not a list of chars. That's really nice, but it has one unusual consequence.

Gotcha!

Under the covers, Ceylon strings are implemented using a Java char[] array (in fact, they are implemented using a Java string).

Therefore, some operations on Ceylon strings are much slower than you might expect, since they must take four-byte characters into account. This includes size and get().

We think it's much better that these operations be slow, as in Ceylon, than that they sometimes give the wrong answer, like in Java. And remember, it's never correct to iterate a list using size and get() in Ceylon!

Tip: avoiding String.size()

To avoid the cost of calling size(), try to use the more efficient empty, longerThan() and shorterThan() when the string might be very long.

String long = ... ;
if (long.size<10) { ... } //slow!
if (long.shorterThan(10)) { ... } //faster

Numeric types

As we've mentioned several times before, Ceylon doesn't have anything like Java's primitive types. The types that represent numeric values are just ordinary classes. Ceylon has fewer built-in numeric types than other C-like languages:

  • Integer represents signed integers, and
  • Float represents floating point approximations of real numbers.

However, the compiler magically eliminates these classes, wherever possible, in order to take advantage of the high performance of the platform's native primitive types.

Therefore, the precision of these types depends on whether you're running your code on the JVM or on a JavaScript virtual machine.

  • When compiling for Java both types have 64-bit precision by default. You can specify that a value has 32-bit precision by annotating it small. (But note that this annotation is a hint that the compiler is permitted to ignore.)
  • When compiling for JavaScript, Floats have 64-bit precision and Integers have 53-bit precision.

Overflow (on the JVM), or loss of precision (in JavaScript) occurs silently.

Numeric literals

In their simplest form the literals for Integers, and literals for Floats look as you might expect from other languages:

Integer one = 1;
Float oneHundredth = 0.01;
Float oneMillion = 1.0E+6;

However they can be a bit more sophisticated. The digits of a numeric literal may be grouped using underscores. If the digits are grouped, then groups must contain exactly three digits.

Integer twoMillionAndOne = 2_000_001;
Float pi = 3.141_592_654;

A very large or small numeric literal may be qualified by one of the standard SI unit prefixes: m, u, n, p, f, k, M, G, T, P.

Float red = 390.0n;                  // n (nano) means E-9
Float galaxyDiameter = 900.0P;       // P (peta) means E+15
Float hydrogenRadius = 25.0p;        // p (pico) means E-12
Float usGovDebt = 14.33T;            // T (tera) means E+12
Float brainCellSize = 4.0u;          // u (micro) means E-6
Integer deathsUnderCommunism = 94M;  // M (mega) means E+6

A hexadecimal integer is written using a prefix #. Digits may be grouped into groups of two or four digits.

Integer white = #FF_FF_FF;

A binary integer is written with a prefix $. Digits may be grouped into groups of four digits.

Integer sixtyNine = $0100_0101;

Arbitrary precision numeric types

The platform modules ceylon.whole and ceylon.decimal define the types

  • Whole, which represents arbitrary precision integers, and
  • Decimal, which represents arbitrary precision decimals numbers.

Both classes are subtypes of Numeric, so we can use all the usual numeric operators with them:

Decimal num = ... ;
Decimal denom = ... ;
Decimal ratio = num / denom;

Note that ceylon.decimal is currently JVM-only.

Tip: abstracting over numeric types

Since all numeric types are subtypes of Numeric, it's possible to write generic code that treats numeric values polymorphically.

Value ratio<Value>(Value num, Value denom) 
        given Value satisfies Numeric<Value>
        => num/denom;

You can pass Floats, Integers, Wholes, Decimals or any other numeric type to ratio().

Gotcha!

Since polymorphic numeric functions can't be optimized to use VM-level primitive types, when executed on the JVM, the generic function above is likely to be much slower than a function which accepts two Floats or two Integers. Assignment of an Integer or Float to a generic type like Numeric or Summable necessarily involves boxing. That's invisible at the level of the Ceylon code, but it's significant at runtime.

(On a JavaScript VM, you can expect a much smaller performance penalty.)

Numeric widening

As mentioned earlier, Ceylon doesn't have implicit type conversions, not even built-in conversions for numeric types. Thus, assignment to the type Float does not automatically widen an expression of type Integer. Instead, we have to perform the type conversion explicitly:

Float zero = 0.float; // explicitly widen from Integer

You can use all the operators you're used to from other C-style languages with the numeric types. You can also use the ^ operator to raise a number to a power:

Float diagonal = (length^2.0+width^2.0)^0.5;

Of course, if you want to use the increment ++ operator, decrement -- operator, or one of the compound assignment operators such as +=, you'll have to declare the value variable.

Since it's quite noisy to explicitly perform numeric widening in numeric expressions, the numeric operators do automatically widen their operands, so we could write the expression above like this:

Float diagonal = (length^2+width^2)^(1.0/2);

Since ceylon.language only has two numeric types the only automatic widening conversion is from Integer to Float. This is the one and only thing approaching an implicit type conversion in the whole language.

Bytes

The class Byte is very different from bytes in Java, C#, or C. A Byte is considered to represent a congruence class of integers modulo 256. That is to say, a Byte doesn't represent just one integer value, but a whole infinite set of them!

Therefore:

  • the arithmetic operations on Byte are explicitly understood to be the operations of modular arithmetic, not of ordinary integer arithmetic,
  • there is no order for Bytes (they aren't Comparable), and
  • it doesn't even make sense to ask if a Byte is signed or unsigned!

However, Byte has two very useful attributes:

  • unsigned, which returns a positive Integer in the range 0..255, and
  • signed, which returns an Integer in the range -128..127.

You'll need to use either signed or unsigned if you want to treat a Byte value as an integer with integer arithmetic and integer ordering.

Byte is optimized by the compiler to a Java byte on the JVM, where possible.

Collections

The language module includes several interfaces that represent container types:

You might be disappointed to discover that there are no general-purpose implementations of these interfaces in the language module itself. In fact, they're only declared here so that String, Sequential, Array, and Tuple can be subtypes of List.

You might be even more disappointed when you look at these interfaces and discover that they're missing half the useful operations you're used to seeing on a collection: they have no operations at all for building or mutating the collection. Actually, there's a couple of good reasons for this:

  • It's usually best for an API to return an obviously read-only collection to clients, instead of leaving the client scratching his head wondering whether mutating this collection results in mutation of the internal data structures held by the API, and whether this is safe.
  • Making these interfaces read-only means they can be declared covariant in their type parameters.

The module ceylon.collection contains general-purpose implementations of these interfaces, along with APIs for building and mutating collections: MutableList, MutableMap, and MutableSet.

Tip: creating an immutable Map or Set

If you only need an immutable Map or Set, the language module functions map() and set() may be used to create one.

There's more...

The language module isn't by itself a platform for building applications. It's a minimal set of basic types that form part of the language definition itself. The Ceylon SDK provides a set of platform modules—basic building blocks for all sorts of programs—including ceylon.collection, ceylon.file, ceylon.process, ceylon.dbc, ceylon.json, ceylon.numeric, ceylon.whole, ceylon.decimal, ceylon.unicode, ceylon.uri, ceylon.http.client, ceylon.http.server, ceylon.buffer, ceylon.logging, ceylon.test, ceylon.time, ceylon.random, ceylon.regex, ceylon.transaction, ceylon.promise, and ceylon.locale.

Next we're going to come back to the subject of object initialization, and deal with a subtle problem affecting languages like Java and C#.