Streams, sequences, and tuples

This is the sixth leg of the Tour of Ceylon. In the previous leg we covered anonymous classes and member classes. Now we're going to look at streams, sequences, and tuples. These are examples of generic container objects. Don't worry, we'll come back to talk more about generics later.

Streams (Iterables)

An iterable object, or stream, is an object that produces a stream of values. Streams satisfy the interface Iterable.

Ceylon provides some syntax sugar for working with streams:

  • the type Iterable<X,Null> represents a stream that might not produce any values when it is iterated, and may be abbreviated {X*}, and
  • the type Iterable<X,Nothing> represents a stream that always produces at least one value when it is iterated, and is usually abbreviated {X+}.

We may construct an instance of Iterable using braces:

{String+} words = { "hello", "world" };
{String+} moreWords = { "hola", "mundo", *words };

The prefix * is called the spread operator. It "spreads" the values of a stream. So moreWords produces the values "hola", "mundo", "hello", "world" when iterated.

As we'll see later, the braces may even contain a comprehension, making them much more powerful than what you see here.

Iterable is a subtype of the interface Category, so we can use the in operator to test if a value is produced by the Iterable.

if (exists char = text[i],
    char in {',', '.', '!', '?', ';', ':'}) {
    //...
}

"index must be between 1 and 100"
assert (index in 1..100);

The in operator is just syntactic sugar for the method contains() of Category.

Iterating a stream

To iterate an instance of Iterable, we can use a for loop:

for (word in moreWords) {
    print(word);
}

(Note that in this context, the in keyword isn't the operator we just met above, it's just part of the syntax of the for loop.)

If, for any reason, we need an index for each element produced by a stream, we can use the following idiom to iterate a streams of Entrys:

for (i -> word in moreWords.indexed) {
    print("``i``: ``word``");
}

This idiom makes use of destructuring, which we'll learn about at the end of this leg of the tour.

The indexed attribute returns a stream of entries containing the indexed elements of the original stream.

(Note: the arrow -> is syntax sugar for the class Entry. So we can write the type of the entry stream as {<Integer->String>*}.)

It's often useful to be able to iterate two sequences at once. The zipEntries() function comes in handy here:

for (name -> place in zipEntries(names,places)) {
    print(name + " @ " + place);
}

Now there's one very important thing to know when you start mixing streams with mutable objects, variables, or impure functions. This is a common source of error for folks new to Ceylon.

Gotcha!

Streams created using the { ... } syntax are always lazy. That is:

  • their elements are not evaluated until the stream is iterated, and
  • each element is reevaluated each time the stream is iterated.

Consider this code:

variable value counter = 0;
value stream = { for (i in 0:5) counter++ }; //curly braces means LAZY!
print(stream); //evaluate elements
print(stream); //reevaluate elements

The code prints:

{ 0, 1, 2, 3, 4 }
{ 5, 6, 7, 8, 9 }

If this behavior is not what you're looking for, you'll need a different sort of stream! One option is to use a sequence instead.

variable value counter = 0;
value stream = [ for (i in 0:5) counter++ ]; //square brackets means EAGER!
print(stream); //elements already evaluated
print(stream); //elements already evaluated

This code prints:

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]

An even more confusing example arises when one attempts to form a stream by incrementally "cons"-ing elements at the head of the stream. A first, naive attempt might look like this:

variable value stream = { 0 };
stream = { 1, *stream };
stream = { 2, *stream };

This code results in an infinite stream of 2s, instead of the stream {2,1,0}. That's because the spread references to stream is evaluated lazily!

variable value stream = { 0 };
stream = stream.follow(1);
stream = stream.follow(2);

Alternatively, one could use a sequence:

variable [Integer+] sequence = [0];
sequence = [1, *sequence];
sequence = [2, *sequence];

Laziness is not the only stream-related gotcha.

Gotcha again!

Iterable is an extremely abstract type, and many different container types satisfy it. These various container types each have their own notions of what "equality" means. For example, in a List, the order of elements is important, whereas in a Set, it isn't. Furthermore, there are infinite streams for which equality is not computable.

Therefore, value equality—the == operator—is not considered a well-defined operation for streams, unless the streams have some known additional structure in common. The following assertion produces a warning at compilation time:

assert ({1, 2} == {1, 2}); //warning: equality not well-defined

We should rewrite this code to use some other data structure for which value equality is well-defined. For example, if we didn't care about order, we could compare Sets:

assert (set {1, 2} == set {2, 1});

Or, if we did want to take order into account, we could use sequences:

assert ([1, 2] == [1, 2]);

So now, naturally, it's time to learn about sequences.

Sequences

Some kind of array or list construct is a universal feature of all programming languages. The Ceylon language module defines support for sequence types via the interfaces Sequential, Sequence, and Empty.

Again, there is some syntax sugar associated with sequences:

  • the type Sequential<X> represents a sequence that may be empty, and may be abbreviated [X*] or X[],
  • the type Sequence<X> represents a nonempty sequence, and may be abbreviated [X+], and
  • the type Empty represents an empty sequence and is abbreviated [].

Some operations of the type Sequence aren't defined by Sequential, so you can't call them if all you have is X[]. Therefore, we need the if (nonempty ... ) construct to gain access to these operations.

void printBounds(String[] strings) {
    if (nonempty strings) {
        //strings is of type [String+] here
        print(strings.first + ".." + strings.last);
    }
    else {
        print("Empty");
    }
}

Notice how this is just a continuation of the pattern established for null value handling. In fact, both these constructs are just syntactic abbreviations for type narrowing:

  • if (nonempty strings) is an abbreviation for if (is [String+] strings), just like
  • if (exists name) is an abbreviation for if (is Object name).

Sequence syntax sugar

There's lots more syntactic sugar for sequences. We can use a bunch of familiar Java-like syntax:

String[] operators = [ "+", "-", "*", "/" ];
String? plus = operators[0];
String[] multiplicative = operators[2..3];

Oh, and the expression [] evaluates to an instance of Empty.

[] none = [];

However, unlike Java, all these syntactic constructs are pure abbreviations. The code above is exactly equivalent to the following de-sugared code:

Sequential<String> operators = ... ;
Null|String plus = operators.get(0);
Sequential<String> multiplicative = operators.span(2,3);

(We'll come back to what the list of values in brackets means in a minute!)

The Sequential interface extends Iterable, so we can iterate a Sequential using a for loop:

for (op in operators) {
    print(op);
}

Ranges

A Range is a kind of Sequence. The span function creates a Range. The following:

Character[] uppercaseLetters = 'A'..'Z';
Integer[] countDown = 10..0;

Is just sugar for:

Sequential<Character> uppercaseLetters = span('A','Z');
Sequential<Integer> countDown = span(10,0);

In fact, this is just a sneak preview of the fact that almost all operators in Ceylon are just sugar for method calls upon a type. We'll come back to this later, when we talk about operator polymorphism.

Ceylon doesn't need C-style for loops. Instead, combine for with the range operator:

variable Integer fac=1;
for (n in 1..100) {
    fac*=n;
    print("Factorial ``n``! = ``fac``");
}

Sequence and its supertypes

It's probably a good time to see some more advanced Ceylon code. What better place to find some than in the language module itself?

You can find the API documentation and source code of Sequence online, or you can go to Navigate > Open Ceylon Declaration... to view the declaration of Sequential directly inside Ceylon IDE for Eclipse.

The most important operations of Sequential are inherited from Correspondence, and Iterable.

  • Correspondence provides the capability to access elements of the sequence by index, and
  • Iterable provides the ability to iterate the elements of the sequence.

Now open the class Range in the IDE, to see a concrete implementation of the Sequence interface.

Empty sequences and the bottom type

Finally, check out the definition of Empty. Notice that Empty is declared to be a subtype of List<Nothing>. This special type Nothing, often called the bottom type, represents:

  • the empty set, or equivalently
  • the intersection of all types.

Since the empty set is a subset of all other sets, Nothing is assignable to all other types. Why is this useful here? Well, Correspondence<Integer,Element> and Iterable<Element> are both covariant in the type parameter Element. So Empty is assignable to Correspondence<Integer,T> and Iterable<T> for any type T. That's why Empty doesn't need a type parameter.

Since there are no actual instances of Nothing, if you ever see an attribute or method of type Nothing, you know for certain that it can't possibly ever return a value. There is only one possible way that such an operation can terminate: by throwing an exception.

Another cool thing to notice here is the return type of the first and item() operations of Empty. You might have been expecting to see Nothing? here, since they override supertype members of type T?. But as we saw in the first part of the Tour, Nothing? is just an abbreviation for Null|Nothing. And Nothing is the empty set, so the union Nothing|T of Nothing with any other type T is just T itself.

The Ceylon compiler is able to do all this reasoning automatically. So when it sees an Iterable<Nothing>, it knows that the operation first is of type Null, i.e. that its value is null.

Cool, huh?

Sequence gotchas for Java developers

Superficially, a sequence type looks a lot like a Java array, but really it's very, very different! First, of course, a sequence type Sequential<String> is an immutable interface, it's not a mutable concrete type like an array. We can't set the value of an element:

String[] operators = .... ;
operators[0] = "^"; //compile error

Furthermore, the index operation operators[i] returns an optional type String?, which results in quite different code idioms. To begin with, we don't iterate sequences by index like in C or Java. The following code does not compile:

for (i in 0..operators.size-1) {
    String op = operators[i]; //compile error
    // ...
}

Here, operators[i] is a String?, which is not directly assignable to String.

Instead, if we need access to the index, we use the the idiom we saw above:

for (i -> op in operators.indexed) {
    // ...
}

Likewise, we don't usually do an upfront check of an index against the sequence length:

if (i>operators.size-1) {
    throw IndexOutOfBoundException();
}
else {
    return operators[i]; //compile error
}

Instead, we do the check after accessing the sequence element:

if (exists op = operators[i]) {
    return op;
}
else {
    throw IndexOutOfBoundException();
}

Indeed, this is a common use for assert:

assert (exists op = operators[i]);
return op;

We especially don't ever need to write the following:

if (i>operators.size-1) {
    return "";
}
else {
    return operators[i]; //compile error
}

This is much cleaner:

return operators[i] else "";

All this may take a little getting used to. But what's nice is that all the exact same idioms also apply to other kinds of Correspondence, including Maps .

Tuples

A tuple is a linked list which captures the static type of each individual element in the list. For example:

[Float,Float,String] point = [0.0, 0.0, "origin"];

This tuple contains two Floats followed by a String. That information is captured in its static type, [Float,Float,String].

Tuple<Float|String,Float,Tuple<Float|String,Float,Tuple<String,String>>>
        point = Tuple(0.0, Tuple(0.0, Tuple("origin", [])));

Therefore, we always use syntax sugar when working with tuples.

Tuple extends Sequence, so we can do all the usual kinds of sequency things to a tuple, iterate it, and so on. As with sequences, we can access a tuple element by index. But in the case of a tuple, Ceylon is able to determine the type of the element when the index is a literal integer:

Float x = point[0];
Float y = point[1];
String label = point[2];
Null zippo = point[3];

A unterminated tuple is a tuple where the last link in the list is a sequence, not an Empty. For example:

String[] labels = ... ;
[Float,Float,String*] point = [0.0, 0.0, *labels];

This tuple contains two Floats followed by an unknown number of Strings.

Now we can see that a sequence type like [String*] or [String+] can be viewed as a degenerate tuple type!

Destructuring

Individually accessing the elements of a tuple by numeric index can be a little verbose, so Ceylon supports a sophisticated sort of parallel assignment called destructuring. We can rewrite the code above like this:

value [x, y, label] = point;

This introduces three new values, x and y of inferred type Float, and label of inferred type String.

The syntax [x, y, label] is called a tuple pattern. Tuple patterns are used to destructure nonempty sequences. A tuple pattern is a list of value names, the last of which may be a tail value, indicated using the spread operator:

value labeled = ["one two three", 1.0, 2.0, 3.0];
value [label, *point] = labeled; //point is a tail with type Float[3]

A tuple pattern may include explicit element types:

value [Float x, Float y, String label] = point;

We can use destructuring in for loops:

for ([x, y, label] in points) {
    print("``label``: (``x``, ``y``)");
}

And in exists and nonempty conditions in if or while :

if (nonempty [name, *rest] = process.arguments) {
    print("Hello ``name``!");
}

And in cases of a switch statement or expression:

Float[2]|Float[3] coord = ... ;
switch (coord)
case ([Float x, Float y]) { 
    print((x^2+y^2)^0.5);
}
case ([Float x, Float y, Float z]) {
    print((x^2+y^2+z^2)^0.5);
}

And even in a let expression:

print(let ([x, y] = [1.0, 2.0]) "(``x``, ``y``)");

We can also destructure Entrys. We've already seen this used in a for loop:

for (i -> op in operators.indexed) {
    // ...
}

The syntax i -> op is called an entry pattern. An entry pattern may include explicit key and item types:

for (Integer i -> String op in operators.indexed) {
    // ...
}

More complex destructuring patterns may be formed by nesting tuple and entry patterns, for example:

for (i -> [en, es] in translations.indexed) {
    print("``i``: ``en`` ``es``");
}

Finally, destructuring may be used in the parameter list of an anonymous function. But we'll discuss that later in the tour.

There's more...

If you're interested, you can find a more in-depth discussion of tuples here.

You can read more about destructuring here.

Next up we'll explore some more details of the type system, starting with type aliases and type inference.