Streams, sequences, and tuples
This is the sixth leg of the Tour of Ceylon. In the previous leg we covered anonymous classes and member classes. Now we're going to look at streams, sequences, and tuples. These are examples of generic container objects. Don't worry, we'll come back to talk more about generics later.
Streams (Iterables)
An iterable object, or stream, is an object that produces a stream of
values. Streams satisfy the interface
Iterable
.
Ceylon provides some syntax sugar for working with streams:
- the type
Iterable<X,Null>
represents a stream that might not produce any values when it is iterated, and may be abbreviated{X*}
, and - the type
Iterable<X,Nothing>
represents a stream that always produces at least one value when it is iterated, and is usually abbreviated{X+}
.
We may construct an instance of Iterable
using braces:
{String+} words = { "hello", "world" };
{String+} moreWords = { "hola", "mundo", *words };
The prefix *
is called the spread operator. It "spreads" the values of a
stream. So moreWords
produces the values "hola", "mundo", "hello", "world"
when iterated.
As we'll see later, the braces may even contain a comprehension, making them much more powerful than what you see here.
Iterable
is a subtype of the interface
Category
,
so we can use the in
operator to test if a value is produced by the
Iterable
.
if (exists char = text[i],
char in {',', '.', '!', '?', ';', ':'}) {
//...
}
"index must be between 1 and 100"
assert (index in 1..100);
The in
operator is just syntactic sugar for the method
contains()
of
Category
.
Iterating a stream
To iterate an instance of Iterable
, we can use a
for
loop:
for (word in moreWords) {
print(word);
}
(Note that in this context, the in
keyword isn't the operator we just met
above, it's just part of the syntax of the for
loop.)
If, for any reason, we need an index for each element produced by a stream,
we can use the following idiom to iterate a streams of
Entry
s:
for (i -> word in moreWords.indexed) {
print("``i``: ``word``");
}
This idiom makes use of destructuring, which we'll learn about at the end of this leg of the tour.
The indexed
attribute returns a stream of entries containing the indexed elements of
the original stream.
(Note: the arrow ->
is syntax sugar for the class Entry
. So we can
write the type of the entry stream as {<Integer->String>*}
.)
It's often useful to be able to iterate two sequences at once. The
zipEntries()
function
comes in handy here:
for (name -> place in zipEntries(names,places)) {
print(name + " @ " + place);
}
Now there's one very important thing to know when you start mixing streams with mutable objects, variables, or impure functions. This is a common source of error for folks new to Ceylon.
Gotcha!
Streams created using the { ... }
syntax are always lazy. That is:
- their elements are not evaluated until the stream is iterated, and
- each element is reevaluated each time the stream is iterated.
Consider this code:
variable value counter = 0;
value stream = { for (i in 0:5) counter++ }; //curly braces means LAZY!
print(stream); //evaluate elements
print(stream); //reevaluate elements
The code prints:
{ 0, 1, 2, 3, 4 }
{ 5, 6, 7, 8, 9 }
If this behavior is not what you're looking for, you'll need a different sort of stream! One option is to use a sequence instead.
variable value counter = 0;
value stream = [ for (i in 0:5) counter++ ]; //square brackets means EAGER!
print(stream); //elements already evaluated
print(stream); //elements already evaluated
This code prints:
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
An even more confusing example arises when one attempts to form a stream by incrementally "cons"-ing elements at the head of the stream. A first, naive attempt might look like this:
variable value stream = { 0 };
stream = { 1, *stream };
stream = { 2, *stream };
This code results in an infinite stream of 2
s, instead of the stream {2,1,0}
.
That's because the spread references to stream
is evaluated lazily!
The recommended solution is to use the follow()
method, which forces the
references to stream
to be evaluated eagerly:
variable value stream = { 0 };
stream = stream.follow(1);
stream = stream.follow(2);
Alternatively, one could use a sequence:
variable [Integer+] sequence = [0];
sequence = [1, *sequence];
sequence = [2, *sequence];
Laziness is not the only stream-related gotcha.
Gotcha again!
Iterable
is an extremely abstract type, and many different container types
satisfy it. These various container types each have their own notions of what
"equality" means. For example, in a List
, the order of elements is important,
whereas in a Set
, it isn't. Furthermore, there are infinite streams for
which equality is not computable.
Therefore, value equality—the ==
operator—is not considered
a well-defined operation for streams, unless the streams have some known
additional structure in common. The following assertion produces a warning at
compilation time:
assert ({1, 2} == {1, 2}); //warning: equality not well-defined
We should rewrite this code to use some other data structure for which value
equality is well-defined. For example, if we didn't care about order, we
could compare Set
s:
assert (set {1, 2} == set {2, 1});
Or, if we did want to take order into account, we could use sequences:
assert ([1, 2] == [1, 2]);
So now, naturally, it's time to learn about sequences.
Sequences
Some kind of array or list construct is a universal feature of all programming
languages. The Ceylon language module defines support for sequence types via
the interfaces Sequential
,
Sequence
,
and Empty
.
Again, there is some syntax sugar associated with sequences:
- the type
Sequential<X>
represents a sequence that may be empty, and may be abbreviated[X*]
orX[]
, - the type
Sequence<X>
represents a nonempty sequence, and may be abbreviated[X+]
, and - the type
Empty
represents an empty sequence and is abbreviated[]
.
Some operations of the type Sequence
aren't defined by Sequential
, so you
can't call them if all you have is X[]
. Therefore, we need the
if (nonempty ... )
construct to gain access to these operations.
void printBounds(String[] strings) {
if (nonempty strings) {
//strings is of type [String+] here
print(strings.first + ".." + strings.last);
}
else {
print("Empty");
}
}
Notice how this is just a continuation of the
pattern established for
null
value handling. In fact, both these constructs are just syntactic
abbreviations for type narrowing:
-
if (nonempty strings)
is an abbreviation forif (is [String+] strings)
, just like -
if (exists name)
is an abbreviation forif (is Object name)
.
Sequence syntax sugar
There's lots more syntactic sugar for sequences. We can use a bunch of familiar Java-like syntax:
String[] operators = [ "+", "-", "*", "/" ];
String? plus = operators[0];
String[] multiplicative = operators[2..3];
Oh, and the expression []
evaluates to an instance of Empty
.
[] none = [];
However, unlike Java, all these syntactic constructs are pure abbreviations. The code above is exactly equivalent to the following de-sugared code:
Sequential<String> operators = ... ;
Null|String plus = operators.get(0);
Sequential<String> multiplicative = operators.span(2,3);
(We'll come back to what the list of values in brackets means in a minute!)
The Sequential
interface extends
Iterable
,
so we can iterate a Sequential
using a for
loop:
for (op in operators) {
print(op);
}
Ranges
A Range
is a kind of Sequence
. The span
function creates a Range
. The following:
Character[] uppercaseLetters = 'A'..'Z';
Integer[] countDown = 10..0;
Is just sugar for:
Sequential<Character> uppercaseLetters = span('A','Z');
Sequential<Integer> countDown = span(10,0);
In fact, this is just a sneak preview of the fact that almost all operators in Ceylon are just sugar for method calls upon a type. We'll come back to this later, when we talk about operator polymorphism.
Ceylon doesn't need C-style for
loops. Instead, combine for
with the
range operator:
variable Integer fac=1;
for (n in 1..100) {
fac*=n;
print("Factorial ``n``! = ``fac``");
}
Sequence and its supertypes
It's probably a good time to see some more advanced Ceylon code. What better place to find some than in the language module itself?
You can find the API documentation and source code of
Sequence
online, or you can go to Navigate > Open Ceylon Declaration...
to view the
declaration of Sequential
directly inside Ceylon IDE for Eclipse.
The most important operations of Sequential
are inherited from
Correspondence
,
and Iterable
.
-
Correspondence
provides the capability to access elements of the sequence by index, and -
Iterable
provides the ability to iterate the elements of the sequence.
Now open the class Range
in the IDE, to see a concrete implementation of the Sequence
interface.
Empty sequences and the bottom type
Finally, check out the definition of
Empty
.
Notice that Empty
is declared to be a subtype of List<Nothing>
. This special
type Nothing
, often called the bottom type, represents:
- the empty set, or equivalently
- the intersection of all types.
Since the empty set is a subset of all other sets, Nothing
is assignable to
all other types. Why is this useful here? Well, Correspondence<Integer,Element>
and Iterable<Element>
are both covariant in the type parameter Element
. So
Empty
is assignable to Correspondence<Integer,T>
and Iterable<T>
for any
type T
. That's why Empty
doesn't need a type parameter.
Since there are no actual instances of Nothing
, if you ever see an attribute
or method of type Nothing
, you know for certain that it can't possibly ever
return a value. There is only one possible way that such an operation can
terminate: by throwing an exception.
Another cool thing to notice here is the return type of the
first
and
item()
operations
of Empty
. You might have been expecting to see Nothing?
here, since they
override supertype members of type T?
. But as we saw in the
first part of the Tour, Nothing?
is just an abbreviation for
Null|Nothing
. And Nothing
is the empty set, so the union Nothing|T
of
Nothing
with any other type T
is just T
itself.
The Ceylon compiler is able to do all this reasoning automatically. So when
it sees an Iterable<Nothing>
, it knows that the operation first
is of type
Null
, i.e. that its value is null
.
Cool, huh?
Sequence gotchas for Java developers
Superficially, a sequence type looks a lot like a Java array, but really it's
very, very different! First, of course, a sequence type Sequential<String>
is
an immutable interface, it's not a mutable concrete type like an array. We
can't set the value of an element:
String[] operators = .... ;
operators[0] = "^"; //compile error
Furthermore, the index operation operators[i]
returns an optional type
String?
, which results in quite different code idioms. To begin with, we
don't iterate sequences by index like in C or Java. The following code does
not compile:
for (i in 0..operators.size-1) {
String op = operators[i]; //compile error
// ...
}
Here, operators[i]
is a String?
, which is not directly assignable to
String
.
Instead, if we need access to the index, we use the the idiom we saw above:
for (i -> op in operators.indexed) {
// ...
}
Likewise, we don't usually do an upfront check of an index against the sequence length:
if (i>operators.size-1) {
throw IndexOutOfBoundException();
}
else {
return operators[i]; //compile error
}
Instead, we do the check after accessing the sequence element:
if (exists op = operators[i]) {
return op;
}
else {
throw IndexOutOfBoundException();
}
Indeed, this is a common use for assert
:
assert (exists op = operators[i]);
return op;
We especially don't ever need to write the following:
if (i>operators.size-1) {
return "";
}
else {
return operators[i]; //compile error
}
This is much cleaner:
return operators[i] else "";
All this may take a little getting used to. But what's nice is that all the
exact same idioms also apply to other kinds of Correspondence
, including
Map
s .
Tuples
A tuple is a linked list which captures the static type of each individual element in the list. For example:
[Float,Float,String] point = [0.0, 0.0, "origin"];
This tuple contains two Float
s followed by a String
. That information
is captured in its static type, [Float,Float,String]
.
Each link of the list is an instance of the class
Tuple
.
If you really must know, the code above is syntax sugar for the following:
Tuple<Float|String,Float,Tuple<Float|String,Float,Tuple<String,String>>>
point = Tuple(0.0, Tuple(0.0, Tuple("origin", [])));
Therefore, we always use syntax sugar when working with tuples.
Tuple
extends Sequence
, so we can do all the usual kinds of sequency
things to a tuple, iterate it, and so on. As with sequences, we can access
a tuple element by index. But in the case of a tuple, Ceylon is able to
determine the type of the element when the index is a literal integer:
Float x = point[0];
Float y = point[1];
String label = point[2];
Null zippo = point[3];
A unterminated tuple is a tuple where the last link in the list is
a sequence, not an Empty
. For example:
String[] labels = ... ;
[Float,Float,String*] point = [0.0, 0.0, *labels];
This tuple contains two Float
s followed by an unknown number of String
s.
Now we can see that a sequence type like [String*]
or [String+]
can
be viewed as a degenerate tuple type!
Destructuring
Individually accessing the elements of a tuple by numeric index can be a little verbose, so Ceylon supports a sophisticated sort of parallel assignment called destructuring. We can rewrite the code above like this:
value [x, y, label] = point;
This introduces three new values, x
and y
of inferred type Float
,
and label
of inferred type String
.
The syntax [x, y, label]
is called a tuple pattern. Tuple patterns
are used to destructure nonempty sequences. A tuple pattern is a list
of value names, the last of which may be a tail value, indicated using
the spread operator:
value labeled = ["one two three", 1.0, 2.0, 3.0];
value [label, *point] = labeled; //point is a tail with type Float[3]
A tuple pattern may include explicit element types:
value [Float x, Float y, String label] = point;
We can use destructuring in for
loops:
for ([x, y, label] in points) {
print("``label``: (``x``, ``y``)");
}
And in exists
and nonempty
conditions
in if
or while
:
if (nonempty [name, *rest] = process.arguments) {
print("Hello ``name``!");
}
And in case
s of a switch
statement or
expression:
Float[2]|Float[3] coord = ... ;
switch (coord)
case ([Float x, Float y]) {
print((x^2+y^2)^0.5);
}
case ([Float x, Float y, Float z]) {
print((x^2+y^2+z^2)^0.5);
}
And even in a let
expression:
print(let ([x, y] = [1.0, 2.0]) "(``x``, ``y``)");
We can also destructure Entry
s.
We've already seen this used in a for
loop:
for (i -> op in operators.indexed) {
// ...
}
The syntax i -> op
is called an entry pattern. An entry pattern
may include explicit key and item types:
for (Integer i -> String op in operators.indexed) {
// ...
}
More complex destructuring patterns may be formed by nesting tuple and entry patterns, for example:
for (i -> [en, es] in translations.indexed) {
print("``i``: ``en`` ``es``");
}
Finally, destructuring may be used in the parameter list of an anonymous function. But we'll discuss that later in the tour.
There's more...
If you're interested, you can find a more in-depth discussion of tuples here.
You can read more about destructuring here.
Next up we'll explore some more details of the type system, starting with type aliases and type inference.