Comprehensions

This is the thirteenth stop in our Tour of Ceylon. In the previous section we looked at invoking functions using named arguments. We're now ready to learn about comprehensions.

Comprehensions

A comprehension is a convenient way to transform, filter, or combine a stream or streams of values before passing the result to a function. Comprehensions act upon, and produce, instances of Iterable. A comprehension may appear:

  • inside braces, producing an stream,
  • inside brackets, producing a sequence,
  • inside a positional argument list, as an argument to a variadic parameter, or
  • inside a named argument list, as an iterable argument.

Comprehensions in stream and sequence instantiation expressions

The brace syntax for instantiating a stream accepts a comprehension, so we can use a comprehension to transform any iterable:

{String*} names = { for (p in people) p.name };

Executing the above line of code doesn't actually do very much. In particular it doesn't actually iterate the collection people, or evaluate the name attribute. That's because elements of the resulting Iterable are evaluated lazily.

The bracket syntax for instantiating a sequence also accepts a comprehension, so we can use a comprehension to build a sequence:

String[] names = [ for (p in people) p.name ];

Since sequences are by nature immutable, executing the previous statement does iterate the people and evaluate their names. But it's best to think of that as the effect of the bracket syntax, not of the comprehension itself.

Now, comprehensions aren't only useful for building iterables and sequences! They're a significantly more general purpose construct. The idea is that you can write a comprehension anywhere the language syntax accepts multiple values. That is to say, anywhere you could write a list of comma-separated expressions, or spread an iterable using *.

(Aside: actually, we sometimes prefer think of the iterable instantiation syntax and sequence instantiation syntax as just a syntactic shorthand for an ordinary named argument instantiation expression. That's not precisely how the language specification defines these constructs, but it's a useful mental model to keep handy. So the idea is that anything we can write inside braces or brackets should also be syntactically legal inside a named argument list.)

Gotcha!

It's important to have the right mental model of where a comprehension starts and finishes and what precisely it means. The comprehension is the bit which starts with for, and ends in an expression. The braces or brackets are not included.

A comprehension produces multiple values, not a single value. Therefore a comprehension is not considered an expression and we can't directly assign a comprehension to a value reference! If we just need to store the iterable stream somewhere, without evaluating any of its elements, we can use an iterable instantiation expression, exactly like the one we've just seen.

Comprehensions as variadic arguments

One place where the language "accepts multiple values" is in the positional argument list for a function with a variadic parameter.

void printNames(String* names) => printAll(names, " and ");

printNames(for (p in people) p.name);

Arguments to variadic parameters are packaged into a sequence, so the comprehension is iterated eagerly, before the result is passed to the receiving function. Therefore, we don't usually use variadic parameters for processing streams in Ceylon. That's OK, because we have an alternative option that is designed precisely with stream processing in mind.

Comprehensions in named argument lists

Now let's see what makes comprehensions really useful.

Suppose we had a class HashMap, with the following signature:

class HashMap<Key,Item>({Key->Item*} entries) { ... }

According to the previous chapter, we can pass multiple values to this parameter using a named argument list:

value numbersByName = HashMap { "one"->1, "two"->2, "three"->3 };

If multiple values are acceptable, so is a comprehension:

value numNames = ["one", "two", "three"];
value numbersByName = HashMap { for (i->w in numNames.indexed) w->i };

Going back to our previous example, we could construct a HashMap<String,Person> like this:

value peopleByName = HashMap { for (p in people) p.name->p };

As you've already guessed, the for clause of a comprehension works a bit like the for loop we met earlier. It takes each element of the Iterable stream in turn. But it does it lazily, when the receiving function actually iterates its argument!

This means that if the receiving function never actually needs to iterate the entire stream, the comprehension will never be fully evaluated. This is extremely useful for functions like every() and any():

if (every { for (p in people) p.age>=18 }) { ... }

The function every() in ceylon.language accepts a stream of Boolean values, and stops iterating the stream as soon as it encounters false.

Now let's see what the various bits of a comprehension do.

Comprehension clauses

A comprehension is a series of:

  • for clauses, and
  • if clauses,
  • followed by an expression.

You're allowed to freely mix if and for clauses. But there must be exactly one expression, and it must come last.

Transformation

The first thing we can do with a comprehension is transform the elements of the stream using an expression to produce a new value for each element. This expression appears at the end of a comprehension. It's the thing that the resulting Iterable actually iterates!

For example, this comprehension

for (p in people) p.name->p

results in an Iterable<String->Person>. For each element of people, a new Entry<String,Person> is constructed by the -> operator.

Filtering

The if clause of a comprehension allows us to skip certain elements of the stream. This comprehension produces a stream of numbers which are divisible by 3.

for (i in 0..100) if (i%3==0) i

It's especially useful to filter using if (exists ...).

for (p in people) if (exists s = p.spouse) p->s

You can even use multiple if conditions:

for (p in people) 
    if (exists s = p.spouse, 
        nonempty inlaws = s.parents) 
                p->inlaws

Products and joins

A comprehension may have more than one for clause. This allows us to combine two streams to obtain a stream of the values in their cartesian product:

for (i in 0..100) for (j in 0..10) Node(i,j)

Even more usefully, it lets us obtain a stream of associated values, a lot like a join in SQL.

for (o in orgs) for (e in o.employees) e.name

Comprehensions beginning in if

A comprehension may begin with an if clause. For example, this comprehension:

[ if (x>=0.0) x^0.5 ]

Produces the singleton [2.0] if x==4.0, and the empty tuple [] if x<0.0. It's similar to the following if expression:

if (x>=0.0) then [x^0.5] else []

Likewise, the comprehension:

{ if (exists list) for (x in list) x.string }

Produces an empty stream if list is null. It's similar to this if expression:

if (exists list) then { for (x in list) x.string } else {}

However, there's one important difference: conditions inside brace-delimited comprehensions are evaluated lazily.

There's more...

Next we're going to discuss some of the basic types from the language module, in particular numeric types, and introduce the idea of operator polymorphism.

You can read more about working with iterable objects in Ceylon in this blog post.