About the plus symbol

Operators in Ceylon can't be overloaded. That is to say, I can't redefine an operator like + to do something totally arbitrary, say, add a new element to a list. Instead, the semantics of each operator is defined by the language specification in terms of types defined in the language module. However, many of these types, and therefore the definitions of the operators associated with them, are quite abstract. For example, + is defined in terms of an interface called Summable. So if you want to define your own Complex number class, you just make it satisfy Summable, and you can use + to add complex numbers. We call this approach operator polymorphism.

One of the first things people notice about Ceylon is that, right after singing the praises of not having operator overloading, we go right ahead and use + for string concatenation! I've seen a number of folks object that string concatenation has nothing to do with numeric addition, and that this is therefore an example of us breaking our own rules.

Well, perhaps. I admit that the main motivation for using + for string concatenation is simply that this is what most other programming languages use, and that therefore this is what we find easiest on the eyes.

On the other hand, I don't think there's a strong reason to object to the use of + for concatenation. There's no single notion of "addition" in mathematics. Quite a few different operations are traditionally called "addition", and written with the + symbol, including addition of vectors and matrices.

Generalizing over all these operations is the job of abstract algebra. So I recently spent some time nailing down and documenting how Ceylon's language module types and operators relate to the algebraic structures from abstract algebra.

The following three famous algebraic structures are of most interest to us:

  • A semigroup is a set of values with an associative binary operation.
  • A monoid is a semigroup with an identity element.
  • A group is a monoid where each value has an inverse element.

If the binary operation is also commutative, we get a commutative semigroup, a commutative monoid, or an abelian group.

Finally, a ring is a set of values with two binary operations, named addition and multiplication, where:

  • the ring is an abelian group with respect to addition,
  • the ring is a monoid with respect to multiplication, and
  • multiplication distributes over addition.

Strings with concatenation form a monoid, since string concatenation is associative, and the empty string is an identity element with respect to concatenation. They don't form a group, since there are no inverse strings. Also, string concatenation isn't commutative.

On the other hand, integers with addition form an abelian group. Together with both addition and multiplication, the integers form a ring.

We could have chosen to say that Ceylon's + operator applies only to abelian groups, or perhaps only to groups, or perhaps only to commutative monoids or only to commutative semigroups. But any of those choices would be as arbitrary as any other. Instead, we've decided to say that the interface Summable abstracts over all semigroups, thereby blessing the use of + with any mathematical semigroup. Thus, we can legally use it to denote string concatenation or any other pure associative binary operation.

Furthermore:

  • The interface Invertible abstracts over groups, allowing the use of the - operator with any mathematical group.
  • The interface Numeric abstracts over rings, allowing the use of the * and / operators with any mathematical ring.

Of course, we could have called these interfaces Semigroup, Group, and Ring, and that would have made us feel smart, perhaps, but we're trying to communicate with programmers here, not mathematicians.