Blog tagged language design

On three-legged elephants

I've often argued that good design—of a language, library, or framework—isn't about packing in as many features as possible into your solution, rather it's about discovering a small set of interlocking features that act to reinforce each other. That is to say, we want to maximize the power/expressiveness of our solution, while simultaneously minimizing the surface area. I am, furthermore, quite often willing to sacrifice some flexibility for elegance. It's my view that elegant solutions are easier to learn, more enjoyable to use, and easier to abstract over.

With this in mind, I would like to consider a problem that language designers have been working on for at least two decades: how to combine subtype polymorphism with parametric polymorphism (generics). This is a central problem faced by any object-oriented language with static typing. Recent languages have come progressively closer to a satisfying solution, but I would like to submit, if it doesn't sound too self-serving, that Ceylon offers the most satisfying solution so far.

Our mascot is Trompon the elephant, because an elephant has four legs and would fall over if one of his legs were missing. Ceylon's type system is exactly like this! (Yes, this is a baxplanation.)

The four legs of the type system are:

  • declaration site variance
  • ad hoc union and intersection types
  • type inference based on principal typing
  • covariant refinement and principal instantiation inheritance

If we were to take away any one of those four characteristics, all of a sudden stuff that Just Works would simply not work anymore, or, even if we could make it work, it would turn out way more complex, and involve reasoning that is difficult for the programmer to reproduce.

Consider this really simple line of code:

value animals = ArrayList { Cat(), Dog(), Person() };

The inferred type of animals is ArrayList<Cat|Dog|Person>.

  • If we were to take away declaration site covariance, then animals would not be assignable to List<Animal>.
  • If we were to take away union and intersection types, then the process for inferring the type argument would be ambiguous and much more complex. (Ceylon's type argument inference algorithm is defined in two pages of pseudocode in the language specification, which sounds like a lot, until you realize how problematic and underspecified this algorithm is in other languages, and that the actual implementation of the algorithm is not much more longer.)
  • If we were to take away type inference, or principal typing, we would need to explicitly write down some uninteresting types in this line of code.

Minus any one of these characteristics, we're left with a three-legged elephant.

Principal instantiation inheritance is a kind-of hidden feature of Ceylon that we don't talk about much, even though we use it extensively throughout the design of our container types. It lets us say, for example, that a List<Element> is a Ranged<List<Element>>, and that a Sequence<Element> is a List<Element>&Ranged<Sequence<Element>>. Principal instantiation inheritance meshes really nicely with declaration-site covariance, and with ad hoc union/intersection types. Consider the following code:

List<String>|Integer[] ranged = ... ;
value span = ranged.span(0,max); 

Here, the inferred type of span is List<String>|Integer[]. Isn't that nice? The typechecker has reasoned that the principal supertype instantiation of Ranged for List<String>|Integer[] is the type Ranged<List<String>|Integer[]>, and thereby determined the perfect return type for span().

If we were to take away any one of declaration site covariance, principal instantiation inheritance, or union types, then this reasoning would no longer be sound. The elephant would fall on his ass.

Expressiveness

I just noticed a tweet by Paul Snively that I thought was interesting. I don't make it my practice to respond to tweets, simply because >99.9% of them are just idiotic. But I'll make an exception here, because I respect Paul, because he frames an interesting issue in a way I agree with, and because his tweet, which I ultimately disagree with, as I'll explain, is certainly at least partly true, at least from a certain perspective. Paul wrote:

One reason I prefer #scala to #kotlin or #ceylon is it adheres to the #lisp #smalltalk dictum of not confining expressive power to itself.

So, let's start by seeing what's right about this. Where in the language definition does Ceylon "reserve expressive power to itself"?

Well, the things that stand out to me are:

  • control structures,
  • operators,
  • type name abbreviations.

We "reserve expressive power" here, in the sense that we don't let you write your own MyTuple class, and then instantiate it using this syntax:

[String,String] myTuple = ["hello","world"];

If you want to use the brackets, you're stuck with our Tuple class, and if it doesn't suit your needs just right, you're going to have to write:

MyTuple<String,MyTuple<String>> myTuple = MyTuple("hello",MyTuple("world"));

Or whatever. Likewise, while you can certainly use the + operator with your own Complex class, if you want to use it to add a Period to a Datetime, you're out of luck. The Summable<Other> interface expresses that + must be a symmetric operation via the self-type constraint on Other. You're going to have to write:

value date = delay after now;

Or whatever.

Now, I could write a long post defending this choice, but here I'll simply note that:

  • Scala doesn't let you define type abbreviations or control structures either, but
  • it does let you define your own operators.

That is, Scala, like plenty of other languages, lets you take any arbitrary string of unicode characters and turn it into an operator.

And just look at what a mess that turned out to be. Sure, there are some nice uses of operator overloading in the Scala world (parser combinators), along with some abominations (SBT). On balance, I, and many others, believe that untrammeled operator overloading has been a net negative in Scala and C++ and other languages that have tried it. So Ceylon takes a middle road: operator polymorphism.

This means that you're stuck with the operators we think are important. But at least everyone in the Ceylon community is stuck with the same ones! Which makes Ceylon code much more immediately understandable to someone who knows the syntax of Ceylon, because the things which aren't predefined symbolic operators have meaningful names. So if by "expressive power", you take into account the ability to be understood by others, I think of this as a feature.

Fine, anyway, we can agree to disagree on this one, it's not the main point I want to make.

The real point I want to make is that I measure "expressive power" of a language, not mainly by what superficial syntax it supports (though syntax is certainly not unimportant), but rather by what can be expressed within the type system. And here's where it has been a central animating design principle that we weren't going to build in special little escape hatches or ad hoc features for things we think are important. That's why, for example, null is an instance of the ordinary class Null in Ceylon, unlike in Java or Scala where it's a primitively defined value of the bottom type. That's why Tuple is just an ordinary class, and Callable is just an ordinary interface, unlike in many languages where tuple types and function types are defined primitively. That's why the operators we do have are mostly defined to operate against generic interfaces instead of against an enumerated list of language module types.

Now, sure, of course Ceylon could never be compared to Lisp, or even to Smalltalk in this respect. But that's simply an unfair comparison. Of course a dynamically typed language is more expressive than a language with static typing. Duh. You can't reasonably compare the expressiveness of ML or Haskell to Lisp either. But statically typed languages have their own advantages, and that's why static typing is where the real action is right now.

So I think it's a misunderstanding of Ceylon to imagine that we're reserving much expressive power to oursevles. No, we don't let you define your own pope operator (-:|-+>, and I guess some people will find that limits their self-expression. But I believe Ceylon will foster other productive avenues for them to express their creativity.

UPDATE:

To take this argument even further, consider our rule against the use of non-denoteable types: we don't use non-denotable types, or operations upon types that can't be expressed within the language itself, even internally in the typechecker to reason about your code. Since we needed union and intersection types to reason about generics, we needed to make them a part of the language. And they turned out to be perhaps the most interesting and powerful feature of our typesystem.

Things static type systems shouldn't have

If you've worked with any programming language, you have gripes about it. All design involves compromises, and all languages have warts. More interesting are the warts that are shared by many languages. Here's a short list of things that are extremely common in static type systems, that I think are almost always mistakes.

A populated bottom type

By definition, the bottom type is a type with no values. If null is an instance of the bottom type, that's a hole in the type system. Yes, yes, it's done for convenience. But if we were to go round adding holes into our type system every time we run into something slightly inconvenient, we may as well just give up and go use a dynamic language.

Non-denoteable types

A non-denoteable type is a type which can't be written down in the programming language itself. Many languages with sophisticated static type systems require the use of non-denoteable types to typecheck the code. This almost guarantees that the programmer can't always reproduce the compiler's reasoning, or understand the resulting error messages.

Aggressively replacing unions with intersections

For some reason, there's some kind of prejudice against union types. Your typical compiler, upon encountering something internally that is naturally a union type, immediately attempts to replace it with the intersection of its supertypes. This doesn't make a whole lot of sense to me, since in these languages both unions and intersections are usually equally non-denoteable, and this operation involves throwing away a whole lot of useful information about the type.

Assigning references a single immutable type

A value has an immutable type. A reference to a value doesn't need to. Indeed, what the compiler knows (or rather could know) about the type of a reference changes depending upon precisely where we are in the code. But there's a strong tradition that each named reference has a single declared or inferred type everywhere it is used. This tradition even infects Ceylon to some extent (though this will change in future versions).

Primitive types or primitively-defined compound types

I hesitated to put this one on the list. There is an approach to building a static type system where we start with certain primitive values, layer tuple support over those types, layer records over tuples, and eventually arrive at generic abstract data types. I'm not talking about that kind of type system. What I'm talking about is where we start with a totally general way of defining abstract data types, and then stick some extra primitive types into the system because we find our supposedly totally general thing to be inadequate.

Overloading and implicit type conversions

Yes, yes, I know they seem convenient. But in fact the benefits provided by overloading or implicit type conversions are almost entirely trivial. They just don't increase the expressiveness of your type system. And they really screw up type inference, which is why you don't see these things in academic or research languages. Dynamic languages don't have anything like overloading or implicit type conversions, and nobody misses them. Because you don't need 'em. They're an abuse of the static type information.

A sort of manifesto

The latest generation of statically typed languages combine subtyping, type parameterization, and varying levels of support for type inference. But their type systems have - reflecting their development over time - grown so complex and riddled with corner cases and unintuitive behavior that few programmers are able to understand the reasoning of the compiler all the time. Error messages are confusing, referring to concepts that only language theorists care about, or to types which can't be written down in the language itself. The type systems are so complex that they even include corner cases which are mathematically undecidable, resulting in the possibility of compile-time stack overflows or non-termination.

For example, Java features sophisticated support for covariance and contravariance via wildcard type arguments. But most Java developers avoid making use of this extremely powerful feature of the type system, viewing it as a source of confusing error messages and syntactic ugliness.

Other languages introduce even further complexity to this mix, for example, implicit type conversions, which result in unintuitive behavior, especially when combined with generic types and covariance, or nonlocal type inference, which can result in frustratingly slow compilation times.

But the desire to combine the best of the ML family of languages with the best of the C family of languages is not, in itself, a misplaced goal. A type system featuring subtyping, type parameterization, simplified covariance and contravariance, limited type inference, and algebraic types could be a powerful tool in the hands of developers, if it:

  • could be defined in terms that reflect how programmers intuitively reason about types,
  • was packaged up in a syntax that is easy to read and understand, and
  • constrained so as to eliminate performance problems, undecidability, and confusing corner cases.

A language like this would necessarily be simpler than some other recent statically typed languages, but ultimately more powerful, if its features were more accessible to developers.

Ceylon is a language with several goals, including readability, modularity, and excellent tool support. In terms of its type system, a core goal is to provide powerful mechanisms of abstraction without scaring away the people who will ultimately benefit from this power with confusing error messages and unintuitive collisions between intersecting language features.

In pursuit of that goal, we've had to start by taking some things away. When you start using Ceylon, you'll find there are some things you can do in Java that you can't do in Ceylon. Our hope is that, as you get further into the language, you'll find yourself naturally writing better code, making use of more general-purpose abstractions and stronger type safety, without even really needing to try. It will just feel like the most natural way to do it.

At least, that's the theory. ;-)