My previous post about
type functions
generated some interesting discussion, here, and
on reddit.
Therefore, I think it's worth tying up several loose ends
from the earlier post. So here's a collection of further
observations about type functions.
Warning: this post addresses some very technical details of
how we've incorporated type functions into Ceylon's type
system. Don't even bother continuing any further until
you've read the earlier post.
The "why" of this
The most well-known application of higher-order generics is
for representing high-level abstractions of container types:
functors, monads, and friends. That's not what motivated me
to experiment with higher-order generics in Ceylon, and as a
practical matter I still have little interest in these
abstractions, though they're fun to play with.
No, what bothered me was not that Ceylon's type system
wasn't powerful enough to represent Functor
or Monad
,
but rather that Ceylon's type system wasn't powerful enough
to represent Ceylon. I'll show you what I mean by that in
just a second. But first I want to argue that type functions
can be seen as a regularization of the language.
From a purely syntactic point of view, it's always seemed
a little strange that every sort of type declaration in
Ceylon can have a list of type parameters, except for a
type parameter itself. Furthermore, it's noticeable that I
can take a reference, or meta reference to any program
element unless it has a list of type parameters. Now, such
restrictions might seem reasonable if a parameterized type
parameter or a reference to a generic declaration were not
meaningful notions at a fundamental level. But they clearly
are meaningful, and even at least somewhat useful.
Exactly how useful is a different question—the
jury's still out, at least in my mind. And so perhaps we'll
ultimately conclude that this stuff isn't worth its weight.
The weight being, substantially, the time it takes for
programmers new to Ceylon to understand this stuff.
In the original post, I showed how type functions were
necessary to represent the type of a reference to a generic
function. One place where this problem arises is with one of
Ceylon's more unique features: its typesafe metamodel.
A use case for generic function reference types
Usually, I can obtain a metamodel object that represents a
class or function and captures its type signature. For
example, the expression `String`
evaluates to a
metamodel object that captures the type and initializer
parameters of the class String
:
Class<String,[{Character*}]> stringClass = `String`;
For a generic declaration, I can do a similar thing, as long
as I'm prepared to nail down the type arguments. For example,
I can write `Singleton<String>`
to get a metamodel
object representing the class Singleton
after applying the
type argument String
:
Class<Singleton<String>,[String]> stringSingletonClass
= `Singleton<String>`
But in Ceylon as it exists today, I can't obtain a typed
metamodel object that represents just `Singleton`
,
because to represent the type of that metamodel object I
would necessarily need a type function.
Now, with the new experimental support for type functions,
the type of the expression `Singleton`
could be
<T> => Class<Singleton<T>,[T]>()
, allowing code like this:
value singletonGenericClass = `Singleton`;
...
Class<Singleton<String>,[String]> stringSingletonClass
= singletonGenericClass<String>();
That's just one example of how allowing references to
generic functions makes Ceylon feel more "complete".
Two use cases for anonymous type functions
I get the impression that the "scariest" bit of what I've
presented in the previous post is the notation for anonymous
type functions. That is, the following syntax:
<X> => X
<X> => [X,X,X]
<X,Y> => X|Y
<T> given T satisfies Object => Category<T>(T*)
But I'm convinced that this notation is not really that
hard to understand. The reason I assert this is because if
I give each of these type functions a name, then most of
you guys have no problem understanding them:
alias Identity<X> => X;
alias Triple<X> => [X,X,X];
alias Union<X,Y> => X|Y;
alias CategoryCreator<T> given T satisfies Object => Category<T>(T*);
But—one might reasonably enquire—why do we
even need them, if the named versions are easier to read?
Well, we need them:
- in order to be able to denote the type of a reference to
a generic function—remember, we don't have
undenotable types in Ceylon—and
- to make it easy to partially apply a named type
function like
Map
.
For example, we want to be able to write stuff like
<T> => Map<String,T>
when working with higher-order
generics, thus turning a type function of two type
parameters into a type function of one type parameter.
Are type functions "type types"?
One thing I should have made very clear, and forgot, is that
type functions don't represent an additional meta level. In
Ceylon's type system, type functions are types, in the very
same sense that function are values in Ceylon and other
modern languages.
There simply is no additional meta-type system for types in
Ceylon. The closest thing we have to a "type type" is a
generic type constraint, but that's an extremely
impoverished sort of type type, since Ceylon provides no
facilities at all to abstract over type constraints—I
can't even assign an alias to a type constraint and reuse it
by name.
Ceylon reasons about type constraints and assignability of
types to type variables using hardcoded rules written
primitively into the language spec and type checker, not by
abstraction over the types of types.
Type functions and subtyping
But if a type function is a type, what are its subtyping
relationships with other types and other type functions?
Well, first, recall that some type functions have instances:
generic function references. We didn't want to introduce
values into the language that aren't Object
s, so we've
declared that every type function is a subtype of Object
.
This preserves the useful property that our type system has
a single root type Anything
.
Next, recall that an ordinary function type is covariant
in its return type, and contravariant in its parameter
types. For example, the function type:
String(Object, Object)
is a subtype of:
Object(String, String)
Since if a function accepts two Object
s and returns a
String
, then it's clearly also a function that accepts two
String
s and returns an Object
.
Given two function types with one parameter:
F(P)
G(Q)
Then F(P)
is a subtype of G(Q)
iff P
is a supertype of
Q
and F
is a subtype of G
.
Similar rules apply to type functions. Consider two type
functions of one type parameter:
alias A<X> given X satisfies U => F<X>
alias B<Y> given Y satisfies V => G<Y>
Then A
is a subtype of B
iff:
- the upper bound
U
on X
is a supertype of the upper
bound V
on Y
, and
- for any type
T
, F<T>
is a subtype of G<T>
.
That is to say, if A<X>
accepts every type argument T
that B<Y>
accepts, and for each such T
, the applied type
A<T>
is a subtype of the applied type B<T>
, then we can
soundly replace B
with A
in well-typed code.
(Of course, these rules generalize to type functions with
multiple type parameters.)
Generic function types and subtyping
Now let's narrow our attention to consider only type
functions that represent the types of generic functions.
To make it easier, we'll consider generic functions of the
following form, with just one type parameter and just one
value parameter:
F<X> f<X>(P<X> p) given X satisfies U => ... ;
Here, F<X>
is the return type, a type expression involving
the type parameter X
, and P<X>
is the parameter type,
which also involves X
.
The type of this generic function—as we
saw
in the previous post— is the type function:
<X> given X satisfies U => F<X>(P<X>)
So let's consider two type functions of the general form
we're considering:
alias A<X> given X satisfies U => F<X>(P<X>)
alias B<Y> given Y satisfies V => G<Y>(Q<Y>)
Then we see quickly that A
is a subtype of B
iff:
- the upper bound
U
on X
is a supertype of the upper
bound V
on Y
, and
- for any type
T
, the return type F<T>
of A
is a
subtype of the return type G<T>
of B
, and the
parameter type P<T>
of A
is a supertype of the
parameter type Q<T>
of B
.
For example, this generic function type:
<X> => X&Object(X|Object)
is a subtype of this generic function type:
<X> given X satisfies Object => X(X)
Take a minute to convince yourself that this is correct
intuitively.
(Again, these rules generalize naturally to functions with
multiple type parameters and/or multiple value parameters.)
Type functions and type inference
When we call a first-order generic function in Ceylon, we
don't usually need to explicitly specify type arguments.
Instead, we can usually infer them from the value arguments
of the invocation expression. For example, if we have this
generic function:
List<Out> map<In,Out>(Out(In) fun, List<In> list) => ... ;
Then we can always safely infer In
and Out
, because
there's a unique most-precise choice of type arguments:
value list = map(Integer.string, ArrayList { 10, 20, 30 });
In this example, we can safely infer that In
is Integer
,
and Out
is String
, without any loss of precision.
Unfortunately, once higher-order generics come into play,
inference of type functions is a much more ambiguous problem.
Consider this second-order generic function, which abstracts
the map()
function away from the container type, by
introducing the type function variable Box
to represent
the unknown container type:
Box<Out> fmap<Box,In,Out>(Out(In) fun, Box<In> box)
given Box<Element> { ... }
And now consider the following invocation of this function:
fmap(Integer.string, ArrayList { 10, 20, 30 })
What type should we infer for Element
, and what type
function for Box
?
-
Integer
and List
?
-
Integer
and Iterable
?
-
Integer
and ArrayList
?
-
Integer
and MutableList
?
-
Integer
and ListMutator
?
-
Integer
and Collection
?
-
Object
and Category
?
In general, there might be several different reasonable
choices, and no really good criteria for choosing between
them. So in this case, we require that the type arguments be
specified explicitly:
fmap<List,Integer,String>(Integer.string, ArrayList { 10, 20, 30 })
However, there is a pattern we can use to make type function
inference possible. In this case, we could define the
following interface:
interface Functor<Box,Element> given Box<Value> { ... }
Now let's imagine that our ArrayList
class inherits
Functor
, so that any ArrayList<Element>
is a
Functor<List,Element>
.
And let's redefine fmap()
like this:
Box<Out> fmap<Box,In,Out>(Out(In) fun, Functor<Box,In> box)
given Box<Element> { ... }
Then, finally, for the same instantiation expression we had
before:
fmap(Integer.string, ArrayList { 10, 20, 30 })
we can now unambiguously infer that Box
is List
and
In
is Integer
, since those types are encoded as type
arguments to Functor
in the principal supertype
instantiation of Functor
for the expression
ArrayList { 10, 20, 30 }
.
Instances of type functions
In the original post, we noted that a type function that
returns a function type is the type of a generic function.
For example, the type function:
<X> given X satisfies Object => X(X)
Is the type of this generic function:
X f<X>(X x) given X satisfies Object => x;
But then it's natural to enquire: if some type functions are
the types of generic functions, what are the other type
functions the types of?
Well, if you reflect for a second on the relationship
between types and values, I think you'll see that they must
be the types of generic value declarations. That is, this
type function:
<X> => List<X>
would be the type of this value, written in pseudo-Ceylon:
List<X> list<X> => ... ;
That is, when presented with a type X
, list<X>
evaluates
to a List<X>
.
Of course there are no actual generic values in Ceylon, the
closest thing we have is a nullary generic function:
List<X> list<X>() => ... ;
whose type is actually:
<X> => List<X>()
There's no plan to ever introduce generic values into Ceylon,
so types like <X> => List<X>
have no instances. They're
useful only as type arguments to higher-order generic types.
Type functions and principal typing
Finally, let's address a rather technical point.
A very important property of Ceylon's type system is the
ability to form a principal instantiation of any union or
intersection of different instantiations of a generic type.
For, example, for the covariant type List<T>
:
-
List<X> | List<Y>
has the principal instantiation
List<X|Y>
-
List<X> & List<Y>
has the principal instantiation
List<X&Y>
For the contravariant type Comparable<T>
:
-
Comparable<X> | Comparable<Y>
has the principal
instantiation Comparable<X&Y>
-
Comparable<X> & Comparable<Y>
has the principal
instantiation Comparable<X|Y>
Naturally, it's important that we can do the same tricks for
intersections and unions of instantiations of higher-order
types. As at happens, this works out extremely naturally,
using the following identities:
-
<<X> => F<X>> | <<Y> => G<Y>>
is a subtype of
<T> => F<T> | G<T>
, and
-
<<X> => F<X>> & <<Y> => G<Y>>
is a subtype of
<T> => F<T> & G<T>
.
Thus, if we have the following covariant second-order type:
interface Functor<out Element, out Container>
given Container<E> { ... }
Then we obtain the following principal instantiations:
-
Functor<E,A> | Functor<F,B>
has the principal
instantiation Functor<E|F,<T> => A<T>|B<T>>
, and
-
Functor<E,A> & Functor<F,B>
has the principal
instantiation Functor<E&F, <T> => A<T>&B<T>>
.
You don't need to know these identities when you're writing
code in Ceylon, but it's nice to know that type functions
don't undermine the basic algebraic properties which are the
reason Ceylon's type system is so nice to work with.
Everything fits together here, without weird holes and
corner cases.
A word about "rank"
In the previous post I described our support for references
to generic functions as "arbitrary rank" polymorphism, which
prompted a short discussion about how to measure the rank of
a generic type. I now think that the term "rank" probably
isn't very meaningful in connection to Ceylon, since there's
nothing special about our function types: they're just
instantiations of the perfectly ordinary generic type
Callable
. As suggested by Kamatsu
on reddit,
it seems to me that a better word to use is probably
"impredicative".