A little more about type functions
My previous post about type functions generated some interesting discussion, here, and on reddit.
Therefore, I think it's worth tying up several loose ends from the earlier post. So here's a collection of further observations about type functions.
Warning: this post addresses some very technical details of how we've incorporated type functions into Ceylon's type system. Don't even bother continuing any further until you've read the earlier post.
The "why" of this
The most well-known application of higher-order generics is for representing high-level abstractions of container types: functors, monads, and friends. That's not what motivated me to experiment with higher-order generics in Ceylon, and as a practical matter I still have little interest in these abstractions, though they're fun to play with.
No, what bothered me was not that Ceylon's type system
wasn't powerful enough to represent Functor
or Monad
,
but rather that Ceylon's type system wasn't powerful enough
to represent Ceylon. I'll show you what I mean by that in
just a second. But first I want to argue that type functions
can be seen as a regularization of the language.
From a purely syntactic point of view, it's always seemed a little strange that every sort of type declaration in Ceylon can have a list of type parameters, except for a type parameter itself. Furthermore, it's noticeable that I can take a reference, or meta reference to any program element unless it has a list of type parameters. Now, such restrictions might seem reasonable if a parameterized type parameter or a reference to a generic declaration were not meaningful notions at a fundamental level. But they clearly are meaningful, and even at least somewhat useful.
Exactly how useful is a different question—the jury's still out, at least in my mind. And so perhaps we'll ultimately conclude that this stuff isn't worth its weight. The weight being, substantially, the time it takes for programmers new to Ceylon to understand this stuff.
In the original post, I showed how type functions were necessary to represent the type of a reference to a generic function. One place where this problem arises is with one of Ceylon's more unique features: its typesafe metamodel.
A use case for generic function reference types
Usually, I can obtain a metamodel object that represents a
class or function and captures its type signature. For
example, the expression `String`
evaluates to a
metamodel object that captures the type and initializer
parameters of the class String
:
Class<String,[{Character*}]> stringClass = `String`;
For a generic declaration, I can do a similar thing, as long
as I'm prepared to nail down the type arguments. For example,
I can write `Singleton<String>`
to get a metamodel
object representing the class Singleton
after applying the
type argument String
:
Class<Singleton<String>,[String]> stringSingletonClass
= `Singleton<String>`
But in Ceylon as it exists today, I can't obtain a typed
metamodel object that represents just `Singleton`
,
because to represent the type of that metamodel object I
would necessarily need a type function.
Now, with the new experimental support for type functions,
the type of the expression `Singleton`
could be
<T> => Class<Singleton<T>,[T]>()
, allowing code like this:
value singletonGenericClass = `Singleton`;
...
Class<Singleton<String>,[String]> stringSingletonClass
= singletonGenericClass<String>();
That's just one example of how allowing references to generic functions makes Ceylon feel more "complete".
Two use cases for anonymous type functions
I get the impression that the "scariest" bit of what I've presented in the previous post is the notation for anonymous type functions. That is, the following syntax:
<X> => X
<X> => [X,X,X]
<X,Y> => X|Y
<T> given T satisfies Object => Category<T>(T*)
But I'm convinced that this notation is not really that hard to understand. The reason I assert this is because if I give each of these type functions a name, then most of you guys have no problem understanding them:
alias Identity<X> => X;
alias Triple<X> => [X,X,X];
alias Union<X,Y> => X|Y;
alias CategoryCreator<T> given T satisfies Object => Category<T>(T*);
But—one might reasonably enquire—why do we even need them, if the named versions are easier to read?
Well, we need them:
- in order to be able to denote the type of a reference to a generic function—remember, we don't have undenotable types in Ceylon—and
- to make it easy to partially apply a named type
function like
Map
.
For example, we want to be able to write stuff like
<T> => Map<String,T>
when working with higher-order
generics, thus turning a type function of two type
parameters into a type function of one type parameter.
Are type functions "type types"?
One thing I should have made very clear, and forgot, is that type functions don't represent an additional meta level. In Ceylon's type system, type functions are types, in the very same sense that function are values in Ceylon and other modern languages.
There simply is no additional meta-type system for types in Ceylon. The closest thing we have to a "type type" is a generic type constraint, but that's an extremely impoverished sort of type type, since Ceylon provides no facilities at all to abstract over type constraints—I can't even assign an alias to a type constraint and reuse it by name.
Ceylon reasons about type constraints and assignability of types to type variables using hardcoded rules written primitively into the language spec and type checker, not by abstraction over the types of types.
Type functions and subtyping
But if a type function is a type, what are its subtyping relationships with other types and other type functions?
Well, first, recall that some type functions have instances:
generic function references. We didn't want to introduce
values into the language that aren't Object
s, so we've
declared that every type function is a subtype of Object
.
This preserves the useful property that our type system has
a single root type Anything
.
Next, recall that an ordinary function type is covariant in its return type, and contravariant in its parameter types. For example, the function type:
String(Object, Object)
is a subtype of:
Object(String, String)
Since if a function accepts two Object
s and returns a
String
, then it's clearly also a function that accepts two
String
s and returns an Object
.
Given two function types with one parameter:
F(P)
G(Q)
Then F(P)
is a subtype of G(Q)
iff P
is a supertype of
Q
and F
is a subtype of G
.
Similar rules apply to type functions. Consider two type functions of one type parameter:
alias A<X> given X satisfies U => F<X>
alias B<Y> given Y satisfies V => G<Y>
Then A
is a subtype of B
iff:
- the upper bound
U
onX
is a supertype of the upper boundV
onY
, and - for any type
T
,F<T>
is a subtype ofG<T>
.
That is to say, if A<X>
accepts every type argument T
that B<Y>
accepts, and for each such T
, the applied type
A<T>
is a subtype of the applied type B<T>
, then we can
soundly replace B
with A
in well-typed code.
(Of course, these rules generalize to type functions with multiple type parameters.)
Generic function types and subtyping
Now let's narrow our attention to consider only type functions that represent the types of generic functions. To make it easier, we'll consider generic functions of the following form, with just one type parameter and just one value parameter:
F<X> f<X>(P<X> p) given X satisfies U => ... ;
Here, F<X>
is the return type, a type expression involving
the type parameter X
, and P<X>
is the parameter type,
which also involves X
.
The type of this generic function—as we saw in the previous post— is the type function:
<X> given X satisfies U => F<X>(P<X>)
So let's consider two type functions of the general form we're considering:
alias A<X> given X satisfies U => F<X>(P<X>)
alias B<Y> given Y satisfies V => G<Y>(Q<Y>)
Then we see quickly that A
is a subtype of B
iff:
- the upper bound
U
onX
is a supertype of the upper boundV
onY
, and - for any type
T
, the return typeF<T>
ofA
is a subtype of the return typeG<T>
ofB
, and the parameter typeP<T>
ofA
is a supertype of the parameter typeQ<T>
ofB
.
For example, this generic function type:
<X> => X&Object(X|Object)
is a subtype of this generic function type:
<X> given X satisfies Object => X(X)
Take a minute to convince yourself that this is correct intuitively.
(Again, these rules generalize naturally to functions with multiple type parameters and/or multiple value parameters.)
Type functions and type inference
When we call a first-order generic function in Ceylon, we don't usually need to explicitly specify type arguments. Instead, we can usually infer them from the value arguments of the invocation expression. For example, if we have this generic function:
List<Out> map<In,Out>(Out(In) fun, List<In> list) => ... ;
Then we can always safely infer In
and Out
, because
there's a unique most-precise choice of type arguments:
value list = map(Integer.string, ArrayList { 10, 20, 30 });
In this example, we can safely infer that In
is Integer
,
and Out
is String
, without any loss of precision.
Unfortunately, once higher-order generics come into play,
inference of type functions is a much more ambiguous problem.
Consider this second-order generic function, which abstracts
the map()
function away from the container type, by
introducing the type function variable Box
to represent
the unknown container type:
Box<Out> fmap<Box,In,Out>(Out(In) fun, Box<In> box)
given Box<Element> { ... }
And now consider the following invocation of this function:
fmap(Integer.string, ArrayList { 10, 20, 30 })
What type should we infer for Element
, and what type
function for Box
?
-
Integer
andList
? -
Integer
andIterable
? -
Integer
andArrayList
? -
Integer
andMutableList
? -
Integer
andListMutator
? -
Integer
andCollection
? -
Object
andCategory
?
In general, there might be several different reasonable choices, and no really good criteria for choosing between them. So in this case, we require that the type arguments be specified explicitly:
fmap<List,Integer,String>(Integer.string, ArrayList { 10, 20, 30 })
However, there is a pattern we can use to make type function inference possible. In this case, we could define the following interface:
interface Functor<Box,Element> given Box<Value> { ... }
Now let's imagine that our ArrayList
class inherits
Functor
, so that any ArrayList<Element>
is a
Functor<List,Element>
.
And let's redefine fmap()
like this:
Box<Out> fmap<Box,In,Out>(Out(In) fun, Functor<Box,In> box)
given Box<Element> { ... }
Then, finally, for the same instantiation expression we had before:
fmap(Integer.string, ArrayList { 10, 20, 30 })
we can now unambiguously infer that Box
is List
and
In
is Integer
, since those types are encoded as type
arguments to Functor
in the principal supertype
instantiation of Functor
for the expression
ArrayList { 10, 20, 30 }
.
Instances of type functions
In the original post, we noted that a type function that returns a function type is the type of a generic function. For example, the type function:
<X> given X satisfies Object => X(X)
Is the type of this generic function:
X f<X>(X x) given X satisfies Object => x;
But then it's natural to enquire: if some type functions are the types of generic functions, what are the other type functions the types of?
Well, if you reflect for a second on the relationship between types and values, I think you'll see that they must be the types of generic value declarations. That is, this type function:
<X> => List<X>
would be the type of this value, written in pseudo-Ceylon:
List<X> list<X> => ... ;
That is, when presented with a type X
, list<X>
evaluates
to a List<X>
.
Of course there are no actual generic values in Ceylon, the closest thing we have is a nullary generic function:
List<X> list<X>() => ... ;
whose type is actually:
<X> => List<X>()
There's no plan to ever introduce generic values into Ceylon,
so types like <X> => List<X>
have no instances. They're
useful only as type arguments to higher-order generic types.
Type functions and principal typing
Finally, let's address a rather technical point.
A very important property of Ceylon's type system is the ability to form a principal instantiation of any union or intersection of different instantiations of a generic type.
For, example, for the covariant type List<T>
:
-
List<X> | List<Y>
has the principal instantiationList<X|Y>
-
List<X> & List<Y>
has the principal instantiationList<X&Y>
For the contravariant type Comparable<T>
:
-
Comparable<X> | Comparable<Y>
has the principal instantiationComparable<X&Y>
-
Comparable<X> & Comparable<Y>
has the principal instantiationComparable<X|Y>
Naturally, it's important that we can do the same tricks for intersections and unions of instantiations of higher-order types. As at happens, this works out extremely naturally, using the following identities:
-
<<X> => F<X>> | <<Y> => G<Y>>
is a subtype of<T> => F<T> | G<T>
, and -
<<X> => F<X>> & <<Y> => G<Y>>
is a subtype of<T> => F<T> & G<T>
.
Thus, if we have the following covariant second-order type:
interface Functor<out Element, out Container>
given Container<E> { ... }
Then we obtain the following principal instantiations:
-
Functor<E,A> | Functor<F,B>
has the principal instantiationFunctor<E|F,<T> => A<T>|B<T>>
, and -
Functor<E,A> & Functor<F,B>
has the principal instantiationFunctor<E&F, <T> => A<T>&B<T>>
.
You don't need to know these identities when you're writing code in Ceylon, but it's nice to know that type functions don't undermine the basic algebraic properties which are the reason Ceylon's type system is so nice to work with. Everything fits together here, without weird holes and corner cases.
A word about "rank"
In the previous post I described our support for references
to generic functions as "arbitrary rank" polymorphism, which
prompted a short discussion about how to measure the rank of
a generic type. I now think that the term "rank" probably
isn't very meaningful in connection to Ceylon, since there's
nothing special about our function types: they're just
instantiations of the perfectly ordinary generic type
Callable
. As suggested by Kamatsu
on reddit,
it seems to me that a better word to use is probably
"impredicative".