Destructuring considered harmful
A language is said to feature destructuring if it provides a syntax for quickly declaring multiple local variables and assigning their values from the attributes of some complex object. For example, in Ceylon, we let you write:
for (k->v in map) { ... }
This is a simple kind of destructuring where the key
and item
attributes of the map Entry
are assigned to the locals k
and
v
.
Let's see a couple more examples of destructuring, written in a hypothetical Ceylon-like language, before we get to the main point of this post.
A number of languages support a kind of parallel assignment syntax for destructuring tuples. In our hypothetical language, it might look like this:
String name, Value val = namedValues[i];
Some languages support a kind of destructuring that is so powerful
that it's referred to as pattern matching. In our language we might
support pattern matching in switch
statements, using a syntax
something like this:
Person|Org identity = getIdentityFromSomewhere();
switch (identity)
case (Person(name, age, ...)) {
print("Person");
print("Name: " + name);
print("Age: " + age);
}
case (Org(legalName, ...)) {
print("Organization");
print("Name: " + legalName);
}
Now, I've always had a bit of a soft spot for destructuring—it's a minor convenience, but there are certainly cases (like iterating the entries of a map) where I think it improves the code. A future version of Ceylon might feature a lot more support for destructuring, but there are several reasons why I'm not especially enthusiastic about the idea. I'm going to describe just one of them.
Let's start with the "pattern matching" example above. And let's stipulate that I—perhaps more than most developers—rely almost completely on my IDE to write my code for me. I use Extract Value, Extract Function, Assign To Local, Rename, ⌘1, etc, in Ceylon IDE like it's a nervous tic. So of course the first thing I want to do when I see code like the above is to run Extract Function on the two branches, resulting in:
Person|Org identity = getIdentityFromSomewhere();
switch (identity)
case (Person(name, age, ...)) {
printPerson(name, age);
}
case (Org(legalName, ...)) {
printOrg(legalName);
}
...
void printPerson(String name, Integer age) {
print("Person");
print("Name: " + name);
print("Age: " + age);
}
void printOrg(String legalName) {
print("Organization");
print("Name: " + legalName);
}
Ooops. Immediately we have a problem. The schema of Person
and
Org
is smeared out over the signatures of printPerson()
and
printOrg()
. This makes the code much more vulnerable to changes
to the schema of Person
or Org
, makes the code more vulnerable
to changes to the internal implementation of these methods (if we
want to also print the Person
's address, we need to add a
parameter), and it even makes the code less typesafe. The problem
gets worse and worse as I recursively run Extract Value and
Extract Function on the implementation of printPerson()
and
printOrg()
.
Now consider what we would get without the use of destructuring, as we would do in Ceylon today. We would have started with:
Person|Org identity = getIdentityFromSomewhere();
switch (identity)
case (is Person) {
print("Person");
print("Name: " + identity.name);
print("Age: " + identity.age);
}
case (is Org) {
print("Organization");
print("Name: " + identity.legalName);
}
Whether this is better or worse than the code using of pattern matching is somewhat in the eye of the beholder, but clearly it's not much worse and is arguably even a little cleaner. Now let's run Extract Function on it. We get:
Person|Org identity = getIdentityFromSomewhere();
switch (identity)
case (is Person) {
printPerson(identity);
}
case (is Org) {
printOrg(identity);
}
...
void printPerson(Person identity) {
print("Person");
print("Name: " + identity.name);
print("Age: " + identity.age);
}
void printOrg(Organization identity) {
print("Organization");
print("Name: " + identity.legalName);
}
I think it's very clear that this a much better end result. And I hope it's also clear that this is in no way a contrived example. The arguments I'm making here scale to most uses of pattern matching. The problem here is that introducing local variables too "early" screws things up for refactoring tools.
Essentially the same argument applies to tuples: a tuple seems
like a convenient thing to use when you "just" have a quick helper
function that returns two values. But after a few iterations of
Extract Function/Extract Value, you wind up with five functions
with the tuple type (String, Value)
smeared out all over the
place, resulting in code that is significantly more brittle
than it would have been with a NamedValue
class.
I've repeatedly heard the complain that "oh but sometimes it's just not worth writing a whole class to represent the return value of one function". I think this overlooks the effect of code growing and evolving and being refactored. And it also presupposes that writing a class is a pain, as it is in Java. But in Ceylon writing a class is easy—indeed, it looks just like a function! Instead of this:
(String, Value) getNamedValue(String name) {
return (name, findValueForName(name));
}
we can just write this:
class NamedValue(name) {
shared String name;
shared Value val = findValueForName(name);
}
No constructor, no getters/setters, and if this is a member of
another class, you can just annotate it shared default
, and it's
even polymorphic, meaning that there is not even a need to write
a factory method. And this solution comes with the huge advantage
that the schema of a NamedValue
is localized in just one place,
and won't start to "smear out" as your codebase grows and evolves.