29 Dec 2015

This is my take on what functional programming really is, in a way that will make sense to a jobbing programmer just trying to Get Stuff Done.

I put it to you that every function you write has two sets of inputs and two sets of outputs.

Two? Only one, surely?

No, two. Definitely two. Let's take a look at the first pair with this example:

public int square(int x) {
    return x * x;
}

// NOTE: The language doesn't matter, but I've picked one with
// explicit input & output types, for emphasis.

Here, the input you're used to thinking about is int x, and the output you're used to is also an int.

That's the first set of inputs & outputs. The traditional set, if you will. Now let's see an example of the second set of inputs and outputs:

public void processNext() {
    Message message = InboxQueue.popMessage();

    if (message != null) {
        process(message);
    }
}

According to the syntax, this function takes no inputs and returns no output, and yet it's obviously depending on something, and it's obviously doing something. The fact is, it has a hidden set of inputs and outputs. The hidden input is the state of the InboxQueue before the popMessage() call, and the hidden outputs are whatever process causes, plus the state of InboxQueue after we're done.

Make no mistake - the state of InboxQueue is a genuine input of this function. The behaviour of processNext cannot be known without knowing that value. And it's a genuine output too - the result of calling processNext cannot be fully understood without considering the new state of InboxQueue.

So the second piece of code has hidden inputs and outputs. It requires things, and causes things, but you could never guess what just by looking at the API.

These hidden inputs and outputs have an official name: "side-effects". There are many kinds of side-effects, but they all come together under the same concept: "when we call this function, what does it need that isn't in the argument list, and what does it do that isn't part of the return value?"

(Actually I think we need two terms: "side-effects" for the hidden outputs, and "side-causes" for the hidden inputs. For most of the rest of this post I'll use "side-effects" for brevity, but I'm definitely talking about side-causes too. I'm talking about all hidden inputs and outputs.)

Side-Effects are the Complexity Iceberg

When functions have side-effects (and side-causes), you can look at a function like this:

public boolean processMessage(Channel channel) {...}

…and think you've got an idea of what it's doing, and be totally wrong. There's no way to know what it requires or what it will do without looking inside. Does it take a message off the channel and process it? Probably. Does it close your channel if some condition is true? Maybe. Does it update a count in the database somewhere? Perhaps. Does it explode if it can't find the logging directory path it was expecting? It might.

Side-effects are the complexity iceberg. You look at the function signature, and the name, and think you've got a sense of what you're looking at. But hidden beneath the surface of the function signature could be absolutely anything. Any hidden requirement, any hidden change. Without looking at the implementation, you've no way of knowing what's really involved. Beneath the surface of the API is a potentially vast block of extra complexity. To grasp it, you'll only have three options: dive down into the function definition, bring the complexity to the surface, or ignore it and hope for the best. And in the end, ignoring it is usually a titanic mistake.

Isn't This What Encapsulation's About?

No.

Encapsulation is about hiding implementation details. About hiding the innards of the code so the caller doesn't need to worry about them. That remains a good principle, but it's not what we're talking about.

Side-effects aren't about "hiding implementation details" - they're about hiding the code's relationship with the outside world. A function with side-causes has undocumented assumptions about what external factors it's depending on. A function with side-effects has undocumented assumptions about what external factors it's going to change.

Are Side-Effects Bad?

When they work exactly as the original programmer expected, no, they're probably fine. But there's the rub: we have to trust that the the hidden expectations of the original programmer were correct, and will remain correct as time marches on.

Have we set up the state of the world the way this function expected when it was written? Or did the world get changed somewhere? Perhaps because a seemingly-unconnected piece of code changed. Or because we're installing the software in a new environment. Hidden assumptions about the state of the world mean we have hidden hopes that it's similar enough to work.

Can we test this code? Not in isolation. Unlike a circuit board, we can't just plug into its inputs and check its outputs. We have to break open the code, figure out its hidden causes and effects, and simulate the world it's supposed to exist in. I've seen several TDD'ers spin in circles about whether they should do black box or white box testing. The answer is, you ought to do black box testing - you ought to be able to ignore the implementation details - but if you allow side-effects, you can't. Side-effects close the door to black box testing, because you can't get to the inputs & outputs without cracking the box open and learning what's inside.

This effect is amplified for debugging. If a function doesn't allow side-effects (or side-causes), you can understand whether it's correct just by giving it some inputs and checking the outputs. But a function with side-effects? There's no upper-limit to how many other parts of the system you'll have to consider. When it's allowed to depend on anything, and cause anything, then the bugs could be anywhere.

We Can Always Surface Side-Effects

Can we do anything about this complexity? Yes. It's actually pretty simple to get started: If a function has something as an input, just say so. If it returns something as an output, declare it. Simple as that.

Let's try an example. Here's a function with a hidden input. Bonus points if you spot it quickly:

public Program getCurrentProgram(TVGuide guide, int channel) {
  Schedule schedule = guide.getSchedule(channel);

  Program current = schedule.programAt(new Date());

  return current;
}

This function has a hidden input of the current time (new Date()). We can surface this complexity by just being honest about this extra input:

public Program getProgramAt(TVGuide guide, int channel, Date when) {
  Schedule schedule = guide.getSchedule(channel);

  Program program = schedule.programAt(when);

  return program;
}

This function now has no hidden inputs (or outputs).

Let's look at the pros and cons of this new version:

Cons

It looks more complex. It has three arguments instead of two.

Pros

It isn't more complex. Hiding a dependency didn't make it simpler, being honest about it doesn't make it more complex.

It's vastly easier to test. Testing different times of day, clock changes, leap years, will all be straightforward, because we can pass in any time we like. I've seen code like the first version in production, with all sorts of clever tricks to spoof the current system clock for testing's sake. Imagine the effort, when we can just make it a parameter!

It's easier to reason about: This function now just describes a relationship between its inputs and its outputs. If you know the inputs, you know what the result should be, and you know everything about the result. This is a big deal. We can verify this code in isolation. As long as we've tested the relationship between inputs and outputs, we've tested the whole of the function.

(And as an aside, it's also more useful. We get, "what program starts in an hour?" code for free.)

What is a 'Pure Function'?

Drumroll please.

Now, finally, with an awareness of hidden inputs & outputs, we can give, "a jobbing programmer's definition of pure functions":

A function is called 'pure' if all its inputs are declared as inputs - none of them are hidden - and likewise all its outputs are declared as outputs.

In contrast, if it has hidden inputs or outputs, it's 'impure', and the contract we think the function offers is only half the story. The iceberg of complexity looms. We can never use impure code "in isolation". We can never test it in isolation. It always depends on other things which we have to keep track of whenever we want to test or debug.

What is 'Functional Programming'?

With an awareness of pure and impure functions, we can now give, "a jobbing programmer's definition of functional programming":

Functional programming is about writing pure functions, about removing hidden inputs and outputs as far as we can, so that as much of our code as possible just describes a relationship between inputs and outputs.

We accept that some side-effects are inevitable - most programs are run for what they do rather than what they return, but within our program we will exercise tight control. We will eliminate side-effects (and side-causes) wherever we can, and tightly control them whenever we can't.

Or put another way: Let's not hide what a piece of code needs, nor what results it will yield. If a piece of code needs something to run correctly, let it say so. If it does something useful, let it declare it as an output. When we do this, our code will be clearer. Complexity will come to the surface, where we can break it down and deal with it.

What is a 'Functional Programming Language'?

Every language supports pure functions - it's hard to make add(x, y) impure1. And in many cases converting an impure function to a pure one is just a case of lifting all its inputs and outputs into the function signature, so that the signature totally describes its behaviour. So are all programming languages 'functional'?

No. Because then the term would be meaningless.

So what can we give as a "jobbing programmer's definition of a functional programming language"?

A functional programming language is one that supports and encourages programming without side-effects.

Or more specifically: A functional language actively helps you eliminate side-effects wherever possible, and tightly control them wherever it's not.

Or more dramatically: A functional language is actively hostile to side-effects. Side-effects are complexity and complexity is bugs and bugs are the devil. A functional language will help you be hostile to side-effects too. Together you will beat them into submission.

Is That It?

Yes. There are a couple of subtleties - things you probably never thought of as a hidden input before, but that's the essence. But start building software with the perspective of "side-effects are the first enemy" and it will change everything you know about programming. Join me for part two, in which we take an awareness of side-effects, and functional programming, and fire a scattergun over the programming landscape.

Acknowledgments

This post comes out of a couple of discussions about the nature of functional programming. Particularly a chat with Sleepyfox discussing whether JavaScript could be considered a functional programming language, with the right libraries. My answer was an instinctive no, but thinking through why lead me along a very fruitful chain of thought.

Hat tip to James Henderson, which whom I have bounced around many fruitful functional ideas this year.

And thanks to Malcolm Sparks, Ivan Uemlianin, Joel Clermont, Katy Moe and my homophonic-doppleganger Chris Jenkins for proofreading & suggestions.

Footnotes:

1

Although Java tries really hard.

comments powered by Disqus