Functional Mumbo Jumbo - ADTs

What are "ADTs"?

Algebraic Data Types.

Lovely. What the heck does that mean?

It means you can mix product and sum types together.

Okay. I have one more question, and please start answering in English: What are product and sum types?

Product types you'll be deeply familiar with - it's when you make a new type by combining others. For example, you could define a Person type as the combination of a String for their name, and an Int for their age. Like this:

type Person
    = Person String Int

...or like this:

type Person
    = Person
        { name : String
        , age : Int
        }

In C that'd be a struct. In Java you'd probably call it a POJO. In Clojure they say, "Just use a map."

Why's it called a 'product' type?

It might help to think of how many possible values there are for Person now.

It's all the possible values for String, multiplied by all the possible values for Int. It's the multiple, or product, of the two types.

Okay, product types combine types together. What's a sum type?

Well, product types are a bit like an AND operation, aren't they? They AND types together. If you think of it that way, you're instantly going to start to wonder what an OR would look like. For that, we can pull out another old chestnut from the Big Book of Type Examples. Chapter two, playing cards:

type Suit
  = Heart
  | Spade
  | Club
  | Diamond

A suit is a heart, or a spade, or a club, or a diamond. Any one of those four values, but only one. It's an exclusive-or, as a type.

So a sum type is just an enum?

Well...yeah I suppose it is. But that's not what makes this interesting. An ADT isn't just structs and enums, it's the free mixing of them to make sophisticated data models.

Give me a concrete example.

Glad you asked. I'm going to make a credit card payment. When the answer comes back from my payment provider, there are lots of things that could happen. Hopefully it goes well and I get a receipt ID back as a string. But maybe the card on file has expired, and I get a few details about the card to prompt the user with. Or it just blows up with different kinds of error. I can model all possible responses neatly with an ADT, like so:

type PaymentResponse
    = Paid String
    | CardExpired Int Date
    | HttpError Int String
    | AuthError

Here we have Paid, which is a wrapper around a String; CardExpired and HttpError which are product types; and AuthError which is a "traditional" enum value. All mixed together in a sum type to model all possible payment responses.

Most languages don't have support for this. They might have enums, but not ones that carry data. You can't have an enum of structs or an enum of POJOs1.

Clojure's an interesting case where it does have ADTs, but they're possibly too flexible. You can mix together maps, lists and primitives to make any shape of data you like. But you can suffer with the exclusion problem. While you can say all these things are valid responses:

{:paid "eefa9112"}

{:card-expired [8783 #date "03/19"]}

{:http-error [500 "Server just exploded"]}

:auth-error

...you have a harder time saying what isn't a valid response, like:

{:card-expired "we don't know why"}

That's where Plumatic schema and spec become essential.

So is that it? Is that all an ADT is?

No. You're at the entrance to a delightful rabbit hole. ADTs can have type variables, and can be further generalised, and all sorts of useful stuff. But if you have the core idea - mixing and matching ANDs and ORs to build up data models - you're on your way.

So why is it useful?

Because data-modelling is the most important thing we do, so tools for modelling data are fundamentally interesting. But let me give you a concrete example of something cool to go away with. Once we've defined our rich data type, a suitably-savvy compiler will be able to help us use it properly. Here's some pseudo-code to handle payment responses:

handle paymentResponse =
    case paymentResponse of
        Paid id ->
            insert db "receipts" db id

        CardExpired fourDigits date ->
            notifyUser ("Your card " ++ fourDigits ++ " expired on " ++ date ++ ".")

        HttpError code msg ->
            logError code msg

Did you spot the error in that code? You probably did, but the compiler is guaranteed to notice:

-- MISSING PATTERNS ---------------------------------------------- src/Main.elm

This `case` does not have branches for all possibilities.

...

You need to account for the following values:

    PaymentResponse.AuthError

Add a branch to cover this pattern!

The compiler knows which values PaymentResponse can have, and so can check our case statements for completeness. In this case, it's instantly noticed that we've ignored the authorization failure.

Sweet. Okay, last question. If there's AND and OR...where's NOT?

You mean a type that can be any type except something?

Yeah.

Interesting question. Yup, that'd still be part of an Algebra of Data Types, so it'd count. I've never seen it though. Let me know if you do...

Footnotes


  1. And nearly every language has a way to get around it. In Java you'd use reflection on subclasses, perhaps. But first-class support? No.