A quick guide to types in functional programming

Clearing up the confusion about union, tagged unions, and algebraic data types in functional programming.

Image by:

Opensource.com

Functional programming is taking off, but there appears to be some misunderstanding about the theoretical background of its types. Many programmers incorrectly name types and create confusion around simple ideas. To clarify, let's look at the differences between union, tagged unions, and algebraic data types.

Algebraic data types (ADTs) are defined in terms of product and sum types. Sum types (also called tagged unions) are a refinement of union types. Let's explore their differences:

Union types encode values under the same allocated memory. The most basic example I can think of is the union keyword in C. The example below allocates the biggest type contained in the union. This makes it possible that both values could be there—but not at the same time. If you need to know which value is available at any time, you need to use a field in a struct to keep track of the type at runtime.

union {
  int i; 
  void *p;
}

A tagged union saves some runtime information to determine which element from the union exists. Tagged union/sum types are a refinement of union types; the constructor adds some extra information in the tag so that you can easily pattern match on it. Haskell does this really elegantly:

data TrafficLight = Red | Yellow | Green

In this example, TrafficLight represents the type constructor, and only one of the three data values can exist at any given point. The symbol | expresses the idea of a logical or operator. The following code shows how to create and pattern match on the TrafficLight type:

msgTrafficLight colour =
  case colour of
     Red    -> "you should stop"
     Yellow -> "rush before it turns red"
     Green  -> "you can pass"

Before we get into algebraic data types, let's take a look at product types, which are necessary for understanding ADTs.

Product types are your normal pairs and tuples, e.g., (1, "Hello") in Haskell with type (Int, String), where both types, Int and String, are expected. Another way to define your product types is using record types, which basically are tuples with named labels. For instance, if we define the previous tuple as:

data Person = {age :: Int, name :: String}

creating a person is as simple as Person 31 "Kiko Fernandez" or using record syntax Person {name = "Kiko Fernandez", age = 31}.

If we generalize the idea of products inside tagged unions, then we can represent data values with a disjointed number of arguments and/or types in each branch of the sum type. The example below redefines TrafficLight with additional information, where two data values use the same type, a String, and the remaining data value uses the type constructor Options.

data Options = Rush | SlowDown
data TrafficLight = Red "Stop" | Green "Drive" | Yellow Options

If we further expand this generalization by adding type variables to type constructors, then we have defined the idea of algebraic data types! ADTs are important because they allow programmers to define composite types based on products and sum types.

One more final example: Our friend the Option type (known as Maybe type for Haskellers) can be encoded using plain ADTs:

data Maybe t = Nothing | Just t

Maybe t is a type constructor with a type variable, t, while Nothing and Just t are a tagged union where Just t uses the type variable from the type constructor.

I hope that this clarifies the definition and the usage of union, tagged unions, products, and algebraic data types. If you have questions, please post them in the comments.