Functional Programming for the Functionally Challenged (Like Me)

Part II - get_in While the get_in's Good

- James Norton - Functional Programming, Elixir

In the last post we looked at functional approaches to solving problems typically solved using loops in imperative languages. These problems centered around list-like data structures such as arrays or vectors. In this post we will look at more complicated nested data structures.

The Challenge

Functional programming languages generally focus on transforming data rather than working with object hierarchies. They tend to make extensive use of maps, lists, and other fundamental data structures in lieu of custom types like those used in object oriented languages. In FP it is common to use highly nested data composed of many types of structures (maps, lists, sets, etc.).

Working with these nested structures can be a challenge for programmers getting started with FP. Modifying nested structures can seem particularly difficult when working with immutable data. In this post, I will cover several common chores encountered when working with nested data structures and explore solutions using Elixir.

Background - Elixir Collection Types

Before diving into nested data structures, it might be helpful to introduce the collection types Elixir provides, as these will be used to build more complicated nested structures.

Perhaps the simplest collection type is the list. In Elixir we construct a list using square brackets.

list = [:ok, 1, "a"]
=> [:ok, 1, "a"]

Elixir, like most functional languages, is highly tailored for working with lists. The List module contains many functions that operate on or work with lists. The first function will (obviously) return the first element of a list, or nil if the list is empty

value = List.first(list)
=> :ok

One important thing to note is that in Elixir collections can store any type, including mixed types, whereas Java collections can only store one type of object per collection. This gives us a lot of flexibility, as we can create highly nested structures in Elixir composed of different types at each level.

Because lists are commonly processed using functions that recursively split a list into head and tail, Elixir provides a convenient shortcut that allows us to bind variables to the head (first element) and tail (all the rest) of a list using a pattern match.

[head | tail] = list
=> [:ok, 1, "a"]
head
=> :ok
tail
=> [1, "a"]

Suppose we want to access a random element in a list. In Java this is no different than accessing the first element. In Elixir things get a bit more complicated. Random access is not something one usually does with a list, so the List module has no function for this. However, because lists implement the Enumerable protocol, we can use the Enum modules at function to access a value at a random index.

value = Enum.at(list, 1)
=> 1

Warning: Elixir lists are not arrays. Random access to values is an order N operation as the list values must be read from the start up to the index. If you need array-like functionality, us a tuple. Tuples are constructed with brackets like this

tuple = {:ok, 200}
=> {:ok, 200}

Tuples in Elixir provide constant time random access. With a tuple we can access a random index using the Kernel module’s elem function. Kernel functions are loaded by default, so we don’t need to prefix them with the module name.

value = elem(tuple, 1)
=> 200

Tuples and lists are useful, but often we need to make use of map like structures to store and retrieve key value pairs. We can use a map for this. Maps are constructed using %{} notation like this.

map = %{"foo" => "bar", "fizz" => "buzz"}
=> %{"fizz" => "buzz", "foo" => "bar"}

Keys and values can be anything, not just strings. If the keys are atoms then we can use a shortcut notation like so:

map = %{foo: "bar", fizz: "buzz"}
=> %{fizz: "buzz", foo: "bar"}

We can use the Map module’s get function to access a map value:

value = Map.get(map, :foo)
=> "bar"

get returns nil if the map does not contain the key. get can accept a third parameter that specifies a default value if the map does not contain the key:

 value = Map.get(map, :baz, "default")
=> "default"

Because accessing map values is such a common operation, Elixir provides a shortcut syntax:

 value = map[:fizz]
=> "buzz"

Finally, if the keys to our map are atoms (like symbols in ruby or keywords in Clojure) then we can use a dot notation to retrieve the value for a given key.

 value = map.fizz
=> "buzz"

Retrieving a Value from a Nested Structure

So far so good, but we wanted to use nested data structures. Let’s say we have a list of maps:

data = [%{foo: "bar", fizz: "buzz"}, %{foo: "baz", fizz: "fez"}]

Now suppose we want to retrieve the value from first map for the key :foo. In Java we can compose our method calls like so to avoid using an intermediate variable:

String value = data.get(0).get("foo");

The one danger here is that if one of the gets in our chained method calls returns null then the subsequent get will throw a NullPointerException.

In Elixir we can get close to this.

 value = List.first(data)[:foo]
=> "bar"

Although this is concise and somewhat familiar looking, it doesn’t work well with more complicated nesting. It’s better to use pipe composition (see the previous blog post).

 value = data |> List.first |> Map.get(:foo)
=> "bar"

Here we pipe our data structure as the first argument to the List.first function and then pipe the result of that as the first argument to the Map.get function. In this way we can string together arbitrary paths into a nested structure.

This approach gets a bit messy as our data gets more complicated. For these cases we can rely on the get_in function.

get_in takes a nested data structure and a path of keys (as a list) and returns the value at that path.

nested_data = %{foo: %{bar: "baz"}}
=> %{foo: %{bar: "baz"}}
get_in(nested_data, [:foo, :bar])
=> "baz"

Unfortunately get_in does not support numerical indices as keys, so the following is not supported:

nested_data = [%{foo: "bar"}, %{foo: "baz"}]
=> [%{foo: "bar"}, %{foo: "baz"}]
val = get_in(nested_data, [0, foo:])
=> ** (SyntaxError) iex:19: keyword argument must be followed by space after: foo:

get_in does support passing functions as keys, which gives us a great deal of flexibility (at a cost of some complexity). These ‘selector’ functions are passed three arguments, the operation (always :get), the data to be accessed, and a function to be invoked next.

If we want to effectively use a numerical index for a key, we can implement it as a function like this

index1_fun = fn :get, data, next ->
               next.(Enum.at(data, 1))
             end
=> #Function<18.54118792/3 in :erl_eval.expr/5>

Our selector function selects the first element (using Enum.at) and then passes this to the next function, effectively passing it to the next selector in the chain.

We pass in this function as one of our keys like this

val = get_in(nested_data, [index1_fun, :foo])
=> "baz"

Because Elixir supports higher order functions, we can generalize our selector function to take an index as an argument and return the function for getting the item at that index.

index_fun = fn index ->
              fn :get, data, next ->
                next.(Enum.at(data, index))
              end
             end
=> #Function<6.54118792/1 in :erl_eval.expr/5>

get_in(nested_data, [index_fun.(0), :foo])
=> "bar"

get_in(nested_data, [index_fun.(1), :foo])
=> "baz"

Our select function can do a lot more than just pick a particular index or key; since we have access to all the data at the current selection level, we can apply whatever logic we want to it.

Say we have the following data structure to describe our company:

company = %{
  earnings: 10_000_000,
  employees: [%{name: "Jack", age: 25},
              %{name: "Jill", age: 22},
              %{name: "John", age: 20},
              %{name: "Joan", age: 27}]
}

Now suppose we want a list of names of all the employees that are over a certain age. We can accomplish this as follows:

age_filter = fn age ->
               fn :get, data, next ->
                 Enum.filter_map(data, fn employee -> employee.age > age end, &(next.(&1)))
               end
             end
=> #Function<6.54118792/1 in :erl_eval.expr/5>

get_in(company, [:employees, age_filter.(24), :name])
=> ["Jack", "Joan"]

This function takes an age and creates a selector function that will filter the current selection level to those employees that are over the given age. It uses the Enum.filter_map function that first filters a collection and then maps a function over the filtered results, in this case the next function passed to our selector.

Conclusion

We have seen how we can navigate through nested data structures using a variety of approaches, culminating in writing custom selector functions for get_in. Hopefully this has given you a better understanding of how we can use these techniques to work with complex data. In the next post we look at the other side of the coin, updating nested structures and dealing with immutability.