Functional Programming for the Functionally Challenged (Like Me)

Part III - Weekend update_in

- James Norton - Functional Programming, Elixir

In the previous installment of our introduction to functional programming we looked at reading values from nested data structures. In this final post we look at the flip side of working with nested data structures, updating them. If you have not read the previous post yet and are not familiar Elixir, you might want to read it now, as this post builds on that one.

The Challenge

If navigating through a nested data structure to retrieve a value seems challenging, updating a nested structure may appear hopeless, especially since Elixir and FP languages in general have immutable data. In Java and other languages that support mutable data structures we can change nested values in-place. This usually means changing a nested value is no harder than reading it. As long as we can navigate to the value we want to change we can do whatever we want with it.

With immutable data we can’t simply change one value in a data structure. If we want to update a value, we have to return a whole new data structure with the value changed. This is not as bad as it sounds. First of all, since we know the original data structure can’t be changed (immutable, remember?) our new structure can be composed of the original structure plus the part that has changed. Also, we don’t have to construct this ourselves; Elixir provides an API that makes this, well, not easy, but easier than it would be without it.

Let’s start with a fairly simple nested map. Java has no map literals, so we have to construct an empty map and use mutation right out of the gate.

Map<String, Map<String, Integer>> myMap = new HashMap<String, Map<String, Integer>>();
myMap.put("Joe", new HashMap<String, Integer>());
myMap.get("Joe").put("age", 30);
myMap.get("Joe").put("weight", 170);

myMap.put("Bob", new HashMap<String, Integer>());
myMap.get("Bob").put("age", 32);
myMap.get("Bob").put("weight", 172);
=> {Joe={weight=170, age=30}, Bob={weight=172, age=32}}

Elixir supports map literals, so initializing a map is simpler.

my_map = %{"Joe" => %{age: 30, weight: 170}, "Bob" => %{age: 32, weight: 172}} 
=> %{"Bob" => %{age: 32, weight: 172}, "Joe" => %{age: 30, weight: 170}}

As an aside, notice that we use atoms (like symbols in Ruby) for the keys that don’t vary (age, weight) and strings for the keys that do (name).

Let’s say Joe has gained a little weight and we want update our map to reflect this. In Java this is straight-forward. In fact, it’s exactly what we did to initialize our map.

myMap.get("Joe").put("weight", 175);
=> {Joe={weight=175, age=30}, Bob={weight=172, age=32}}

For maps, Elixir is almost as simple as Java. For this we have the Kernel.put_in function. It takes a list of keys and uses them to navigate a data structure, in the same way as get_in which we used in the previous post. get_in and put_in are part of a family of functions that share the same semantics for working with nested data structures.

For our nested map we have the following:

new_map = put_in(my_map, ["Joe", :weight], 175)
=> %{"Bob" => %{age: 32, weight: 172}, "Joe" => %{age: 30, weight: 175}}

We can use a short-cut syntax to get even closer to the Java style.

new_map = put_in my_map["Joe"][:weight], 175
=> %{"Bob" => %{age: 32, weight: 172}, "Joe" => %{age: 30, weight: 175}}

Now suppose we know that Joe’s weight increased by ten percent, but we don’t know the actual value. In Java we have

int joeWeight = (int) (myMap.get("Joe").get("weight") * 1.1);
myMap.get("Joe").put("weight", joeWeight);
=> {Joe={weight=187, age=30}, Bob={weight=172, age=32}}

In Elixir when we want to update to some function of the current value we use Kernel.update_in.

new_map = update_in my_map["Joe"][:weight], fn w -> w * 1.1 end
=> %{"Bob" => %{age: 32, weight: 172}, "Joe" => %{age: 30, weight: 187}}

The biggest difference in the two approaches is that with update_in we pass a function in (remember higher order functions from the first post?) instead of a value. The function takes the current value and returns the new one. The function definition is the fn w -> w * 1.1 end bit.

Because put_in and update_in follow the same semantics as get_in we have a lot of flexibility for updating nested structures. Suppose a year has passed and we need to update everyone’s age. In an imperative language we don’t have a lot of options. Using a non-functional approach in Java we would have to update every record separately, like this:

for (Map.Entry<String, Map<String, Integer>> entry : myMap.entrySet()) {
    entry.getValue().put("age") = entry.getValue().get("age") + 1;
}

In Elixir we can pass a function saying “choose everything” as the first key in our list of keys passed to update_in. We’ll call this function all, meaning we want to work on all the keys at the top level. This is just a style choice, we could call it foo and update_in would not care.

new_map = update_in my_map, [all, :age], fn age -> age + 1 end
=> %{"Bob" => %{age: 33, weight: 175}, "Joe" => %{age: 31, weight: 170}}

Before defining the all function it’s helpful to understand how keys work in update_in (or get_in, put_in, get_and_update_in). As we traverse our nested data structure we can think of each level in our traversal as a substructure. Each key gets applied to the substructure for the current point in the traversal. The key determines which portion of the substructure is used for the next key.

Given our original map, if we apply the key list ["Joe", :age] then we start with the map

%{"Bob" => %{age: 32, weight: 172}, "Joe" => %{age: 30, weight: 187}}

apply the key "Joe" to get the next substructure

%{age: 30, weight: 187}

then apply the key :age to get the final substructure, in this case a single value

30

With normal keys like strings or atoms this happens automatically. With function keys, it’s a bit more complicated as our function must handle some of the work of the traversal.

Every function used as a key to update_in must have the same signature,

fun(:get_and_update, data, next)

The first argument is an atom, :get_and_update, meaning our function only matches if someone calls it with this as the first argument. This is the action that will be passed by update_in to our function. The reason update_in passes the action is because we sometimes want to write multiple function clauses to handle different actions, such as :get for the get_in function. You may be wondering why the action is :get_and_update instead of just :update. The reason is that update_in actually defers to get_and_update_in which expects this action.

The second argument, data, is the substructure for the current point of the traversal. This is the only data our function has to care about - all the data higher up in the traversal has already been handled. The final argument, next, is a function that we must call on the elements of data that we want to traverse.

So all key functions for update_in need to do the same thing, take a substructure and transform it by identifying the elements of interest and calling next on them. Normally the key function would leave the other elements alone, but this doesn’t have to be the case – we will use this to our advantage later on.

The all function is given below:

1
2
3
4
5
6
7
8
all = fn :get_and_update, data, next ->
  data_list = Map.to_list(data)
  new_list = Enum.map(data_list, fn {key, value} ->
    {_, updated} = next.(value)
    {key, updated}
  end)
  {nil, Enum.into(new_list, %{})}
end

This function is a bit complicated at first glance, so let’s break it down.

On line 1 starting with fn we define the signature for an anonymous function that matches on three arguments. The first argument must be :get-and_update, which is the action that update_in will pass when this function is called. The other two arguments, data and next, are the current substructure and the function we need to call on the selected elements. We bind (assign) this to the variable all so we can pass it as one of the keys to update_in.

The substructure passed to this function will actually be our whole map, since all is the first key in our list of keys passed to update_in. On line 2 we convert the map to a list. We do this because we want to process every value in the map. Converting it to a list turns this

%{"Bob" => %{age: 32, weight: 172}, "Joe" => %{age: 30, weight: 170}}

into this

[{"Bob", %{age: 32, weight: 175}}, {"Joe", %{age: 30, weight: 170}}]

which is a list of two-element (key/value) tuples.

On lines 3-6 we transform this list into a new list by using map to call the next function on every value. This gives us a new list that has our original keys and transformed values.

[{"Bob", %{age: 33, weight: 175}}, {"Joe", %{age: 31, weight: 170}}]

On line 7 we construct a return value for our all function that consists of a two element tuple. The first element is nil. It could be anything because update_in is going to ignore it. If we passed our all function to get_and_update_in then we would want to return the original value, data, instead of nil. The second element of the tuple is a new map that we construct by calling Enum.into(new_list, %{}). Moving back and forth between maps and lists is routine in Elixir.

{nil, %{"Bob" => %{age: 33, weight: 172}, "Joe" => %{age: 31, weight: 170}}}

Again, our function key has one job, transform the current substructure by calling next on the elements we care about and leaving the other elements alone.

Let’s put everything we have learned in one final example. Suppose we have a company database represented by the following data structure:

company = %{total_revenue: 10_000_000,
            sales: [%{salesperson: "Joe", bonus: 0, accounts: ["XYZ Inc.", "ABC Co.", "McDoogles"]},
                    %{salesperson: "Bob", bonus: 0, accounts: ["ACME", "Tarjet"]},
                    %{salesperson: "Jane", bonus: 0, accounts: ["Homely Depot", "Element 84", "Pear Computers"]},
                    %{salesperson: "Jill", bonus: 0, accounts: ["Four Guys", "Gas 'N Sip'"]}]}

We want to update our the records for our sales people to reflect their quarterly bonuses. The formula for the bonus is given by

\begin{equation} bonus = $1000 \times number\ of\ accounts \end{equation}

We could calculate everyone’s bonus by calling get_in for each employee to get their accounts and using the count of accounts to compute their bonus. Then we could call update_in to set their bonus using the values we calculated. But that sounds like a lot of work. Besides, it means we would have to know the names of all our sales people. Who has time for that?

Instead, we can update all the bonuses in one call to update_in by exploiting what we know about how function keys work. Normally, a function key is just used as a selector. It chooses which path(s) to follow and eventually when we reach the final element selected by the final key, update_in applies the transformation function we gave it and things bubble back up. So normally we end up modifying the element selected by our last key and all the other element in our nested structure are left alone.

But remember the primary responsibility of the function key is to transform the substructure that is passed to it. Normally it does this by returning what ever bubbled up from subsequent keys, but it doesn’t have to do this. In fact, we are free to modify the data structure however we want. In this case, we can update the value for the :bonus key with whatever bubbled up from the subsequent keys.

Our call to update_in will look like this

update_in company, [:sales, update_bonus, :accounts], fn accounts ->
  1000 * Enum.count(accounts)
end 

The first key, :sales, selects the sales substructure. The next key, update_bonus, is a function key that will be responsible for returning a new substructure will all the bonuses filled in. The last key, :accounts, selects the list of accounts for each salesperson.

The last argument to update_in is the transformation function, which operates on the last element in the traversal, in this case the list of accounts for each salesperson. It simply computes a bonus by multiplying the number of accounts by 1000.

The atom keys and the transformation function should be easy enough to understand. The only tricky part is the update_bonus function key, which is given here

1
2
3
4
5
6
7
8
update_bonus = fn :get_and_update, data, next ->
  new_data = Enum.map data, fn salesperson ->
    {_, updated_salesperson} = next.(salesperson)
    bonus = updated_salesperson[:accounts]
    put_in(salesperson, [:bonus], bonus)
  end
  {nil, new_data}
end

Hopefully this is starting to make sense to you, but let’s break it down to be sure. Line 1 is the function signature that we have examined earlier and the assignment to a variable, update_bonus, so we can pass it to update_in. The data passed to this function will be list of maps of salespeople as selected by the :sales key. Line 2 is going to transform all the elements in the list by mapping over them using Enum.map.

The function applied to each of the elements takes a single argument, salesperson, which will be a map. The body of the function is given in lines 3-5. It calls the next function on the salesperson map and gets back an updated salesperson map, with the bonus computed by the transformation function. If we were to just bubble this value back up we would have a mess since we would be replacing our list of accounts with the bonus. Instead, on line 4 we extarct the computed bonus from the updated_map (it’s under :accounts) and on line 5 we set the current map’s :bonus key to the calculated bonus. This new map replaces our salesperson map.

Finally, on line 7, we return our updated list of maps in a two element tuple, using nil as the first value (remember, update_in will ignore this value).

Putting it all together, we start with the first key, :sales, which points this substructure, a list of maps:

[%{salesperson: "Joe", bonus: 0, accounts: ["XYZ Inc.", "ABC Co.", "McDoogles"]},
 %{salesperson: "Bob", bonus: 0, accounts: ["ACME", "Tarjet"]},
 %{salesperson: "Jane", bonus: 0, accounts: ["Homely Depot", "Element 84", "Pear Computers"]},
 %{salesperson: "Jill", bonus: 0, accounts: ["Four Guys", "Gas 'N Sip'"]}]

This list is the structure that is passed to our update_bonus function as the data parameter.

update_bonus then calls Enum.map on this list. The function passed to Enum.map gets a map (salesperson) as its argument. It calls next on this map, which moves on to the next key in our key list, :accounts. The substructure pointed to by :accounts is the list of accounts. So for our first salesperson map, it would be this

["XYZ Inc.", "ABC Co.", "McDoogles"]

Since :accounts is the last key in the list passed to update_in, update_in calls its tranformation function on this account list, which looks like this

1000 * Enum.count(["XYZ Inc.", "ABC Co.", "McDoogles"])
=> 3000

Now the computed value begins to bubble back up. So at this point we have

{_, updated_salesperson} = {nil, %{salesperson: "Joe", bonus: 0, accounts: 3000}}

on line 3 of our update_bonus function.

The computed bonus, 3000, is under the :accounts key, since this is the last key in our key list. We simply extract it on line 4 with

bonus = updated_salesperson[:accounts]
=> 3000

Now that we have the bonus, we just update the oringal salesperson map the was passed to us on line 5

put_in(salesperson, [:bonus], bonus)
=> %{salesperson: "Joe", bonus: 3000, accounts: ["XYZ Inc.", "ABC Co.", "McDoogles"]}

update_bonus does this for every salesperson map in the list it received, so finally, we have

update_in company, [:sales, update_bonus, :accounts], fn accounts ->
  1000 * Enum.count(accounts)
end
=> {sales: [%{accounts: ["XYZ Inc.", "ABC Co.", "McDoogles"], bonus: 3000, salesperson: "Joe"},
            %{accounts: ["ACME", "Tarjet"], bonus: 2000, salesperson: "Bob"},
            %{accounts: ["Homely Depot", "Element 84", "Pear Computers"], bonus: 3000, salesperson: "Jane"},
            %{accounts: ["Four Guys", "Gas 'N Sip'"], bonus: 2000, salesperson: "Jill"}],
    total_revenue: 10000000}

Conclusion

In this series we have seen that familiar tasks such as looping over an array or working with nested data structures can be challenging for programmers moving from an imperative language to a functional one like Elixir; challenging, but not impossible. Once you start to think in terms of transforming data this opens up a lot of possibilities. Hopefully the examples we have worked through here have begun to help you understand how these kinds of problems can be solved with functional approaches. And hopefully I have convinced you that functional approaches are not reserved for academic problems, but are applicable to the kinds of problems you encounter every day – and that you don’t have to be a genius to use them.