# F# Data Structures: Set

As I’ve been going through Okasaki’s book on functional data structures, it occurred to me that it’d also be a lot of fun to run through the F# implementation of it’s core data structures and compare them with the book Purely Functional Data Structures.

It’s a lot of fun seeing how fundamental and still powerful the data structures used in Purely Functional Data Structures are and has made reading through this book even more satisfying.

The first significant data structure in Okasaki is the Set, which is implemented using a binary search tree. It’s simple enough to find the F# Core implementation of set in the GitHub repository.

and a little further down in the source file we find this definition of Set:

So the Set type is definitely built on top of a binary search tree. Now the question is, what type of binary search tree and that can be determined by looking at the functions used on the SetTree type. There’s the expected add and remove and within those functions are calls to a rebalance function. Meaning that F# is using a self balancing BST for its Set implementation. Here’s the code for the rebalance function:

So F# is using what looks like a slightly modified AVL tree, which uses a definable tolerance for how unbalanced the tree can get before rebalancing happens. The strict AVL tree rebalances if the difference in height between the left and right branches is greater than 1.

There’s another interesting function nestled within the SetTree module: split. This function interested me because it wasn’t referenced in the balance or the add/remove functions.

This function has type SetTree, boolean, SetTree and takes a SetTree t and a pivot value pivot. The first SetTree in the return type is the set of all values in t which are less than pivot, the boolean is true if pivot is in t, and the right SetTree contains all the values in t which are greater than pivot.

The split function is used in the union function:

This function takes two trees t1 and t2 and merges them together and returns a single TreeSet with all the elements from t1 and t2. Here’s how split is used in this function:

1. Assuming t1 is taller than t2 (it’s basically the same if t2 is taller than t1)
2. split all the elements in t2 by if they are smaller than the root value of t1. Call the set of nodes smaller than the t1 root lo and the set of larger nodes hi.
3. Use union to merge the left subtree of t1 with lo and to merge the right subtree of t1 with hi.
4. Balance the to newly generated subtrees. This leaves a set of all the values smaller than the t1 value and a set of all the values larger than t1. These can then be used to create a new TreeSet representing the fully balanced merging of t1 and t2.