As I’ve been going through Okasaki’s book on functional data structures, it occurred to me that it’d also be a lot of fun to run through the F# implementation of it’s core data structures and compare them with the book Purely Functional Data Structures.
It’s a lot of fun seeing how fundamental and still powerful the data structures used in Purely Functional Data Structures are and has made reading through this book even more satisfying.
The first significant data structure in Okasaki is the Set, which is implemented using a binary search tree. It’s simple enough to find the F# Core implementation of set in the GitHub repository.
1 2 3 4 

and a little further down in the source file we find this definition of Set
:
1


So the Set
type is definitely built on top of a binary search tree. Now the question is,
what type of binary search tree and that can be determined by looking at the functions
used on the SetTree
type. There’s the expected add
and remove
and within those
functions are calls to a rebalance
function. Meaning that F# is using a self balancing
BST for its Set
implementation. Here’s the code for the rebalance
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 

So F# is using what looks like a slightly modified AVL tree, which uses a definable tolerance for how unbalanced the tree can get before rebalancing happens. The strict AVL tree rebalances if the difference in height between the left and right branches is greater than 1.
There’s another interesting function nestled within the SetTree
module: split
.
This function interested me because it wasn’t referenced in the balance or the
add
/remove
functions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 

This function has type SetTree, boolean, SetTree
and takes a SetTree t
and a pivot
value pivot
. The first SetTree
in the return type is the set of all values in t
which are less than pivot
, the boolean is true if pivot
is in t
, and the right
SetTree
contains all the values in t
which are greater than pivot
.
The split
function is used in the union function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

This function takes two trees t1
and t2
and merges them together and returns a
single TreeSet
with all the elements from t1
and t2
. Here’s how split is used
in this function:
 Assuming
t1
is taller thant2
(it’s basically the same ift2
is taller thant1
) split
all the elements int2
by if they are smaller than the root value oft1
. Call the set of nodes smaller than thet1
rootlo
and the set of larger nodeshi
. Use
union
to merge the left subtree oft1
withlo
and to merge the right subtree oft1
withhi
.  Balance the to newly generated subtrees. This leaves a set of all the values smaller
than the
t1
value and a set of all the values larger thant1
. These can then be used to create a newTreeSet
representing the fully balanced merging oft1
andt2
.