What are you playing with today?


A colleague of mine recently asked me how I do research, and where the ideas come from.
The conversation went something like this:

NN: yeah, but how do you learn all these things?
ME: It’s a side-effect, really. Let me explain: when in life do you learn the most?
NN: It must be childhood…
ME: Yes, and why is that?
NN: The brain is more flexible when you’re young…
ME: Yes, that helps considerably, but that’s not the full answer.
The thing that especially helps is that you were playing.
NN: So, you’re telling me I should play more?
ME: Yes, but in order to get away with it, you call it research.

Currently, I’m playing with doing research about persistent data structures.
Literature uses a number of different names (CoW, append-only, log-structured, persistent, …) where updates (at least conceptually)
never change anything which means you have access to all previous versions.
This is associated with a lot of buzz-words like instant snapshotting, cheap transactions, ssd optimized, hot copy, …

Append-only binary tree

To illustrate the concept, I’ll show you how to play with an append-only binary-ish tree (a B-tree would just be more work, without additional fun).

binary tree variation

Lets start off with some pictures of the data structure we want to implement.
The append-only representation will be shown in the next section.
Initially, the tree is empty, and there is nothing to show, but after adding the first key value pair (“f”,”F”), which just means that “f” is mapped to “F”,
the tree looks like this:

A key node (shown as an oval, and called a leaf) just points to a value node (shown as a box).

This is the tree after we insert a second mapping (“d” maps to “D”):

A branch node (shown as a trapezium) splits the key space in 2 parts and everything smaller than or equal to its separator (“d”) dangles left, everything else, dangles right.

After adding the mappings (“h”,”H”) (“a”,”A”) and (z”,”Z”) we have the following tree:

append-only representation

The name append-only stems (I think) from the file based view, where only write at the end of the file (append).
It is also called a log.
For the one element tree, this is simple, once you realize that since “f” points to “F” you need to write value “F” first before you write the leaf “f”:

The number at the left of an entry denotes its position in the log, so later nodes can refer to it (the 0 in the leaf “f” points to the value at position 0).
Adding stuff can only be done at the right hand side, and things can only point to things that have already been written.
Inserting (“d”,”D”) means first writing the value “D”, then writing the leaf “d”, and finally the branch node.
The result is shown below:

Ok, adding (“h”,”H”),(“a”,”A”) and (“z”,”Z”) yields the following: (click to enlarge).

You immediately see that the space overhead can be considerate.
At some point in time, you don’t care anymore about older snapshots and you will want to have the possibility to reclaim that space.

minimal implementation

We still don’t have anything to play with. Let’s do a minimal implementation.
First start with a log.

type v = string
type k = string
type pos = int

type entry = 
  | NIL 
  | V of v
  | L of k * pos
  | N of pos * k * pos 

type log = {es:entry array;  mutable next : int;}

Entries are either nil, values, leafs or nodes. Btw, this is ocaml.
A log is an array, combined with an attribute next that tells you where next to ‘write’.
Some helper functions are needed.

let make cap = {es = Array.make cap NIL; next = 0;}
let root_pos log = log.next -1
let get_entry log pos = if pos = -1 then NIL else log.es.(pos)
let get_length log = Array.length log.es
let free log = get_length log - log.next
let write log es =
  let do_one e = 
    if free log = 0 then failwith "full";
    let p = log.next in
    log.es.(p) <- e;
    log.next <- p + 1 
  List.iter do_one es

Retreiving a value is straight forward, just descend starting from the root.

let get log k = 
  let rec find_pos pos = 
    match get_entry log pos with
      | V v -> v
      | L (k0,p0) when k = k0 -> find_pos p0 
      | N (l,k0,r) -> let p' = if k <= k0 then l else r  in
		      find_pos p'			
      | _ -> failwith "Not_found"
  find_pos (root_pos log)

Adding a mapping isn’t that hard. The trick is to record the places you visited on the way down, these are exactly the entries that will need to be rewritten.

type dir = Hit | Left | Right
type trail = (dir * entry * pos) list

exception Todo of trail * string
let todo v msg = raise (Todo (v,msg))

After introducing the direction and the trail I added an exception for cases not handled.
The skeleton for the ‘set’ is rather straight forward:

let set log k v =
  let rec descend trail pos = 
    match get_entry log pos with
      | NIL -> trail
      | V v -> failwith "corrupt"
      | L (k0,p0) as e -> let dir = 
			    if k = k0 
			    then Hit else if k < k0 
			      then Left 
			      else Right 
			  (dir , e, pos) :: trail
      | N (l,k0,r) as e when k <= k0 -> descend ((Left,e, pos) :: trail) l
      | N (l,k0,r) as e              -> descend ((Right,e,pos) :: trail) r
  let trail = descend [] (root_pos log) in
  let update = build_set k v log.next trail in
  write log update

So first we go down the tree accumulating a trail, which we use to build the update that gets written to the log.
Let’s build the update:

let rec build_set k v start (visited:trail)  = V v :: do_start start k visited 
and do_start start k visited = match visited with
  | [] -> [L (k, start) ]
  | (dir , L(k0,p0),pe) :: rest ->
      let e = L(k,start) in
      match dir with
	| Hit   -> e :: do_rest (start + 1) rest
	| Left  -> e :: N(start + 1, k, pe) :: do_rest (start + 2) rest
	| Right -> e :: N(pe, k0, start +1) :: do_rest (start + 2) rest
  | _ -> todo visited (Printf.sprintf "do_start %i '%s'" start k)
and do_rest start visited = 
  match visited with
    | [] -> []
    | h :: t ->
      let e = 
	match h with
	  | (Left , N(pl,k0,pr),pe)  ->  N(start,k0,pr) 
	  | (Right, N(pl,k0,pr),pe)  ->  N(pl,k0,start)
	  | _ -> todo visited (Printf.sprintf "do_rest %i" start)
      e :: do_rest (start + 1) t

Creating the update is done in three steps, where do_rest is the most difficult.
The key insight is that if on the way down, you took the left branch, that part of the node needs to change, the other part can be copied.

interactivity, please

The piece of code below allows you to dump a log.

let dump log = 
  Array.iteri (fun i e->
    let () = Printf.printf "%2i: " i in
    match e with
      | NIL        -> print_newline()
      | V v        -> Printf.printf "V \"%s\"\n" v
      | L (k,p)    -> Printf.printf "L(\"%s\",%i)\n" k p 
      | N (l,k0,r) -> Printf.printf "N(%i,\"%s\",%i)\n" l k0 r
  ) log.es

an example dump is this:

 0: V "F"
 1: L("f",0)
 2: V "D"
 3: L("d",2)
 4: N(3,"d",1)
 5: V "H"
 6: L("h",5)
 7: N(1,"f",6)
 8: N(3,"d",7)
 9: V "A"
10: L("a",9)
11: N(10,"a",3)
12: N(11,"d",7)
13: V "Z"
14: L("z",13)
15: N(6,"h",14)
16: N(1,"f",15)
17: N(11,"d",16)

But examining this to see if all references are correct is rather tedious.
With a few lines of code you can generate output that can be processed by graphviz to generate images that allow visual inspection.

let dot_log ?(f = stdout) log = 
  Printf.fprintf f "digraph Log{\n";
  Printf.fprintf f "\trankdir=\"RL\";\n";
  Printf.fprintf f "\tnode [shape=record];\n";
  let () = Array.iteri (fun i e ->
    let () = match e with
      | NIL -> ()
      | V v     -> 
	Printf.fprintf f "\tnode%i [label = \"{%i|%s}\"];\n" i i v
      | L (k,p) -> 
	Printf.fprintf f 
	  "\tnode%i [label = \"{%i | { %s | <f1> %i} }\"];\n" 
	  i i k p;
	Printf.fprintf f "\tnode%i:<f1> -> node%i;\n" i p
      | N(l,k0,r)  -> Printf.fprintf f "\tnode%i [label = \"{%i| { <f1> %i | %s | <f2> %i}}\"];\n" i i l k0 r;
	Printf.fprintf f "\tnode%i:<f1> -> node%i;\n" i l;
	Printf.fprintf f "\tnode%i:<f2> -> node%i;\n" i r;
    if e <> NIL && i > 0 then Printf.fprintf f "\tnode%i -> node%i [style = invis];\n" i (i-1)
  ) log.es in
  Printf.fprintf f "}"

This needed a bit of experimentation, but the record feature helps a lot.
To assure things are rendered in the correct left-to-right order, every entry has an invisible arrow to its predecessor.
The last fragment glues graphviz and evince together to have the kind of interactive visualization you appreciate from mathlab or scipy.

let view ?(v=dot_log) log = 
  let root = "test" in
  let dot = Filename.temp_file root ".dot" in
  let png = Filename.temp_file root ".png" in
  let oc = open_out dot in
  let () = v ~f:oc log in
  close_out oc;
  let convert_cmd = Printf.sprintf "dot -Tpng -o %s %s" png dot in
  let _ = Sys.command convert_cmd in
  let cmd = Printf.sprintf "evince %s" png in
  Sys.command cmd

let view_log  log = view ~v:dot_log log

Ok, this calls for a screenshot:

closing remarks

The essence is that I build me a nifty toy to gain insight in persistent data structures in less than 150 lines of code,
which is a tribute to the power of ocaml.
All that may be true, but the code isn’t that minimal, and not very pretty:

  • build_set can be made tail recursive and shorter as well.
  • The mutual recursive functions are not needed neither if they are defined in the right order.
  • leaf nodes can be suppressed.
  • dot_log feels hackish.

Most of the time in writing the code was actually spent trying to seduce graphviz in doing what I wanted.

Have fun,


Post mortem

This was my first blog post ever.
I should probably have done something shorter as I’m still learning the wordpress web interface.


2 Comments on “What are you playing with today?”

  1. […] Baardskeerder project, as explained before, is an implementation of a B-tree-ish data structure, using an append-only file to persist the data […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s