H: One voter, 16,472 votes — a slight anomaly…?
E: Not really, Mr. Hanna. You see, Baldrick may look like a monkey who’s
been put in a suit and then strategically shaved, but he is a brillant
politician. The number of votes I cast is simply a reflection of how
firmly I believe in his policies.
Black Adder III, Episode 1
So, Arakoon 1.6.0 is out, a release so huge we just had to leap minor increments! This blog post will walk you through the various changes between the 1.4 series and the 1.6 series. We’ll explain why it’s faster, and give an indication by how much.
Harvesting the client queue
Arakoon is a multipaxos implementation. This means that arakoon nodes try to gain consensus on what the next transaction is that needs to be applied.
The thing they gain consensus on is called a paxos value. In all Arakoons (≤ 1.4) there was a 1:1 mapping between a client update and a paxos value. This all works fine in LAN, but when the latency between nodes becomes a lot higher, like in a WAN setup, performance becomes abysmally slow. But some of your users are using Arakoon in a multi-datacenter setup. The solution chosen to address this was to have a n:1 mapping between client updates and paxos values. What Arakoon 1.6 does when the time has come to decide on the next paxos value, is to take all (or up to a limit) of the queued updates, and stuff them in the next paxos value. For multiple clients and small updates this means an almost linear speedup in a WAN setting.
Batched Local store
Ops: We have seen a corrupt tokyo cabinet, and we can reproduce it!
Dev: Great! What’s the scenario?
Ops: It’s really easy. Just apply the following 218 million transactions to an empty store.
Dev: Err, euhm, thx ?!
When you want to fix a problem, having a 100% reproducible scenario is a blessing, but not if it takes forever. Fortunately, during the root cause analysis of such a problem, you have time to think, and thus the insight came: We can speed this up a lot! Having to maintain a transaction log is a mixed blessing, but one of the advantages is that you don’t have to apply all updates to the database immediately. So we added an in-memory caching layer that sits before Tokyo Cabinet. All updates happen there and from time to time, we push the updates through in a big batch. We can afford ourselves to do this as we already ensured durability via the transaction logs.
Since the early days, Arakoon has a crash logger. When an Arakoon node dies unexpectly, it dumps its last 1000 or so log statements in a separate file. This is a really useful feature, and allowed us to determine the underlying cause of death in many occasions. Unfortunately, it was also a very expensive feature: We were always generating debug level log statements, even if the configuration stated ‘info’. Logging is expensive, even if you throw the log statements away. There is a big cost in the generation of all these strings via fancy Printf templates. So we decided to use a syntax extension that allows us to create these strings more lazily.
So now we have the best of both worlds: when all goes well, we don’t need debug statements and we never pay the cost of constructing the string, but when things turn sour,
we can still go back and create our crash log.
We’re not going to show full fledged benchmarks, just a simple microbenchmark to show a trend. The experiment is to let a client (or more clients) do a million sets to an Arakoon cluster and see how long it takes. All experiments were performed on a simple Intel Core i7 with a HDD.
|1e6 sets/client||Arakoon 1.4.3||Arakoon 1.6.0|
|4 // clients||900s||625s|
|8 // clients||2180s||1150s|
How to read the table? 1 client does 1e6 sets on Arakoon 1.6.0 and finishes after 196s. 8 clients do 1e6 sets in parallel on Arakoon 1.6.0 and the last client finishes after 1150s. That looks like a rather nice improvement.
The next table is for a cluster of 3 nodes, all running on the same machine and thus sharing the same HDD
|1e6 sets/c||Arakoon 1.4.3||Arakoon 1.6.0|
|4 // clients||4005s||2470s|
|8 // clients||8734s||4700s|
A last table shows the effect of latency. We add 50ms latency on everything that goes over the loopback interface (via netem). Again, this is a setup with 3 nodes, all on the same host, sharing a hdd (although this will not matter too much).
|1e5 sets/c||Arakoon 1.4.3||Arakoon 1.6.0|
|8 // clients||86400||20600|
Note that this benchmark only goes to 1e5 (otherwise, it would take forever) You can immediately see the dramatic effect of latency on a distributed system. Arakoon 1.6.0 does a better job at fighting latency than Arakoon 1.4.3 (the difference in performance will increase with the number of clients).
Other notable improvements
Tokyo Cabinet patched
The whole data corruption annecdote eventually lead to a 32 bit overflow issue in tokyo cabinet. With the help of some friends, we fixed it and made our changes available here in this tokyo cabinet git repo. We really don’t want to expose anyone to database corruption and decided to backport this to the 1.4 series too.
We started to use log sections, which allows you to configure a node such a way that for example the messaging layer is in debug, while everything else stays on info level. Which allows you to put the focus on something you’re examining while not having to wade through zillions of log messages. Handy for ops and devs alike.
Client API additions
We added a 2 small but useful methods to the client API. First there’s AssertExists of key that allows you to shortcut a sequence if a key is not there. Second, there’s multi_get_option: key list -> value option list Lwt.t which allows you to fetch a series of values, even if you’re not sure all of them exist in the database.
Some of our users have fancy SSDs for the tlogs and wanted to be able to fsync after every write of a tlog entry. We accomodated them.
You can find the full list of improvements for Arakoon 1.6 in our bug tracker
It’s stable, way faster than it used to be, and it’s free.
You can get it from our githup repo
What are you waiting for?