One of my goals with Dynomite is to make it a high throughput system. It should be able to scale horizontally as hardware is added. But it should also be able to scale vertically and fully utilize hardware with varying degrees of capability. I wanted to get a baseline throughput reading for what could be called a single large instance.
So I setup a test to see what a single node could handle in terms of throughput. The test environment contains 2 identical nodes. Each node is a full host (not virtualized) with 8 cores, 16GB ram and a number of disks put together in RAID5. One node runs an instance of Dynomite, the other runs the benchmark.escript test harness.
The first thing I found is that the thrift bindings for Erlang are absolutely terrible as far as performance goes. When loading the system to saturation with thrift it’s entirely CPU bound. Total throughput for a node regardless of configuration turned out to be right around 200 req/s. One bottleneck down.
After identifying thrift as a likely cause I changed the benchmarking script to generate load using Erlang’s native messaging. The results were much better. Dstat showed that Dynomite was no longer CPU bound. Fortunately there was more throughput than with thrift. Writes averaged about 366 req/s and reads were 1238 req/s. This was with the default configuration having 64 partitions. After playing with the configuration a bit I noticed that throughput went up when the number of partitions went down. In particular, writes went to 753 req/s and reads went to 1456 req/s.
So it bears looking into what I mean by partition. The Dynamo paper describes a system where each key is consistently hashed to a partition and then that partition is consistently hashed to a node. For the purposes of expedience I decided to make every partition on a node refer to a unique instance of the underlying storage engine. So for instance for the dets engine this means that in a single node cluster there are 64 dets files that are written to and read from when there are 64 partitions. This allows things like partition transfer and synchronization very easy to implement and I thought that it would improve throughput since there would be more processes to handle requests. Unfortunately, until I saw these throughput numbers the full implications of this design decision were not apparent. Having data scattered across 64 different files means that seek time is being something of a bottleneck.
So I tried using the ets engine, which is entirely contained in memory. Even then reads and writes were only topping out to about 2000 req/s. That’s pitiful for something which does not even hit disk. Indeed, when benchmarked on its own independent of Dynomite ets tops out at about 23,000 req/s. So I basically stripped Dynomite down until I could point to whatever was being slow. I wrote a custom profiling server and broke down everything on the critical path to doing an operation. What I found was that everything was pretty fast. On the order of microseconds. Everything except for the call to the storage server. That was taking 20 milliseconds on average. But when I looked at the benchmarks from inside the storage server, it was still taking on the order of microseconds to do its work. So somehow the context switch to the storage server was taking much longer than the actually work getting done. This points to an issue with the VM.
Luckily, Erlang R13A just came out. And while it’s not yet production ready, it is complete enough to test against. And the results with R13A were very impressive. For a single node the ets throughput went to 18,000 req/s. Dets throughput was 7000 req/s for writes and 6000 req/s for reads with the single partition configuration. Not bad at all.

So I spent a lot of time working through issues which were essentially problems in the VM. But I would not call it time wasted. I learned a good deal about the performance profile of Dynomite and gained some new insight into how certain patterns in Erlang will translate into runtime performance. I also gained some insight into what design changes need to happen in the future for Dynomite. I will have future write ups about the design changes I have in mind, but for the time being I would say that the current version is ready for release.