The Game of Distributed Systems Programming. Which Level Are You?


When programming distributed systems becomes part of your life, you go through a learning curve. This article tries to describe my current level of understanding of the field, and hopefully points out enough mistakes for you to be able follow the most optimal path to enlightenment: learning from the mistakes of others.
For the record: I entered Level 1 in 1995, and I’m currently Level 3. Where do you see yourself?

Level 0: Clueless

Every programmer starts here. I will not comment too much here as there isn’t a lot to say. Instead, I quote some conversations I had, and offer some words of advice to developers that never battled distributed systems.

NN1:”replication in distributed systems is easy, you just let all the machines store the item at the same time

Another conversation (from the back of my memory):

NN: “For our first person shooter, we’re going to write our own networking engine”
ME: “Why?”
NN: “There are good commercial engines, but license costs are expensive and we don’t want to pay these.”
ME: “Do you have any experience in distributed systems?”
NN: “Yes, I’ve written a socket server before.”
ME: “How long do you think you will take to write it?”
NN: “I think 2 weeks. Just to be really safe we planned 4.”

Sometimes it’s better to remain silent.

Level 1: RPC

RMI is a very powerful technique for building large systems. The fact that the technique can be described, along with a working example, in just a few pages, speaks volumes of Java. RMI is tremendously exciting and it’s simple to use. You can call to any server you can bind to, and you can build networks of distributed objects. RMI opens the door to software systems that were formerly too complex to build.

Peter van der Linden, Just Java (4th edition, Sun Microsystems)

Let me start by saying I’m not dissing this book. I remember disctinctly it was fun to read (especially the anecdotes between the chapters), and I used it for the Java lessons I used to give (In a different universe, a long time ago). In general, I think well of it. His attitude towards RMI however, is typical of Level 1 distributed application design. People that reside here share the vision of unified objects. In fact, Waldo et al describe it in detail in their landmark paper “a note on distributed computing” (1994), but I will summarize here:
The advocated strategy to writing distributed applications is a three phase approach. The first phase is to write the application without worrying about where objects are located and how their communication is implemented. The second phase is to tune performance by “concretizing” object locations and communication methods. The final phase is to test with “real bullets” (partitioned networks, machines going down, …).

The idea is that whether a call is local or remote has no impact on the correctness of a program.

The same paper then disects this further and shows the problems with it. It has thus been known for almost 20 years that this concept is wrong. Anyway, if Java RMI achieved one thing, it’s this: Even if you remove transport protocol, naming and binding and serialization from the equation, it still doesn’t work. People old enough to rember the hell called CORBA will also remember it didn’t work, but they have an excuse: they were still battling all kinds of lower level problems. Java RMI took all of these away and made the remaining issues stick out. There are two of them. The first is a mere annoyance:

Network Transparency isn’t

Let’s take a look at a simple Java RMI example (taken from the same ‘Just Java’)

public interface WeatherIntf extends javva.rmi.Remote{
  public String getWeather() throws java.rmi.RemoteException;    

A client that wants to use the weather service needs to do something like this:

     Remote robj = Naming.lookup("//localhost/WeatherServer");
     WeatherIntf weatherserver = (WeatherInf) robj;
     String forecast = weatherserver.getWeather();
     System.out.println("The weather will be " + forecast);
  }catch(Exception e){

The client code needs to take RemoteExceptions into account.
If you want to see what kinds of remote failure you can encounter, take a look at the more than 20 subclasses. Ok, so your code will be a tad less pretty. We can live with that.

Partial Failure

The real problem with RMI is that the call can fail partially. It can fail before the action on the other tier is invoked, or the invocation might succeed but the return value might not make it afterwards, for whatever reason. These failure modes are in fact the very defining property of distributed systems or otherwise stated:

“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable”
(Leslie Lamport)

If the method is just the retrieval of a weather forecast, you can simply retry, but if you were trying to increment a counter, retrying can have results ranging from 0 to 2 updates. The solution is supposed to come from idempotent actions, but building those isn’t always possible. Moreover, since you decided on a semantic change of your method call, you basically admit RMI is different from a local invocation. This is an admission of RMI being a fallacy.

In any case the paradigm is a failure as both network transparency and architectural abstraction from distribution just never materialise. It also turns out that some software methodologies are more affected than others. Some variations of scrum tend to prototype. Prototypes concentrate on the happy path and the happy path is not the problem. It basically means you will never escape Level 1. (sorry, this was a low blow. I know)

People who do escape Level 1 understand they need to address the problem with the respect it deserves. They abandon the idea of network transparency, and attack the handling of partial failure strategically.

Level 2: Distributed Algorithms + Asynchronous messaging + Language support

<sarcasm>”Just What We Need: Another RPC Package” </sarcasm>
(Steve Vinoski)

Ok, you’ve learned the fallacies of distributed computing. You decided to bite the bullet, and model the message passing explicitly to get a control of failure.
You split your application into 2 layers, the bottom being responsible for networking and message transport, while the upper layer deals with the arrival of messages, and what needs to be done when they do.
The upper layer implements a distributed state machine, and if you ask the designers what it does, they will tell you something like : “It’s a multi-paxos implementation on top of TCP”.
Development-wise, the strategy boils down to this: Programmers first develop the application centrally using threads to simulate the different processes. Each thread runs a part of the distributed state machine, and basically is responsible for running a message handling loop. Once the application is locally complete and correct, the threads are taken away to become real processes on remote computers. At this stage, in the absence of network problems, the distributed application is already working correctly. In a second phase fault tolerance can be straighforwardly achieved by configuring each of the distributed entities to react correctly to failures (I liberally quoted from “A Fault Tolerant Abstraction for Transparent Distributed Programming).

Partial failure is handled by design, because of the distributed state machine. With regards to threads, there are a lot of options, but you prefer coroutines (they are called fibers, Light weight threads, microthreads, protothreads or just theads in various programming languages, causing a Babylonic confusion) as they allow for fine grained concurrency control.

Combined with the insight that “C ain’t gonna make my network any faster”, you move to programming languages that support this kind of fine grained concurrency.
Popular choices are (in arbitrary order)

(Note how they tend to be functional in nature)

As an example, let’s see what such code looks like in Erlang (taken from Erlang concurrent programming)


-export([start/0, ping/2, pong/0]).

ping(0, Pong_PID) ->
    Pong_PID ! finished,
    io:format("ping finished~n", []);

ping(N, Pong_PID) ->
    Pong_PID ! {ping, self()},
        pong ->
            io:format("Ping received pong~n", [])
    ping(N - 1, Pong_PID).

pong() ->
        finished ->
            io:format("Pong finished~n", []);
        {ping, Ping_PID} ->
            io:format("Pong received ping~n", []),
            Ping_PID ! pong,

start() ->
    Pong_PID = spawn(tut15, pong, []),
    spawn(tut15, ping, [3, Pong_PID]).

This definitely looks like a major improvement over plain old RPC. You can start reasoning over what would happen if a message doesn’t arrive.
Erlang gets bonus points for having Timeout messages and a builtin after Timeout construct that lets you model and react to timeouts in an elegant manner.

So, you picked your strategy, your distributed algorithm, your programming language and start the work. You’re confident you will slay this monster once and for all, as you ain’t no Level 1 wuss anymore.

Alas, somewhere down the road, some time after your first releases, you enter troubled waters. People tell you your distributed application has issues. The reports are all variations on a theme. They start with a frequency indicator like “sometimes” or “once”, and then describe a situation where the system is stuck in an undesirable state. If you’re lucky, you had adequate logging in place and start inspecting the logs. A little later, you discover an unfortunate sequence of events that produced the reported situation. Indeed, it was a new case. You never took this into consideration, and it never appeared during the extensive testing and simulation you did. So you change the code to take this case into account too.

Since you try to think ahead, you decide to build a monkey that pseudo randomly lets your distributed system do silly things. The monkey rattles its cage and quickly you discover a multitude of scenarios that all lead to undesirable situations like being stuck (never reaching consensus) or even worse: reaching an inconsistent state that should never occur.

Having a monkey was a great idea, and it certainly reduces the chance of encountering something you’ve never seen before in the field. Since you believe that a bugfix goes hand in hand with a testcase that first produced the bug, and now proves its demise, you set out to build just that test. Your problem however is reproducing the failure scenario is difficult, if not impossible. You listen to the gods as they hinted when in doubt, use brute force. So you produce a tests that runs a zillion times to compensate the small probability of the failure. This makes your bug fixing process slow and your test suites bulky. You compensate again by doing divide and conquer on your volume of testsets. Anyway, after a heavy investment of effort and time, you somehow manage to get a rather stable system and ditto process.

You’re maxed out on Level 2. Without new insights, you’ll be stuck here forever.

Level 3: Distributed Algorithms + Asynchronous messaging + Purity

It takes a while to realise that a combination of long running monkeys to discover evil scenarios and brute force to reproduce them ain’t making it. Using brute force just demonstrates ignorance. One of the key insights you need is that if you could only remove indeterminism from the equation, you would have perfect reproducibility of every scenario. A major side effect of Level 2 distributed programming is that your concurrency model tends to go viral on your codebase. You desired fine grained concurrency control… well you got it. It’s everywhere. So concurrency causes indeterminism and indeterminism causes trouble. So concurrency must go. You can’t abandon it: you need it. You just have to ban it from mingling with your distributed state machine. In other words, your distributed state machine has to become a pure function. No IO, No Concurrency, no nothing. Your state machine signature will look something like this

module type SM = sig
  type state
  type action
  type msg
  val step: msg -> state -> action * state

You pass in a message and a state, and you get an action and a resulting state. An action is basically anything that tries to change the outside world, needs time to do so, and might fail while trying. Typical actions are

  • send a message
  • schedule a timeout
  • store something in persistent storage

The important thing to realise here is that you can only get to a new state via a new message. nothing else. The benefits of such a strict regime are legio. Perfect control, perfect reproducibility and perfect tracibility. The costs are there too. You’re forced to reify all your actions, which basically is an extra level of indirection to reduce your complexity. You also have to model every change of the outside world that needs your attention into a message.

Another change from Level 2 is the change in control flow. At Level 2, a client will try to force an update and set the machinery in motion. Here, the distributed state machine assumes full control, and will only consider a client’s request when it is ready and able to do something useful with it. So these must be detached.

If you explain this to a Level 2 architect, (s)he will more or less accept this as an alternative. It, however, takes a sufficient amount of pain (let’s call it experience or XP) to realize it’s the only feasible alternative.

Level 4: Solid domination of distributed systems: happiness, piece of mind and a good night’s rest

To be honest, as I’m a mere Level 3 myself, I don’t know what’s up here. I am convinced that both functional programming and asynchronous message passing are parts of the puzzle, but it’s not enough.
Allow me to reiterate what I’m struggling against. First, I want my distributed algorithm implementation to fully cover all possible cases.
This is a big deal to me as I’ve lost lots of sleep being called in on issues in deployed systems (Most of these turn out to be PEBKAC but some were genuine, and cause frustration). It would be great to know your implementation is robust. Should I try theorem provers, should I do exhaustive testing ? I don’t know.
As an aside, for an append only btreeish library called baardskeerder, we know we covered all cases by exhaustively generating insert/delete permutations and asserting their correctness. Here, it’s not that simple, and I’m a bit hesitant to Coqify the codebase.
Second, for reasons of clarity and simplicity, I decided not to touch other, orthogonal requirements like service discovery, authentication, authorization, privacy and performance.
With regard to performance, we might be lucky as the asynchronuous message passing at least doesn’t seem to contradict performance considerations.
Security however is a real bitch as it crosscuts almost everything else you do. Some people think security is a sauce that you can pour over your application to make it secure.
Alas, I never succeeded in this, and currently think it also needs to be addressed strategically during the very first stages of design.

Closing words

Developing robust distributed systems is a difficult problem that is practically unsolved, or at least not solved to my satisfaction.
I’m sure its importance will increase significantly as latency between processors and everything else increases too. This results in an ever growing area of application for this type of application development.

As far as Level 4 goes, maybe I should ask Peter Van Roy. Over the years, I’ve read a lot of his papers, and they offered me a lot of insight in my own mistakes. The downside of insight is that you see others repeating your mistakes and most of the time, I fail to convince people they should do it differently.
Probably, this is because I cannot offer the panacea they want. They want RPC and they want it to work. It’s perverse … almost religious


70 Comments on “The Game of Distributed Systems Programming. Which Level Are You?”

  1. Ben Schaeffer says:

    That was an awesome article! Thanks for taking the time to write it. I am a self-taught distributed systems programmer and recognize my own progression / learning process over the last decade in your article. If there really is a kind of universal experience in developing this kind of app… Well, maybe there is “science” in computer science 😉

  2. johnicholas says:

    I’d be interested to hear what you think about E.

    Furthermore, do you think there is a spectrum of distributed algorithms, from internet-spanning, to data-center-spanning, down to single machine-spanning?

    What about algorithms intended to run on cellular automata or FPGAs? There’s parallelism there, but the problems seem to be different.

    What I’m trying to say is: can you work from the scope of your system to some reliability / failure / performance model that will give you some hints on how to argue that you’ve addressed what needs to be addressed?

    • Ellie K says:

      E is a programming language with no = “EQUAL” operator!

      Sorry (sort of): I realize that you didn’t ask for my opinion. And I understand the rationale for not having an “equal to” operation, given that E is intended to be a security and audit focused programming language. It is an interesting and reasonable idea to have a security oriented programming language, when you consider how many trivial purposes (relatively speaking) some of these newer languages seem to be designed for. Well, let’s just say “highly localized”. (No, I don’t want to name any names). Given how important security is, it should get more attention from compiler and programming language designers!

      • rslootma says:

        you’re right: it should get more attention. It’s not as easy as encrypting the messages. The pain starts when you ask yourself the question under which conditions A is allowed to send a message to B. A second wave of pain arrives when you consider the effect of a revocation. Authorization depends on so many parameters it crosscuts everything. I’m traumatized.

    • rslootma says:

      So many and difficult questions…
      First about E: I don’t know the language, but from the look of it it doesn’t seem too different from what I used in the 90s ()
      So I’m pretty sure you can use this to build distributed applications. So yes, language support does seem to be sufficient here.

      About the spectrum: Yes, I do think there are differences, but I don’t think it’s a spectrum. I cannot place them on a single axis. There are different axes like security needs, latency, bandwidth, … that have impact on your (or the best, or the simplest) strategy.

      About the FPGAs and cellular automata: I’m not an expert, but I did some GPGPU programming in the past, and the fact you have a single clock that drives everything makes your world considerably simpler.
      About the road from scope to model: Yes, I think it’s possible to derive some kind of decision flow model that will force you to answer a decent set of questions before you start.

      You made me think. thx for that.

  3. svenningsson says:

    As you mentioned, one problem with distributed software is being able to test them. Most approaches that I know of are dead ends. But there is some promising work that can increase the reliability of your system quite substantially without having to formalize it or try to apply model checking. Check out the following paper which extend QuickCheck to do randomized property based testing of concurrent Erlang programs.

  4. Moi says:

    My humble 2 cents about security: I’ve made it a doctrine that security tokens/information/whatever be exchanged at any modular transition of my application (e.g. client/server, application/db). This isn’t to be confused with libraries (like some sort of image manipulation tool or whatever).

    This has allowed me to sometimes ignore the security checks in lower level areas, but still have the signatures and data model in place.

    The only thing that this entails is that you need to decide what your security data model is going to be. But aside from that, it doesn’t much affect the application. With CL, where you can have dynamic bindings, it’s actually quite easy and almost transparent to pass this data around within your context.

    And finally, in my production systems, upper/lower transitions are always considered as unsafe. So, for example, my DB always assumes my application is unsafe etc.

    I feel good about this, but I recognize it’s almost a religious kind of feel-good. I have practically no metric to judge it by.

  5. what about designing languages and hardware simultaneously to support this? What level should that be at?

  6. Very nice discussion about levels! But your level 3 still makes some “level 2 mistakes”. It is possible (and it’s a good idea) to separate concurrency and nondeterminism. The problem is nondeterminism, not concurrency. You write your program using a deterministic concurrency model, and introduce nondeterminism exactly where you need it (and only there). Usually, that means only in a very small number of places. For example, in a client/server there is only one point where you really need nondeterminism, and that’s where the server accepts client requests. All the rest can be deterministic. One big problem of most languages (except a select few: Erlang, E, and Oz) is that they mix state and concurrency. That’s a recipe for disaster, since it introduces nondeterminism everywhere, and you’re dead even before you start running.

    • rslootma says:

      thx for dropping by. Indeed, I understand that non-determinism is a beast that needs to be kept in a straightjacket. I must confess I witnessed an Oz presentation of you in the late 90s that was completely lost on me as I was still stuck in the world of concurrent object oriented languages and meta object protocols. It took a while to escape that. Like I said in the blog post, currently I think functional programming is part of the solution, but I’m confronted with more and more people straight from university that think FP doesn’t matter at all. Isn’t it a task for the Academia to expose people to more than one programming paradigm so they don’t try to hit every problem with the same hammer?

      • You’re right, functional programming is part of the solution! Industry is recognizing this in Scala and Erlang, two languages with strong functional subsets that are both becoming popular. Scala is especially interesting because it runs on the JVM: the idea is to jettison the language (Java) but keep the technology (JVM). This is an appealing idea to many industrial developers.

  7. michaelrush says:

    Great article – thanks.

    On level 4.. *peace* of mind seems more likely what you meant 😉

  8. angOS says:

    man your mentioned mozart and stackless and didn’t mention for f# when f# is really nice for concurrent and distributed programming (support actors, parallel programming etc)…add it to the list 😀

    • rslootma says:

      True, but I mentioned it’s daddy: ocaml. If I were on windows, I’d probably do f#.Currently I’m not.

      • COCO says:

        I’m using f# in mac and run really smooth, actually, this run better than java in my laptop, obviously is slower but consume less memory, the GUI’s are more responsives, I don’t know how is the performance in other plataforms….

  9. dmbarbour says:

    Up at level 4, you should look into the CALM conjecture and Bloom.

    Also, Lightweight Time Warp protocols.

    Of course, there are more issues than partial failure and network transparency – security, for example, becomes very important as we go distributed. As does upgrade, maintenance, administration, and job control of code that has already been distributed.

  10. How do you think OTP satisfies Level 3 requirements?

    • Nicolas T. says:

      Whilst I truly like OTP, I’m not sure it’s really up to the 3th level, although I certainly don’t want to say it’s incompatible with level 3: whilst level 3 requires purity and separation between (pure) algorithm implementation and ‘engineering issues’ like messaging and timeout, OTP doesn’t really enforce this separation, yet using OTP might be a way to reduce the burden of these ‘engineering issues’ significantly.

  11. Thanks for the excellent article. There is only thing I wish were different in it: no talk about language support at Level 2. Right now you can read into it that if you don’t have language support for asynchronous messaging you better stick with RPC. I’m not sure if this was your intent, perhaps not, but it can be interpreted this way.

    The language features are nice, I understand why you like them, but most of us need to build distributed systems in mainstream languages like Java. I believe you can build distributed systems exposing the properties described in Level 2 in almost any language. It is perhaps harder, but it can be done.

    I rate myself at level two, so cannot comment on three and four. Maybe language support belongs there, maybe it is only a nice-to-have at all levels.

    • rslootma says:

      You’re right about the “It’s perhaps harder, but it can be done.” part. You can always build the features you need in the language you’re using. For example, you have several coroutine implementations available for Java. (fe here ). It’s just a lot harder. Compare it to trying to shave yourself and warding off a wasp.
      You can do both but trying to do them at the same time generally does not end well.

      As C developers will tell you you don’t really really need threads as a language concept, or exceptions,or OO or lambda expressions, but are you willing to give them up once you tasted them ?
      I’m pretty sure moving from Level 1 to Level 2 requires an insight more than anything else, but it’s the very same insight that will push you towards other programming languages.
      A word of warning though: Edgar Allen Poe (At least I think it was him) once wrote: “understanding is the strongest poison; there is no antidote”

  12. On level 4 resides invariant-based reasoning and self-stabilizing fault-tolerance.

    • dmbarbour says:

      Agreed. Self-stabilizing distributed structure is very important for scalable distributed systems.

      And there are many useful invariants we can pursue – for partial failure, redundancy, consistency, graceful degradation, clean failover, resilience, timeliness of data, elimination of starvation or priority inversion, security, controlling space overheads, persistence, runtime upgrade…

      Seems difficult to fit them all in one abstraction, though. But I find that abandoning message-passing control-flow models in favor of dataflow is a very good start.

    • rslootma says:

      Please enlighten me. How does this differ from the Level 2 invariant reasoning and self stabilization that Paxos variations bring us ?

  13. Nice article, thanks!

    Minor bug report: in the sentence beginning with “The upper layer implements a distributed state machine”, you are missing the URL on the link for “distributed state machine”. What did you want to link to?

    • rslootma says:

      Thx for pointing this out. I was thinking of either linking to Lamport’s original paxos paper or to the paxos entry on wikipedia. In the end, it seems I did neither. Will correct this.

  14. […] The Game of Distributed Systems Programming. Which Level Are You? (via Kent Beck) – we start with a naive approach to distributed systems, treating them as just a little different local systems, then (painfully) come to understand the fallacies of distributed programming and start to program explicitely for the distributed environment leveraging asynchronous messaging and (often functional) languages with good support for concurrency and distribution. We suffer by random, subtle, non-deterministic defects and try to separate and restrict non-determinism by becoming purely functional … . Much recommended to anybody dealing with distributed systems (i.e. everybody, nowadays). The discussion is worth reading as well. […]

  15. Beleif that the functional purity means “no problem” is exactly what described in “Level 0: Clueless”… hidden problem is not equals to “no problem”.

  16. drtune says:

    Level 1 = not understanding problem
    Level 2 = threaded coding where i/o blocks. The code is clean, easy to read but ignore thread concurrency issues at your peril.. as in any language.
    Level 3 = Similar to a CPU interrupt handler. No way is it remotely acceptable to block in an IRQ for any length of time, so you end up doing exactly [3] – strict state machines, no blocking whatsoever – as people have since the 1950’s when writing practically all low level drivers.
    Level 4 is “waffling about how hard level 3 is”.

    Don’t cast too many stones at threaded programming. Any decent coder can do both 2 and 3, but for very good reason most prefer 2 in everyday work.
    State machines are complicated to write, hard to read and hard to debug, and hence are error-prone. This is a major issue in practice with human programmers.

    Use threads sensibly, and don’t pretend concurrent accesses (e.g. to db) won’t happen.


  17. Aleksandr says:

    I would describe myself as level 0 at this point, so gaining some extra knowledge is always great. Are you saying we shouldn’t be using RPC/RMI at all? After reading your description of Level 3 I got the sense that the best approach to distributed systems is to simply send messages, which if received would be processed and lead to the new state. All this business of “local” function invocation which happens to be executed on a remote server is a really BAD idea. I may be reiterating in simpler terms what your article is all about, but I felt like I needed some validation.

    • rslootma says:

      No, there are cases you can do RPC. If you’re doing client-server programming, and every remote updates correspond to a transaction (like you have when you work against a database), or you’re just retrieving information that never changes
      (for example package managers like apt or rpm, or things like google maps) then RPC is feasible. In the case of databases, it looks like one statement, but there is a lot going on under the hood to guarantee the semantics you appreciate so much. Keep in mind that not every system is a distributed system, and a smart man once said “things should be as simple as possible but no simpler”. The problem here is that people tend to simplify their problems too much. This blog post was meant to be a reminder that a lot of the simplifications are merely wishful thinking. Hope I didn’t add to the confusion.

  18. jtienhaara says:

    Stimulating reading!

    Level 3 sounds like the Actor Model (forgive the cross-linking but some fantastic articles on the topic are linked from my delapidated blog: ).

    There are alternative theoretical frameworks that will help an architect tame distribution and concurrency and related issues from the outset: Data Flow or Flow-Based Programming, the Petri Model, the Pi Calculus, and so on. These are all comparable / contrastable with the Actor Model, and therefore, I think, all level 3 in your analysis.

    Level 4 IMHO is not a pure, elegant, simple theoretical framework.

    Level 4 does not at all remove indeterminacy. Nor does it make every detail transparent and shove complexity in the programmer’s face.

    Level 4 acknowledges its own bloat and the impossibility of keeping such a large foundation perfect. It points out all the ways it can block or fail through simple, narrow, well-documented APIs.

    At its best it only ever promises to try to send or receive *data*, it knows nothing of functions or semantics (let alone whether the programmer intended a call to a remote object to be idempotent).

    It introduces ridiculous levels of redundancy and an explicitly fault-tolerant core which manages all the issues relevant to a distributed system: messaging, security, scheduling, memory, persistent storage, and so on.

    It fails all the time, yet most of the time it fails in ways that are acceptable and un-noticeable. Most of the rest of the time it provides tools to manually un-fail whatever went wrong.

    In many ways we distributed system developers keep trying to crawl our way back to the 1960s design table.

    Level 4 is an operating system!

    Cheers and thanks for the thought-provoking article,


    • rslootma says:

      This comment states so many things, I had to reflect on my answer a lot. First, I think the theoretical frameworks as you call them (Petri Nets, Pi Calculus, …) or the Actor model are quite orthogonal with this classification.
      You can for example implement a Petri Net using different strategies (using threads fi). And I have seen people combining a gui based on the actor model with RMI, which classifies as Level 1. For what pertains your level 4 remarks,
      you made me think about what Level 4 should be a bit more. Maybe Level 4 isn’t really a system anymore, but maybe … an organism. indeed thought provoking.

      have fun,


      • jtienhaara says:

        Fair points, and perhaps I interpreted level 3 differently than what you had in mind.

        Nevertheless even while the “frameworks” can be used for other purposes (and I would like to see how that actor GUI in particular turned out! 🙂 ), your level 3 distributed system’s key points are all addressed by those frameworks.

        For example: “You pass in a message and a state, and you get an action and a resulting state”; and “Here, the distributed state machine assumes full control, and will only consider a client’s request when it is ready and able to do something useful with it”. The Actor Model, Data Flow, etc model exactly that, and have been used by many to simplify building distributed systems.

        To me level 3 is not the end goal because level 3 does not consider all of the aspects of a functioning system. It tries to focus on a few traits, such as networking, scheduling and concurrency, while ignoring others, such as security, memory, and so on. A robust, complete system must take into account *all* of the aspects of software, IMHO. But of course this is a daunting task.

        Still, perhaps I have misunderstood or completely overlooked a significant part of what you had in mind for level 3 distributed systems.

        For what it’s worth I was not being facetious about level 4 being an operating system. Each OS is a single (though often huge) system, so I would not depict it as something ethereal or an “organism”. The concepts built into UNIX and certain micro-kernel operating systems have been tried, tested and thoroughly documented for decades. How exactly one would translate those concepts into a modern distributed application system is highly debatable. But the fact that the debate has not even started remains to me one of the saddest signs of the anahistorical nature of software development today.

        Consider a narrow, perhaps even trivial, example. Distributed application systems frequently perform swapping: load read-only data from some persistent store (or stores, in the general case) into memory. For example download an updated product list, gather the results of a multi-site search, what have you. I’m sure you have seen this load-remote-data-into-memory done a gazillion different ways, all with numerous fail points. I certainly have. Network delays and outages are often the trivial fail points, since the architect *should* at least try to design for them. On the other hand, in my experience at least, it is often the specific requirements for a project that lead to the biggest generalized headaches for the long term. What about when the product list grows bigger than the architect designed for? We have already reinvented swapping, now we need to reinvent virtual memory / paging as well. And we haven’t even touched on security, or batching up and sorting the block requests to send to the persistent store in order to maximize efficiency. All of these aspects are mapped out in operating systems, and much, much more.

        I truly believe that many of the solutions to the distributed systems problems we continue to wring our hands about are well documented, even in university textbooks. Many smart architects of distributed systems have deliberately imitated certain operating system behaviour, but without going the full length and building a whole functioning higher level operating system layer. BitTorrent is a good example of swapping & paging in a distributed environment, but without all the scheduling, security, etc.

        So why not start debating what a 21st century “higher level operating system” would look like? I think this debate is long overdue…

        With a bit of foresight and a whole lot of hindsight, maybe, just maybe, distributed application systems — and software development in general — could also then move up to a higher, more useful level than the buggy crap we suffer through today. 🙂

        Cheers Romain,


  19. Rina Noronha says:


    I’m the web editor at iMasters, one of the largest developer communities in Brazil. I´d like to talk to you about republishing your article at our site. Can you contact me at

    Rina Noronha
    Journalist – web editor
    +55 27 3327-0320 / +55 27 9973-0700

  20. Awesome article! I’m also a self-taught distributed systems hacker, this definitely was my own progression to Level 3 as well!

    What’s your take on a multi-paradigm language such as The D Programming Language? It’s built with distributed systems in mind: it has a ‘pure’ function qualifier, lazy evaluation, metaprogramming and the std.functional module for functional programming as well as a ‘shared’ data qualifier to explicitly mark data as having multiple processes access it simultaneously.

    It offers everything your article mentions in a practical C++ like language. In fact, the vibe.d web development framework even uses fibers for transparently performing evented I/O to what seems like blocking code to the user.

    • rslootma says:

      There’s nothing wrong with multi-paradigm programming. Most problems become easier when attacked with the correct paradigm. I’ve taken a look at D a few years ago, and it indeed looks a step up from C++, but I must confess I support Bertrand Meyer’s view of C++.

  21. Scott Cruzen says:

    It seems odd that big, proven (?), distributed systems based on protocols like NNTP, SMTP, DNS, are designed without RPC or Distributed Algorithms or Purity, (they do allow async messsages though). Upon reflection, I suppose email is the biggest and most successful distributed system ever, and it lives on top of a huge stack of decentralized protocols (IP, TCP, UDP, BGP, DNS, finally SMTP).

    I think that distributed systems are easier to build when there’s a well designed protocol behind it. If I was interested in researching language support for this sort of thing, I think I’d look into applying some of the advances from recent PL research in order to build arbitrary new protocols that are correct. The goal would be to generate machine checkable proofs on each end point that automatically validate that an incoming message is acceptable given the current state of the conversation.

    There’s probably a lot of “convenience” things that’d be nice enhancements as well, things like automatically ensuring that a message is well formed and therefore can’t overflow a buffer or similar. I suspect that better language support could have prevented many ASN.1 vulnerabilities.

    • Olav Frengstad says:

      I think you’r absolutely right that we should define protocols that gracefully handles partial failures and netsplits, provides network transparency, and some of the other traits described in the article. These protocols would, as you say, be difficult to create in a generic way as there are too many application specific states that needs to be factored in.

      This is just another level of abstraction where we are federating different systems that are fully isolated and autonomous (like email, jabber, banking etc). In essence we start delegating the responsibility of data instead of dictating how other systems – that we have no control over – should work.

      This does remind me of article by Joe Armstrong stating that we should spend less time on code and more time on protocol. Unfortunately I can’t seem to find it now.

  22. Hi! Thanks, this parallels my experience in many ways. I have a lot of code stuck at level 2, with irreproducible rare glitches.

    Level 3 is the obvious next step, but actually writing code that does that seems impossible. Take a webserver for example. It has to serve multiple requests at the same time, so what is the state machine? Is it 1 state machine for every request, or a single large state machine that responds to all requests? If you have a bug like “the server crashes when there’s more than 1000 simultaneous open connections” then you can’t find it if your server is multiple state machines.

    But if you have only one monolithic state machine for a whole webserver, and the whole thing runs single-threaded, that would kill performance, no matter how much of the actual work is offloaded to worker threads.

    It is an interesting idea to pursue. Thank you for the post.

    • rslootma says:

      The state of a typical web server is the data structure that holds the connections. Each of the connections runs some kind of protocol. Each protocol has a state.
      A state machine takes a message (or event) in a certain state and then maybe changes state and produces an action. The important thing NOT TO DO is to execute that action at this point. This wil cause an IO mixin and your system becomes unpredictable. You need to reify the action and detach it’s execution. That way, your state handling remains pure and verifiable. Don’t worry: this reification and detaching step has negligible performance impact.
      Also, what is called a state machine does not need to be coded as a GOF state machine pattern. The point is that you separate input, state change, the result and its side effects.

    • olavfrengstad says:

      Wouldn’t the webserver example be addressed at level 2 using language primitives like lightweight threads, fibers, co-processes?

      For instance you would have one state machine handling all incoming requests, then spawning a child process that handles the actual request. The return data would not be written by the child process itself but rather be sent as a asynchronous message that might be written to the socket when available.

      In that perspective i’m not sure how you move to level 3, I guess it would be dependent on what it serves. I’m probably being naive though.

      • rslootma says:

        I’m not sure what example you’re referring to, but a webserver is typically client/server, where the server is the only party that has any authority on the state. In that sense, it’s not really a distributed system.

  23. webreac says:

    Interesting classification. My best advise is to avoid using a distributed system if a properly coded socket server is enough for your needs (clearly not the case for a first person shooter). I have read the code of one of the oldest distributed FPS. It is almost a tutorial of how to develop distributed systems using RPC. It is Sun Mazewar. Reading this code is the best way to understand the complexity of RPC and the benefits of higher levels.

  24. Rafal says:

    I adore the DCI conceptual model (Data, Context, Interaction). Your levels 1-3 handle only data and interaction. I assume any level 4+ should include context. That’s what gives ability to use security, roles, privacy. In your ‘type SM’ one would add a member of type context. How to implement this thing? Maybe dynamic state machine inside the state machine? Maybe the queues of dynamic state machines (SM meta-generator) or state machine of dynamic queues (some kind of Petri Nets)? In either case we need to test not the possible states of one SM, but the generator which can produce a range of acceptable state machines.

  25. Bob Dole says:

    This kind of thinking excludes a certain class of programs. Think of a Data-browser-like User-Interface. You see all stuff happenning in real-time, really cool stuff. A lot of stuff happens in real-time, so it’s real cool, but therefore also consuming a bunch of resources. Now the user modifies some data: system stalls because the SM is busy, the UI used to be cool now… 😉

    Maybe my thinking is wrong (I hope so), but otherwise I think this SM-stuff is kinda cool but not real-world enough.

    • rslootma says:

      Two remarks: first, it’s about distributed systems programming, not about desktop gui programming. Second, in your example, why would the SM be busy? (since there are no side effects I see no reason for delays)

  26. Robert M. says:

    IMHO — you hit in the key to level four in a way. “Allow me to reiterate what I’m struggling against. First, I want my distributed algorithm implementation to fully cover all possible cases.”

    Don’t! I am absolutely serious, Don’t! To lift a phrase from Fight Club “You have to realize that someday you will die. Until you know that you are useless.” Exhaustively testing the exponential complexity of complex linked services is impossible and a vast waste of time.

    That is IMHO some of the secret sauce that makes Erlang so amazing, it acknowledges that weird stuff you never could imagine or test for WILL ABSOLUTELY HAPPEN — and the system ticks merrily on through the failures…

    Failures are not oddities, they are a part of your system, your daily operation. You will have (at any complex scale) tens of thousands of things happening a day you could never account for… you just accept the Chaos Monkey and move on.

    • To “handle all possible cases” doesn’t require precision handling. One can cover a lot of failure cases broadly as “let’s treat this the same as network partitioning” (or as node failure). And then you can have recovery mechanisms in place to handle those broad conditions. This is essentially what Erlang does. When something weird happens, it gets handled by the generic “something weird happened” handlers – supervisors, watchdogs.

      But we can profitably address most common failure patterns. One goal is to ensure we fail in a consistent way – i.e. that network partitioning or node failure is `atomic` or can safely be treated as such. Another goal is to ensure we recover in a consistent, predictable way so that we can design our programs to leverage it. Up at level 4 you have time warp protocols, temporal logic, temporal databases, synchrony hypothesis, monotonicity, eventual consistency, meta-stability, stateless stability, blackboard metaphors (temporal and logically monotonic, of course), automated content-driven service brokering, @unhosted client-controlled state (so we can fallback between services, or upgrade services without breaking clients).

      Don’t just accept the chaos monkey. Build a robust jungle gym for it, and a range for safely throwing wrenches and spanners. You can accommodate a lot more than you might expect.

      • dmbarbour says:

        I should also mention: Weighted or probabilistic logics can robustly accommodate common failures even in data models and configurations. And paraconsistent logics can encapsulate and modularize inconsistency, controlling how far it propagates. These are valuable tools for another common class of predictable chaos in an open modular or incremental system

      • Johann Tienhaara says:

        The internet itself was designed using probabilistic simulations but very simple, deterministic behaviour:

        I’d be curious what all that level 4 mumbo jumbo you mention is all about (time warp protocols, meta-stability, etc). Sounds vaguely reminiscent of people talking knowledgeably about JINI in the days when it was the next great distributed computing solve-all. Is there more to it than fancy sounding jargon? I’d genuinely like to know.

      • dmbarbour says:

        Time warp protocols are from 1985, recently revised (as lightweight time warp). Time warp is suitable for distributed programming (designed for parallel and distributed simulations), but not for “open” or “interactive” distributed programming. LTW is marginally suitable for interactive programming, but is still vulnerable to denial of service attacks.

        Meta-stability is a design principle, not a technique. It has been used in many distributed algorithms, such as maintaining DHT connectivity.

        Rather than speaking of “mumbo jumbo”, satisfy some of that curiosity of yours by your own research efforts. It isn’t even difficult. At the moment, however, I feel insulted by your words and I’m not inclined to help you further.

      • Johann Tiemnhaara says:

        Thanks for taking the time to reply David. I was in a snarky mood and meant to come across as devil’s advocate but came across as snotty instead. My apologies! I am glad to see your response, it’s great food for thought.

        I stumbled upon some older articles on “linearizability” today — a new concept to me, but not at all a new concept. So now I’m particularly curious about time warp protocols. Time to do some reading.

        For whatever it’s worth, I suspect we have different backgrounds but a not-entirely-dissimilar bent on distributed computing: i.e. that there tools to choose from but no silver bullet.

        I’d be happy to hear more of your thoughts if you’re willing to elaborate on some of the approaches you find helpful in building distributed systems.



  27. jtienhaara says:

    I’ve said it already in a very different way, but I believe the networking and distribution aspects of a distributed system cannot be considered in isolation when architecting a system at “level 4”.

    Jargon and cute language constructs aside, a distributed system is a hydra that must be attacked from all directions at once. Security included! Contemporary architecture methods embrace this and guide the architect in planning all aspects at once in a manageable way:

    Perspectives feel clunky to me, they are difficult to put into an architecture document in practice without actually adding to the reader’s confusion.

    But for all the programming language-specific jargon I don’t think there’s anything else out there today that even begins to help a distributed software developer reach “level 4”.

    In fact I would go so far as to say that the jargon actually distracts the developer from simple solutions to hard problems. Paul Baran’s designs included recovery from nuclear warfare and yet a non-programmer can comprehend the solutions he came up with to distributed communications problems:

    Often the worst way to build any software system (distributed or otherwise) is to fill it with all kinds of cute neologisms that nobody will understand 5 years down the road. The Java community is full of such roadkill.


  28. java blog says:

    Never knew some many things about Distributed Systems programming. good stuff

  29. […] in functional programming, partly in response to the problems of writing parallel software and distributed systems. Although FP has been receiving attention from the academic computer science community for years, […]

  30. “Level 3: Distributed Algorithms + Asynchronous messaging + Purity”

    Level 3 seems just some nice separation from level 2 so i can test it more easily? Does not seem like a level itself but more like a common sense code design.

    I am looking more into details of level 4. That would go and tackle the real problem here which is designing correct distributed algo.

    • rslootma says:

      At the risk of souding pedantic, the article predicts your reaction.

      If you explain this to a Level 2 architect, (s)he will more or less accept this as an alternative. It, however, takes a sufficient amount of pain (let’s call it experience or XP) to realize it’s the only feasible alternative.

      It’s an illusion you can read a moderately complex distributed algorithms paper, implement it using a level 2 strategy, and close the gap between theory and practice through testing.
      I write about the difficulties of sealing the implementation in another post: There are some nice comments there about tool support for this kind of activity.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s