How Complex Systems Fail

Complacency and the Clear/Chaotic boundary

We live in A Complex World. We are surrounded by complex systems. In this post I will take a moment to highlight how complex systems fail.

My colleague Jason Koppe shared a great summary in a tweet anticipating this post, linking to Richard Cook’s how.complexsystems.fail.

The website above is a great reference and goes into a lot… A LOT… more detail than I will go into here. I want to focus on a particular aspect of how complex systems fail that I find particularly useful in the context of Onboarding to a Software Team.

Clear/Chaotic Boundary

I highlighted various aspects of the Cynefin framework in this series so far. I want to return to it once again and discuss the boundary between the Clear and Chaotic domains. Notice, in the figure below, that the boundary is drawn with a squiggle on the bottom. This is intentional, in that it intends to be a visual representation of a cliff between Clear and Chaotic. Transition from Clear to Chaotic is a one way transition without a direct return path. Why would Clear become Chaotic?

https://www.cognitive-edge.com/cynefin-st-davids-day-2020-1-of-n/

In the framing of Economy of Thought, I pointed out that while the world is complex, it is expensive to treat everything as such and I can choose to summarize some of the complexity as Clear.

It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.¹

The problem occurs when I treat something as Clear and it turns out not to be the case.

Rasmussen’s Model

So, what happens that shifts a Clear system into the Chaotic domain? It’s not like it’s waiting for me to turn around and then beelines straight for the cliff into Chaos screaming “YOLOOOOOO!…” on the way down.

Photo by Jakob Owens on Unsplash

Jens Rasmussen came up with a model that I think captures the intuition behind what is happening.

Rasmussen, Jens (1997). “Risk management in a dynamic society: A modelling problem”. Safety Science, 27(2–3), 183–213. https://doi.org/10.1016/S0925-7535(97)00052-0. Accessed on 21 Mar 2021.

This is a busy figure. Take note of the boundaries on the right hand side. There is a Boundary to Economic Failure; if the system moves beyond there, you go bankrupt. There is a Boundary to Unacceptable Work Load; if the system moves beyond there, everyone is spents and work stops. As a result of these boundaries, there exists a pressure “from management” toward efficiency which applies a force to the left, away from the Boundary to Economic Failure. Additionally, there is a gradient towards ease of work, which applies a force to the left, away from the Boundary to Unacceptable Work Load. Together, these forces create a movement in the system toward the third boundary on the left-hand side, the Boundary of functionally acceptable performance, i.e. failure.

So, it seems rather straightforward, we should simply not let the system drift beyond the failure boundary. Well, there are a few constraints that make this difficult.

Normalization of Deviance²

While Rasmussen’s model is valuable, here is a depiction of what it looks like for complex systems in practice.

Note the loss of information on the left hand side. We still know that the Boundary of functionally acceptable performance is somewhere on the left, but we don’t know exactly where it is. All we have is our perceived boundary of acceptable performance. This disparity causes an interesting dynamic. Let’s zoom in on the perceived boundary of acceptable performance and consider what happens to a system near such a boundary.

  1. The system, due to normal operation, crosses our perceived failure boundary.
  2. We notice that things are in a configuration where we expect them to fail, they’re not failing yet, so we quickly apply remediation and shift the system back to within acceptable performance. This cross into failure, no actual failure, and quick remediation continue until we reach 3.
  3. We now discovered that our perceived failure boundary is not the actual failure boundary. We acclimatize. We move our perceived failure boundary to match.
  4. The pattern of crossing into failure, system not failing, and remediation back into acceptable (now shifted) performance continues. We drift further.
  5. We ultimately locate the actual failure boundary, the system fails, and we inadvertently entered into Chaos.

The fact that the actual failure boundary is often unknown until our complex system fails, combined with the invisible unknown boundary continually shifting (in the Complex domain constraints change due to actions and the environment), demonstrates a pattern of how our systems will continuously flirt with the failure boundary. Do not forget that we have forces pressuring the system from right to left, the pressure of efficiency, and the gradient toward ease of work. This continuous pressure and not really knowing where the failure boundary lies combine to continuously shift the system toward failure.

What Do You Think?

How do you think about system failures? Let me know in the comments.

Next Up

Next, I will discuss how the words we use shape our thinking: Metaphors We Live By.

¹ Mark Twain … maybe.

² https://en.wikipedia.org/wiki/Normalization_of_deviance

Interested in design, development and operation of autonomous self-directed teams and decentralized distributed systems.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store