2020-03-15 | Subject | System Crisis Management
We live within a system experiencing change, the earth, a system that we rely on for life. From a human perspective, and the perspective of many other lifeforms that we share the planet with, the overall change is negative. As a species, we have contributed to this negative change, and can expect disruption to our way of life. While it is true that we should attempt to minimize our contribution to the change, it is also true that much is in motion now that cannot be stopped. What remains is a series of emergencies, Dunkirks, and each one will need to be managed in a resilient way.
Resilience. This is a key concept. Usually resilience is associated with psychological resilience in the face of a disturbing change, but as Dr. Desirée Daniel points out, "...'the ability of a sytem to cope with a disturbance' is a disposition that is realized through processes since resilience cannot exist without its bearer i.e. a system and can only be discerned over a period of time when a potential disturbance is identified." This was published as one of the many papers for the 2014 Formal Ontology in Information Systems conference.
From a broad perspective, the earth is a self-correcting system. There are enough DNA, creatures and life smeared around this fabulous blue ball, that it is unlikely that a 200 year bump in greenhouse gases, or even nuclear meltdowns from wide-spread infrastructure failure will permanently rid the planet of life. Who knows what will happen long-term. As Chernobyl shows us, life adapts. For humans, if the planet warms as predicted, it seems unlikely that we can move agriculture fast enough to support even a fraction of our current population, particularly with the broad array of negative externalities that are in motion (bees, loss of fertile land, flooding, extreme weather events). Our civilization is also intimately entwined with oil at many levels, and this adds another complicated dimension as we respond to climate change. In order to be resilient as a species, we will be required to act quickly at each crisis. Using Daniel's definition, we cannot react until the particular crisis is upon us, and that brings us back to the focus of this blog.
System planning, design, engineering, and operation are different efforts that are also important. These efforts are very familiar to people that work with any kind of system. A new system is planned, the design architected, a solution engineered, put into production, and, finally, operated. This is generally how we deal with systems, and it works well. The problem with this is that we can't predict crisis. We can guess at what pieces of the system are more likely to fail. We can shore up those parts of the systems with fail-safes, but, as many have experienced, sometimes the fail-safes fail, and sometimes the entire system fails as we test our fail-safes, ironically. How many times have you read an explanation of a system outage, and the root cause was a generator test for a datacenter? While all of these things are important, this is not the focus of this blog. The focus of this blog is on resilience, which happens at the time of crisis.
Usually in a time of crisis we have not been able to plan for it. If we did plan for it well, we don't consider it a crisis, as crisis is averted. As I mentioned earlier, this is the nature of crisis, and why this blog focuses on resilience. At the time of crisis we gather together and come up with a common understanding of the situation. We discuss what our target is, where we would like to be in the future, and what goals we have for action we take. Finally, we evaluate various proposals to reach our target and decide on action. This works well for systems with relatively few dimensions and relations. For instance, if the water supply from a reservoir is provided to a city via a canal, and a landslide takes out part of the canal, the situation, target and proposals are fairly easy to arrive at. This is also something that can be planned around to avert a crisis. The socio-economic-environmental systems that are stressed, right now, have many dimensions and relations, much more than can be predicted or managed. Further, much of our society relies on computer systems which are increasingly complicated. In order to be resilient, we need to be able to quickly understand many dimensions and relations at the time of crisis. This is not a contradiction. It is a goal, the only approach available at time of crisis.
Cognitively, humans don't do particularly well at considering complex dynamic systems. While we are better than other apes, we still have limitations. We generally think of complex issues as having two teams (us and them using at most two methods). We use models to break out of this. Language is a model. Archetypes in our unconcious mind serve as models. We render these models in our bio-ware, and every human renders differently. This makes collaboration difficult, so we often fall back to our two team model and outsource the serious work to third party analytics and playbooks. Often there is a strong leader who understands the situation better than most and can guide response, but this has limits.
Artificial intelligence technology has changed this, and in our favor. I'm not talking about science fiction rogue AI, I'm talking about what AI is based on, knowledge expressed as relations, knowledge graphs, or ontologies. This is the world that Dr. Desirée Daniel is working from, the expression of knowledge in a way that machines can use. Translating models into a common ontology that renders the same for everybody, gives us a way to visualize, collaborate, investigate, and build upon models of any dimension, and with the same meaning. No longer are we hobbled by different bio-rendering. We can decide where we are, where we want to go, and how to get there and map it. This does not mean that we avoid different approaches. We need all human perspectives to work through our problems. It does mean that we can agree on a model that behaves the same no matter who runs it. It also means that anybody can own these tools and work together to tackle our large problems.
The tools and models matter, but we also need to be able to do this quickly for a crisis. This means that we don't have time to set up complex analytics or enter into a business relationship with a third party. We need to be able to model quickly and form decisions based on the models. The creation of the models should be able to be done by anybody involved in the crisis management, and shouldn't require technical skills. The nature of crisis is that we don't know what the situation will be. Who is available to respond? What area of knowledge or expertise will be needed to build the models and decide on an action? These answers will not be known until the time of crisis, so it is important to make the tools easy to use.
Owning these tools is an important goal for individuals, businesses, organizations, countries, and the world. Unfortunately, over time we have ceded our control of these types of tools to large cloud providers and software companies. Much of this is because of the complexity of the systems, but another part of this is the nature of business. By the time you get to the point of AI and complex modeling and analytics, there is a significant amount of complexity involved. It is usually a good business move to rely on companies that specialized in providing this. From a resilience perspective, though, there are risks with this because of system dependencies both physical and technical. The most obvious risk is loss of internet connectivity. A less visible risk is the current way that most people develop, deploy, and manage new systems that provide the tools I'm describing. Infrastructure is increasingly deployed in a way that hides the complexity of the system and cedes ownership.chimp_notches