Antifragile
Complete Developer Podcast - En podcast af BJ Burns and Will Gant - Torsdage
“If carpenters built houses the way programmers write code, the first woodpecker to come along would destroy civilization”. – Gerald Weinberg We all have to deal with errors in our code, or at least, we’re supposed to. However, you’ve probably experienced a situation where a small problem caused another small problem, which caused a big problem, which ended up taking an entire system down. If you’ve experienced this, you are probably also painfully aware that these failure can be very expensive and hard to recover from. Second order effects are usually not that hard to predict, if you are thinking about them. You’ll also find that neither you nor anyone else does a particularly good job thinking about second, third, and fourth order effects of decisions. Massive failures, in particular, are easily predicted. For instance, the Japanese attack on Pearl Harbor was easily predictable, based on rising tensions between the governments of Japan and the US, the layout of islands in the Pacific, and advancing aviation technology. In fact, the attacks were predicted in 1910 by Billy MItchell, who analyzed the situation. However, in spite of these predictions, the US fleet still was caught unprepared. Not only did this cause a catastrophic loss of lives and equipment, but it also led to a larger war, a war that might not have happened (or might not have occurred in the same way) had things been prepared differently. At the end of the day, people in positions of power didn’t believe that such a chain of events could happen, nor were they adequately able to shift resources to deal with a rising threat. Such a threat couldn’t be moderated by academic theorizing, but would require being in a position to get regular feedback and iterate quickly (this is a little harder to do with battleships, however). Antifragile design is more complex that simply handling errors and keeping them from causing problems. Rather, it is an approach to embracing and using the natural chaos of the world to improve your response to small problems, so that a chain of them doesn’t result in a massive failure. As explained by Nassim Taleb, “antifragility is a property of systems in which they increase in capabillity to thrive as a result of stressors, shocks, volatilitty, noise, mistakes, faults, attacks, or failures”. It is fundamentally different from the concept of resilency (the ability to recover from failure). Such systems exist in nature, in biological systems as diverse as muscles and immune systems. There are general principles of system (and life) design that need to be applied if you don’t want to constantly be dealing with problems. The principles of antifragility are especially useful to observe when you start noticing catastrophic failures that manifest as a chain of smaller failures that increase in intensity until something breaks in a major way. Following these principles, you should be able to figure out and mitigate many of them. Episode Breakdown Understanding Antifragility The continuum of fragility Fragile – Breaks when pressure is applied. Robust – Handles pressure, but does not become stronger under pressure. Antifragile – Uses pressure to drive improvements that make the system stronger over time. Why antifragile in systems? People expect systems to stay up, but as systems become more complex, the number of things that can fall apart increases exponentially. As a system scales in complexity, the cost of downtime increases as well as the difficulty of bringing things back online, unless you specifically design the system to improve recovery options. A partially functional system that still allows most operational goals to proceed has less “cost” in terms of time, money, and management/customer irritation than a system...