When monoculture leads to monofailure

On 19th July, the world woke up to the “blue screen of death”, the unmistakable sign that a Windows computer has crashed and needs to be restarted.

Airlines discovered that essential systems, including the one Delta uses to track the assignments of pilots and cabin crew, were unavailable. Thousands of flights were cancelled. Broadcasters including Sky News and NBC were forced to scrap morning programming as computers controlling teleprompters and graphics failed. Everything from banks to hospitals encountered serious problems, though most worrisome might have been emergency phone systems used to call for police or medical help, which went down in several US states. Estimates of losses and damage are still coming in, but insurers expect that costs could be in the tens of billions of dollars.

Overall, 8.5m computer systems were affected in what proved not to be a cyberattack but a “bad patch”, a botched routine update by the cybersecurity company CrowdStrike to its own software. Crowdstrike is designed to defend systems from external attack, and must be updated periodically so that it can detect new, emergent threats. Because security software needs to monitor every aspect of a computer’s system, it must have access to even the most sensitive parts of a computer’s operations. When a normal application has an error, it might crash itself but is unlikely to take down the whole machine. Security software can cause much more damage and even leave machines unable to restart, which is precisely what happened on 19th July. The SolarWinds attack in 2020, in which Russian hackers broke into tens of thousands of sensitive computers, took advantage of these same dynamics; in that case, a fake security update compromised a security system that had access to data across a computer’s systems.

CrowdStrike will likely face a wave of lawsuits seeking to hold it responsible for shutting down the world’s computers. It’s easy to blame the company. But there’s a bigger culprit: monoculture. Roughly 73 per cent of desktop computers worldwide run a Microsoft Windows operating system. Because Windows systems are so common, they are a popular target for hackers, who can exploit more computers with a single attack than if they targeted MacOS or Linux systems. As a result, there’s a massive market for software like CrowdStrike’s. Almost 60 per cent of Fortune 500 companies use CrowdStrike, which helps explain why impacts on commercial systems were so profound, even though 8.5m machines is a tiny fraction of the number of computers used worldwide. The reliance on a single operating system and a single threat checker for critical commercial IT systems makes global outages possible. Consequently, computing is profoundly more vulnerable than it needs to be.

To understand how we got here, it helps to look at an apparently unrelated industry: agriculture.

Think of a farm. You likely envision a vast field planted with endless rows of the same crop: barley, maize, wheat. These monoculture fields, filled with thousands of the same plant growing side by side, are a relatively modern innovation, designed for the mechanised and efficient planting and harvesting of crops.

Before farming became an industrial exercise, farmers planted a wide variety of crops in the same area, harvesting food for their family to eat and selling the occasional surplus. To farm successfully, a farmer needed to know different techniques and maintain specialised tools to plant and harvest a diversity of crops. Much as factories transformed craft—which required a shoemaker, for instance, to have the tools and knowledge to participate in every step of shoemaking—into discrete steps that could be carried out by non-expert workers, industrial agriculture too sought efficiencies of scale, planting fields with a single crop rather than dozens of different species.

We often invest in infrastructures which trade robustness, redundancy and resilience in favour of efficiency—because the latter is fundamentally in tension with the rest

The vulnerabilities of monoculture farming were understood long ago. Seeking to feed a growing population, Ireland planted millions of so-called Lumper potatoes in the early 19th century. When a rot struck the potato crop in the 1840s, it spread across the island rapidly, and ultimately one in eight Irish people died of starvation in just a three-year period. The lesson of the potato famine—don’t rely on a single crop, and plant multiple varieties of the same crop—are well understood, but not always honoured. In the 1970s, a blight killed more than 15 per cent of the US maize crop, in part because 70 per cent of farms were growing the same high-yield variety.

Deb Chachra, in a brilliant book titled How Infrastructure Works, explains, “Making systems resilient is fundamentally at odds with optimization.” An Irish potato crop that was more resilient in the face of rot wouldn’t have yielded as many calories, and the millions of hectares of maize planted in the US couldn’t be harvested by massive machines if those fields held tomatoes and cucumbers as well. Chachra writes that we often invest in infrastructures which trade robustness, redundancy and resilience in favour of efficiency—because the latter is fundamentally in tension with the rest.

Often, we optimise ourselves into single points of failure. Protecting computer systems from threats around the world is an extremely complex task, and delegating the task to a single company seems like an efficient solution to the problem. Yet we might want to do the exact opposite. When systems are critical to the functioning of society, we might insist that they are redundant and independent of one another, so that if one fails, others might carry on.

Polyculture has recently come back as a hot topic, with farmers rediscovering a range of benefits to planting multiple species in the same field. Different crops use up different nutrients, so polyculture planting depletes soil to a smaller degree, and requires less fertiliser. Coffee plants growing between banana trees, for instance, perform better than bushes planted in full sun. Green beans produce chemicals which repel beetles that eat potatoes, and potatoes likewise protect beans from Mexican bean beetles.

At the risk of overextending an analogy, it’s possible that similar synergies could come from building polycultural technical ecosystems. As Prospect’s own editor, Alan Rusbridger, argues, X—which remains deeply important for political news, even as it contracts—may have been ideologically captured by its billionaire owner. A more diverse social media habitat could lessen Musk’s power. It also might surface distinct strategies for building resilient social networks, as different spaces can experiment with new approaches to content moderation, the use of algorithms (to surface interesting content) and user self-governance. X is so influential in part because Meta—which controls three major social media platforms—has been experimenting with “downranking” political and news content. The power of these platforms and their dangerous interactions seem like a symptom of rot that could threaten something as significant as Ireland’s potato crop: democratic discourse itself.

The CrowdStrike catastrophe left us with an indelible visual image: rows of computer monitors in airports and train stations showing identical blue screens. Perhaps the best lesson we can take from them is that any monoculture, agricultural or technological, is brittle.