How the Stanford prison experiment gave us the wrong idea about evil

Inhumanity is as old as humanity itself, yet—curiously—we can trace our modern understanding of it to a single year: 1961. Until then, to the extent that they were ready to grapple with the horrors of the gulag and the gas chamber at all, thinkers sought an answer to the “how could they do that?” question with the idea of a distinct individual pathology. The old presumption was that those who do monstrous things must themselves be monsters. But, in the middle of 1961, two events—many thousands of miles away from each other—coincided to change our understanding entirely. They started to paint the grim picture of the brute within us all—a picture which appeared to be confirmed by the notorious Stanford prison experiment 10 years later. But it is a picture which new evidence is fast unravelling.

Back in 1961 however, the first of those two defining events was the trial of Adolf Eichmann in Jerusalem. One of those sitting in the courtroom was the political philosopher Hannah Arendt. Like those around her, she expected the man responsible for millions of deaths to be a terrifying figure. What she saw instead was a balding, rather hunched and altogether insignificant individual. In Eichmann in Jerusalem, Arendt concluded that it was that very insignificance which ultimately made him more terrifying than she had imagined, for it suggested that anyone could become a genocidal perpetrator. The monster was not “out there” in others, it was inside all of us. And she summed up this idea in a phrase that haunts us still: “the fearsome, word-and-thought-defying banality of evil.”

At the same time as Eichmann was on trial in Jerusalem, far across the Mediterranean and the Atlantic, Stanley Milgram was in his Yale laboratory conducting the “obedience experiments” which would in time be famous. Participants were led to believe that they were taking part in a study about the effects of punishment on learning, in the role of a teacher. The real subject of the study was just how punishing they were prepared to be.

The experimenter instructed the “teachers” to give an escalating series of electric shocks each time a learner made an error on a memory task. In fact, the “learner” was acting—an accomplice of Milgram’s—and no real shocks were being given. But just how far would the teachers go? To Milgram’s surprise, in the best-known version of his study (where the learner is in a different room to the teacher and experimenter, but his supposed cries of pain can be heard over an intercom) two-thirds of participants went all the way to the maximum level of 450v—a level where, had the shocks been real, they would have been fatal.

The terrifying implication seemed to be that it was not a few, but a majority of people, who were prepared to follow authority to destructive limits. Worse, the immediate environment was day-to-day and orderly; this wasn’t a time of war or a battlezone in which everybody was hardened by life or death decisions. Nor had they been prepared for brutality by a long campaign of dehumanising and demonising their victims.

In seeking to explain these findings, Milgram explicitly referenced Arendt and the banality of evil. Moreover, like Arendt, he attributed the ability of ordinary people to perpetrate atrocities to “thoughtlessness,” uncritically obeying authority and paying no heed to the dire consequences of their actions. Science, it seemed, intertwined with history, to produce a compelling narrative: something in the human condition makes ordinary people predisposed to obey orders, no matter how toxic the commands.

An off-the-shelf monster

So there it was. Within a few months, Arendt had supplied the eloquent words, and Milgram the evidence. Our picture of the monster was now complete, and so was our understanding of its habitat: the inside of each of us. A decade later, in 1971, a young Stanford professor Philip Zimbardo and his colleagues painted the monster within even more vividly. In Zimbardo’s study, young men were randomly divided into “Prisoners” and “Guards” and put in a simulated prison environment. As the standard account goes, the Guards became so brutal and the Prisoners so disturbed so quickly that the study, scheduled to last two weeks, had to be terminated after six days. It seemed to show that you don’t even need an external authority to make good folk turn into brutes. People are disposed to conform to roles and to role expectations even in the absence of explicit instruction. Or, to use the words of Zimbardo and his fellow experimenters, Guard brutality was “emitted simply as a ‘natural’ consequence of being in the uniform of a ‘Guard’ and asserting the power inherent in that role.”

This grim narrative, as we shall see, ignored the crucial way in which the Guards had been primed. But nonetheless, it became entrenched over the half century since. Even while historical studies by the likes of David Cesarani and Bettina Stangneth exploded the myth that Eichmann was simply one more thoughtless bureaucrat—a mere cog in the Nazi machine—the notion of the “banality of evil” has kept its grip on our collective consciousness, buttressing the idea that people are inherently disposed to conform to malevolent expectations.

Here is an off-the-shelf heuristic that is all too easy to reach for every time we are confronted with another act of conspicuous brutality, immorality or public wrong-doing. It has been called the “Nuremburg” defence because of the number of times that Nazi perpetrators claimed that they were “only following orders.” And yet it has also been used many times since where other forms of toxic behaviour have been at stake including the behaviour of tobacco companies, ethnic cleansing, phone hacking by News Corp, and the actions of military police in Abu Ghraib.

But for anyone who pays attention to the evidence about evil—as opposed to ready-made stories—recent years have witnessed the crumbling of all the old certainties. This process started with the re-examination of Milgram’s work. Fortunately, he had left his raw materials in an archive at Yale University. Using these, researchers pointed out that there were over 30 variants of the obedience studies. Most attention has focussed on those where a clear majority obey the experimenter, but when you look right across all of his so-called “Obedience” studies, over half of participants (58 per cent to be precise) actually disobeyed orders and refused to inflict the maximum level of electric shock.

Moreover, among those “teachers” that did continue with the shocks, this was not because they were thoughtlessly obeying orders—it was because they were persuaded by the experimenter to invest in a scientific enterprise presented as important and progressive, appealing to a side of their identity which wasn’t inherently evil at all. More specifically, they were told that their efforts would help expand human knowledge about how to help people learn more effectively. As we describe it in our professional work, the important thing here is this “identity leadership” on the part of the experimenter, through which the “teachers” are coaxed to become “engaged followers.” In other words, the callous cruelty that has so often been reported as having been discovered by the experiment, was in fact invented—called into being by persuasion that struck a noble note.

As well as using this new analysis of the raw material to make sense of what had originally happened in Milgram’s laboratories back in 1961, we have conducted a series of our own experiments. These use a variety of techniques, including virtual reality, professional acting, and online analogues of the original. And they have confirmed the roots of a malign obedience link in such proactive appeals to (on the face of it) positive aspects of the followers’ sense of identity. Those who impose greater harm are those who identify more with the scientific project and believe that they are helping advance a progressive cause. This constitutes a “greater good” which justifies whatever harm is imposed.

So much for Milgram. But what about the even more chilling Stanford prison experiment? Ever since we conducted our own prison study in 2002 for the BBC, we suspected this—too—might be explained by something that happened to the Guards in and around the experiment, rather than who they fundamentally were. In our study we did not coach the Guards and, without such guidance, they proved either unwilling or else unable to exert their authority to such an extent that, ultimately, the Prisoners gained the upper hand.

This inspired us to dig deeper into the original Stanford experiment. There were snippets of information in the public domain—such as video of Zimbardo briefing his Guards—which hinted that they might perhaps, and to a greater extent than has been acknowledged, been led to behave as they did. But that information was limited and fragmentary. There was not enough to seriously challenge the “conformity account” of Zimbardo’s study, which continues to dominate psychology textbooks.

All that changed after 2011 when Zimbardo himself, following in Milgram’s footsteps, put his own materials in an archive, and particularly over the last year when a number of scholars such as Thibault le Texier—with his 2018 book Histoire d’un Mensonge—began to publicise what was there. What emerged from the vaults was the reality of how far the experimenters had intervened to shape the behaviour of the Guards. The Guards were not only told to be tough in general terms, they were given detailed suggestions and trained in techniques that would humiliate the Prisoners (one of the Guards, Andre Cerovina, has recalled being fed “very good sado-creative ideas”). If they complied, they were praised. If they didn’t, they were taken aside and pressured to improve.

Real brutality

It may now seem tempting to dismiss the study, almost to erase it from memory. But that would be a mistake. For one thing, there is still something serious that needs to be explained. The new information about what happened during the experiment doesn’t change the fact that the disturbing behaviour did occur. Some of the Guards really did oppress and humiliate the Prisoners, and none of the other Guards did anything to stop such actions. Individual brutality may not have been universal, but a brutal culture quickly developed and then went unchallenged. This is exactly how corrupt and abusive environments work, and we really do need to understand them.

What is more, by revisiting the Stanford study and archives, we can begin to fashion a new and better understanding of why evil deeds happen in general. Remarkably, nearly a half century after the original experiment, we are finally in a position to have a fully informed debate how brutality emerged within it. And most particularly, about how people were led to it.

At last, we have hard evidence that Zimbardo and his colleagues sought to persuade and train the Guards to be “tough.” But it is important to note exactly how they did so. And then how—in some cases—they succeeded in making ordinary young men act as brutes.

There is one specific encounter which is particularly illuminating. An archived audio recording revealed that one of the experimenters who played the role of the Warden (David Jaffe) tried to convince one of the reluctant Guards (John Mark) to act more harshly towards the Prisoners. Jaffe seeks to engage Mark in a common cause—exposing the effects of brutal prison conditions in real life in order to generate support for prison reform. In order to achieve this shared goal, he is told the Guards must create brutal conditions within the study. By acting tough, Mark would be helping to advance this worthy cause. To use Jaffe’s own words, the aim of the study was: “to be able to go to the world with what we’ve done and say ‘Now look, this is what happens when you have Guards who behave this way…’ But in order to say that we have to have Guards who behave that way.”

This psychology is remarkably similar to the techniques used by Milgram to persuade his participants to shock the “learner” in his studies, by appealing to their progressive side. Far from being instructed to serve a noxious cause, in both studies they are invited to collaborate in a worthy cause (indeed, in his experimental notebooks, Milgram himself ponders whether his studies might be better understood as about co-operation rather than obedience). Where obedient participants in these studies are normally characterised as doing harm to powerless victims, from their own perspective they are contributing to important research designed to help others.

The infernal triad

Drawing on all the information now available about both Milgram’s studies and the Stanford prison experiment, we believe we can conclude that brutality is brought forward by a distinctly proactive pattern of suggestion or leadership, which is effective because of the way it appeals to the follower.

This conclusion has important implications for understanding the human capacity for inhumanity. In place of the old idea of harm-doing as an inherent human characteristic, the focus switches to the capacity to lead—or be led—towards evil. The old thinking diagnoses a “natural” tendency in all of us to be prejudiced against other groups, a tendency that encourages an “ancient hatreds” reading of all sorts of conflict, and assumes that people helplessly assume toxic roles in these. Such analysis, though, ignores the role of leaders in derogating others, invoking old grievances or enforcing roles. In reality, collective brutality is something that has to be mobilised. That suggests a first and obvious practical step towards tackling it—identifying “the mobilisers,” and then holding them to account for their acts.

A second implication is that to understand the capacity for otherwise ordinary people to harm others in extraordinary ways, it is necessary to look at things from the internal perspective of the actor. One of the surest ways to lead them to evil is to convince them that they are doing the opposite. Many behaviours that appear at a distance as vice, are only possible to the extent that the immediate view to those carrying them out is to regard them as virtue. This is what Claudia Koonz argues in her far-sighted 2003 book The Nazi Conscience. Obscene though it might seem, Hitler’s success derived from his ability to portray his policies as a moral project: reinstating German purity in the face of a Jewish enemy. The same logic of defending a virtuous ingroup against its imagined enemies was used to justify Stalin’s terror. It continues to be used today by Islamic State as well as domestic terrorists in justifying violence, as was clearly articulated by Robert Bowers in explaining his recent murderous attack on a Pittsburgh synagogue.

Once “we” are defined as uniquely virtuous and “they” are defined as endangering us, then the destruction of the other can be promoted and justified as the preservation of virtue. Whereas Milgram’s and Zimbardo’s findings often used to be characterised as showing that people can commit great harm because they act like zombies, thoughtlessly and mindlessly carrying the instructions deriving either from authority or from their roles, this old analysis misses a critical feature of human psychology. In many instances, perpetrators are fully aware and they commit great harm—but do so only to the extent that they believe they are serving a higher purpose. Indeed, particularly shrill calls to throw everything at some great cause, whether that be a nation at war or the advancement of science, should arguably themselves be interpreted as a warning sign that trouble is on the way.

In sum, it appears that our intellectual armoury against resurgent barbarity may be deployed in the wrong place. We believe that brutality does not derive from any lack of awareness of the consequences of one’s actions. Rather, leadership that proactively appeals to identity, the rationalisation of bad behaviour as serving a higher cause, and glorification of the ingroup are the infernal triad that takes us down the path to perdition.