Epic fails! What can we learn from recently debunked psychology studies?
By Dylan Evans
The Stanford Prison Experiment is one of the most famous experiments in the history of psychology. In August 1971, a young psychology professor called Philip Zimbardo created a simulated prison in the basement of the psychology department at Stanford University. Twenty four male volunteers agreed to live there, with half acting as prisoners and the other half as guards. Very soon, the guards began treating the prisoners in harsh and abusive ways, and the experiment had to be abandoned after only six days.
The story has been told countless times in books and documentaries and figures prominently in many psychology textbooks. Recently, however, researchers have thrown much of the story into doubt. It turns out that the cruelty exhibited by the guards was not spontaneous. On the contrary; the guards were coached beforehand by Zimbardo. And some of the prisoners have admitted that the distress they exhibited was faked.
The Stanford Prison Experiment is not the only study to be debunked. Several other classic experiments have also been exposed as deeply flawed. Stanley Milgram’s famous obedience experiments, and David Rosenhan’s psychiatric hospital experiment have also come under scrutiny.
Take Milgram’s experiment first. In the 1960s, he recruited forty male participants to take part in a study on obedience. Participants were instructed by a man in a white coat to give electric shocks to another person whenever they answered a question incorrectly, with the level of shock being increased after each mistake. Remarkably, 65% of participants continued to give shocks up to the maximum level of 450 volts.
New analysis, however, suggests that most participants realised the shocks were not really dangerous. 72% of obedient participants made this kind of claim at least once. For example, one participant stated: “If it was that serious you woulda stopped me”. The sample was self-selected and was comprised of only males, significantly limiting the extent to which you can generalise the findings to the wider population. Finally, later variations of the study have found that far fewer people were willing to follow the experimenters’ orders than those in Milgram’s version, suggesting that there were design elements of the original study that made participants more likely to reach the maximum shock voltage.
Or take Rosenhan’s famous study of psychiatric diagnosis. In a landmark paper published in 1973, Rosenhan described how he and several other volunteers had fooled doctors into admitting them to psychiatric hospitals in the United States. After admission, they acted normally and told staff that they felt fine, but they were kept in hospital for an average of nineteen days. The study was widely perceived as undermining the objectivity of psychiatric diagnosis. When the journalist Susannah Cahalan tried to track down the volunteers in Rosenhan’s study, however, she could only find two – Rosenhan himself, and one who was excluded from the published paper. There is a strong suspicion that Rosenhan may have simply made up much of the data.
If foundational studies like those of Zimbardo, Milgram and Rosenhan are suspect, what does that say about the state of psychological research in general? What lessons can be learned for the future?
One important lesson is that we need bigger and more diverse sample sizes. All the studies described here used very small samples. For decades, experiments would often include fewer than fifty participants, and in many cases these were all students. Today, psychologists aim for sample sizes in the hundreds, and from a much more diverse pool. With online tools, samples can be even larger, including thousands of volunteers from a variety of different countries. Mindlab recently carried out a study involving 9,000 participants in nine countries.
Another lesson concerns biasing. Zimbardo stacked the deck in his favour by coaching the participants in his experiment to behave in a certain way. Given this, it’s hardly surprising he got the results he did. Psychologists today take great pains to avoid biasing experiments in this way. One of the great discoveries of social psychology is that it’s much harder to do this than we thought. Participants can pick up on very subtle cues that even the experimenters aren’t aware of. Small differences in the way that questions are phrased can nudge answers in one direction or another. At Mindlab, we are very careful to frame our questions in neutral terms. And we minimise priming effects where necessary by making sure the participants only see one version of the design we are testing.
The exposure of flaws in some of the major studies in psychology has challenged researchers to improve their methods and recruit bigger samples. This is how scientific research progresses – by learning from the failures of the past.
So here’s a few questions to keep in mind when carrying out consumer research:
- Is your sample size large enough? Small sample sizes means that atypical respondents can skew the results. If you want statistically significant results, you need larger sample sizes.
- Is your sample diverse enough? If you are only recruiting from a narrow range of people, your results won’t generalise to the population.
- Are your questions neutral? Subtle differences in wording can bias your findings way more than you may expect them to.
- Are you relying on outdated findings? Opinions and perceptions change. To be sure, test more than once and keep your research up-to-date.
Read more about failure here on the Museum of Failed Products