Randomised Statistical Trials Are Not Always The Best Option - An illustration showing a balance, where "Human Ethics" is written on the left weighing scale, whereas "Scientific Rigour" is written on the right weighing scale.

Randomised statistical trials are the standard when it comes to testing multiple solution options to figure out which is the best. In the statistical world, the standardized approach has earned its worth over time. It provides reliable results with minimal risk or complications.

Yet, in this essay, I argue that randomized statistical trials are not always the best option on the table. In fact, given certain circumstances, they could even be the worst option available. To understand why this is the case, let us start with understanding the philosophy of randomized statistical trials.

This essay is supported by Generatebg

A product with a beautiful background featuring the sponsor: Generatebg - a service that generates high-resolution backgrounds in just one click. The description says "No more costly photographers" and displays a "Get Started" button beneath the description.

The Philosophy of Randomised Statistical Trials

The Use-Case

Let us say that you are designing a website or a web-based digital product. You have come up with two designs (design A and design B) for a feature and are unsure which one is better. You have a hunch that design A is slightly better, but you don’t want your personal feelings to interfere with the objective truth. So, you decide to do a randomized statistical trial to figure out which one’s better.

Following the standard protocol for randomized statistical trials, you randomly split your test traffic such that one half gets to see design A and the other half gets to see design B. You use ‘clicks’ and the ‘time spent’ as primary metrics to infer which design is better.

After running the test for a pre-determined amount of time, you analyse the results. Your analysis reveals that design B was better than design A (proved by statistical significance, of course). Therefore, you decide to lock in design B for your final website/digital product.

The ‘A/B’ Test

What we just covered is known in the biz as an â€œA/B” test. Our hypothetical example covered a simple case, but in real-life applications, there are usually numerous options and more complex algorithms behind such tests. However, the fundamental idea remains the same: the best solution wins.

This method/procedure is so ubiquitous that it is not an exaggeration to say that most of today’s internet revenue is partially a product of such A/B tests. Some of the best talents of our time are pretty much utilized to figure out how to make people click advertisements.

But alas! That is not the primary concern of this essay. Here, we are concerned with an even bigger issue. To understand what this issue is, let us move from the realm of digital technology to the realm of human technology.


The Shift from Web Design to Clinical Trials

Let us say that you are interested in treating a disease and have a bunch of treatment-options. The standard protocol for randomized statistical trial says that you split your test group equally administer each treatment option to an equal number of patients. In this sense, it is very similar to an ‘A/B’ test.

Based on the analysis of the results, you establish which treatment was superior and which treatments were inferior. So far, so good. However, there is a catch. Web design and clinical trials are very different fields.

You see, although at the end of the test, you find out which treatment was the best, during the course of the test, by design, the majority (or at least a fraction of the total subjects) of the subjects receive sub-par/inferior treatment. That doesn’t sound right, does it?

“Well, it WAS a test, and the test subjects KNEW what they were signing up for! So, what is your problem?”

If you are thinking along these lines, I fully empathise with you. I acknowledge the fact that in order to arrive at the greater good, some minor sacrifice has to be done by test-subjects in the context of clinical trials.

But what if neither the test subjects nor the trial conductors know what they are signing up for? If you think that such a scenario is impossible, I’m here to show you otherwise. To understand further, let us jump from the realm of hypothetical examples to the realm of real-world historical examples.

Extracorporeal Membrane Oxygenation (ECMO)

In order to treat respiratory failure in infants in the 1970s, Robert Bartlett from the University of Michigan developed a disruptive method called extracorporeal membrane oxygenation (ECMO). This approach takes blood that is en route to the lungs out of the body, outsources blood-oxygenation to an external device, and then re-routes the oxygenated blood to the heart.

Naturally, such a disruptive method involves risks such as blood clots, air gaps, etc. Even considering the risks, the method proved effective. In 1975, when a baby girl was not getting enough oxygen from a ventilator, ECMO came to the rescue.

However, one successful case does not mean that the method should be made mainstream. For that purpose, clinical trials would be necessary.

Zelen’s Algorithm as an Alternative to Randomised Statistical Trials

As you can imagine, with lives on the line, randomized trials can be very tricky. To counter this situation, a biostatistician named Marvin Zelen came up with the notion of “adaptive” trials in 1969. Similar to the “Win-Stay and Lose-Shift” algorithm that I covered in my essay on how to choose between favourite experiences and trying new ones, Zelen’s algorithm optimized for a “play-the-winner” strategy.

Imagine that each viable treatment option gets a distinctly coloured ball that is placed inside a hat. Then, a ball is drawn out of the hat at random. The colour of the ball decides which treatment is administered. If the treatment is successful, then another ball of the same colour is added to the original ball, and both balls are returned to the hat. This means that the more successful treatment has more chances of being chosen at random in the future.


The ECMO Clinical Trial Disaster

Sixteen years after Zelen came up with the “adaptive trials” algorithm, it was put to use to study the effectiveness of ECMO in saving infant lives as compared to the “standard” approach (which was more conservative).

Between 1982 and 1984, Bartlett and his colleagues conducted clinical trials that aimed to establish which method was better. They were disturbed by the fact that the (potentially) more effective method was not considered widely yet. This study resulted in one infant dying from “conventional” treatment, and eleven infants saved (consecutively) using ECMO. An extended study resulted in two more infants dying from conventiona” treatment, whereas eight out of eight infants were saved via ECMO.

When the results were out, this study caused numerous controversies as the scientific establishment was not used to considering “one” death case for a treatment method as statistically significant evidence. Harvard professor of biostatistics, Jim Ware took the challenge head-on and aimed to conduct a study using the standard procedure for randomized statistical trials.

The research team decided that the treatments would be randomly assigned UNTIL a pre-determined number of deaths was reached. As the trial unfolded, four out of ten infants that received the conventional treatment died. However, nine out of nine infants who received ECMO survived.

At this point, one really has to ask the question:

Is it really justified to put infant lives in danger for the sake of statistical significance?

Well, you would think the scientific community would have had enough dead infants on its list, but the story doesn’t end there! In the 1990s, one more study ECMO/conventional treatment study was conducted in the United Kingdom. This study involved traditional randomized statistical trials and 200 infants.

The difference in results from this study were not as drastic as the previous studies. However, ECMO proved to be the better treatment based on statistical significanceThe price? Twenty-four more infants died when they were given the conventional treatment as compared to ECMO.

Randomised Statistical Trials are Not Always the Best Option

This story remains a chilling example of why randomized statistical trials are not always the best option. To establish which treatment is better, it is not always ethically justified to put subjects in danger during clinical trials. So, how do we solve this?

The answer, thankfully, also presents itself in the same story in the form of adaptive trials. For special cases like these, it makes sense to design adaptive trials that aim to achieve better treatments even as the trials are in progress.

The notion of adaptive trials belongs to a special category of mathematical/logical problems known as the “multi-armed bandit” problem. This topic is so insight-rich that I will cover it in more detail in a future essay of its own.

For now, it suffices to say that scientific rigour, however important it may be, must come as a second priority to ethical responsibility. And the notion of radomised statistical trials is no exception to this fundamental human realization!


References and credit: Brian ChristianTom GriffithsRobert Bartlett et al. (scientific article), and Colin B. Begg (Comment).

If you’d like to get notified when interesting content gets published here, consider subscribing.

Further reading that might interest you: How To Really Understand Statistical Significance? and How To Really Avoid P-Value Hacking In Statistics?

If you would like to support me as an author, consider contributing on Patreon.

Street Science

Explore humanity's most curious questions!

Sign up to receive more of our awesome content in your inbox!

Select your update frequency:

We don’t spam! Read our privacy policy for more info.