Regression to the mean is one of those statistical concepts that is extremely deceptive to understand. And when I write “extremely deceptive”, I am in no way exaggerating. This point will become abundantly clear as you read along.
In this essay, I will be covering the challenges and perils one faces as one deals (knowingly or otherwise) with regression to the mean. To this end, I will be employing an easily comprehensible story and illustrated examples.
If you have anything to do with statistics, machine learning, data science, etc., you would directly benefit from reading this essay. Apart from that, I will also be touching upon day-to-day phenomena relevant to the common man as well. Without any further ado, let us begin.
This essay is supported by Generatebg
The Genius Statistician
There once lived a genius statistician (let us call him Mr. X) who was highly respected by his peers. He was a professor of statistics and had authored a popular statistics textbook for technical students and business professionals.
As part of one of his ambitious research projects, he had been meticulously collecting statistical business data from various firms in various fields. Just collecting the data and curing it had taken him several years.
This data involved tabulated information on expenses, sales, wages, rents, etc. Mr. X’s goal with this project was to try and establish patterns (if any) that explain why some businesses succeed and others fail.
The Book that Shocked the Business World
After years of scrupulous work, Mr. X was ready to publish his thesis. And publish, he did! He titled his grandiose 468-page work “The Triumph of Mediocrity in Business” and revealed information that shocked the business fraternity at the time.
His book featured structured analyses involving detailed tables and graphs. The serious reader of the time understood straight away that this book was no ordinary work.
Prior to this book, it was common belief that good business practices can lead a firm to sustained success over a prolonged duration of time. However, the data and the analysis from this book convincingly discredited this “myth”.
Success Fades with Time
Mr. X’s work showed that successful organisations fade with time. In fact, he showed that this was not just the case with successful organisations. Poorly performing organisations tended to improve with time. In essence, he noted sharply that both extremes moved towards “mediocrity” as time passed.
To drive home this point, he wrote:
“…neither superiority nor inferiority will tend to persist. Rather, mediocrity tends to become the rule. The average level of the intelligence of those conducting business holds sway, and the practices common to such trade mentality become the rule.”
So, how exactly did our genius statistician come to this conclusion? Let us find out.
The Method Behind The Claim
Mr. X first classified all the firms as per their sectors. Next, he carefully ranked them based on various performance data. Then, he split the ranked lists into groups containing an equal number of firms.
For instance, after ranking 120 textile stores based on their sales-to-expense ratios, he divided them into 6 groups (sextiles) of 20 stores each. Mr. X expected the top sextile to dominate over time. But to his surprise, over just six years, the stores from the top sextile had lost almost all of their advantage over the average store.
He then repeated the procedure with various firms from various fields. He even ranked the firms based on other performance criteria such as wage-to-sales ratio, rent-to-sales ratio, etc. It did not seem to matter. His analysis clear pointed to both ‘good’ firms and ‘bad’ firms darting towards mediocrity over time.
The Revelation — Regression to the Mean
Based on all of this work, Mr. X concluded that in any market environment that features fair competition and free trade-entry (basically a healthy capitalistic environment), this “mediocrity” effect would appear.
Good organisations lose their edge and their incompetent rivals improve over time. His thesis was this had to do with human behaviour; human beings seem to be limited by negative qualities such as corruption, greed, etc.
Mr. X likened this to a classroom where bright and righteous students are negatively influenced by the poor qualities of dull and naughty students. Note that this is a very radical position for an academic to take.
People usually focus on the effect of successful organisations on the less successful ones. But Mr. X here goes on to say that the “mediocrity” effect was due to the effect of below-average organisations on above-average ones.
Consequently, he proposed that Governments should intervene and protect the top performers from the effects of the low-performers. With strong data and rigorous analyses backing his suggestions, what did fellow researchers think of this?
The Genius Mathematician
We shift our story now to a genius mathematician and statistician (let us call him Mr. Y) who lived during the same time as Mr. X. He was a professor of mathematics and pioneered several influential works in the field of statistics as well. Needless to say, he was highly respected by his peers.
Back to Mr. X’s book — the academic world started reviewing the book. The reaction was largely respectful and impressive. Mr. X had added to his respect with his latest ground-breaking book “The Triumph of Mediocrity in Business”.
Even Mr. Y responded in a journal; he expressed deep respect for the amount of effort that Mr. X had put into accumulating and treating all of this data first-hand. However, he politely revealed something else.
Mr. Y said that whenever one studies a variable that is affected by stable factors AND is under the influence of random chance, this effect naturally shows up. He went on to say that there is nothing magical about this effect in the context of business and that Mr. X need not have struggled with all of this data to establish this “obvious” fact.
This response, naturally, did not sit well with Mr. X.
The Fierce Exchange
Mr. X published his response in the same journal. He politely corrected Mr. Y on some minor misunderstandings. He claimed that the “mediocrity” effect in business was not just a statistical or mathematical generality. He then referred back to his data that clearly suggested this to be the case.
In his response (again, in the same journal), Mr. Y stopped being polite and went about with clinical efficiency. He thwarted Mr. X’s argument using a single thread of clearly-defined logic.
In essence, Mr. Y showed that the ENTIRE thesis of Mr. X’s book was trivial. He compared it to proving that multiplication works by arranging elephants in rows and columns and then repeating the entire ordeal using other animals over and over again.
According to Mr. Y, while this analysis looked spectacular, it was merely stating the obvious. After this, Mr. X’s reputation took a permanent hit and never recovered again!
Note that Mr. X was no ordinary academic. He was an established professor of statistics. His work was meticulous and widely respected. Yet, what is it that Mr. Y showed that had caused this? Let us find out.
Regression to the Mean: Why Success Fades with Time
You see, the key difference between Mr. X and Mr. Y is that Mr. X was a statistician, whereas Mr. Y was a statistician AND a mathematician. As a mathematician, what was obvious to Mr. Y was invisible to Mr. X.
Let us choose one of Mr. X’s top-performing firms. Mr. Y’s point was that such a firm would have had inherently superior wisdom, management skills, etc. But apart from that, such a top-performing firm would also have been very likely to have been very lucky.
So, with the passage of time, one could expect its luck to wander back to the average luck in the market (regardless of its superior wisdom, management, etc.). Mr. X claimed that a top-performing organisation faded because of corrosion experienced from competition over time.
However, Mr. Y challenged Mr. X to pick one of his top-performers, but not from the beginning of his data’s time frame. He instead asked Mr. X to focus on the end of his data’s time frame.
Then, he suggested Mr. X go back in time and observe the past numbers for the latest top-performer. In other words, if it were really true that corrosive effects of competition pulled down successful firms, such an effect should exist both forward in time as well as backward in time.
It turned out that no such effect existed. All that was there was a regression to the mean. With all this said, how can you and I benefit from this story?
The Perils of Regression to the Mean
The story we just saw, although revealing, is nothing special. This kind of a misunderstanding/misinterpretation happens all the time.
At any point in time, there are numerous clinical studies and research papers in social science that claim something has a certain effect, while the researcher(s) simply misinterpret(s) a reversion to the mean.
Assume that a study picked very sick people and gave them a certain medication. After seeing reasonable improvement, the study might claim that the medication has a positive effect on very sick people.
However, very sick people are likely to experience a reversion towards average health. Don’t get me wrong. Very sick people are likely to be more sick than the average over time. BUT they are also more likely to regress in the direction of the average than the less sick people over time.
This phenomenon is so ubiquitous that we need not even go so far as academia and research. A responsible father comes across his daughter’s unusually poor grade report one fine evening. He has a strict talk with her that evening. Later, he sees that her grades have improved and attributes it to his strict talk.
In reality, while his strict talk would definitely have had some sort of an effect, his daughter would have still been likely to improve on her own, especially considering the fact that her grades had slipped ‘unusually’.
What Can We Do About Regression to the Mean?
We, as human beings, think linearly and wish to narratively explain phenomena we observe. Sometimes, we just get the narrative wrong; sometimes, the phenomenon is non-linear; other times, it’s both.
This characteristic of ours to ‘explain everything’ is not at fault either; it has arguably helped us survive and thrive as a species throughout our evolutionary history.
However, it is very helpful for us to “assume” that some sort of regression to the mean exists before we assign narratives to phenomena that we don’t fully understand. Take the medical study for instance.
One solution is to not just pick ‘very sick’ people to test the medication on. Instead, one could pick two groups of equivalently ‘random’ sick people. Then, one could administer the medication to one group and a placebo to the other group.
This method would enable a fairer comparison, as the effect of regression to the mean is likely to affect both groups and thereby wash itself out in the final analysis. Stuff like this is research common practice in this day and age. But slip-ups do still happen!
When it comes to daily life, it helps to be unassuming and sceptical. Be it fad diets or cold showers, regression to the mean silently tricks us into thinking that we understand something when we really don’t. Add to it placebo effects, and it gets even more challenging to tell the trees from the forest.
If all this sounds overwhelming, just know that even ‘experts’ in regression to the mean take years to begin understanding the concept. It sounds simple. It IS simple. Yet it is tremendously deceptive!
Epilogue
In case you had not worked it out, the story that I narrated in this essay is no work of fiction. It is a historical account of real events that actually happened.
Our genius statistician, Mr. X, is Horace Secrist. And our genius mathematician, Mr. Y, is Harold Hotelling. My respect and empathy goes out to Secrist for all the effort he put into his research only for it to be proved ‘trivial’.
Secrist’s top performers were NOT randomly chosen, and that made all the difference. But alas! He was not the first ‘expert’ to fall for the trap of regression, and he will not be the last one either.
Many an expert has made the blunder of assigning a narrative to a phenomenon that is merely regressing. You might think that you are an exception; that you could have seen it coming.
But every one of us is susceptible to the trap of missing the regression to the mean. For it is one of the subtlest among statistical phenomena!
Reference and credit: Jordon Ellenberg.
If you’d like to get notified when interesting content gets published here, consider subscribing.
Further reading that might interest you:
- How To Really Make Sense of Hotelling’s Law?
- How To Really Understand Pascal’s Wager?
- How To Really Understand Zero-Knowledge Proof?
If you would like to support me as an author, consider contributing on Patreon.
Comments