How To Really Understand Statistical Significance? - An illustration of a bell curve that shows the 95% confidence interval highlighted in pink. There are 3 significant values marked on the horizontal axis: -1.96 on the left boundary of the confidence interval, 0 at its centre, and +1.96 at its right boundary.


Statistical significance is one of those “scientific” concepts/tools that has steadily been misused over the years (especially in the social sciences). In this essay, I aim to clarify certain fundamental misunderstandings related to this concept and its application.

We will start with an introduction to our intuitive understanding/application of statistical significance. Then, we will dive one level deeper and cover the scientific business of statistical significance. Finally, we will discuss the core issues with our current usage of statistical significance and explore potential remedies as well. Without any further ado, let us begin.

This essay is supported by Generatebg

Intuitive Statistical Significance

Human beings use intuitive statistical models all the time. We observe real-world events, build hypotheses over time, and then establish them as truths using our intuitive statistical models. Of course, we don’t express words like “statistical” and “significance” for such purposes. Here is a hypothetical example:

You are driving downtown and stop in front of a signal near the town square. You let out a big sigh of frustration. You do this because you know that if you get a red on this signal, then you will get a line of reds at the next four traffic signals as well. You wish that you had gotten a green instead.

What is going on here is that you have built-up a mental database of your past experiences and constructed a “If-Then” hypothesis. Then, based on further experiences (reds and greens), you have established your initial hypothesis as the truth. You won’t need to question this “truth” unless there is a sufficient frequency of events that discredit your original “If-Then” hypothesis.

In such an intuitive statistics model, any event (data) that confirms your hypothesis leads to statistical significance. This means that your intuitive statistical model gains significance or becomes more important.

Needless to say, there are a few major issues here. I have covered one of the major philosophical issues in the essay: How to really understand the raven paradox? Apart from that, there is also the fundamental issue that the scientific interpretation of statistical significance is starkly different from our institutive interpretation.


Scientific Statistical Significance

Intuition usually goes out the window when we are dealing with scientific methods. As far as scientific statistical methods are concerned, one of the pioneers of the modern method was Ronald Aylmer Fisher. The modern statistical method can be summarized by the following action-steps:

1. Frame the null-hypothesis.

2. Run an experiment and record observations/data.

3. Calculate the probability of obtaining the observed results as exceptions (extremes) to the null-hypothesis, assuming that the null-hypothesis is true. This probability value is known in the biz as the p-value.

4. If the p-value is less than a threshold value (5% was Fisher’s favourite), it means that your experimental results reject the null-hypothesis, and consequently, are statistically significant. If the p-value is above the threshold value, then it means that the null-hypothesis has not been rejected.

The first step in the scientific method is to frame the null-hypothesis. Right here, it is important to note a fundamental difference between intuitive statistical significance and scientific statistical significance. With our intuitive mental models, when events confirm our hypothesis, we mark them as statistically significant.

With the scientific method, it is not sufficient that the data is consistent with our hypothesis. Instead, it is necessary that the data is sufficiently inconsistent with the negation of our hypothesis (null-hypothesis) for it to be considered statistically significant.

If all this is too abstract for you, let me help you out with an example. Let’s say that I claim that I have the power to make the sun rise in the east. With your intuitive model, if you keep testing the hypothesis by observing sunrises, the data would appear to have (intuitive) statistical significance.

Instead, if you follow the scientific method, the null-hypothesis would be: I do not have the power to make the sun rise in the east (negation of the original statement). Once you have the null-hypothesis, assuming that it is true, you could test it by asking me not to make the sun rise in the east. If I cannot pull this off, the null-hypothesis is not ruled out.

A More Practical Example

Say that you are developing a cream to shield people from radio waves and wish to test it for statistical significance. You are operating under the hypothesis that the cream is going to shield people from radio waves in a statistically significant way.

The corresponding null hypothesis would be: the cream has no effect on human beings. In order to put your null hypothesis to test, you conduct an experiment with 100 people. 50 of these people are given your cream and 50 of them are given a placebo cream. You then measure for radio wave-shielding numbers for all 100.

It is not just sufficient that more cream-people show better shield-numbers than placebo-people. It is also necessary that lesser than a threshold number of placebo people show better or comparable shield-numbers than the cream people (backed by the p-value). If this is the case, then we could call the results of the experiment statistically significant (that is, the null hypothesis is rejected).

I have cut corners with this example by not going into probabilities of natural shielding, assumed distributions, etc. But for the purposes of this essay, this example suffices.

Now that we have covered the scientific method of null-hypothesis significance testing pioneered by R.A. Fisher, let us jump into the issues with this approach.


Issues with Statistical Significance

The majority of the issues that we face occur at what I call interface points (more on that in a bit). A typical statistical workflow looks as follows:

How To Really Understand Statistical Significance? — a flowchart showing the following text flow: Null hypothesis formulation (interpreted in subjective language) → Experimentation and statistical treatment (more or less — objective protocols) → Result-reporting via statistical significance (interpreted in subjective language).
Statistical Workflow — Flowchart created by the author

As you can see, the experimentation and statistical treatment of the problem at hand are pretty much objective protocols. However, there are issues that creep in when we formulate our null-hypothesis and when we interpret results from statistical significance.

The First Interface Point

Because we interpret the null hypothesis via human language, it does not occur to us that the null-hypothesis is practically almost always false. There is no practical way that the cream has no effect on human beings. It either improves radio wave-shielding or makes it worse. In this sense, almost every application has a binary outcome: either it helps or it doesn’t. The zero-effect application might very well not exist.

It’s just that when the effects are very small, we practically consider them as zero. The formulation of the null-hypothesis is what I call as the first interface point. This is partly where our issues emerge from; poor formulation leads to poor results.

The Second Interface Point

The second interface point occurs when we interpret the statistical results. You see, all that statistical significance (in the scientific sense) tells us is that there is an effect from the cream on human beings. Contrary to our intuition, it does not tell us anything about “significance” or “importance” of the results.

It is up to us to draw conclusions from the “statistically significant” results. We usually do this via subjective language as well. If our subjective conclusions are misleading, the scientific concept of “statistical significance” becomes arbitrarily misleading as well.

Imagine a situation where I state that the radio wave-shielding cream improves the shielding as much as 10 times compared to no cream. That sounds very convincing.

But what if I revealed that the shielding number without the cream was 0.0000000000001 with 100 being the highest possible and 0 being the lowest possible values when health factors are all held equal. This would mean that the number with the cream was 0.0000000000010 — that is not so convincing!


How to Treat the Issues with Statistical Significance

Inorder to start addressing the issues we have discussed so far, we need to focus on the interface points first. There is no silver bullet solution for these issues.

Whenever we are presented with “statistically significant” results, we need to critically analyse the formulation of the null hypothesis and the linguistic interpretation of “statistical significance” first.

Don’t get me wrong. There could be issues with the statistical methods as well. But even before we get to that part, the interface points present a more annoying problem — they play with human emotions. For instance, a scary-sounding conclusion/interpretation could invoke an unwarranted response from people (ten times a very, very small number is still a very small number).

To conclude, the term “statistical significance” says nothing about the quantitative significance of the results. All it says is that we are detecting something worth our attention.


Reference and credit: Jordon Ellenberg.

If you’d like to get notified when interesting content gets published here, consider subscribing.

Further reading that might interest you: How To Perfectly Predict Improbable Events? and The New Industrial Revolution Is Here.

If you would like to support me as an author, consider contributing on Patreon.

Street Science

Explore humanity's most curious questions!

Sign up to receive more of our awesome content in your inbox!

Select your update frequency:

We don’t spam! Read our privacy policy for more info.