How To Really Understand The Philosophy Of Inferential Statistics?
Published on May 23, 2022 by Hemanth
--
The philosophy of inferential statistics is not an everyday topic of discussion. But it lies at the core of every statistically significant study or experiment that happens/has happened in the scientific world. So, it is well worth diving into this topic and understanding it well. That is precisely what we are going to do in this essay.
To begin, we can split statistics into descriptive statistics and inferential statistics. Descriptive statistics aims to describe the properties of the data at hand. This is where fancy terms like mean, median, variance, kurtosis (for the more advanced reader), etc., get thrown about.
Inferential statistics, on the other hand, aims to “infer” meaningful conclusion(s) from the descriptive side of things. As you can imagine, inferential statistics is where terms like null hypothesis, statistical significance testing, ANOVA (for the more advanced reader), etc., get thrown about.
So, what is it about inferential statistics that interests us in this essay? Well, it is the “inference” part. Before we dive into the core philosophy, there is a fundamental mathematical method we will need to cover first.
The Mathematical Hammer — Proof by Contradiction
Proof by contradiction is a very old, tried-and-trusted method in the mathematical world. It uses the following logic:
1. You formulate a hypothesis (H) that you wish to discredit/reject.
2. You assume that this hypothesis (H) is true.
3. According to the hypothesis, evidence (E) should not be observed in the real world.
4. But evidence (E) IS observed in the real world.
5. Therefore, the only logical conclusion is that the hypothesis (H) is false.
When deconstructed like this, the notion of proof by contradiction appears simple and trivial. However, we would be wrong to underestimate its effectiveness. To prove this point, let me illustrate how proof by contradiction tackles a relatively complex phenomenon very effectively.
Let us say that we are interested in proving that √2 is irrational. So, the hypothesis that we would like to reject is:
H: √2 is rational.
We can express a rational number as a fraction of whole numbers ‘m’ and ’n’ such that ‘m’ and ’n’ have no common factors (for example, 22/7).
Let us now assume that hypothesis (H) is true. If ‘m’ and ’n’ have no common factors, then ‘m’ and ’n’ cannot be both even whole numbers. Based on this we could say that if √2 is rational (H), it follows that both ‘m’ and ’n’ cannot be even numbers (evidence — E).
Let us now do some algebraic manipulation with the initial conditions that we just came up with:
Math illustrated by the author
If m² = 2n², it means that ‘m’ is an even number (because m² is an even number). An even number is one that can be expressed as two times another whole number. Based on this, we could say that m = 2k, where ‘k’ is an arbitrary whole number. Consequently, the following expression follows:
Math illustrated by the author
If n² = 2k², then it follows that ’n’ is an even number as well (because n² is even). If ‘m’ is even, and ’n’ is even, then evidence E is observed. However, according to hypothesis H that √2 is rational, this cannot occur. So, the only logical conclusion is that √2 is irrational. That is, √2 cannot be expressed as a fraction of whole numbers ‘m’ and ’n’. Therefore, the hypothesis: H is rejected.
This beautiful proof illustrates why proof by contradiction is so powerful. We start by formulating a hypothesis that we would like to reject and see if it leads to a logical contradiction. If it does, then the hypothesis cannot be correct and is rejected. If not, then our initial understanding was wrong, and the hypothesis that we wished to reject is accepted.
All this is great! But what does this have to do with inferential statistics? Well, let’s get to that.
The Philosophy of Inferential Statistics
It is very subtle, but inferential statistics ‘takes inspiration’ from proof by contradiction. The ‘takes inspiration’ part is a polite way of saying ‘cheaprip-off’. Why is it a cheap rip-off? Well, let’s look at the logical steps followed in inferential statistics first, and things will get clearer:
1. Formulate a null hypothesis (H0) that is the opposite of your hypothesis(H1).
2. Assume that H0 is true.
3. If H0 is true, then the probability of observing evidence (E) is very low (for example, the 0.05 p-value threshold suggested by R.A. Fisher).
4. Evidence E is observed in your experiment.
5. Consequently, the null hypothesis H0 is very improbable.
Do you see the commonality between this method and the one that we just covered for proof by contradiction? Well, beyond the commonalities, there are also some very subtle, yet consequential differences. In proof by contradiction, we were talking about “proofs” and “logical certainties”. In inferential statistics, we are talking about “probabilities”.
Therefore, you could say that inferential statistics deals with confidence-boosting by contradiction rather than proof by contradiction. “Confidence” is what the philosophy of inferential statistics revolves around. If we are confident enough about a hypothesis, we “infer” a decision.
The Limits of Inferential Statistics
My roots in inferential statistics lie in financial modelling. In the financial world, there is a thumb rule that any (semi-decent) practitioner knows:
“Never bet the house!!”
Regardless of the level of statistical significance you observe, never go all in on a very highly likely event. As I mentioned in my essay on statistical significance, improbable is not the same as impossible. In other words, if something can horribly go wrong, it will; it is just a matter of time.
The trouble is that the method of inferential statistics is seldom transparent. If you are the one “inferring”, you would know the risks. But if you are the end-user or consumer, you are operating on blind trust.
In Fisher’s own words:
“The force with which such a conclusion is supported is logically that of a simple disjunction. Either an exceptionally rare chance has occurred, or the theory of random distribution is not true.
– R.A. Fisher
Fisher knew what he was talking about. Let us say that a machine learning model performs its function of designing a vaccine with a very high level of statistical confidence. The regulatory authorities are apparently happy with the statistics.
The only catch is that there is a very minor probability that 1 in 1 million people who take the vaccine would die. You and I are very, very unlikely to die from such a vaccine. But it is a certainty that (at least) someone will die from taking it (eventually, as the vaccinated population grows significantly).
This example just covers the known issue. The issue of the unknown ‘unknown’ is far more dangerous and remains. Since it is also far more complex, I’ll save that topic for another day.
To conclude, the philosophy of inferential statistics revolves around “confidence”. Most of the time, the confidence prevails with certainty. Now and then, the confidence is shattered (also) with certainty!
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-advertisement
1 year
Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent
1 year
Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Cookie
Duration
Description
_gat
1 minute
This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Cookie
Duration
Description
__gads
1 year 24 days
The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga
2 years
The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_R5WSNS3HKS
2 years
This cookie is installed by Google Analytics.
_gat_gtag_UA_131795354_1
1 minute
Set by Google to distinguish users.
_gid
1 day
Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT
2 years
YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Cookie
Duration
Description
IDE
1 year 24 days
Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie
15 minutes
The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE
5 months 27 days
A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC
session
YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices
never
YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id
never
YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
Comments