How To Really Avoid P-Value Hacking In Statistics?

P-value hacking is a phenomenon that is currently ensuring that most published statistical research findings are false. In my essay on statistical significance, I briefly covered what a P-value is.

In this essay, we will first do a quick recap of the statistical method and what the P-value means. Following this, we will proceed to see how P-value hacking occurs and what its direct consequences are. Finally, we will explore solutions to the problem at hand. In our ever-so-increasingly data-driven world, everyone would benefit from awareness and knowledge of this topic.

The Statistical Method — A Recap

The modern statistical method can be summarized in the following steps:

1. Formulate the null-hypothesis.

2. Conduct an experiment and record observations/data.

3. Calculate the probability of obtaining the observed results as exceptions (extremes) to the null-hypothesis, assuming that the null-hypothesis is true. This probability value is known in the biz as the p-value.

4. If the p-value is less than a threshold value, it means that your experimental results reject the null-hypothesis, and consequently, are statistically significant. If the p-value is above the threshold value, then it means that the null-hypothesis has not been rejected.

As a reminder, the null hypothesis is framed in such a way that it describes the hypothesis that you would like to reject/discredit. R.A. Fisher was one of the foremost pioneers of the modern statistical method. He preferred a threshold p-value of 0.05, and the convention stuck.

If all this information is too abstract for you, let us go through a quick example. Let us say that you are conducting a statistical experiment to see if a drug has the intended effect.

The corresponding null hypothesis is that the drug has no effect. Once you have the experimental results, you calculate the probability that your observed results would be the case if the drug were to indeed have no effect (due to luck). This probability value is the p-value.

A very low p-value (say, below Fisher’s 0.05) suggests that the probability that the observed results occurred purely by chance is very low. So, it boosts your confidence that the drug indeed has the intended effect. So far, so good.

Before, we move on to the issue of p-value hacking, it is beneficial to cover the issue of confidence hacking first.

This essay is supported by Generatebg

The Philosophy of Statistics — The Issue of Confidence Hacking

Say that you are conducting a statistical analysis that correlates 10,0000 different organic ingredients with positive effects on human beings. Let us say that in reality, 20 of these ingredients truly have an effect (though we don’t know this at the time of the study).

After applying Fisher’s recommended p-value filter of 0.05, we end up with the following results table:

Out of the 20 ingredients that truly have a positive effect, 10 don’t make it past the 0.05 threshold (top-left corner). This is not good; but it is to be expected. The remaining 10 out of 20 ingredients that truly have an effect are correctly categorized as true positives (top-right corner).

Clearly, most of the ingredients that don’t have an effect are rejected by the analysis. This is a good thing, and is also to be expected (bottom-left corner). Finally, we have on the bottom-right corner, 500 ingredients that have no effect falsely classified as ingredients that have a positive effect. This is very bad. Still, this is ALSO to be expected. Even though the number is 500, it is within what the p-value threshold promises to reject/allow.

In short, we have ended up in a situation where we have statistically significant results that give us very little confidence.

The principle of confidence hacking works as follows:

If “p < 0.05”, then the null hypothesis must be false. If “p > 0.05”, then the null hypothesis must be true.

Basically, the issue of confidence hacking treats the p-value filter as a hard truth-criterion (which it is not). If you wish to read an in-depth discussion on this topic, check out my essay on the philosophy of inferential statistics. But for now, let us move on to the main topic of this essay.

What is P-value Hacking?

P-value hacking is a special case of data manipulation when the p-value is right on the limit of being statistically significant (for example, p = 0.06). The biggest offenders in this domain are modern scientific researchers.

Say that a scientific researcher has been researching on a hypothesis for the past 3 years. Finally, after gathering the funds to conduct a proper experiment, she sets off to work on the numbers. The statistics reveal that the p-value is 0.059 (for statistical significance p needs be lesser than 0.05 in her case).

After three years of hard work, the researcher is facing a situation where it might all end-up meaningless. She goes through all of the data sets.

“Did I over-weight the pressure factor? Did I “by mistake” exclude any outliers in the data? Did I “by mistake” include any outliers in the data?

Okay, what happens if I remove these five data points? Wow, the p-value is now down to 0.041. Right, that settles that.”

Would you blame the researcher? I would not. We all know of the teacher who gives away a couple of mercy points to her students who are on the verge of failing. If two points are going to pass them, what’s wrong with giving the points away? Scientific Researchers are human beings, just as teachers are.

Blame the system; don’t blame the players.

Even though I am empathetic to the researchers, small manipulative mistakes do add up. In 2005, John Ioannidis published a paper titled “Why Most Published Research Findings Are False” (linked in the references section at the end of the essay).

In this paper, he critically showed how such minor rule-bending adds up to catastrophic systemic consequences. Science involving statistical methods (especially social science) has become less trustworthy over the years.

How to Really Avoid P-Value Hacking in Statistics?

The issue we are facing here is systemic. So, it makes sense that the solution(s) be systemic as well. Firstly, there seems to be an asymmetry as to how statistical results are being handled. Currently, most science publications are interested in publishing “statistically significant” studies/research.

Research with statistically insignificant results goes in the back-office drawer, only to be never seen again. This leads to two issues:

Firstly, the rest of the world does not know about the “failed” experiment/study. So, other researchers are bound to repeat the same experiment/study until the writing is on the wall for the entire community. This leads to the wastage of precious resources.

Secondly, researchers are incentivized to “cheat”. If a human being is punished for not p-value hacking and rewarded for p-value hacking, can you really blame the human being for his/her actions?

Publications need to start incentivizing statistically insignificant research results as well. This is not a silver bullet, but it is a start!

Fisher once wrote:

“A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.”
– R.A. Fisher.

In short, even after a statistically significant result is published once, it could be purely by luck. So, publications also need to promote and publish follow-on research work that aims to reproduce results from first-time research. You’d be surprised to know how many studies are not reproducible or were purely due to luck.

Final Thoughts

To conclude, p-value hacking is rampant, and the situation is dire. However, the statistics community is already aware of the issue and is slowly taking action to resolve it.

As an end-user of statistical products in our data-driven world, it is immensely beneficial for you to be aware of the p-value hack. This enables you to ask the right questions before you make critical decisions!

References and credit: John Ioannidis and Jordon Ellenberg.

If you’d like to get notified when interesting content gets published here, consider subscribing.

Further reading that might interest you: Why Is The Hot Hand Fallacy Really A Fallacy? and How To Really Benefit From Curves Of Constant Width?

If you would like to support me as an author, consider contributing on Patreon.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_R5WSNS3HKS	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_131795354_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Cookie	Duration	Description
_learn_press_session_a7b7f6513d11f58744fec86fbc57b116	2 days	No description
_wordpress_lp_guest	1 hour	No description
GoogleAdServingTest	session	No description

How To Really Avoid P-Value Hacking In Statistics?

The Statistical Method — A Recap

The Philosophy of Statistics — The Issue of Confidence Hacking

What is P-value Hacking?

How to Really Avoid P-Value Hacking in Statistics?

Final Thoughts

Explore humanity's most curious questions!

Sign up to receive more of our awesome content in your inbox!

Comments

Leave a Reply Cancel reply