How To Use Science To Detect Fraud?

How to detect fraud using science? The distribution of the first digits according to Benford's law. Each bar represents the percentage of numbers that start with that digit. It is seen that the digit 1 occurs the most - 30.1%, whereas the digit 9 occurs the least - 5%. The entire bar chart seems to follow a logarithmic pattern downward from 1 through 9.

Using science to detect fraud is nothing fancy. What is fancy is the awareness about the topic that I will be covering in this article: The law of anomalous numbers. This law is also known as Benford’s law or the Newcomb-Benford law. It is based on the following observation:

In many real-life sets of numerical data, the number ‘1’ appears as the leading significant digit about 30% of the time, while the number ‘9’ appears as the leading significant digit less than 5% of the time.

You would intuitively think that leading digits would be uniformly distributed. If they were, each digit would have a relative frequency of about 11%. This is not the case. In this article, I dive into why there is such a deviation from intuitive expectation. But before we dive into the analysis, let’s have a look at the history of this law.

This essay is supported by Generatebg

The History of the Law of Anomalous Numbers

How to use science to detect fraud? A picture of Simon Newcomb — Simon Newcomb (1835–1909) — Image from Wikimedia Commons

The roots of this law lead back to the use of logarithms. Back in 1881, a Canadian-American astronomer named Simon Newcomb noticed that the front pages of logarithm tables were much more worn out compared to other pages. This led him to investigate the phenomenon.

He eventually figured out the relatively higher frequency of ‘1’ as the leading digit. As he further studied the phenomenon, he discovered a logarithmic relationship between numbers 1 through 9 and their frequencies of occurrence as the leading digit.

Newcomb eventually proposed a law that the probability of a single-digit ‘D’ being the first digit of a number was equal to [log(D + 1) — log(D)].

How to use science to detect fraud? — An image of Frank Benford. — Frank Benford (1883–1948) — Image from Wikimedia Commons

Fast forward to 1937, physicist Frank Benford discovered the same phenomenon and ended up testing it more rigorously. He tested the law with data sets that included the surface areas of 335 rives, the sizes of 3259 US populations, 104 physical constants, and 1800 molecular weights to name a few. His observations summed to 20,229.

He then published his results in a paper titled “The Law of Anomalous Numbers” in 1938 (linked in the references at the end of the article). The law was later named after Benford, making it Benford’s law.

Where Does Benford’s Law Apply?

If you thought this law sounds too good to be true, it indeed is! It cannot be applied everywhere. The trick to understanding Benford’s law is to look at its mathematical formulation.

P(D) = log(D + 1) — log(D) = log[(D+1)/D]

You will notice that the law uses logarithms. That is our main clue. We use logarithms primarily to study phenomena that are multiplicative in nature. For more details, check out my article on the history of logarithms.

It turns out that Benford’s law works well when it is applied to data that span several orders of magnitude. An order of magnitude can be roughly considered as 10¹. So, Benford’s law would work better if the range of data spanned from, say, 10⁰ to 10⁶ than if the range spanned from, say, 100 to 999 (within one order of magnitude). There is no precise functional cut-off to the order of magnitude to predict the application of Benford’s law. In fact, researchers have developed rigorous statistical methods over the years to solve this challenge. These methods employ advanced statistical approaches and are beyond the scope of this article.

How to use science to detect fraud? — Benford’s law is tested graphically against a distribution of physical constants. The frequency of occurence is plotted on the y axis, and the quantitiy of the constant is plotted on the x axis. The prediction from Benford’s law seems to fit well to the data. — Benford’s law applied to physical constants — Image from Wikimedia Commons

Instead, what we shall do is take a look at a couple of examples where Benford’s law works well, and where it doesn’t.

Data distributions that are likely to obey Benford’s law:

1. Numbers that follow multiplicative distributions (such as power-law) — for example, stock prices, sales numbers, population numbers of countries, etc.

2. Distributions, where the mean is greater than the median and the skew is positive.

Data distributions that are NOT likely to obey Benford’s law:

1. When numbers are sequentially assigned — for example, cheque numbers, invoice numbers, etc.

2. When numbers are influenced by human psychology — for example, item prices at $1.99, etc.

3. Distributions that are normally distributed — for example, distributions of human height, weight, etc.

Why Does Benford’s Law Work?

It is challenging to get into a deeply technical explanation without making things boring. So, I’ll save you the misery and simplify the explanation as follows:

Benford’s law works because there exist numerous real-world phenomena that follow variations of geometric distributions whereas human perception is linear in nature.

Too many confusing words? Let us look at an example. Assume that the population of an arbitrary country grows geometrically. Consequently, it would follow a growth curve as follows:

How to use science to detect fraud? A graph with populations in millions on the y axis and year on the x axis. The population growth is exponential. — Image created by the author

Due to the exponential nature of the curve, the number of years spent on the number ‘1’ as the leading digit (marked by the green ‘a’) would be significantly higher than the years spent on the number ‘9’ as the leading digit (marked by the red ‘b’). This is precisely what Benford’s law enables us to calculate and predict.

How to use science to detect fraud? — Due to the exponential growth of the population, the digit 1 seems to be the leading digit for many more years than the digit 9. This is marked appropriately using green and red lines respectively. — Image created by the author

This also explains why Benford’s law is a bad fit for phenomena that follow normal distributions. Any phenomenon that does not follow a geometric distribution would be a bad fit for Benford’s law.

With this over-simplified explanation, I am probably not doing the technical work behind the law much justice. But hey, consider this is an introduction to the topic.

How To Detect Fraud?

Benford’s law has been historically employed to detect fraud in taxation and financial transactions. Whenever numbers are cooked by people, there is likely to be a deviation from Benford’s law’s prediction. Particularly useful to detect fraud are digits that occur in the middle of the distribution (like 2, 3, 4, 5, etc.) rather than the ones that occur at the extremes (like 1 and 9).

The reason for this expected deviation is that human thinking processes are likely to contain arbitrarily hidden bounds. Do you remember our pricing example from earlier? While a human being may have a psychological attachment to the price of $1.99, Benford’s law and nature don’t.

A useful property of Benford’s law in this regard is that it is scale-invariant. Regardless of what scale or measuring unit we use, Benford’s law will remain applicable. We could even use different logarithmic bases, and still, Benford’s law would hold.

Having said this, one has to be very cautious when employing Benford’s law in general. Rigorous statistical methods (beyond this article’s scope) have to be employed to ascertain both the applicability as well as fit to the data under scrutiny. There have been reported cases of false conclusions drawn after poor statistical analysis of election data using Benford’s law as the basis. So, you have been warned.

Final Thoughts

Detecting fraud aside, Benford’s law finds a wide range of applications. This is because many natural phenomena are geometric in nature. As time flows, things grow and decay all the time.

In the field of mathematics, the Fibonacci numbers, the factorials, and the powers of almost any number are known to obey Benford’s law.

All of this makes Benford’s law both a fascinating phenomenon as well as a useful tool to work with. With a global pandemic on the loose and a data-driven world coming of age, Benford’s law is becoming all the more relevant by the day!

I hope you found this article interesting and useful. If you’d like to get notified when interesting content gets published here, consider subscribing.

Further reading that might interest you: Is Zero Really Even Or Odd? and How Much String Would You Need To Wrap The Earth?

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_R5WSNS3HKS	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_131795354_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Cookie	Duration	Description
_learn_press_session_a7b7f6513d11f58744fec86fbc57b116	2 days	No description
_wordpress_lp_guest	1 hour	No description
GoogleAdServingTest	session	No description

How To Use Science To Detect Fraud?

The History of the Law of Anomalous Numbers

Where Does Benford’s Law Apply?

Why Does Benford’s Law Work?

How To Detect Fraud?

Final Thoughts

Explore humanity's most curious questions!

Sign up to receive more of our awesome content in your inbox!

Comments

Leave a Reply Cancel reply