How to detect fraud using science? The distribution of the first digits according to Benford's law. Each bar represents the percentage of numbers that start with that digit. It is seen that the digit 1 occurs the most - 30.1%, whereas the digit 9 occurs the least - 5%. The entire bar chart seems to follow a logarithmic pattern downward from 1 through 9.

Using science to detect fraud is nothing fancy. What is fancy is the awareness about the topic that I will be covering in this article: The law of anomalous numbers. This law is also known as Benford’s law or the Newcomb-Benford law. It is based on the following observation:

In many real-life sets of numerical data, the number ‘1’ appears as the leading significant digit about 30% of the time, while the number ‘9’ appears as the leading significant digit less than 5% of the time.

You would intuitively think that leading digits would be uniformly distributed. If they were, each digit would have a relative frequency of about 11%. This is not the case. In this article, I dive into why there is such a deviation from intuitive expectation. But before we dive into the analysis, let’s have a look at the history of this law.

This essay is supported by Generatebg

A product with a beautiful background featuring the sponsor: Generatebg - a service that generates high-resolution backgrounds in just one click. The description says "No more costly photographers" and displays a "Get Started" button beneath the description.

The History of the Law of Anomalous Numbers

How to use science to detect fraud? A picture of Simon Newcomb
Simon Newcomb (1835–1909) — Image from Wikimedia Commons

The roots of this law lead back to the use of logarithms. Back in 1881, a Canadian-American astronomer named Simon Newcomb noticed that the front pages of logarithm tables were much more worn out compared to other pages. This led him to investigate the phenomenon.

He eventually figured out the relatively higher frequency of ‘1’ as the leading digit. As he further studied the phenomenon, he discovered a logarithmic relationship between numbers 1 through 9 and their frequencies of occurrence as the leading digit.

Newcomb eventually proposed a law that the probability of a single-digit ‘D’ being the first digit of a number was equal to [log(D + 1) — log(D)].

How to use science to detect fraud? — An image of Frank Benford.
Frank Benford (1883–1948) — Image from Wikimedia Commons

Fast forward to 1937, physicist Frank Benford discovered the same phenomenon and ended up testing it more rigorously. He tested the law with data sets that included the surface areas of 335 rives, the sizes of 3259 US populations, 104 physical constants, and 1800 molecular weights to name a few. His observations summed to 20,229.

He then published his results in a paper titled “The Law of Anomalous Numbers” in 1938 (linked in the references at the end of the article). The law was later named after Benford, making it Benford’s law.

Where Does Benford’s Law Apply?

If you thought this law sounds too good to be true, it indeed is! It cannot be applied everywhere. The trick to understanding Benford’s law is to look at its mathematical formulation.

P(D) = log(D + 1) — log(D) = log[(D+1)/D]

You will notice that the law uses logarithms. That is our main clue. We use logarithms primarily to study phenomena that are multiplicative in nature. For more details, check out my article on the history of logarithms.

It turns out that Benford’s law works well when it is applied to data that span several orders of magnitude. An order of magnitude can be roughly considered as 10Âč. So, Benford’s law would work better if the range of data spanned from, say, 10⁰ to 10⁶ than if the range spanned from, say, 100 to 999 (within one order of magnitude). There is no precise functional cut-off to the order of magnitude to predict the application of Benford’s law. In fact, researchers have developed rigorous statistical methods over the years to solve this challenge. These methods employ advanced statistical approaches and are beyond the scope of this article.

How to use science to detect fraud? — Benford’s law is tested graphically against a distribution of physical constants. The frequency of occurence is plotted on the y axis, and the quantitiy of the constant is plotted on the x axis. The prediction from Benford’s law seems to fit well to the data.
Benford’s law applied to physical constants — Image from Wikimedia Commons

Instead, what we shall do is take a look at a couple of examples where Benford’s law works well, and where it doesn’t.

Data distributions that are likely to obey Benford’s law:

1. Numbers that follow multiplicative distributions (such as power-law) — for example, stock prices, sales numbers, population numbers of countries, etc.

2. Distributions, where the mean is greater than the median and the skew is positive.

Data distributions that are NOT likely to obey Benford’s law:

1. When numbers are sequentially assigned — for example, cheque numbers, invoice numbers, etc.

2. When numbers are influenced by human psychology — for example, item prices at $1.99, etc.

3. Distributions that are normally distributed — for example, distributions of human height, weight, etc.


Why Does Benford’s Law Work?

It is challenging to get into a deeply technical explanation without making things boring. So, I’ll save you the misery and simplify the explanation as follows:

Benford’s law works because there exist numerous real-world phenomena that follow variations of geometric distributions whereas human perception is linear in nature.

Too many confusing words? Let us look at an example. Assume that the population of an arbitrary country grows geometrically. Consequently, it would follow a growth curve as follows:

How to use science to detect fraud? A graph with populations in millions on the y axis and year on the x axis. The population growth is exponential.
Image created by the author

Due to the exponential nature of the curve, the number of years spent on the number ‘1’ as the leading digit (marked by the green ‘a’) would be significantly higher than the years spent on the number ‘9’ as the leading digit (marked by the red ‘b’). This is precisely what Benford’s law enables us to calculate and predict.

How to use science to detect fraud? — Due to the exponential growth of the population, the digit 1 seems to be the leading digit for many more years than the digit 9. This is marked appropriately using green and red lines respectively.
Image created by the author

This also explains why Benford’s law is a bad fit for phenomena that follow normal distributions. Any phenomenon that does not follow a geometric distribution would be a bad fit for Benford’s law.

With this over-simplified explanation, I am probably not doing the technical work behind the law much justice. But hey, consider this is an introduction to the topic.


How To Detect Fraud?

Benford’s law has been historically employed to detect fraud in taxation and financial transactions. Whenever numbers are cooked by people, there is likely to be a deviation from Benford’s law’s prediction. Particularly useful to detect fraud are digits that occur in the middle of the distribution (like 2, 3, 4, 5, etc.) rather than the ones that occur at the extremes (like 1 and 9).

The reason for this expected deviation is that human thinking processes are likely to contain arbitrarily hidden bounds. Do you remember our pricing example from earlier? While a human being may have a psychological attachment to the price of $1.99, Benford’s law and nature don’t.

A useful property of Benford’s law in this regard is that it is scale-invariant. Regardless of what scale or measuring unit we use, Benford’s law will remain applicable. We could even use different logarithmic bases, and still, Benford’s law would hold.

Having said this, one has to be very cautious when employing Benford’s law in general. Rigorous statistical methods (beyond this article’s scope) have to be employed to ascertain both the applicability as well as fit to the data under scrutiny. There have been reported cases of false conclusions drawn after poor statistical analysis of election data using Benford’s law as the basis. So, you have been warned.

Final Thoughts

Detecting fraud aside, Benford’s law finds a wide range of applications. This is because many natural phenomena are geometric in nature. As time flows, things grow and decay all the time.

In the field of mathematics, the Fibonacci numbers, the factorials, and the powers of almost any number are known to obey Benford’s law.

All of this makes Benford’s law both a fascinating phenomenon as well as a useful tool to work with. With a global pandemic on the loose and a data-driven world coming of age, Benford’s law is becoming all the more relevant by the day!


I hope you found this article interesting and useful. If you’d like to get notified when interesting content gets published here, consider subscribing.

Further reading that might interest you: Is Zero Really Even Or Odd? and How Much String Would You Need To Wrap The Earth?

Street Science

Explore humanity's most curious questions!

Sign up to receive more of our awesome content in your inbox!

Select your update frequency:

We don’t spam! Read our privacy policy for more info.