Using science to detect fraud is nothing fancy. What is fancy is the awareness about the topic that I will be covering in this article: The law of anomalous numbers. This law is also known as Benfordâs law or the Newcomb-Benford law. It is based on the following observation:
In many real-life sets of numerical data, the number â1â appears as the leading significant digit about 30% of the time, while the number â9â appears as the leading significant digit less than 5% of the time.
You would intuitively think that leading digits would be uniformly distributed. If they were, each digit would have a relative frequency of about 11%. This is not the case. In this article, I dive into why there is such a deviation from intuitive expectation. But before we dive into the analysis, letâs have a look at the history of this law.
This essay is supported by Generatebg
The History of the Law of Anomalous Numbers
The roots of this law lead back to the use of logarithms. Back in 1881, a Canadian-American astronomer named Simon Newcomb noticed that the front pages of logarithm tables were much more worn out compared to other pages. This led him to investigate the phenomenon.
He eventually figured out the relatively higher frequency of â1â as the leading digit. As he further studied the phenomenon, he discovered a logarithmic relationship between numbers 1 through 9 and their frequencies of occurrence as the leading digit.
Newcomb eventually proposed a law that the probability of a single-digit âDâ being the first digit of a number was equal to [log(D + 1)âââlog(D)].
Fast forward to 1937, physicist Frank Benford discovered the same phenomenon and ended up testing it more rigorously. He tested the law with data sets that included the surface areas of 335 rives, the sizes of 3259 US populations, 104 physical constants, and 1800 molecular weights to name a few. His observations summed to 20,229.
He then published his results in a paper titled âThe Law of Anomalous Numbersâ in 1938 (linked in the references at the end of the article). The law was later named after Benford, making it Benfordâs law.
Where Does Benfordâs Law Apply?
If you thought this law sounds too good to be true, it indeed is! It cannot be applied everywhere. The trick to understanding Benfordâs law is to look at its mathematical formulation.
P(D) = log(D + 1)âââlog(D) = log[(D+1)/D]
You will notice that the law uses logarithms. That is our main clue. We use logarithms primarily to study phenomena that are multiplicative in nature. For more details, check out my article on the history of logarithms.
It turns out that Benfordâs law works well when it is applied to data that span several orders of magnitude. An order of magnitude can be roughly considered as 10Âč. So, Benfordâs law would work better if the range of data spanned from, say, 10â° to 10ⶠthan if the range spanned from, say, 100 to 999 (within one order of magnitude). There is no precise functional cut-off to the order of magnitude to predict the application of Benfordâs law. In fact, researchers have developed rigorous statistical methods over the years to solve this challenge. These methods employ advanced statistical approaches and are beyond the scope of this article.
Instead, what we shall do is take a look at a couple of examples where Benfordâs law works well, and where it doesnât.
Data distributions that are likely to obey Benfordâs law:
1. Numbers that follow multiplicative distributions (such as power-law)âââfor example, stock prices, sales numbers, population numbers of countries, etc.
2. Distributions, where the mean is greater than the median and the skew is positive.
Data distributions that are NOT likely to obey Benfordâs law:
1. When numbers are sequentially assignedâââfor example, cheque numbers, invoice numbers, etc.
2. When numbers are influenced by human psychologyâââfor example, item prices at $1.99, etc.
3. Distributions that are normally distributedâââfor example, distributions of human height, weight, etc.
Why Does Benfordâs Law Work?
It is challenging to get into a deeply technical explanation without making things boring. So, Iâll save you the misery and simplify the explanation as follows:
Benfordâs law works because there exist numerous real-world phenomena that follow variations of geometric distributions whereas human perception is linear in nature.
Too many confusing words? Let us look at an example. Assume that the population of an arbitrary country grows geometrically. Consequently, it would follow a growth curve as follows:
Due to the exponential nature of the curve, the number of years spent on the number â1â as the leading digit (marked by the green âaâ) would be significantly higher than the years spent on the number â9â as the leading digit (marked by the red âbâ). This is precisely what Benfordâs law enables us to calculate and predict.
This also explains why Benfordâs law is a bad fit for phenomena that follow normal distributions. Any phenomenon that does not follow a geometric distribution would be a bad fit for Benfordâs law.
With this over-simplified explanation, I am probably not doing the technical work behind the law much justice. But hey, consider this is an introduction to the topic.
How To Detect Fraud?
Benfordâs law has been historically employed to detect fraud in taxation and financial transactions. Whenever numbers are cooked by people, there is likely to be a deviation from Benfordâs lawâs prediction. Particularly useful to detect fraud are digits that occur in the middle of the distribution (like 2, 3, 4, 5, etc.) rather than the ones that occur at the extremes (like 1 and 9).
The reason for this expected deviation is that human thinking processes are likely to contain arbitrarily hidden bounds. Do you remember our pricing example from earlier? While a human being may have a psychological attachment to the price of $1.99, Benfordâs law and nature donât.
A useful property of Benfordâs law in this regard is that it is scale-invariant. Regardless of what scale or measuring unit we use, Benfordâs law will remain applicable. We could even use different logarithmic bases, and still, Benfordâs law would hold.
Having said this, one has to be very cautious when employing Benfordâs law in general. Rigorous statistical methods (beyond this articleâs scope) have to be employed to ascertain both the applicability as well as fit to the data under scrutiny. There have been reported cases of false conclusions drawn after poor statistical analysis of election data using Benfordâs law as the basis. So, you have been warned.
Final Thoughts
Detecting fraud aside, Benfordâs law finds a wide range of applications. This is because many natural phenomena are geometric in nature. As time flows, things grow and decay all the time.
In the field of mathematics, the Fibonacci numbers, the factorials, and the powers of almost any number are known to obey Benfordâs law.
All of this makes Benfordâs law both a fascinating phenomenon as well as a useful tool to work with. With a global pandemic on the loose and a data-driven world coming of age, Benfordâs law is becoming all the more relevant by the day!
I hope you found this article interesting and useful. If youâd like to get notified when interesting content gets published here, consider subscribing.
Further reading that might interest you: Is Zero Really Even Or Odd? and How Much String Would You Need To Wrap The Earth?
Comments