Using science to detect fraud is nothing fancy. What is fancy is the awareness about the topic that I will be covering in this article: The law of anomalous numbers. This law is also known as Benfordâs law or the Newcomb-Benford law. It is based on the following observation:
In many real-life sets of numerical data, the number â1â appears as the leading significant digit about 30% of the time, while the number â9â appears as the leading significant digit less than 5% of the time.
You would intuitively think that leading digits would be uniformly distributed. If they were, each digit would have a relative frequency of about 11%. This is not the case. In this article, I dive into why there is such a deviation from intuitive expectation. But before we dive into the analysis, letâs have a look at the history of this law.
The roots of this law lead back to the use of logarithms. Back in 1881, a Canadian-American astronomer named Simon Newcombnoticed that the front pages of logarithm tables were much more worn out compared to other pages. This led him to investigate the phenomenon.
He eventually figured out the relatively higher frequency of â1â as the leading digit. As he further studied the phenomenon, he discovered a logarithmic relationship between numbers 1 through 9 and their frequencies of occurrence as the leading digit.
Newcomb eventually proposed a law that the probability of a single-digit âDâ being the first digit of a number was equal to [log(D + 1)âââlog(D)].
Fast forward to 1937, physicist Frank Benford discovered the same phenomenon and ended up testing it more rigorously. He tested the law with data sets that included the surface areas of 335 rives, the sizes of 3259 US populations, 104 physical constants, and 1800 molecular weights to name a few. His observations summed to 20,229.
He then published his results in a paper titled âThe Law of Anomalous Numbersâ in 1938 (linked in the references at the end of the article). The law was later named after Benford, making it Benfordâs law.
Where Does Benfordâs Law Apply?
If you thought this law sounds too good to be true, it indeed is! It cannot be applied everywhere. The trick to understanding Benfordâs law is to look at its mathematical formulation.
P(D) = log(D + 1)âââlog(D) = log[(D+1)/D]
You will notice that the law uses logarithms. That is our main clue. We use logarithms primarily to study phenomena that are multiplicative in nature. For more details, check out my article on the history of logarithms.
It turns out that Benfordâs law works well when it is applied to data that span several orders of magnitude. An order of magnitude can be roughly considered as 10Âč. So, Benfordâs law would work better if the range of data spanned from, say, 10â° to 10ⶠthan if the range spanned from, say, 100 to 999 (within one order of magnitude). There is no precise functional cut-off to the order of magnitude to predict the application of Benfordâs law. In fact, researchers have developed rigorous statistical methods over the years to solve this challenge. These methods employ advanced statistical approaches and are beyond the scope of this article.
Benfordâs law applied to physical constantsâââImage from Wikimedia Commons
Instead, what we shall do is take a look at a couple of examples where Benfordâs law works well, and where it doesnât.
Data distributions that are likely to obey Benfordâs law:
1. Numbers that follow multiplicative distributions (such as power-law)âââfor example, stock prices, sales numbers, population numbers of countries, etc.
2. Distributions, where the mean is greater than the median and the skew is positive.
Data distributions that are NOT likely to obey Benfordâs law:
1. When numbers are sequentially assignedâââfor example, cheque numbers, invoice numbers, etc.
2. When numbers are influenced by human psychologyâââfor example, item prices at $1.99, etc.
3. Distributions that are normally distributedâââfor example, distributions of human height, weight, etc.
Why Does Benfordâs Law Work?
It is challenging to get into a deeply technical explanation without making things boring. So, Iâll save you the misery and simplify the explanation as follows:
Benfordâs law works because there exist numerous real-world phenomena that follow variations of geometric distributions whereas human perception is linear in nature.
Too many confusing words? Let us look at an example. Assume that the population of an arbitrary country grows geometrically. Consequently, it would follow a growth curve as follows:
Image created by the author
Due to the exponential nature of the curve, the number of years spent on the number â1â as the leading digit (marked by the green âaâ) would be significantly higher than the years spent on the number â9â as the leading digit (marked by the red âbâ). This is precisely what Benfordâs law enables us to calculate and predict.
Image created by the author
This also explains why Benfordâs law is a bad fit for phenomena that follow normal distributions. Any phenomenon that does not follow a geometric distribution would be a bad fit for Benfordâs law.
With this over-simplified explanation, I am probably not doing the technical work behind the law much justice. But hey, consider this is an introduction to the topic.
How To Detect Fraud?
Benfordâs law has been historically employed to detect fraud in taxation and financial transactions. Whenever numbers are cooked by people, there is likely to be a deviation from Benfordâs lawâs prediction. Particularly useful to detect fraud are digits that occur in the middle of the distribution (like 2, 3, 4, 5, etc.) rather than the ones that occur at the extremes (like 1 and 9).
The reason for this expected deviation is that human thinking processes are likely to contain arbitrarily hidden bounds. Do you remember our pricing example from earlier? While a human being may have a psychological attachment to the price of $1.99, Benfordâs law and nature donât.
A useful property of Benfordâs law in this regard is that it is scale-invariant. Regardless of what scale or measuring unit we use, Benfordâs law will remain applicable. We could even use different logarithmic bases, and still, Benfordâs law would hold.
Having said this, one has to be very cautious when employing Benfordâs law in general. Rigorous statistical methods (beyond this articleâs scope) have to be employed to ascertain both the applicability as well as fit to the data under scrutiny. There have been reported cases of false conclusions drawn after poor statistical analysis of election data using Benfordâs law as the basis. So, you have been warned.
Final Thoughts
Detecting fraud aside, Benfordâs law finds a wide range of applications. This is because many natural phenomena are geometric in nature. As time flows, things grow and decay all the time.
In the field of mathematics, the Fibonacci numbers, the factorials, and the powers of almost any number are known to obey Benfordâs law.
All of this makes Benfordâs law both a fascinating phenomenon as well as a useful tool to work with. With a global pandemic on the loose and a data-driven world coming of age, Benfordâs law is becoming all the more relevant by the day!
I hope you found this article interesting and useful. If youâd like to get notified when interesting content gets published here, consider subscribing.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking âAcceptâ, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-advertisement
1 year
Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent
1 year
Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Cookie
Duration
Description
_gat
1 minute
This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Cookie
Duration
Description
__gads
1 year 24 days
The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga
2 years
The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_R5WSNS3HKS
2 years
This cookie is installed by Google Analytics.
_gat_gtag_UA_131795354_1
1 minute
Set by Google to distinguish users.
_gid
1 day
Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT
2 years
YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Cookie
Duration
Description
IDE
1 year 24 days
Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie
15 minutes
The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE
5 months 27 days
A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC
session
YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices
never
YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id
never
YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
Comments