Benford’s Law – An explanation

Category: Ingenuity | Comments Off on Benford’s Law – An explanation

Via Wolfram MathWorld

A phenomenological law also called the first digit law, first digit phenomenon, or leading digit phenomenon. Benford’s law states that in listings, tables of statistics, etc., the digit 1 tends to occur with probability ∼30%, much greater than the expected 11.1% (i.e., one digit out of 9). Benford’s law can be observed, for instance, by examining tables of logarithms and noting that the first pages are much more worn and smudged than later pages (Newcomb 1881). While Benford’s law unquestionably applies to many situations in the real world, a satisfactory explanation has been given only recently through the work of Hill (1998).

Benford’s law was used by the character Charlie Eppes as an analogy to help solve a series of high burglaries in the Season 2 “The Running Man” episode (2006) of the television crime drama NUMB3RS.

Benford’s law applies to data that are not dimensionless, so the numerical values of the data depend on the units. If there exists a universal probability distribution P(x) over such numbers, then it must be invariant under a change of scale, so

 P(kx)=f(k)P(x).
(1)

If intP(x)dx=1, then intP(kx)dx=1/k, and normalization implies f(k)=1/k. Differentiating with respect to k and setting k=1 gives

 xP^'(x)=-P(x),
(2)

having solution P(x)=1/x. Although this is not a proper probability distribution (since it diverges), both the laws of physics and human convention impose cutoffs. For example, randomly selected street addresses obey something close to Benford’s law.

BenfordsLaw

If many powers of 10 lie between the cutoffs, then the probability that the first (decimal) digit is D is given by a logarithmic distribution

 P_D=(int_D^(D+1)P(x)dx)/(int_1^(10)P(x)dx)=log_(10)(1+1/D)
(3)

for D=1, …, 9, illustrated above and tabulated below.

D P_D D P_D
1 0.30103 6 0.0669468
2 0.176091 7 0.0579919
3 0.124939 8 0.0511525
4 0.09691 9 0.0457575
5 0.0791812

However, Benford’s law applies not only to scale-invariant data, but also to numbers chosen from a variety of different sources. Explaining this fact requires a more rigorous investigation of central limit-like theorems for the mantissas of random variables under multiplication. As the number of variables increases, the density function approaches that of the above logarithmic distribution. Hill (1998) rigorously demonstrated that the “distribution of distributions” given by random samples taken from a variety of different distributions is, in fact, Benford’s law (Matthews).

One striking example of Benford’s law is given by the 54 million real constants in Plouffe’s “Inverse Symbolic Calculator” database, 30% of which begin with the digit 1. Taking data from several disparate sources, the table below shows the distribution of first digits as compiled by Benford (1938) in his original paper.

col. title 1 2 3 4 5 6 7 8 9 samples
A Rivers, Area 31.0 16.4 10.7 11.3 7.2 8.6 5.5 4.2 5.1 335
B Population 33.9 20.4 14.2 8.1 7.2 6.2 4.1 3.7 2.2 3259
C Constants 41.3 14.4 4.8 8.6 10.6 5.8 1.0 2.9 10.6 104
D Newspapers 30.0 18.0 12.0 10.0 8.0 6.0 6.0 5.0 5.0 100
E Specific Heat 24.0 18.4 16.2 14.6 10.6 4.1 3.2 4.8 4.1 1389
F Pressure 29.6 18.3 12.8 9.8 8.3 6.4 5.7 4.4 4.7 703
G H.P. Lost 30.0 18.4 11.9 10.8 8.1 7.0 5.1 5.1 3.6 690
H Mol. Wgt. 26.7 25.2 15.4 10.8 6.7 5.1 4.1 2.8 3.2 1800
I Drainage 27.1 23.9 13.8 12.6 8.2 5.0 5.0 2.5 1.9 159
J Atomic Wgt. 47.2 18.7 5.5 4.4 6.6 4.4 3.3 4.4 5.5 91
K n^(-1), sqrt(n) 25.7 20.3 9.7 6.8 6.6 6.8 7.2 8.0 8.9 5000
L Design 26.8 14.8 14.3 7.5 8.3 8.4 7.0 7.3 5.6 560
M Reader’s Digest 33.4 18.5 12.4 7.5 7.1 6.5 5.5 4.9 4.2 308
N Cost Data 32.4 18.8 10.1 10.1 9.8 5.5 4.7 5.5 3.1 741
O X-Ray Volts 27.9 17.5 14.4 9.0 8.1 7.4 5.1 5.8 4.8 707
P Am. League 32.7 17.6 12.6 9.8 7.4 6.4 4.9 5.6 3.0 1458
Q Blackbody 31.0 17.3 14.1 8.7 6.6 7.0 5.2 4.7 5.4 1165
R Addresses 28.9 19.2 12.6 8.8 8.5 6.4 5.6 5.0 5.0 342
S n^1, n^2...n! 25.3 16.0 12.0 10.0 8.5 8.8 6.8 7.1 5.5 900
T Death Rate 27.0 18.6 15.7 9.4 6.7 6.5 7.2 4.8 4.1 418
Average 30.6 18.5 12.4 9.4 8.0 6.4 5.1 4.9 4.7 1011
Probable Error +/-0.8 +/-0.4 +/-0.4 +/-0.3 +/-0.2 +/-0.2 +/-0.2 +/-0.3

The following table gives the distribution of the first digit of the mantissa following Benford’s Law using a number of different methods.

method OEIS sequence
Sainte-Lague A055439 1, 2, 3, 1, 4, 5, 6, 1, 2, 7, 8, 9, …
d’Hondt A055440 1, 2, 1, 3, 1, 4, 2, 5, 1, 6, 3, 1, …
largest remainder, Hare quotas A055441 1, 2, 3, 4, 1, 5, 6, 7, 1, 2, 8, 1, …
largest remainder, Droop quotas A055442 1, 2, 3, 1, 4, 5, 6, 1, 2, 7, 8, 1, …



You must be logged in to post a comment.

Name (required)

Email (required)

Website

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Share your wisdom

FFL

"Anyone who still wants guns, rifles or ammo should go to Traction Control/Well-Regulated Militia: http://tinyurl.com/2ea4lhn Still pre-panic prices and a huge inventory and good reliable service. Tell everyone you know"

Posted by: DAve at December 23, 2012 02:55 PM (XDC0v) at AoSHQ





StatsViewer


Top 100 Gun Blog
GBR

NRA

Join the NRA
TSRA

Support the TSRA

  • Political Continuum

  • Unorganized Militia Propaganda Corps

Iowahawk