Pareto's 80/20 Rule and the Gaussian Distribution
11 Pareto's 80/20 Rule and the Gaussian Distribution
Katsuaki Tanabe*
Department of Chemical Engineering, Kyoto University, Nishikyo, Kyoto 615-8510, Japan *E-mail: [email protected]
Abstract
The statistical state for the empirical Pareto's 80/20 rule has been found to correspond to a normal or Gaussian distribution with a standard deviation that is twice the mean. This finding represents large characteristic variations in our society and nature. In this distribution, the rule can be also referred to as, for example, the 25/5, 45/10, 60/15, or 90/25 rule. In addition, our result suggests the existence of implicit negative contributors.
Keywords
Pareto; Statistics; Gaussian distribution
Introduction
Pareto's 80/20 rule states that roughly 80% of all effects stem from 20% of all causes for many events, which conceptually contrasts the contribution of the vital few with that of the trivial many.
This rule has been applied in a variety of fields, such as economics, biology, ethology, and civil engineering, where its validity and usefulness have been demonstrated. Mathematically, the 80/20 rule is often interpreted as an instance of the Pareto distribution.
However, the power law of the Pareto distribution is originally a model intended to represent the probability of a variable exceeding a certain threshold value, and is also known as the survival or tail function. Meanwhile, many distributions pertaining to events in our society and nature, to which the 80/20 rule is often applied, more commonly follow the normal or Gaussian distribution. In other words, it may be more intuitive to assume a distribution with a peak around the average or mean in order to discuss the 80/20 rule, rather than the monotonic Pareto distribution. Therefore, in this short note we present an analysis of Pareto's 80/20 rule based on the Gaussian distribution.
Theory and Calculation Methods
The normal or Gaussian probability distribution f ( x ) based on the central limit theorem is described as xf x (1), where and are the standard deviation and the mean, respectively. To analyze and discuss Pareto's 80/20 rule, we define the cause and effect integrated fractions, respectively I cause and I effect , as Xcause X f x dxI f x dxf x dx (2), Xeffect X xf x dxI xf x dxxf x dx (3). In popular folklore, the 80/20 rule includes such claims as that 20% of a population own 80% of the wealth, that 20% of the books in a library account for 80% of the circulation, that 20% of a business customers bring in 80% of its revenue, that 20% of all software features account for 80% of all software use, and so on. The quantity I cause is a proportion of the population; the quantity I effect is the proportion of the total income that they get. Or I cause is a proportion of the population of books in a library and I effect is the corresponding proportion of all circulation. And so on. Here, we note that f x dx (4), xf x dx (5), for the definitions of f ( x ) and . Figure 1 depicts an example of the Gaussian distribution, along with its corresponding I cause and I effect , for the case that = 1 and = 2. X denotes the threshold deviation from for defining I cause and I effect . As a common characteristic of the Gaussian distribution, it is well known, for example, that I cause = 0.16, 0.023, and 0.0014 for X = , 2 , and 3 , respectively. In Fig. 1(b), note that I effect is not exactly the integrated area in the graph, but rather that divided by , as shown in Eq. 3. Regarding the similarity of distributions, same / values yield the same I effect - I cause relation, regardless of the individual absolute values of and ; therefore, we can conduct investigations based just on the ratio of / . In other words, the I effect - I cause relation is only a function of the ratio / rather than the individual values of and . We calculate I effect and I cause with various values of / and X . Results and Discussion
Figure 2 shows the relationship calculated between I cause and I effect for varied values of X and / . We see that the point ( I cause , I effect ) = (0.2, 0.8) lies roughly in the curve for / = 2. Importantly, this curve ( / ~ 2) generalizes the 80/20 rule. Interestingly, this result implies that the statistical state for the 80/20 rule (i.e., the state where the rule holds) corresponds to a distribution with a standard deviation that is twice the mean. This result also indicates that human society lies in such a state. Deducing inversely from the empirical Pareto's 80/20 rule, we find that our society and the nature are highly dispersive. Incidentally, it should be noted that the region above 100% in the effect for the corresponding curve plotted in Fig. 2 is not a mathematical artifact but appears in reality; data for example shows that the vital component of the customers can often provide over 100% of the total profit to a company, in conjunction with the existence of negative contributors to be discussed in the following. Figure 3 shows the Gaussian distribution for / = 2, representing the state of Pareto's 80/20 rule. As seen in Fig. 3(b), our result might also suggest the existence of implicit negative factors (i.e., some causes can provide less-than-zero contribution). This phenomenon is actually observed in the society, for example, in the form that some customers rather bring financial losses to companies. In other words, our result presented as Fig. 3 provides a quantitative reasoning of the existence of such negative contributors. Figure 4 plots I cause and I effect for the Gaussian distribution for / = 2, which represents the state for Pareto's 80/20 rule, depending on X . From these I cause and I effect curves, what we understand is that, similar to the 80/20 rule, we can also derive a 25/5 rule, i.e., we can deduce that 20% of the effects are caused by 5% of the causes in a Gaussian distribution. We can also further create new other rules in similar manners. We thus recognize from the plot that the 80/20 rule can also be read as the "25/5 rule" ( X = 1.7 ), "45/10 rule" (1.3 ), "60/15 rule" (1.1 ), "90/25 rule" (0.67 ), and so forth. As touched in the Introduction section, the Pareto distribution (
Type I ) f P ( x ) is described as min 1 P f x x x (6), where (> 0) is a shape parameter called the Pareto index , and x min (> 0) is the minimum possible value of x . Incidentally, this Pareto distribution is plainly not realistic in many fields, e.g., implying that nobody has an income less than x min . Note that min Px f x dx (7). Then I cause , or the probability that x is greater than some value A (> x min ) is, for f P ( x ) is: min min PAcause PAPx f x dx xI f x dx Af x dx (8), and I effect is: min PAeffect Px xf x dx xI Axf x dx (9). We assumed > 1. Otherwise, I effect = 1. Solving Eqs. 8 and 9 for when I cause = 0.2 and I effect = 0.8, we obtain = log = log
5 based on the Pareto distribution. However, in the case of this power-law logic, an iterated 80/20 rule necessarily comes along as follows. From Eqs. 8 and 9, log log 5 11 log 0.8log log 5 log 0.2 effectcause II (10). Therefore, when I cause = 0.2 n , I effect = 0.8 n . The number n of iterations needs not be an integer. It thus necessarily follows the "64/4 rule" ( n = 2), "51.2/0.8 rule" ( n = 3), "40.96/0.16 rule" ( n = 4), and so forth. For comparison, Fig. 5 shows the relationship between I cause and I effect for this iterated 80/20 rule based on the Pareto distribution, plotted along with the curve for the Gaussian distribution for / = 2. The I effect - I cause relation based on the Pareto distribution, particularly in the low-fraction region, seems too drastic for the real world, e.g., a half of the total wealth may not be occupied by < 1% of the people. In contrast, our series of I effect -to- I cause ratio (25/5, 45/10, 60/15, etc.) based on the Gaussian distribution may sound more realistic in many cases. Conclusions
In this short note, we have examined the empirical Pareto's 80/20 rule from the perspective of the normal or Gaussian probability distribution. We found that the 80/20 rule represents the case when the standard deviation is twice the mean in the Gaussian distribution. This result implies a high diversity of characteristics in society and nature. Our result might also suggest the existence of implicit negative factors.
Acknowledgements
We thank an anonymous reviewer for her/his essential insight on the power law.
References W. J. Reed, "The Pareto, Zipf and other power laws,"
Econ. Lett. , 15–19 (2001). W. J. Reed, "The Pareto law of incomes – An explanation and an extension,"
Physica A , 469–486 (2003). L. Wilkinson, "Statistical computing and graphics: Revisiting the Pareto chart,"
Am. Stat. , 332–334 (2006). A. Dragulescu and V. M. Yakovenko, "Statistical mechanics of money,"
Eur. Phys. J. B , 723–729 (2000). A. Dragulescu and V. M. Yakovenko, "Exponential and power-law probability distributions of wealth and income in the United Kingdom and the United States,"
Physica A , 213–221 (2001). A. Chatterjee, B. K. Chakrabarti, and S. S. Manna, "Pareto law in a kinetic model of market with random saving propensity,"
Physica A , 155–163 (2004). E. Brynjolfsson, Y. Hu, and D. Simester, "Goodbye Pareto principle, hello long tail: The effect of search costs on the concentration of product sales,"
Manag. Sci. , 1373–1386 (2011). B. Schwanhäusser, D. Busse, N. Li, G. Dittmar, J. Schuchhardt, J. Wolf, W. Chen, and M. Selbach, "Global quantification of mammalian gene expression control,"
Nature , 337–342 (2011). M. Herrero and D. C. Stuckey, "Bioaugmentation and its application in wastewater treatment: A review,"
Chemosphere , 119–128 (2015). R. C. Craft and C. Leake, "The Pareto principle in organizational decision making,"
Manag. Dec. , 729–733 (2002). A. Grosfeld-Nir, B. Ronen, and N. Kozlovsky, "The Pareto managerial principle: When does it apply?,"
Int. J. Prod. Res. , 2317–2325 (2007). K. T. Rosen and M. Resnick, "The size distribution of cities: An examination of the Pareto law and primacy,"
J. Urban Econ. , 165–186 (1980). P. Barber, A. Graves, M. Hall, D. Sheath, and C. Tomkins, "Quality failure costs in civil engineering projects,"
Int. J. Qual. Reliab. Manag. , 479–492 (2000). M. E. J. Newman, "The Pareto, Zipf and other power laws,"
Contemp. Phys. , 323–351 (2005). S. Lipovetsky, "Pareto 80/20 law: Derivation via random partitioning,"
Int. J. Math. Edu. Sci. Technol. , 271–277 (2009). M. Hardy, "Pareto's law,"
Math. Intellig. , 38–43 (2010). R. Koch,
The 80/20 manager: The secret to working less and achieving more , Little, Brown and Company, New York (2013). Figure Captions
Fig. 1.
Example of Gaussian distribution (a) f ( x ) and (b) x f( x ) along with its corresponding I cause and I effect regions for = 1, = 2. Fig. 2.
Relationship between I cause and I effect under various X and / values. The curve for / = 2 contains the point ( I cause = 0.2, I effect = 0.8), which represents the state for Pareto's 80/20 rule. Fig. 3.
Gaussian distribution with / = 2, representing the state for Pareto's 80/20 rule. (For this figure we set = 2, = 1, but individual values do not matter, for the shape is the same as long as / is identical.) Fig. 4. I cause and I effect against X for the Gaussian distribution of / = 2, which represents the state for Pareto's 80/20 rule. Fig. 5.
Relationship between I cause and I effect for the Gaussian distribution ( / = 2) and the Pareto distribution. Fig. 1. -3 -2 -1 0 1 2 3 4 5 6 70.00.10.20.30.4 f ( x ) x +X I cause a) -3 -2 -1 0 1 2 3 4 5 6 70.00.20.40.60.81.0 x f ( x ) x I effect +X b) Fig. 2. I e ff ec t I cause / = 0.01 Fig. 3. -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 90.000.050.100.150.200.25 f ( x ) x + I cause = a) -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9-0.2-0.10.00.10.20.30.4 x f ( x ) x + I effect =80% b) Fig. 4. I cause I effect I X / Fig. 5.