aa r X i v : . [ s t a t . O T ] M a r What Does the “Mean” Really Mean? by Nozer D. SingpurwallaThe George Washington University, Washington, D.C., USA andBoya LaiThe City University of Hong Kong, Hong KongMarch 5, 2020
Abstract
The arithmetic average of a collection of observed values of a homogeneouscollection of quantities is often taken to be the most representative observa-tion. There are several arguments supporting this choice the moment of inertiabeing the most familiar. But what does this mean?In this note, we bring forth the Kolmogorov-Nagumo point of view that thearithmetic average is a special case of a sequence of functions of a special kind,the quadratic and the geometric means being some of the other cases. Themedian fails to belong to this class of functions. The Kolmogorov-Nagumointerpretation is the most defensible and the most definitive one for the arith-metic average, but its essence boils down to the fact that this average is merelyan abstraction which has meaning only within its mathematical set-up.
Keywords:
Chisini’s Equation, Kolmogorov-Nagumo Functions, WeightedMeans . 1
Background
The December 2017 issue of “
Significance ”, an ASA co-sponsored maga-zine, published an engaging article by Simon Raper titled, “The Shock of theMean”. A title like this may come as a surprise to today’s statisticians be-cause most are not shocked when they encounter a mean, taken here to be anarithmetic average. According to Raper, the 18-th century shock had to dowith how the mean was used, and what it meant. It had little to do with themathematical underpinnings of the mean because these became transparentonly during the 1930’s. The purpose of this article is to articulate on theseunderpinnings which go beyond the usual explanations, like the mean is amoment of inertia. The mean continues to be an abstraction with an inter-pretation only within its mathematical framework; it may therefore continueto shock many a modern statistician who has wholeheartedly embraced it.But first some words about the merits of Raper’s article.Fundamentally, Raper’s article is of expository value. It gives a fascinatingdiscourse on the notion of an arithmetic mean by tracing its historic roots,providing anecdotal stories connected with its appearance and its acceptance,and its evolution as a commonly used methodological device in the economic,the engineering, the medical, the and the social sciences. Of note on p. 15,is a timeline of the mean starting from 426 BC until 1810, when Laplacepublished his central limit theorem. The material to be given in this entryhas a timeline subsequent to 1810, and its focus is more on the analytics ofthe mean.At about the same time as the appearance of the Raper article, the authorsof this entry were looking at Shannon’s formulas for entropy and information.A central feature of these formulas is a weighted average of the “informationgained” in every realization of a random variable. In the course of appre-ciating the essence of this operation, the authors encountered two papers, ne by George Barnard (1951) and the other by Alfred Renyi (1961). Bothpapers questioned Shannon’s rationale for choosing the weighted average,something which seemed like a natural thing to do. In this context, Renyialso mentioned (without any reference) the Kolmogorov-Nagumo class offunctions, of which the sample average turns out to be a special case. Indeed,Renyi used this class of functions (involving improper random variables), topropose his measure of information. All of this seemed intriguing, and onpursuing the matter further, it became clear that outside the community offunctional analysts, little has been said about this class of functions, a specialcase of which is a statistician’s most basic tool. Even Stigler’s (2016) master-piece
The Seven Pillars of Statistical Wisdom does not seem to makenote of this foundation on which one of his pillars rests. In what follows, wehighlight the mathematical essence of the mean which has a history datingback to the times of Cauchy. Antecedents to the Kolmogorov-Nagumo Functions
The earliest reference to the mathematical notion of a mean , is that it is a class of functions , say M, of n measurements x , · · · , x n , on a homo-geneous collection of n quantities satisfying a certain condition. It is due toCauchy (1821). All that Cauchy required is that M ( x , · · · , x n ), be boundedby the smallest and the largest values of x , · · · , x n , as: min { x , · · · , x n } ≤ M { x , · · · , x n } ≤ max { x , · · · , x n } Whereas Cauchy does not give an interpretive meaning to the function M,he initiated a pathway for much that followed, leading up to the definitiveworks of Kolmogorov (1930) and of Nagumo (1930). However, the notionthat the value taken by certain members of the class of functions M maybe seen as a representative measurement of the measurements x , · · · , x n , is scribed to Chisini (1929); see Marichal (2000). Chisini was a distinguishedItalian geometer, who was de Finetti’s teacher at the University of Milan.Subsequent to Cauchy (1821), but prior to Chisini (1929), is an exhaus-tive paper, with discussion, by John Venn (1891) titled “ On the Natureand Uses of Averages ”, that he read before the Royal Statistical So-ciety. Whereas Cauchy’s perspective is analytical, Venn’s has more to dowith applications of the arithmetic average. Specifically, Venn raises sev-eral questions related to the average. He asks: “Why resort to averages atall”? “What do we gain and lose respectively, by doing so”? What differentkinds of averages are there, and how and why does one such kind becomemore appropriate than another”? Venn, via a footnote, also states that amathematical justification of almost every kind of average can be found inEdgeworth’s paper in the Cambridge Philosophical Transactions. Whereaswe have not been successful in accessing Edgeworth’s paper, it appears thatChisini, if not Bonferroni (1927), may have come close to answering many ofthe questions raised by Venn.Per Chisini, a representative value of x , · · · , x n , with respect to thefunction M, is a number µ such that if each of the x i ’s are replaced by µ , thevalue of the function M is unchanged. That is: M ( µ, · · · , µ ) = M ( x , · · · , x n )this is known as Chisini’s Equation .When the function M is the sum of its arguments, the solution to theChisini Equation is the arithmetic mean, known to statisticians as the sam-ple mean . Similarly, when M is the product (the sum of squares) [the sumof inverses] the sum of exponentials, then the solution to this equation is thegeometric mean (the quadratic mean) [the harmonic mean] the exponentialmean.As an illustration, suppose that M is additive, so that M ( µ, · · · , µ ) = µ = M ( P x i ); then µ = Σ x i /n , the arithmetic mean . Similarly, if thefunction M connotes a product, so that M ( µ, · · · , µ ) = µ n = M ( Q ni =1 x i ),then µ is the geometric mean . With M as the sum of squares, µ = q n P x i , is the quadratic mean , whereas with M as the sum of inverses, µ = n P xi , the harmonic mean .To summarize, the commonly used measures of representative values, re-ferred to as measures of central tendency, are effectively, solutions to Chisini’sequation. Preceding Chisini, is the work of Bonferroni (1924), who afterCauchy may have set the stage for that which is to follow [cf. Muliere andParmigiani (1993)].The story would end here with a statement about the solution to Chisini’sequation, except for a caveat. This has to do with the fact that a solution tothe equation, assuming it exists, may not satisfy Cauchy’s inequality; this facthas been pointed out by de Finetti (1931). Indeed, de Finetti’s motivationin writing this paper was more ambitious. He wanted to extend Chisini’sdefinition of the mean of a collection of measurements, to that of the mean ofa collection of functions, particularly, probability distribution functions [seeCifarelli and Regazzini (1996)]. More important, de Finetti was endeavouringto connect the notion of the mean with the notion of a certainty equivalent in decision and utility theory [see Muliere and Parmigiani (1993)].Recognizing that the notion of a mean should be more than a func-tion M which merely satisfies Cauchy’s condition, or which is a solutionto Chisini’s equation, Kolmogorov and Nagumo, independently and simulta-neously, proved a fundamental theorem about mean values. The Kolmogorov-Nagumo Theorem on Means.
Kolmogorov (1930), and Nagumo (1930), henceforth K-N, respectively, pro-pose a definition of a mean in terms of a sequence of a family of functions, andprovide a theorem to operationalize them. Specifically, the mean is an infinitesequence of functions M ( x ) , M ( x , x ) , M ( x , x , x ) , · · · , M n ( x , x , · · · , x n ),each M n being continuous, increasing, and symmetric, and with the propertythat M n ( x, x, · · · , x ) = x , for all x, and all n; a reflexive law. Furthermore,the terms of this sequence are related by an associative law of the followingnature: M k ( x , x , · · · , x k ) = x ⇒ M n ( x , · · · , x k , x k +1 , · · · , x n ) = M n ( x, · · · , x, x k +1 , · · · , x n ) , for every integer k ≤ n .The striking theorem of K-N [cf. Aczel (1948)], is that under the abovenecessary and sufficient conditions on the above sequence of function (alsoknown as the Kolmogorov-Nagumo funtions), there exists a continuous andstrictly increasing function f by which the mean value M n ( x , · · · , x n ) canbe written as: M n ( x , x , · · · , x n ) = f − [ 1 n n X f ( x i )] , where f − ( x ) is the inverse of f ( x ).Different choices for f ( x ) yield different functional forms for the mean M n .For example, if f ( x ) = x , then M n ( x , x , · · · , x n ) = n P x i , the arithmeticmean. Similarly, if f ( x ) = x , then the mean is the quadratic mean. Thetable below gives a summary of some choices for f .The fact that the median of n measurements x , · · · , x n , does not belongas an entry in the table above, was remarked by de Finetti (1931). This isbecause the median does not obey the associative law. Thus per the K-Ncriteria, the median cannot be seen as a representative measurement of the n f ( x ) Mean M n ( x , · · · , x n )Qualifier ofMean x n P n i Arithmetic x q n P n i Quadraticlog x n pQ x i Geometric x n P xi Harmonic x α ( n P x αi ) α Power measurements. In the same 1931 paper, de Finetti, and also Kitagawa (1934)generalized the K-N result in the case of weighted observations. If for anyobservation x i there is associated a weight q i , with P q i = 1, then de Finettiand Kitagawa gave conditions for writing M n ( x , · · · , x n ; q , · · · , q n ) = f − [ X q i f ( x i )] , for n = 1 , , · · · . Weighted means are germane in contexts like Bayesiandecision making wherein taking expected utilities is a necessary step, andeach x i is associated with a utility.The only justification for taking expected values we know of is in decisiontheory which envolves choosing that decision which maximizes an expectedutility. Concluding Remarks.
The answer to the question “What Does the “Mean” Really Mean” posedin the title has gone from the very verbal and descriptive like “representa-tive measurement”, to the physical like “first moment”, to the mathematicaland abstract like “the Kolmogorov-Nagumo sequence of functions”. TheKolmogorov-Nagumo focused answer seems most definitive and final, though t suffers from the fact that neither Kolmogorov nor Nagumo say much, ifanything, as to what the function f should be. Rather, theirs is a statementabout the existence of f and about and exclusion, like the median. Precur-sors to the Kolmogorov-Nagumo work see the mean as merely a function perCauchy, or the solution to an equation per Chisini. Acknowledgements
Fabrizio Ruggeri was the cause of making the authors aware of the exhaustiveand thorough paper of Muliere and Parmigiani. Thanks Fabrizio. eferences [1] Acz´el, J. (1948) . “ On mean values.” Bulletin of the American Math-ematical Society, 54(4), 392-400.[2]
Barnard, G. A. (1951) . “The theory of information.” Journal of theRoyal Statistical Society. Series B (Methodological), 13(1), 46-64.[3]
Bonferroni, C. E. (1924) . ‘La media esponenziale in mathematicafinanziaria.” Annuario del Regio Instituto Superiore di Scienze Economichee Commecial de Bari AA 23-24, 1-14.[4]
Cauchy, A. L. B. (1821) . Cours d’analyse de l’ ´Ecole Royale Polytech-nique. Debure[5]
Chisini, O. (1929) . “Sul concetto di media.” Periodico di matematiche,4(9), 106-116.[6]
Cifarelli, D. M., & Regazzini, E. (1996) “De Finetti’s contributionto probability and statistics.” Statistical Science, 253-282.[7] de Finetti, B. (1931) . Sul concetto di media. Istituto italiano degliattuari.[8]
Kitagawa, T. (1934) . “On some class of weighted means.” Proceedingsof the Physico-Mathematical Society of Japan. 3rd Series, 16, 117-126.[9]
Kolmogorov, A. N., & Castelnuovo, G. (1930) . Sur la notion dela moyenne. G. Bardi, tip. della R. Accad. dei Lincei.[10]
Marichal, J. L. (2000) . “On an axiomatization of the quasi-arithmetic mean values without the symmetry axiom. ” aequationes math-ematicae, 59(1-2), 74-83. Nagumo, M. (1930) . “ ¨Uber eine klasse der mittelwerte.” Japanesejournal of mathematics: transactions and abstracts Vol. 7, pp. 71-79[12]
Pietro Muliere and Givanni Parmigiani (1993) . “Utility andMeans in the 1930’s.” Statistical Science, Vol. 8, No.4, pp. 421-432[13]
Raper, S. (2017) . “The shock of the mean. ” Significance, 14(6), 12-17[14]
R´enyi, A. (1961) . “On measures of entropy and information. ” InProceedings of the Fourth Berkeley Symposium on Mathematical Statisticsand Probability, Volume 1: Contributions to the Theory of Statistics. TheRegents of the University of California.[15]
Stigler, S. M. (2016) . The seven pillars of statistical wisdom. Har-vard University Press.[16]
Venn, J. (1891) . “On the nature and uses of averages.” Journal of theRoyal Statistical Society, 54(3), 429-456. “On the nature and uses of averages.” Journal of theRoyal Statistical Society, 54(3), 429-456