11 Perceptive Statistical Variability Indicators Kalman Ziha
University of Zagreb,
Faculty of Mechanical Engineering and Naval Architecture, Department of Naval Architecture and Ocean Engineering, Ivana Lucica 5, 10000 Zagreb, Croatia E-mail: [email protected]
Tel: +385 1 6168132 Fax: +385 1 61656940 Abstract:
The concepts of variability and uncertainty, both epistemic and alleatory, came from experience and coexist with different connotations. Therefore this article attempts to express their relation by analytic means firstly setting sights on their differences and then on their common characteristics. Inspired with the alternative expression of uncertainty defined as the average number of equally probable events based on entropy concept in probability theory, the article introduced two related perceptive statistical measures which indicate the same variability and invariability as the basic probability distribution. First is the equivalent number of a hypothetical distribution with one sure and all the other impossible outcomes which indicates variability. Second is the appropriate equivalent number of a hypothetically uniform distribution with all equal probabilities which indicates invariability. The article interprets the common properties of variability and uncertainty on theoretical distributions and on ocean-wide wind wave directional properties compiled in the Global Wave Statistics.
Key words: variability; uncertainty; predictability, equivalent numbers, average number
The variability assessment of N discrete numbers y i ,
1, 2,..., i N = , is a problem of lasting interest in statistics and elsewhere (e.g. Cramer, 1945, Kenney and Keeping, 1951, Anderson and Bancroft, 1952, Hogg and Craig, 1965). A set of N normalized numbers i p ,
1, 2,..., i N = can represent a discrete distribution of probabilities P (1) in the range ( p Min – p Max ) as shown: ( ) , , ,
N N p p p = ⋅⋅ ⋅ P (1) The disjoined random events j E with probabilities of events )( ii Epp = ,
1, 2, , i N = ⋅ ⋅ ⋅ configure a system S N that can be written down after Khinchin (1957) in a form of an N -element finite scheme: ( ) ( ) ( ) ( ) j NN j j N N E E E Ep p E p p E p p E p p E ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎞⎛= ⎟⎜⎜ ⎟= = ⋅ ⋅ ⋅ = ⋅⋅ ⋅ =⎝ ⎠ S (2) The probability of a distribution of N probabilities P N (1) or of a system of N events S N (2) is then in general ( ) ( ) 1 NN N ii p or p p = = ≤ ∑ P S . For a complete distribution is ( ) ( ) 1
N N p or p = P S . For generally partial distributions P N (1) the mean value of N probabilities is: ( ) (1/ ) ( ) N N p p N p = = ⋅
P P (3) The average variance V ( P N ) of N probabilities p of a distribution P N (1) reads: ( ) ( ) (1 / ) ( ) (1 / ) N NN N i ii i
V N p p N p p σ = = = = ⋅ − = ⋅ − ∑ ∑ P P (4) The article considers next a proposal for a reference value of variance (4) in order to describe a probability distribution when one probability ( ) j N p p → P is dominant and all the others 0 → ≠ ji p for 1, 2, 1 i N = ⋅ ⋅⋅ − are vanishing. The reference value of variance can be then calculated by definition as: ( ) lim ( ) ( ) ( 1) / ji j N N Np pp
V V p N N ≠ →→ = = ⋅ − P P P P (5) Appendix A presents the proof that the reference variance (5) is the maximally attainable value. The coefficient of variation of N probabilities of a distribution of probabilities P N (1) it reads: ( ) ( ) / N N
CV p σ = P P (6) The coefficient of variation of N probabilities (6) has continuity in its arguments, monotonic increase with the number of outcomes up to limiting value Ref ( ) 1
CV N = − P and a composition rule based on additivity rule of variances. Impossible occurrences with zero probability do not influence affect variability (6) but the incompleteness of distributions p N ( P )<1 affects variability (4). The coefficient of variation of N probabilities (6) of a distribution of probabilities P N (1) can be appropriately presented also by its relative value ( ) ( ) / 1 N N cv CV N = −
P P . For a range of discrete random variables n=
1, 2, 5, 10 and 50 and for a range of probabilities p from 0 to 1, the variability of binomial distributions is presented by the coefficient of variation (6), (Fig. 1). The concept of entropy has been introduced earlier in information theory for assessment of the amount of information pertinent to system of events S N (2) (Hartley, 1928, Shannon and Wiever, 1948). The entropy has been lately applied in probability theory to define the uncertainties of systems of events S N (2) (Khinchin, 1957, Renyi, 1970, Aczel and Daroczy, 1975). Entropy defines the uncertainty of the observable properties that turns into information when observations become available. The entropy of a single event is defined as the logarithm of the equivalent number of events 1 / ( ) i p E with equal probability [ ] ( ) log 1 / ( ) i i E p E μ = and can be interpreted according to Wiener (1948) either as a measure of the information yielded by the event E i or how unexpected the event was. The uncertainty of a complete system S N (2) of N events can be expressed as the weighted sum of unexpectednesses of all events by the Shannon’s entropy (Shannon and Weaver, 1949): ( ) log N NN j j j jj j
H p p p μ = = = ⋅ = − ∑ ∑ S (7) According to Cover and Thomas (2006) the Shannon's information entropy H has a number of natural properties for notions of uncertainty: continuity in its arguments, monotonic increase with the number of equi-probable outcomes and a composition rule. The uncertainty of an incomplete system of N events S N (2) can be defined as the limiting case of the Renyi’s entropy (1970) of order 1 using (3) and (7) as:
1( ) log( )
NRN j jj
H p pp = = − ∑ S S (8) Shannon's axiomatic derivation of entropy explains (1949) why it is the intuitive measure of uncertainty. In addition, the uniqueness theorem by Khinchin (1957) proves that the entropy is the only function that measures the probabilistic uncertainty of systems of events in agreement with the experience of uncertainty. The theorem of mixture of distributions (Khinchin, 1957; Renyi, 1970) provides the conditional (average) entropy of system S with respect t subsystems of events. The uncertainty of systems of events S (2) for binomial distributions for a range of probabilities p from 0 to 1 and for numbers of variables n=
1, 2, 5, 10 , and is presented by the entropy (7) (Fig. 1). Aczel and Daroczy mentioned earlier (1975) the average number of equally probable events that follows from the condition of maximal uncertainty ( ) log ( ) N N
H F = S S as an uncertainty indicator: ( ) ( ) 2 N HN F = S S (9) The average number of events ( ) N F S (9) in the range NF N ≤≤ )(1 S is not any more dependent on the base of applied logarithm. It defines a hypothetically uniform probability distribution of F N equal probabilities amounting to 1/ F N with same uncertainty as the entropy ( ) N H S of the system S (Fig. 2). Relative measures such as ( ) ( ) / log N N h H N = S S (7, 8) or (9) ( ) ( ) /
N N f F N = S S can be appropriate. The concept of average numbers of events based on entropy (9) has inspired the investigation of equivalent numbers of outcomes based on statistical variability of probability distributions (3-6). The article firstly concentrates on variability of probability distributions. Therefore it defines the equivalent number G N ( P ) of a hypothetical probability distribution with one sure and remaining G N ( P )-1 impossible outcomes which provides the same variability as the basic distribution P of N probabilities (Fig. 2). The equivalent number G N ( P ) follows directly from the condition that the coefficient of variation CV N ( P ) (6) is equal to its maximal value ( ) 1 N G − P , as it is shown below: ( ) ( ) 1 N N
G CV = +
P P (10) In the context of this research the square ( ) N CV P of the coefficient of variation (6) of any probability distribution P (1) or a system of events S (2) in (10) represents the number of impossible outcomes or events in addition to a single sure outcome or certain event that provide the same variability or certainty as the original probability distribution of N outcomes or the system of N events. The article next focuses on invariability of probability distributions. The equivalent number D N ( P ) of equal probabilities in amount of 1/ D N ( P ) of a hypothetically complete probability distribution expresses the invariability when the variance (4) of a distribution P (1) of N probabilities is equal to zero [ ] NN i Ni
D p D = ⋅ − = ∑ P P (Fig. 2), as it is shown below: ( ) ( ) 1/ / ( ) ( ) 1
NN i N Ni
D p N p CV = ⎡ ⎤= = ⋅ +⎣ ⎦ ∑ P P P (11) In the context of this research the sum of squares of probabilities N ii p = ∑ in (4) of any probability distribution P (1) or a system of events S (2) in (11) represents the mean probability of a hypothetically uniform probability distribution or a system of events of D N ( P ) equally probable outcomes or events. The following relation links the two equivalent numbers for generally incomplete distributions with known probability p N ( P ) of N possible outcomes: ( ) ( ) / ( ) N N N
D G N p ⋅ =
P P P (12) The relation (12) in logarithmic form expresses uncertainties as shown: log ( ) log ( ) log 2 ( )
N N N
D G N logp + = −
P P P (13) The terms (9), (10) and (11) imply the generalization of number of events other than integers. The increasing number of impossible outcomes in the range N G N ≤ − ≤ − P (10) with respect to one sure outcome based on CV N ( P ) (6) indicates increasing variability and rise of predictability. On the other hand, the increasing value of equivalent number of equally probable outcomes (11) in the range 1 ( ) N D N ≤ ≤ P (11) indicates lessening variability and a drop of predictability (Table 1). Simultaneously, the average number of equally probable events (9) in the range 1 ( ) N F N ≤ ≤ S based on entropy (7, 8) represents rise in uncertainty and as such indicates drop of predictability (Table 1). Thus, the equivalent numbers of outcomes ( ) N D P (11) and the average numbers of events ( ) N F P (9) go well together in expressing the invariability-uncertainty-unpredictability (Table 1). The increasing equivalent probability in the range ( ) N N D ≤ ≤ P based on statistical invariability (11) expressing the growth of invariability and the average probability 1/ 1/ ( ) 1 N N F ≤ ≤ S based on probabilistic entropy (7, 8) representing the drop of uncertainty, indicate rise of predictability. Hence, he equivalent numbers of events ( ) N G P (9) go well together with equivalent ( ) N D P and average 1 / ( ) N F S probabilities in expressing the variability-certainty-predictability (Table 1). The difference between analytical definitions of variability (10) and uncertainty (9) is in perception of impossible or certain events. The uncertainty (9) vanishes whenever there is at least one certain event regardless of the number of impossible events. However, when any one event is sure, the variability perception (10) depends on the number of remaining impossible events since the mean value of the distribution of probabilities changes with overall number of outcomes. Statistical properties (3-6) of distributions of probabilities P N (1), the entropy (7, 8) of systems of events S N (2) or the average and equivalent numbers (9, 10, 11) do not depend on sequences of probabilities. Variability and uncertainty are commonly considered as objective properties since they depend on nothing else but on the probability distributions. Therefore the intrinsic predictability based on statistical variability can be considered as an objective property too. However, probabilistic forecasts can be performed with conditional distributions employing posterior distributions. The common posterior verification method for probabilistic forecasts is the Brier score (1950) proposed as the average deviation between predicted probabilities for a set of events and their outcomes. Relative measures of predictability can be also important given that a Bayesian viewpoint of prediction is a useful one. In that context the relative entropy is useful and worth considering, e.g. Kleeman (2002), Roulston&Smith (2002) and Bröcker (2009). This of course involves both prior and posterior distributions. Visual observations of wind speeds (Beaufort Scale) and directions and wave heights have been reported since 1854; observations of commercial ships have been archived since 1861; wave height, period and directions have been reported from ships in normal service all over the world since 1949. The observations are systematically collected following the non-instrumental methodology prescribed since 1961 by the resolution of the World Meteorological Organization (WMO) in order to assure that the data are globally homogeneous in quality and covering most sea areas of practical interest for shipping, navigation, towing and offshore activities. The compilation of these observations for each of the N A =104 Marsden’s squares (Fig. 4) is the Global Wave Statistics (GWS) prepared by Hogben, Dacunha & Olliver (1986) that uses the past experiences to eliminate biases. The study in the sequel investigates the variability of wave directions based on annual wind wave climate observation reported in the GWS (1986) by probability distributions in 104 ocean areas A (Fig. 4 and 6) as ( ) (d , d , d , d , d , d , d , d ) N NE E SE S SW W NW A = P of N =8 principal wave directions. For wind wave climate directional observations circular statistics is appropriate, e.g. Fisher (1993). However, the variability of probabilities of wind wave directions is not necessarily of circular character and linear descriptive statistics can be applied. The article firstly graphically presents the equivalent numbers of directions D [ P (A) ] (11), the average number of events F [ P (A) ] (9) with respect to the nominal number of directions N =8 for all ocean areas (Fig. 3). The two numbers indicate same ordering based on variability and uncertainty considerations. The same graph also presents the relative uncertainty h [ P (A) ] (8),and the statistical variability of probabilities of wave directions cv [ P (A) ] (6) which indicate opposite ordering (Fig. 3). The article next presents the chart of the relative probabilistic statistical variability cv [ P (A) ] (6) (Fig. 4) and the chart of equivalent numbers of events D [ P (A) ] (11) (Fig. 5) of wind wave directions. There follows few comments. The wave directions are highly predictable in some areas in the eastern Pacific Ocean such as A64 given by distribution P ( A64)=(0.0042 0.0098 0.1151 0.6081 0.2110 0.0234 0.0049 0.0033) where cv (A64) ∼ ] (Fig. 4). The three directions (east, south-east and south) prevail with about 90% (Fig. 6). The appropriate equivalent number of wave directions D (A64)=2.3 (Table 2 and Fig. 5) indicates invariability of directions equivalent to probability distributions with numbers of equally probable outcomes between two P (A64)=(1/2 1/2) and three P (A64)=(1/3 1/3 1/3), closer to two than to three. The average number of wave directions F (A64)=3 (Table 2) indicates uncertainty appropriate to probability distributions with three equally probable outcomes. The appropriate equivalent number of wave directions G (A64)=3.6 (Table 2 and Fig. 6) indicates variability equivalent to a probability distributions with number of outcomes from which one is sure and two P (A64)=(1 0 0) or three P (A64)=(1 0 0 0) are impossible. Similar is the situation in some areas of western Atlantic Ocean such as A66, A67 and A68. In some ocean areas the wave directions are unpredictable since the directions are almost uniformly distributed. For example in South Pacific area A86 the probability distribution of wave directions is P (A86)=(0.1192 0.0941 0.1157 0.1125 0.1299 0.1370 0.1489 0.1152) and the relative coefficient of variation is only cv (A86)=4.75%. There, the equivalent number of wave directions D (A86)=8.3 and the average number of wave directions F (A86)=8.2 (Table 2 and Fig. 7) even exceeds the nominal value of N =8 due to incompleteness of observations. This indicates that the equivalent probability distribution is P (A86)=(1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8). The appropriate equivalent number of wave directions G (A86)=1.01 (Table 2 and Fig. 7) indicates almost maximal variability, that is almost full certainty, when one outcome is sure and another is close to be impossible G (A86)=(1 0). Similar is the situation in North Atlantic area A1 where cv (A1)=4.75%. Variability and uncertainty are recognizable as two opposite properties which naturally motivate conscious observers of random phenomena for predictions. Therefore the article advocated two particular functionals using statistical variability defined on probability sets to bring these two properties closer to common experience of randomness. The two types of equivalent numbers of outcomes or events were introduced with the aim to represent common indicators of invariability and certainty as well as variability and uncertainty, other than just statistical variance and probabilistic entropy. The properties of proposed indicators were premeditated to match human comprehension of random phenomena closely to everyone’s gambling perception of hazardous games. For example, it is intuitively perceptive that the flipping of a balanced coin is as predictable as 2 events with probabilities 1/2 and tossing of an unbiased dice as 6 events with probabilities 1/6. The equivalent and average numbers imply the analytical generalization of numbers of outcomes of probability distributions or of numbers of events of systems of events for perceptive presentation o variability and uncertainty other than only integers.
References
Aczel, J. and Daroczy, Z. 1975: On Measures of Information and Their Characterization, Acad. Press, NY. Anderson, R.L. and Bancroft, T.A. 1952: Statistical Theory in Research, McGraw-Hill, NY. Brier, G. W. 1950: Verification of forecasts expressed in terms of probability, Monthly Weather Review, 75, 1-3. Broecker, J. 2009:. Reliability, suffciency and the decomposition of proper scores. Q. J. R. Meteorol. Soc., 135. Cover, A. and J. A. Thomas, J.A. 2006: Elements of Information Theory. Wiley-Interscience, Hoboken, 2 edition. Cramer, H. 1945: Mathematical Methods of Statistics, Princeton University Press. Fisher, NI. 1993: Statistical Analysis of Circular Data, Cambridge University Press. Hartley, R. V. 1928: Transmission of Information, Bell Systems Tech. J, . Hogben, N., Dacunha N. M.C., Olliver G. F. 1986: Global Wave Statistics, British Maritime Technology Feltham. Hogg R.V., Craig, A.T. 1965: Introduction to Mathematical Statistics, The MacMillan Company, NY. Kenney, J.F., Keeping, E.S. 1951: The Mathematics of Statistics I&II, D. Van Nostrand Inc., London. Khinchin, A. I. 1957: Mathematical Foundations of Information Theory, Dover Publications, NY. Kleeman, R. 2002: Measuring dynamical prediction utility using relative entropy. J. Atmos. Sci., 59(13): 2057-2072. Renyi, A. 1970: (tu fali space) Probability Theory, North-Holland, Amsterdam. Roulston, M.S. and Smith, L.A. 2002: Evaluating probabilistic forecasts using information theory. Mon. Weather Rev., 130(6):1653-1660, 2002. Shannon, C. E. and Weaver, W. 1949: The Mathematical Theory of Communication, Urbana Univ., Illinois. Wiener, N. 1948: Cybernetics or Control and Communication, Bell System Tech. J, . Appendix A
Let’s consider a complete probability distribution of N probabilities N ii p = = ∑ . From the Jensen inequality directly follows the lower limit of the sum of squares of a probability distribution N ii pN = ≤ ∑ . According to common reasoning for any 1 i p < is i i p p < , and therefore the upper limit is N ii p = < ∑ . Since only the unity has the property = , the sum of squares attains its maximal value in amount of N ii p = = ∑ only if any of the probabilities is equal to unity 1 i p = and all the other ( N -1) probabilities are zero 0 j i p ≠ = . 0 Table 1.
Comparative properties of (in)variability and (un)certainty related to predictability
Variability (Fig. 1) Coefficient of variation of probabilities (6) ( ) / ( ) 1
NN N ii
CV N p p = ⎡ ⎤= ⋅ −⎣ ⎦ ∑ P P
Min: 0 – Invariable (All N outcomes equally probable)
Max; 1 N − – Maximal variability (One sure outcome all N-1 others impossible) Unit: 1– (One sure outcome, one impossible N=2)
Uncertainty (Fig. 1) The entropy of system of events (8) [ ] ( ) 1/ ( ) log NN i ii
H p p p = = − ∑ S S
Min: 0 – Certain (One certain event, N-1 others impossible)
Max; log N - Full uncertainty (All N events equally probable) Unit: 1 bit (Two equally probable events
Invariability (Fig. 2) Equivalent number of outcomes (11) ( ) 1/ / ( ) ( )
NN i N Ni
D p N p G = ⎡ ⎤= = ⋅⎣ ⎦ ∑ P P P
Range: 1 ( ) N D N ≤ ≤ P where ( ) ( ) 1 N N
G CV = +
P P
Min: 1: Fully predictable/Certain
One sure outcome, another impossible(N=2) CV=1 One sure outcome, N-1 impossible CV N = −
Max: N – Unpredictable/Uncertain CV =0 (All N outcomes are equally probable) Ref: 2: (Two equally probable events N=2)
Certainty (Fig. 2) Average number of events (9) log ( ) ( ) 2 2 N i ii N p p HN F = − ∑= = S S Range: N F N ≤ ≤ S Min: 1: Certain/Fully predictable
One certain event, one impossible(N=2) H= One certain event, N-1 impossible H= N –Uncertain/Unpredictable H =log N (All N events are equally probable) Ref: 2: (Two equally probable events N=2)
Table 2 . Variability and uncertainty of wave directions in GWS areas A86 and A64 PP p ( P ) p mean (3) H ( P ) (7) F ( P ) (9) CV ( P )(6) cv ( P ) D ( P ) (11) G ( P ) (10) Range ≤
1 0.1250 0-3 1-8 0- 0-1 1-8 1-8 P
A86
A64 *The considered wave direction observations represent incomplete distributions C V N ( B ) , E n t r op y H N ( B ) b i t s p Binomial distribution variability and uncertainty
CV n= 5Entropy n=50Entropy n=10Entropy n=5CV n=10 CV n=50Entropy n=2CV n= 1
B(n,p)
Entropy n=1CV n= 2
Figure 1.
Coefficient of variation and entropy of probabilities of Binomial distribution B ( n , p ) 1 D N ( B ) F N ( B ) pBinomial distribution invariability and uncertainty n=50n=10n=5n=2n=1--------- Average number of events F N [ B(n,p) ] - (Certainty) ___ Equivalent number of events D N [B(n,p)] - (Invariability) B(n,p)
Figure 2 . Equivalent G (10), D (11) and average F (9) numbers for Binomial distribution B ( n , p ) D , F , H , . - C V GWS areas(In)variability and (un)certainty of principal wave direction of Global Wave Statistics T The northern GWS areas A1-A30 The equatorial GWS areas A31-A80 T T T ___ Equivalent number D Maximum is 8.3 in A86 South PacificMinimum is 2.4 in A64 East Pacific T Maximum number of wave directions = 8 The southertn GWS areas --- -
Average number of wave directions F Maximum is 8.2 in A86 South Pacific OceanMinimum is 3.0 in A64 East Pacific Ocean _ _ _ _
Entropy of wave directions H Max 3.0 Min 1.6 bits ___
Invariability 2.64 - CV N Max 2.5 Min 1.0 _ _ _
Equivalent number G Figure 3 . (In)variability and (ucertainty of wind wave directions in GWS
Figure 4 . Chart of relative statistical probabilistic variability (6) presenting inherent predictability based on probability dosstributions og wind wave directions using prior observations compiled in GWS in % / /18 /11 / /5 /21 /22 / /46 /55 /39 / / / / / / /22 / / 20 /17 /
30 20 12 /44 / /13 /55 /52 / / 49 / 47 /59 / / / / / / / / / /33 /26 /10 /39 / 57 / 58 / 30 /56 /9 /14 /21 /26 /10 /12 /8 /14 /26 / 44 /42 /22 / /11 /56 /22 /18 /20 /17 / 26 / / / /25 /18 /18 / / /29 / /18 /20 / / / /34 / / / / / / / / / / / / / / / 27 / / /12 / 30 Below 10%
Above 50%
Area / spp%
Global Wave Statistics (GWS) Hogben, Dacunha and Olliver (1986)
Variability/Certainty/Predictability of wind wave directions svp (D) % Chart of relative statistical probabilistic variability cv (D) of wind wave directions in % di ti Figure 5.
Chart of equivalent numbers of equally probable wave directions (11) observed in GWS on annual basis with respect to eight principal directions
Wave directions
A64
H=1.6F=3.0 CV=1.6cv=60%D=2.3Circular mean o Variance=0.167
Wave directions
A86
H=3.03F=8.16 CV=0.13cv=4.8%D=8.3Circular mean o Variance
Figure 6.
Circular distribution of wave directions
Figure 7 . Circular distribution of wave directions in ocean area A86 in Nortn Atlantic Ocean in ocean area A64 in East Pacific Ocean Above 7
Bellow 4
Global Wave Statistics (GWS) Hogben, Dacunha and Olliver (1986)
Equivalent numbers of wind wave directions D (P)on anual basis with respect to eight principal directions Invariability/Uncertaintyunpredictability of wind wave directions D8