A Mathematically Sensible Explanation of the Concept of Statistical Population
aa r X i v : . [ s t a t . O T ] A p r A Mathematically Sensible Explanation of theConcept of Statistical Population
Yiping ChengBeijing Jiaotong University, China [email protected]
Abstract
In statistics education, the concept of population is widely felt hard to grasp, as a resultof vague explanations in textbooks. Some textbook authors therefore chose not to mention it.This paper offers a new explanation by proposing a new theoretical framework of populationand sampling, which aims to achieve high mathematical sensibleness. In the explanation, theterm population is given clear definition, and the relationship between simple random samplingand iid random variables are examined mathematically.
Keywords: statistical education; statistical population; random sampling; random samples
The theory of statistics is based on the theory of probability, and the first mathematical concept ofstatistics beyond pure probability is that of random samples . A closely related semi-mathematicalconcept, namely that of statistical population , is also popular among statistical practitioners. Theterm population is usually used instead of statistical population where no confusion would arise.However, despite its popularity in the statistical community, in textbooks the treatment of theconcept of population is quite varied, which reflects varied views of different authors on that concept.As a result of our study of 9 undergraduate-level textbooks in statistics: [1, 2, 3, 4, 5, 6, 7, 8, 9], wefound that they can be classified into two groups according to the way they introduce the conceptof random samples. Let us give the detail below.
This group consists of, in chronological order, [1, 2, 3, 4, 5]. One can check [1, page 145], [3, page246], and [5, page 204] for evidences of our classification. Book [2] does not explicitly give a formaldefinition of random samples although it does use the term, and book [4] uses the term of randomsamples very early, with its definition deferred until page 214.In [5, page 204], a definition of random samples is given as follows:
Definition:
If the random variables X , X , · · · , X n are independent and identically distributed(iid), then these random variables constitute a random sample of size n from the common distri-bution. Definitions of this concept in other textbooks in this group, though with different wordings, aremathematically equivalent, so they are omitted here. By defining random samples directly fromdistribution without mentioning population, the authors obviated the difficult task of explainingthe population concept.
This group consists of, in chronological order, [6, 7, 8, 9]. Of them, we found in [6, 9] detailedexplanations of the concepts of population and random samples before giving formal definitions.1e now quote the following excerpt of the explanation in [6, pages 195–197], which is similar tothe explanation in [9, page 225–227].
A population consists of the totality of the observations with which we are concerned.In any particular problem, the population may be small, large but finite, or infinite. Thenumber of observations in the population is called the size of the population. For example, thenumber of underfilled bottles produced on one day by a soft-drink company is a population offinite size. The observations obtained by measuring the carbon monoxide level every day is apopulation of infinite size. We often use a probability distribution as a model for a population.For example, a structural engineer might consider the population of tensile strengths of a chassisstructural element to be normally distributed with mean µ and variance σ . We could refer tothis as a normal population or a normally distributed population.A sample is a subset of observations selected from a population.To define a random sample, let X be a random variable that represents the result of one selec-tion of an observation from the population. Let f ( x ) denote the probability density function of X .Suppose that each observation in the sample is obtained independently, under unchanging condi-tions. That is, the observations for the sample are obtained by observing X independently underunchanging conditions, say, n times. Let X i denote the random variable that represents the i -threplicate. Then, X , X , · · · , X n is a random sample and the numerical values obtained are de-noted as x , x , · · · , x n . The random variables in a random sample are independent with the sameprobability distribution f ( x ) because of the identical conditions under which each observation isobtained. In [7, page 259] there is a short reference to the concept of population.
The appropriate representation of ˆ θ is ˆΘ = h ( X , X , · · · , X n ) where X , X , · · · , X n are randomvariables representing a sample from random variable X , which is referred to in this context asthe population . And in [8, page 78] there is concise characterization of random sampling.
Let N and n represent the numbers of elements in the population and sample, respectively. Ifthe sampling is conducted in such a way that each of the C nN samples has an equal probability ofbeing selected, the sampling is said to be random, and the result is said to be a random sample. The concept of population is widely felt hard to grasp by students, which is perhaps the primaryreason for it not being included in many textbooks. We are not attempting to argue whether weshould or should not include it in textbooks. Instead, we try to propose an explanation of the pop-ulation concept that makes perfect sense in a mathematical manner (which we call mathematicallysensible in the sequel). Our attempt was motivated by our perception that the existing explanations(the foregoing explanation of [6] being a good representative) of the concept are not adequatelymathematically sensible. Our reasons are as follows:1. One would expect in a mathematically sensible explanation, the population fits into theprobability theory framework in that it corresponds to a clear-cut mathematical concept.The existing explanations are chaotic in this respect. Let us examine each of the threepossibilities:(a) population=sample space . This possibility accords with the first half of the secondparagraph of the foregoing quote of [6, pages 195–197]. However, how can a sample spacehave a distribution? Only random variables can.(b) population=range space of the random variable . But this possibility directlycontradicts the second example given in the first half of the second paragraph of theabove-mentioned quote. If the carbon monoxide level can have only 100 values due toquantization, then should we say the size of population is now 100?(c) population=the random variable . This possibility is the most-often seen. But thenwhat is the corresponding mathematical concept for observation? How can we view a2opulation as a set of observations, as now the population is a random variable? Thefirst sentence of the last paragraph of the above-mentioned quote suggests one randomvariable corresponds to one observation, which is in contradiction to the first sentenceof the quote under this possibility.2. The existing explanations do not mathematically show why random sampling in its primitivesense (characterized by the foregoing quote of [8, page 78]) leads to independent and identicallydistributed random variables.The question now becomes: Does a mathematically sensible explanation exist? Our answer isyes and we will provide such an explanation in this paper.
Actually, our new explanation of the population concept is a part of a new probability frameworkfor population and sampling. Before giving the detail, let us look at its salient features:1. It involves two probability spaces: a population probability space and an experiment probabilityspace . However they are closely related to each other.2. The term population is no longer mathematically overloaded as in the existing explanations, itnow corresponds to the clear-cut mathematical concept of the sample space of the populationprobability space.3. The random variable X , which was regarded as the population random variable in the existingexplanations, lives in the population probability space, whereas the sample X , X , · · · , X n lives in the experiment probability space. So mathematically speaking, they are NOT definedover the same probability space.4. The physical operation of sampling is described by the mathematical operation of mappingfrom the experiment space (i.e. sample space of the experiment probability space) to thepopulation, and the simple randomness of the the sampling is mathematically described bythe to-be-defined simpleness of the mapping. Recall that a probability space is a triple (Ω , F , P ), where Ω is a set called the sample space, F is a σ -field of subsets of Ω (see [5, page 10] for a description), and P : F 7→ [0 ,
1] is a probabilityfunction as defined in [5, page 11].Mathematically speaking, any probability space can be a population probability space. It is onlythat the sample space of a population probability space is called population, which, understoodphysically, consists of individuals with various observable quantities. For example, if we are carryingout a health survey in a state with one million people, then we can define a population probabilityspace as follows. Π = { , , · · · , } , F Π = { All subsets of Π } ,P Π ( A ) = Number of elements in A , for any A ⊆ Π . Obviously, (Π , F Π , P Π ) constitutes a probability space. The elements in Π are ordinal ids forthe people in the state. We can then define random variables X , Y , Z , ... to represent the length,weight, blood type, etc. of the people. For example, it can be so defined that X (785) is the lengthin meters of the individual whose ordinal id is 785 in this state. Note also that if the samplespace is finite and all its elements are assigned equal probability, then that probability space is3alled “classical” in the literature. So this example is classical. However, our theory applies also tonon-classical and infinite population cases.The population probability space can be understood either stochastically or non-stochastically.It can be understood non-stochastically, since in the above description, no physical randomness isinvolved. It can also be understood stochastically if we view P Π ( A ) as the the “(physical) probabilityof any member of A being selected for survey”. An experiment probability space is one where the sample space is the set of experiment outcomes.We denote it by the triple ( E, F E , P E ).In our framework, size- n sampling is selection of an n -tuple from the population. What n -tupleis selected is determined by the experiment outcome. Therefore, sampling can be mathematicallydescribed by a sampler mapping S : E Π n . Practically, it is the usual case that the size ofpopulation is massively larger than n , and thus one element of the tuple can rarely equal anotherelement. In addition, it is also practically usual that the order of the elements does not matterin later statistical processing. Therefore practically we often say sampling is selection of a subsetof size n from the population. The n -tuple term is adopted here, because it is considered moreconvenient.Now we give the mathematical property to capture the simple randomness as described in thequote of [8, page 78] in Section 1. Definition 1.
Let (Π , F Π , P Π ) be a population probability space, and ( E, F E , P E ) be an experimentprobability space. A sampler mapping S : E Π n is said to be simple, if for all B ∈ F Π , B ∈F Π , · · · , B n ∈ F Π , P E ( { e | S ( e ) ∈ B , S ( e ) ∈ B , · · · , S n ( e ) ∈ B n } ) = P Π ( B ) P Π ( B ) · · · P Π ( B n ) . (1)For a fixed experiment probability space, the existence of a simple sampler mapping is notguaranteed. In fact, it is more a requirement on the experiment probability space than one on thesampler mapping.The quote of [8, page 78] is with an implicit assumption which, put in our terminology, is thepopulation probability space being classical. Then, a subset of Π of size n corresponds to n ! tupleswith different orders, and if the mapping is simple, each order has N n probability of being selected,thus the probability of the subset being selected is n ! N n , irrespective of what elements constitute thesubset. This is just the simple-randomness of sampling. Theoretically, there always exists a trivial construction of experiment probability space and simplesampler mapping, which is the product measure space (Π , F Π , P Π ) n and the identity mapping. Butthis construction is only of theoretic value, not expected to yield real benefits.If we want to construct an experiment probability space and a simple sampler mapping thatfaithfully describe a practically used sampling procedure, then this task is formidable, because inpractice, ad-hoc methods are usually used, which are only approximately simple random. In suchcases we can just assume the sampling to be simple random and assume the existence of an exper-iment probability space and simple sampler mapping, without giving the particular construction.Perhaps the only scenario that we want to do the explicit mathematical construction is whenwe need the construction to design a simple random sampling algorithm. In that scenario, theexperiment probability space, viewed as a model for random generator, should already have animplementation available. The following random number generators are considered to have reliable(albeit just close approximate) algorithm implementations: Discrete:
Given a length L , random-generate ξ from { , , · · · , L − } with each of 0 , , · · · , L − Continuous:
Uniformly random-generate ξ from real interval [0 , .3.1 Construction Based on the Discrete Generator This method is applicable only when the population probability space is classical. So assumed, let N be the size of population, and let E = { , , · · · , N n − } , (2) F E = { All subsets of E } , (3) P E ( A ) = Number of elements in AN n , for any A ⊆ E. (4)Eqs. (2–4) constitute a model for the length- N n discrete generator. Now we define the samplermapping: S ( e ) = ( a , a , · · · , a n ) (5)where e = a N n − + a N n − + · · · + a n (6)with 0 ≤ a ≤ N − , ≤ a ≤ N − , · · · , ≤ a n ≤ N − . (7)It is easy to see that S is a simple sampler mapping. This method is applicable when the population probability space is finite, both classical and non-classical. However, we only describe the method for the classical case, as the nonclassical case ismuch more complicated to deal with. Let N be the size of population, and let E = [0 , , (8) F E = { All Borel subsets of E } , (9) P E ( A ) = Measure of A, for any A ∈ F E . (10)Eqs. (8–10) constitute a model for the continuous generator. Now we define the samplermapping: S ( e ) = ( a , a , · · · , a n ) (11)where e = a N − + a N − + · · · + a n N − n + · · · (12)with 0 ≤ a ≤ N − , ≤ a ≤ N − , · · · , ≤ a n ≤ N − . (13)We omit the proof to show that S is a simple sampler mapping with the classical assumption. In Section 2.2 we showed that in the classical case, simpleness of the sampler mapping over theexperiment probability space implies simple randomness of sampling. However, the reverse impli-cation may not hold, and we may need a stronger version. Thus we try to revise the quote of [8,page 78] a bit by replacing “subsets” with “tuples”:
Let N and n represent the numbers of elements in the population and sample, respectively.If the sampling is conducted in such a way that each of the N n ordered samples has an equalprobability of being selected, the sampling is said to be simple random, and the result is said tobe a random sample. This tuple-version is still completely intuitively reasonable and is readily seen to be equivalentin the classical case to the simpleness of the sampler mapping over the experiment probabilityspace. So now this version of simple randomness will be adopted. Furthermore, because of theirequivalence in the classical case, and the simpleness concept is more general, we decide to use thesimpleness as defined in Definition 1 to carry out our derivation of the following proposition aboutrandom variables. 5 roposition 1.
Let X be a random variable over population probability space (Π , F Π , P Π ) . Let ( E, F E , P E ) be an experiment probability space and S : E Π n be a simple sampler mapping.Then1. For i = 1 , · · · , n , X i as defined by X i ( e ) := X ( S i ( e )) is a random variable over ( E, F E , P E ) and has the same distribution function as X .2. X , X , · · · , X n are independent.Proof.
1. Fix an arbitrary u ∈ R . Let B = { π | X ( π ) < u } , then since X is a random variable over(Π , F Π , P Π ), we have B ∈ F Π . Then { e | X i ( e ) < u } = { e | X ( S i ( e )) < u } = { e | S i ( e ) ∈ B, S j ( e ) ∈ Π for j = i } . (14)The right side of (14) is in F E , because it has its probability defined by Definition 1. Since u isarbitrary, we have X i is a random variable over ( E, F E , P E ). Furthermore, also by Definition 1, P E ( { e | X i ( e ) < u } ) = P E ( { e | S i ( e ) ∈ B, S j ( e ) ∈ Π for j = i } ) = P Π ( B ) = P Π ( { π | X ( π ) < u } ) . Since u is arbitrary, this means X i has the same distribution function has X .2. Fix an arbitrary set of real values u , u , · · · , u n . Then P E ( { e | X ( e ) < u , X ( e ) < u , · · · , X n ( e ) < u n } )= P E ( { e | X ( S ( e )) < u , X ( S ( e )) < u , · · · , X ( S n ( e )) < u n } )= P E ( { e | S ( e ) ∈ { π | X ( π ) < u } , S ( e ) ∈ { π | X ( π ) < u } , · · · , S n ( e ) ∈ { π | X ( π ) < u n }} )= P Π ( { π | X ( π ) < u } ) P Π ( { π | X ( π ) < u } ) · · · P Π ( { π | X ( π ) < u n } )= P E ( { e | X ( e ) < u } ) P E ( { e | X ( e ) < u } ) · · · P E ( { e | X n ( e ) < u n } ) . This means X , X , · · · , X n are independent. Section 2 presents a fully developed theoretical framework for population and sampling. A largeportion of the previous explanations in verbal language has now been replaced by results in mathe-matical language. This makes our explanation much less vague than previous ones. However, someof the results may require a bit too much of mathematical maturity of the reader, and therefore incourses or textbooks, it may not be the most suitable to present the framework in its full version.However, we believe the following points should be stressed in educational occasions:1. The term “population” as a noun should refer to the sample space, not the random variableas is the case in many textbooks.2. The term “population” can be used as an attributive in “population random variable”, “pop-ulation distribution”, and “population density”, which refer to a random variable in thepopulation probability space, its distribution, and its density, respectively.3. The population random variable X , and the sample random variables X , X , · · · , X n do notlive in the same probability space. Failure to notice this is the cause for many difficulties inthe existing explanations.4. The term “sample” may suggest students to believe the random sample X , X , · · · , X n con-tains less information than the population random variable X . Actually, mathematicallyspeaking, X , X , · · · , X n , and even each of its members, X i , contain no less information than X . It is only that in practice, only one experiment is done, and we only hold n real values x , x , · · · , x n as observed values of the sample. Usually n << N .Finally, it is hoped that this new framework and explanation, with its unique feature of math-ematical sensibleness, will be helpful to a large number of statistics students.6 eferences [1] M.H. DeGroot. Probability and Statistics . Addison-Wesley, 2nd edition, 1989.[2] N. Mukhopadhyay.
Probability and Statistical Inference . Marcel Dekker, 2000.[3] F.M. Dekking, C. Kraaikamp, H.P. Lopuhaa, and L.E. Meester.
A Modern Introduction toProbability and Statistics . Springer, 2005.[4] J.L. DeVore.
Probability and Statistics . Brooks/Cole, Cengage Learning, 8th edition, 2012.[5] R.V. Hogg, J.W. McKean, and A.T. Craig.
Introduction to Mathematical Statistics . Pearson,7th edition, 2013.[6] D.C. Montgomery and G.C. Runger.
Applied Statistics and Probability for Engineers . Wiley,3rd edition, 2003.[7] T.T. Soong.
Fundamentals of Probability and Statistics for Engineers . Wiley, 2004.[8] D.D. Wackerly, W. Mendenhall, and R.L. Scheaffer.
Mathematical Statistics with Applications .Brooks/Cole, Cengage Learning, 7th edition, 2008.[9] R.E. Walpole, H. Myers, S.L. Myers, and K. Ye.