[PDF] A Constructive Algebraic Proof of Student's Theorem

Abstract

Student's theorem is an important result in statistics which states that for normal population, the sample variance is independent from the sample mean and has a chi-square distribution. The existing proofs of this theorem either overly rely on advanced tools such as moment generating functions, or fail to explicitly construct an orthogonal matrix used in the proof. This paper provides an elegant explicit construction of that matrix, making the algebraic proof complete. The constructive algebraic proof proposed here is thus very suitable for being included in textbooks.

Full PDF

aa r X i v : . [ s t a t . O T ] J un A Constructive Algebraic Proof of Student’s Theorem

Yiping Cheng

School of Electronic and Information EngineeringBeijing Jiaotong University, Beijing 100044, China [email protected]

ABSTRACT.

Student’s theorem is an important result in statistics which states that for normalpopulation, the sample variance is independent from the sample mean and has a chi-square distribution.The existing proofs of this theorem either overly rely on advanced tools such as moment generatingfunctions, or fail to explicitly construct an orthogonal matrix used in the proof. This paper providesan elegant explicit construction of that matrix, making the algebraic proof complete. The constructivealgebraic proof proposed here is thus very suitable for being included in textbooks.

Keywords: sample variance; chi-square distribution; t-distribution; statistical education

In mathematical statistics, there is a well-known theorem about the sample variance of a random samplefrom a normal distribution. This theorem is directly related to the discovery of the t-distribution bystatistician William Sealy Gosset (1876-1937), known as “Student”, a pseudonym he used when hepublished his paper. Therefore, this theorem is often referred to as Student’s theorem. Let N ( µ, σ )denote the normal distribution with mean µ and variance σ . Then the theorem reads as follows. Theorem 1 (Student’s Theorem) . Let X , . . . , X n be a random sample from the distribution N ( µ, σ ) ,i.e., they all have that distribution and are mutually independent. Deﬁne the random variables X = P ni =1 X i n , (1) S = P ni =1 ( X i − X ) n − . (2) Then1. X has distribution N ( µ, σ n ) .2. X and S are independent.3. ( n − S σ has distribution χ ( n − . This theorem is equivalent to the following version where the general normal distribution is replacedby standard normal distribution.

Theorem 2 (Student’s Theorem, Standardized Version) . Let Z , . . . , Z n all have distribution N (0 , and are mutually independent. Deﬁne the random variables Z = P ni =1 Z i n , (3) W = n X i =1 ( Z i − Z ) . (4) Then1. √ n Z has distribution N (0 , .2. Z and W are independent.3. W has distribution χ ( n − . Since these two versions are equivalent, and it is easier to formulate a proof of the standardizedversion, in the rest of the paper the standardized version will be used when we give our proof.1

LITERATURE PROOFS OF STUDENT’S THEOREM

To the author’s best knowledge, the original paper of Gosset is not currently available to the generalpublic, so we do not know if it contained a proof of the above theorem. However, it is believed thateven if such a “proof” did exist, it could hardly be regarded as a proof by today’s standard, because themathematically rigorous theory of probability only began to emerge in 1930s. We therefore should lookinto the modern literature, mainly textbooks, for proofs of Student’s theorem. In one way or another,all the proofs rely on two important theorems of multivariate normal distribution, whose proofs requirea very deep mathematical tool: moment-generating functions (m.g.f. in the sequel), or alternatively,characteristic functions. These two theorems are familiar to the majority of statistics students. Theyare given here as lemmas.

Lemma 3.

Let random variables X , . . . , X n have the multivariate normal distribution with mean µ and covariance matrix Σ . Let Y = [ Y , . . . , Y m ] T = AX + b , where A is an m × n full row-rank constantmatrix, X = [ X , . . . , X n ] T , and b = [ b , . . . , b m ] T is a constant column vector. Then Y , . . . , Y m havethe multivariate normal distribution with mean Aµ + b and covariance matrix A Σ A T . Lemma 4.

Let random variables X , . . . , X n have the multivariate normal distribution with mean µ and covariance matrix Σ . Deﬁne random vectors X , X , and X as X T = [ X , . . . , X r | {z } X T , X r +1 , . . . , X n | {z } X T ] . Partition Σ as Σ =  Σ r × r Σ r × ( n − r ) Σ T n − r ) × r Σ n − r ) × ( n − r )  . Then X and X are independent if and only if Σ = . A consequence of Lemma 4 is the following proposition, which we will use later.

Proposition 5.

Let random variables X , . . . , X n have multivariate normal distribution with covari-ance matrix Σ . Then X , . . . , X n are mutually independent if and only if Σ is a diagonal matrix. After looking into a number of renowned modern statistical textbooks, which are supposed to haveincorporated the latest developments in the whole literature on this subject, we found two typicalproofs. They are commented below.

Proof in [1, Section 3.6.3].

This proof ﬁrst shows the independence of X and S using Lemma 4,then it shows that ( n − S σ has distribution χ ( n −

1) using an argument that invokes m.g.f. a furthertime. This, we believe, is a drawback because the typical reader, who is usually only a sophomore, isnot expected to have the skill of directly dealing with m.g.f.. There is a similar proof in [2, Section8.5], which we consider is somewhat less rigorous than the one in [1].

Proof in [3, Section 7.3] and [4, Section 8.3].

This proof shows the independence and the χ ( n − Y = OZ , where O is an orthogonal matrixand the ﬁrst row of O is [ √ n , . . . , √ n ], so that Y = √ n Z and the sum of squares of the other entriesof Y is W . This proof is algebraic, without using advanced tools, and hence is much simpler andeasier to understand than the proof in [1]. However, there is still a little drawback of this proof: itis nonconstructive in that it only states the existence of the orthogonal matrix O , without giving itspeciﬁcally. While not aﬀecting the rigor of the proof, this drawback does hurt its pedagogical value. We consider the proof in [3, 4] nearly perfect, and we seek to make it fully perfect by ﬁxing itsdrawback we just mentioned, i.e. by explicitly constructing the O matrix. In fact, in [4, page 478] theGram-Schmidt orthogonalization method is suggested for constructing the O matrix, but no hint is2iven about the choice of the starting matrix. We tried that method with the starting matrix beingthe matrix obtained by replacing the ﬁrst row of the identity matrix by [ √ n , . . . , √ n ], and we foundthat the resulting orthogonal matrix is very ugly and prohibitively diﬃcult to describe. Therefore wetend to believe that Gram-Schmidt orthogonalization is not an elegant method of construction for ourpurpose here. However, we ﬁnally succeeded in ﬁnding an elegant construction. Let us now illustrateit by a few base examples. O = " √ − √ √ √ . (5) O =  √ − √ √ √ − √ √ √ √  . (6) O =  √ − √ √ √ − √ √

12 1 √

12 1 √ − √ √ √ √ √  . (7) O =  √ − √ √ √ − √ √

12 1 √

12 1 √ − √ √

20 1 √

20 1 √ − √ √ √ √ √ √  . (8)The general method of construction is contained in the following key lemma. Lemma 6.

For every integer n ≥ , deﬁne the matrix O n = [ o ij ] n × n by:For each i with ≤ i ≤ n − , o ij =  √ i ( i +1) , for j ≤ i, − i √ i ( i +1) , for j = i + 1 , , otherwise; (9) o nj = 1 √ n for all ≤ j ≤ n. (10) Then O n is orthogonal, i.e. O n O Tn = I .Proof. Let P = O n O Tn = [ p ij ] n × n .1) If 1 ≤ i ≤ n −

1, then p ii = P nk =1 o ik = P ik =1 1 i ( i +1) + i i ( i +1) = 1.2) p nn = P nk =1 o nk = P nk =1 1 n = 1.3) If 1 ≤ i ≤ n −

1, then p in = p ni = P nk =1 o ik o nk = P ik =1 1 √ i ( i +1) 1 √ n + − i √ i ( i +1) 1 √ n = 0.4) If 1 ≤ i = j ≤ n −

1, without loss of generality, let us assume i < j , then p ij = p ji = n X k =1 o ik o jk = i +1 X k =1 o ik o jk = 1 p j ( j + 1) i +1 X k =1 o ik = 0 . Thus we have shown O n O Tn = I .Now for the sake of self-completeness of this paper, we here give a proof of Theorem 2. It usesthe same idea as the proof in [4, page 478] except for our explicit construction of O and a few minordetails. 3 roof of Theorem 2. Denote Z = [ Z , . . . , Z n ] T . Deﬁne the random vector Y = [ Y , . . . , Y n ] T by Y = O n Z (11)where O n is deﬁned by (9,10).From (10) we know that Y n = n X i =1 Z i √ n = √ n Z. (12)It is obvious that Y n has the distribution N (0 , n X i =1 Y i = Y T Y = Z T O Tn O n Z = Z T Z = n X i =1 Z i . Therefore, n − X i =1 Y i = n X i =1 Y i − Y n = n X i =1 Z i − nZ = n X i =1 ( Z i − Z ) . We have thus obtained the relation W = n X i =1 ( Z i − Z ) = n − X i =1 Y i . (13)By Lemma 3, Y , . . . , Y n have multivariate normal distribution with covariance matrix O n I O Tn = I ,which is diagonal, therefore by Proposition 5, Y , . . . , Y n all have the N (0 ,

1) distribution and are mutually independent . (14)Since W is entirely based on Y , . . . , Y n − , and Z = Y n √ n , (14) implies that W and Z are independent.Finally, (13) and (14) together imply that W has distribution χ ( n − The proof proposed here of Student’s theorem is algebraic and fully constructive. To our best knowl-edge, such a construction has not appeared in the literature before. A constructive proof is expectedto make the reader more comfortable and consequently enhance their understanding of this importantresult. We believe this paper to be of signiﬁcant pedagogical value in statistical education, and hopethe construction proposed here to be included in future textbooks.

REFERENCES [1] R.V. Hogg, J.W. McKean, and A.T. Craig.

Introduction to Mathematical Statistics . Pearson, 7thedition, 2013.[2] R.E. Walpole, H. Myers, S.L. Myers, and K. Ye.

Probability and Statistics for Engineers andScientists . Prentice Hall, 9th edition, 2012.[3] M.H. DeGroot.

Probability and Statistics . Addison-Wesley, 2nd edition, 1989.[4] M.H. DeGroot and M.J. Schervish.