An elementary approach for minimax estimation of Bernoulli proportion in the restricted parameter space
aa r X i v : . [ m a t h . S T ] S e p An elementary approach for minimaxestimation of Bernoulli proportion in therestricted parameter space
Heejune Sheen ∗ and Yajun Mei † H. Milton Stewart School of Industrial and Systems Engineering,Georgia Institute of Technology, Atlanta, GA, USASeptember 25, 2020
Abstract
We present an elementary mathematical method to find the minimax estimator ofthe Bernoulli proportion θ under the squared error loss when θ belongs to the restrictedparameter space of the form Ω = [0 , η ] for some pre-specified constant 0 ≤ η ≤ . Thisproblem is inspired from the problem of estimating the rate of positive COVID-19 tests.The presented results and applications would be useful materials for both instructorsand students when teaching point estimation in statistical or machine learning courses.
Keywords:
Minimax estimation, Bernoulli distribution, squared error loss, restricted param-eter space, convex function. ∗ ( [email protected] ) † ( [email protected] )
1. INTRODUCTIONPoint estimation of model parameters have been an important topic in statistics, machinelearning, and data science. Besides the maximum likelihood estimator (MLE), the methodof moment (MOM), and Bayesian method, another important approach is the minimax es-timator that minimizes the maximum risk. The applications of minimax estimators can befound in many fields such as in statistics (Malinovsky and Albert 2015; Yaacoub, Mous-takides and Mei 2018; Zinodiny, Strawderman and Parsian 2011), machine learning (Ben-Haim and Eldar 2007), physics (Ng and Englert 2012; Ng, Phuah and Englert 2012) andfinance (Chamberlain 2000).One important question on the minimax estimator is to estimate the proportion θ of theBernoulli or binomial distributions. For instance, when X , · · · , X n are independent andidentically distributed (i.i.d.) with Bernoulli( θ ), i.e., P θ ( X i = 1) = 1 − P θ ( X i = 0) = θ, theMLE estimator is ¯ X n = ( X + · · · + X n ) /n, whereas the minimax estimator of θ under thesquared error loss over θ ∈ Ω = [0 ,
1] is δ M = √ n √ n + 1 ¯ X n + 1 √ n + 1 12 . (1)The proof of the minimax property of the estimator in (1) is based on the well-knownresult that a Bayes procedure with constant risk is minimax, see (Lehmann and Casella2006). There are also other approaches to prove minimax properties, such as informationinequality (Tsybakov 2008) or the invariance methods (Kiefer 1957). However, in generalit is challenging to derive the minimax estimators, and the standard statistical textbookprovides very limited examples of minimax procedures.When we are teaching the minimax estimator, one example we used is to estimate thepositive rate θ of COVID-19 surveillance tests. As an illustrative example, suppose that anorganization tests n = 100 random employees/staff for COVID-19 surveillance tests during agiven period (say daily, weekly or monthly), and observes 5 positive cases. In this scenario,the standard sample mean estimator is to estimate the positive rate is ¯ X = = 0 . . Meanwhile, the minimiax estimator in (1) yields the estimate δ M = √ √ .
05 + √
100 12 =0 . , which is much larger than the sample mean of 5% . Meanwhile, the COVID-19 example also brings a new challenge. For instance, supposethat we are pretty sure that the COVID-19 positive rate θ ∈ [0 , η ], say, η = 0 . , instead of2 ∈ [0 , . While the estimator in (1) is minimax when the sample space Ω = { ≤ θ ≤ } , it is interesting to ask whether it is still minimax when the sample space Ω = { ≤ θ ≤ η } for some 0 ≤ η ≤ . The purpose of this note is to provide a complete solution of the minimax estimatorfor the Bernoulli distribution when n = 1 and the sample space Ω = { ≤ θ ≤ η } . Wewill show that the estimator in (1) is still minimax when η ≥ / , but we will have a newminimax estimator when 0 ≤ η < / . Our proof is based on elementary mathematics tools,as we focus on the n = 1 case. We hope that our note provides more examples of minimaxprocedures that can be accessed to high school or undergraduate students, thereby enrichingthe teaching materials on the minimax estimator and enhancing students’ understandingand interests.We should mention that our paper essentially deals with the minimax estimators inrestricted parameter spaces, and there are some related existing research for Bernoulli orBinomial distribution in the literature. For instance, Moors (1985) or Berry (1989) derivedthe minimax estimation when the parameter space Ω is a symmetric interval, i.e., Ω = { − η ≤ θ ≤ + η } . Marchand and MacGibbon (2000) considered the parameter spaceΩ = { ≤ θ ≤ η } as in our note, but derive the minimax estimator for general n case underthe assumption that η is small. Here we assume that the 0 ≤ η ≤ η values when n = 1 . The remainder of this note is as follows. In Section 2, we present the minimax estimationproblem of Bernoulli proportion in the restricted parameter spaces when the parameterspace Ω = { ≤ θ ≤ η } . In Section 3, we state our main result on the minimax estimator,and present a rigorous elementary mathematical proof. Section 4 contains some conclusionmarks. 2. PROBLEM FORMULATIONSuppose that X is a Bernoulli random variable with P θ ( X = 1) = 1 − P θ ( X = 0) = θ , and wewant to estimate θ on the basis of X under the squared error loss function L ( θ, d ) = ( θ − d ) .In this case, we have n = 1 observation, and the MLE estimator is to estimate θ as ˆ θ MLE = 0if X = 0 and = 1 if X = 1 . Meanwhile, the minimax estimator δ M in (1) is to estimate θ as1 / X = 0 and 3 / X = 1 . θ belongs to a restricted space Ω = { ≤ θ ≤ η } for some pre-specified constant η, and we want to find the minimiax estimator of θ under the squared loss function. That is, we want to find a procedure δ ( X ) that minimizesthe maximum risk sup ≤ θ ≤ η E θ ( θ − δ ( X )) (2)Since there is only n = 1 observations, the estimator δ ( X ) is completely determined by δ ( X = 0) = a and δ ( X = 1) = b. Indeed, using the notation of a and b, we have E θ ( θ − δ ( X )) = ( θ − a ) P θ ( X = 0) + ( θ − b ) P θ ( X = 1)= ( θ − a ) (1 − θ ) + ( θ − b ) θ = (2 a − b + 1) θ + ( b − a − a ) θ + a . (3)Thus, the minimax estimator problem in (2) can be written in the following elementarymathematical form: Find two real numbers, a and b , that minimizesup ≤ θ ≤ η (cid:2) (2 a − b + 1) θ + ( b − a − a ) θ + a (cid:3) . (4)for some 0 ≤ η ≤ .
3. OUR MAIN RESULTOur main results are summarized in the following theorem, whose proof will be presented ina little later:
Theorem 1.
When
Ω = { ≤ θ ≤ η } , the minimax estimator of θ under (2), or the optimalsolution in (4), is given by δ ( X = 0) = a ∗ = √ − η − (1 − η ) if ≤ η ≤ if < η ≤ and δ ( X = 1) = b ∗ = min { η, } = η, if η ≤ ; , if η > .
4n particular, Theorem 1 indicates that the estimator δ M in (1) is still minimax for therestricted parameter space Ω = { ≤ θ ≤ η } when ≤ η ≤ , but we will have a newform of minimax estimators when 0 ≤ η ≤ . For instance, assume that we are prettysure that the COVID-19 positive rate of an organization is θ ∈ [0 , η ], say, η = 0 . , andassume that we randomly test a subject. Then the minimax estimate of the positive rate θ is √ − . − (1 − . ≈ .
44% if the subject’s test is negative, and is 20% if the subject’stest is positive.Let us now provide a rigorous proof of Theorem 1.
Proof of Theorem 1 : It suffices for us to consider 0 ≤ a ≤ η and 0 ≤ b ≤ η when solvingthe optimization problem in (4) for 0 ≤ η ≤ . This is because it is evident from (3) that forall 0 ≤ θ ≤ η, we have ( θ − a ) ≥ ( θ − η ) if a > η and ( θ − a ) ≥ ( θ − if a < . Thus,the optimal solution in (4) must be in the interval [0 , η ] , since otherwise we will improve theobjective function if we replace them by the endpoints 0 or η. Moreover, note that the function f ( a, b, θ ) = (cid:2) (2 a − b + 1) θ + ( b − a − a ) θ + a (cid:3) (5)is a quadratic function with respect to θ , the investigation of its maximum values dependson the sign of the leading coefficient, and thus we need to split the region of ( a, b ) into twosub-regions: (A) When 2 a − b + 1 ≥
0, the maximum of f ( a, b, θ ) is attained at one of two endpoints, θ = 0 or θ = η. (B) When 2 a − b + 1 < f ( a, b, θ ) is a concave quadratic function of θ Our main idea is to find the maximum value that can be minimized in each sub-region, whichwill yield the global minimax solution.Let us begin with case (A), or when a ≥ b − , it suffices to compare two endpoints. Inthis case, f ( a, b,
0) = a and f ( a, b, η ) = (1 − b + 2 a ) η + ( b − a − a ) η + a . It is evidentthat f ( a, b, ≥ f ( a, b, η ) if and only if (1 − b + 2 a ) η + ( b − a − a ) ≤
0, or equivalently, a + 2(1 − η ) a ≥ b − ηb + η. This can be further simplified as( a + (1 − η )) − ( b − η ) ≥ − η, (6)5r a ≥ p ( η − b ) + (1 − η ) − (1 − η ) , since we are only interested in the case when a > . A key observation is that the boundary of (6) defines a hyperbolic, which has a vertex( a, b ) = ( √ − η − (1 − η ) , η ) that has the smallest positive a value. Moreover, in case (A),we have a ≥ b − , and its boundary line a = b − interests the hyperbolic boundary of (6)at the unique point ( a, b ) = ( , ) . This leads to two subcases, depending on whether η ≤ or not. This determines whether the unique intersection point is below or above the vertexof the hyperbolic curve, or equivalently, whether the line a = b − intersects on the upperor lower part of the hyperbolic curve.Meanwhile, in case (A), we will need to solve two sub-problems: Problem ( A1 ) : min ≤ a,b ≤ η a (7) s.t. a ≥ p ( η − b ) + (1 − η ) − (1 − η ) a ≥ b − . Problem ( A2 ) : min ≤ a,b ≤ η (cid:16) (1 − b + 2 a ) η + ( b − a − a ) η + a (cid:17) (8) s.t. a ≤ p ( η − b ) + (1 − η ) − (1 − η ) a ≥ b − . In other words, in case (A), we need to investigate two subproblems, (A1) and (A2), undertwo subcases: one for 0 ≤ η ≤ and the other for ≤ η ≤ . First, we claim that when 0 ≤ η ≤ , both problems (A1) and (A2) have the sameoptimal solution: a ∗ = p − η − (1 − η ) and b ∗ = η. (9)To see this, since 0 ≤ η ≤ and the line a = b − intersects on the upper part of thehyperbolic curve, the objective function a in problem (A1) (7) attains its minimum value of( √ − η − (1 − η )) at the vertex. The proof for problem (A2) in (8) is a little complicatedbut can follow similar ideas.To be more specific, in problem (A2), set the objective value (1 − b + 2 a ) η + ( b − a − ) η + a = γ . This is equivalent to the ellipse:( a − η ) (cid:16)q γ − η (cid:17) + ( b − η ) (cid:16)q γη (cid:17) = 1 , (10)with a center ( η, η ). Since the semi-major axis and semi-minor axis of the ellipse are pro-portional to γ , minimizing γ is equivalent to finding the smallest ellipse that intercepts witha point ( a, b ) in the feasible region of ( a, b ) in (A2): { ( a, b ) : 0 ≤ a ≤ η, ≤ b ≤ η, a ≤ p ( η − b ) + (1 − η ) − (1 − η ) , a ≥ b − } . The smallest ellipse is obtained when it intercepts with the curve a = p ( η − b ) + (1 − η ) − (1 − η ) at one point. For any 0 < η ≤
1, by letting γ = ( √ − η − (1 − η )) , we havean ellipse that intersects with the curve a = p ( η − b ) + (1 − η ) − (1 − η ) at the vertex( √ − η − (1 − η ) , η ). Hence, when 0 ≤ η ≤ , the optimal solution is ( √ − η − (1 − η ) , η ).Second, we claim that when ≤ η ≤
1, both problems (A1) and (A2) have the sameoptimal solution: a ∗ = 14 and b ∗ = 34 . (11)To prove this, note that when ≤ η ≤
1, the line a = b − intersects on the lower part of thehyperbolic curve, i.e., the vertex in (9) does not satisfy the constraint a ≥ b − for problem(A). In particular, for problem (A1), the point with the smallest a value is ( a, b ) = ( , ),not the vertex, and thus (11) is the optimal solution to problem (A1). When ≤ η ≤ η .Next, let us investigate case (B) when 2 a − b + 1 <
0, i.e., when a + < b. Our mainconclusion is that the global minimax solution cannot be obtained in this case. Recall thatin the feasible region, we have 0 ≤ a ≤ η and 0 ≤ b ≤ η. Thus the relation a + < b cannothold if η ≤ . Thus, it suffices to investigate case (B) when < η ≤
1. We claim that themaximum value in case (B) will be larger those in case (A) when < η ≤
1. To prove this,7 ab vertex ( h , h ) 0 10.8750.87501 ab (1/4,3/4) ( h , h )vertex Figure 1: The gray area represents the feasible region of problem (A1). The left figure showsthat the optimal solution is obtained at the vertex ( √ − η − (1 − η ) , η ) when 0 < η ≤ .The right figure demonstrates that the optimal solution is ( , ) when ≤ η ≤ θ = ∈ [0 , η ]. Then we have the following inequalities: f ( a, b,
12 ) = 14 (1 − b + 2 a ) + 12 ( b − a − a ) + a = 12 ((cid:18) a − (cid:19) + (cid:18) b − (cid:19) ) > ((cid:18) a − (cid:19) + a ) = (cid:18) a − (cid:19) + 116 ≥ . where the first inequality follows from the assumption of a + < b for Case (B). On theother hand, recall that the maximum values of case (A) attains the minimum value of( √ − η − (1 − η )) and , respectively, depending on whether η < or not. Both of thesemaximum values are less than or equal to . This implies that a minimax estimator for case(B) cannot be a minimax estimator for problem (4). As a result, we exclude case (B) andconclude that the minimax estimator in case (A) is the minimax estimator of the problem(4). This proves the theorem. 8 ab vertex ( h , h ) 0 10.8750.87501 ab (1/4,3/4) ( h , h )vertex Figure 2: The ellipse is centered at ( η, η ) and the gray area represents the feasible region ofproblem (A2). The left figure is for the case when 0 < η ≤ and the right figure is for thecase when ≤ η ≤
1. 4. CONCLUSIONIn the statistical decision theory course, the restricted minimax problem (4) would serve asa useful advanced exercise for illustrating minimax estimation. Indeed, this problem havebeen discussed in our advanced undergraduate level or first-year-graduate-level statisticscourse. To find the minimax estimators for the problem (4), we have used the geometricalinterpretation of hyperbola, ellipse and convex functions. Since our method is simple anddifferent from the standard Bayesian approach, it can be illustrated to students with differentbackgrounds. Moreover, our methodology motivates students to tackle statistical problemswith different and diverse perspectives.REFERENCESBen-Haim, Z., and Eldar, Y. C. (2007), “Blind minimax estimation,”
IEEE Transactions onInformation Theory , 53(9), 3145–3157.Berry, C. J. (1989), “Bayes minimax estimation of a Bernoulli p in a restricted parameterspace,”
Communications in Statistics-Theory and Methods , 18(12), 4607–4616.Chamberlain, G. (2000), “Econometric applications of maxmin expected utility,”
Journal ofApplied Econometrics , 15(6), 625–644. 9iefer, J. (1957), “Invariance, minimax sequential estimation, and continuous time pro-cesses,”
The Annals of Mathematical Statistics , 28(3), 573–601.Lehmann, E. L., and Casella, G. (2006),
Theory of point estimation , New York, NY: SpringerScience & Business Media.Malinovsky, Y., and Albert, P. S. (2015), “A note on the minimax solution for the two-stagegroup testing problem,”
The American Statistician , 69(1), 45–52.Marchand, ´E., and MacGibbon, B. (2000), “Minimax estimation of a constrained binomialproportion,”
Statistics & Risk Modeling , 18(2), 129–168.Moors, J. (1985), Estimation in truncated parameter spaces, PhD thesis, Tilburg University.Ng, H. K., and Englert, B.-G. (2012), “A simple minimax estimator for quantum states,”
International Journal of Quantum Information , 10(04), 1250038.Ng, H. K., Phuah, K. T. B., and Englert, B.-G. (2012), “Minimax mean estimator for thetrine,”
New Journal of Physics , 14(8), 085007.Tsybakov, A. B. (2008),
Introduction to Nonparametric Estimation , 1st edn, New York, NY:Springer Publishing Company, Incorporated.Yaacoub, T., Moustakides, G. V., and Mei, Y. (2018), “Optimal Stopping for Interval Estima-tion in Bernoulli Trials,”
IEEE Transactions on Information Theory , 65(5), 3022–3033.Zinodiny, S., Strawderman, W. E., and Parsian, A. (2011), “Bayes minimax estimation of themultivariate normal mean vector for the case of common unknown variance,”