Sequential Experiment Design for Hypothesis Verification
aa r X i v : . [ s t a t . M L ] D ec Sequential Experiment Design for HypothesisVerification
Dhruva Kartik, Ashutosh Nayyar and Urbashi Mitra
Abstract —Hypothesis testing is an important problem withapplications in target localization, clinical trials etc. Many activehypothesis testing strategies operate in two phases: an explorationphase and a verification phase. In the exploration phase, selectionof experiments is such that a moderate level of confidence onthe true hypothesis is achieved. Subsequent experiment designaims at improving the confidence level on this hypothesis tothe desired level. In this paper, the focus is on the verificationphase. A confidence measure is defined and active hypothesistesting is formulated as a confidence maximization problem inan infinite-horizon average-reward Partially Observable MarkovDecision Process (POMDP) setting. The problem of maximizingconfidence conditioned on a particular hypothesis is referred toas the hypothesis verification problem. The relationship betweenhypothesis testing and verification problems is established. Theverification problem can be formulated as a Markov DecisionProcess (MDP). Optimal solutions for the verification MDPare characterized and a simple heuristic adaptive strategy forverification is proposed based on a zero-sum game interpretationof Kullback-Leibler divergences. It is demonstrated throughnumerical experiments that the heuristic performs better in somescenarios compared to existing methods in literature.
I. I
NTRODUCTION
Hypothesis testing is a classical problem and has beenaddressed in various settings. The problem can be describedqualitatively as follows. An agent is interested in a phe-nomenon, and wants to test if the phenomenon conforms toany one of the hypotheses from a known class. The agent canperform various experiments and based on the observationsfrom these experiments, it needs to infer the true hypothesis.As opposed to the one-shot hypothesis testing problem, anactive agent can choose which experiment to perform basedon the observations made in the past. The agent seeks to selectexperiments such that all false hypotheses are eliminated asquickly as possible.Many active hypothesis testing strategies [1], [2] operate intwo phases. The first phase is an exploration phase in whichthe experiment design is such that a moderate level of confi-dence is achieved on the true hypothesis. In most cases, thisphase terminates in finite time almost surely [3]. The secondis a verification phase in which the agent has a moderatelevel of confidence on some hypothesis and experiments areselected such that confidence on this hypothesis is improvedto the desired level. When the desired confidence level is very
D. Kartik, A. Nayyar and U. Mitra are with the Department of ElectricalEngineering, University of Southern California, Los Angeles, CA 90089 (e-mail: [email protected]; [email protected]; [email protected]). This researchwas supported, in part, by National Science Foundation under Grant NSFCNS-1213128, CCF-1410009, CPS-1446901, Grant ONR N00014-15-1-2550,and Grant AFOSR FA9550-12-1-0215. high, the verification cost dominates the performance. In thispaper, we make the notions of exploration and verificationmore formal and focus on analyzing the verification phase.Active hypothesis testing finds applications in many areassuch as sensor selection for target detection and localization,state tracking, design of clinical trials and learning unknownfunctions from queries [4]. Consequently, the verificationphase plays an important role in all these applications.We consider a slightly different mathematical formulationfor hypothesis testing than previously explored [1], [2]. Usingposterior belief on the set of hypotheses, we define a confi-dence level called Bayesian log-likelihood ratio. The objectiveis to design an experiment selection strategy that maximizesthe expected rate of increase in the confidence level. Ourcontributions in this paper can be summarized as follows:1) We formulate the verification problem as an infinite-horizon average-reward Markov Decision Process(MDP) problem.2) We characterize the optimal rate using infinite-horizonDynamic Programming (DP).3) We identify a set of critical experiments. We then showthat any strategy that selects these experiments whilesatisfying a stability criterion is asymptotically optimal.4) We design a new heuristic experiment selection strategyand numerically show that it achieves better performancecompared to existing methods in some scenarios.The rest of the paper is organized as follows. In SectionI-A, we discuss the relation between our problem and thosein prior works. Section II formulates the problem. Section IIIrelates the problem to the MDP framework and defines criticalexperiments. In Section IV, we solve the DP and in SectionV, we describe an adaptive strategy and numerically compareit with existing policies. We conclude the paper in Section VI.
A. Prior Work
The simplest active hypothesis testing problem was firstformulated by Chernoff in [3] inspired by Wald’s analysis ofthe sequential probability ratio test [5]. Thereafter, it has beengeneralized in different ways depending on the target appli-cation [1], [2]. A major difference between our formulationand the formulation in these works is the reward structure.Prior works consider a combination of expected stopping timeand Bayesian error probability. Fixed horizon problems havealso been considered and they try to minimize the Bayesianerror probability or maximal error probability [1]. We definea notion of confidence and maximize the expected rate of in-crease in confidence over long horizons. In prior formulations, if the agent makes an error in guessing the true hypothesis,it incurs a cost of 1 (or some constant c ) irrespective ofits confidence level. Whereas in our formulation, we rewardthe agent for generating observations that result in a highconfidence level on the true hypothesis. We believe that ourformulation is related to the stopping time formulation becauseof the strong similarity in the results. In [6], [3], [1], [2],the authors obtain asymptotically tight performance boundsand design policies that are asymptotically optimal. When thepolicies in these works are adapted to the verification problemdefined herein, they turn out to be open-loop and randomized.A closed loop policy was designed in [7] but this may notalways be asymptotically optimal. In this paper, we design astrategy for verification that is more adaptive and conjecturethat it is asymptotically optimal. B. Notation
Random variables/vectors are denoted by upper case bold-face letters, their realization by the corresponding lower caseletter. We use calligraphic fonts to denote sets (e.g. U ) and ∆ U is the probability simplex over a finite set U . In general,subscripts are used as time index. There are two exceptions( ρ j ( n ) , X j ( n ) ) to this convention where the subscript de-notes the hypothesis and n denotes time. For time indices n ≤ n , Y n : n is the short hand notation for the variables ( Y n , Y n +1 , ..., Y n ) . For a strategy g , we use P g [ · ] and E g [ · ] to indicate that the probability and expectation depend onthe choice of g . The Shannon entropy of a discrete distribution p over a finite space Y is given by H ( p ) = − X y ∈Y p ( y ) log p ( y ) . (1)And the Kullback-Leibler divergence between distributions p and q is given by D ( p || q ) = X y ∈Y p ( y ) log p ( y ) q ( y ) . (2)II. P ROBLEM F ORMULATION
Let
H ⊂ N be a finite set of hypotheses and let H be thetrue hypothesis. At each time n ∈ N , the agent can performan experiment U n ∈ U and obtain an observation Y n ∈ Y .For simplicity, let us also assume that the sets U and Y arefinite. When an experiment u ∈ U is performed for the k thtime, the observation Y obtained is given by Y = ξ ( H , u, W uk ) , (3)where { W uk : u ∈ U , k ∈ N } is a collection of mutually inde-pendent and identically distributed primitive random variables.The observation Y n at time n can be expressed as Y n = ξ ( H , U n , W n ) . (4)The probability of observing y after performing an experiment u under hypothesis h is denoted by p uh ( y ) .The information available at time n , denoted by I n , is thecollection of all experiments performed and the correspondingobservations up to time n − , i.e. I n = { U n − , Y n − } . (5) ρ (1) h = 1 u y u ... u ... y u ... u ... u ... ... h = 2 u ... ... u y u ... u ... y u ... u ... NatureAgentNatureAgent
Fig. 1: Agent’s choices and subsequent observations repre-sented as a tree. Every instance of the probability space canbe uniquely represented by a path in this tree.Actions of the agent at time n can be functions of I n . Let thepolicy used for selecting the experiment be g n , i.e. U n = g n ( I n ) . (6)The sequence of all the policies { g n } is denoted by g which isreferred to as a strategy . Let the collection of all such strategiesbe G .Using the available information, the agent forms a posteriorbelief ρ ( n ) on H at time n which is given by ρ h ( n ) = P [ H = h | Y n − , U n − ] . (7) Definition II.1 (Bayesian Log-Likelihood Ratio) . TheBayesian log-likelihood ratio C h ( ρ ) associated with an hy-pothesis h ∈ H is defined as C h ( ρ ) := log ρ h − ρ h . (8)The Bayesian log-likelihood ratio (BLLR) is the logarithmof the ratio of the probability that hypothesis h is true versusthe probability that hypothesis h is not true. BLLR is obtainedby applying the logit function (also referred to as log-odds instatistics [8]) on the posterior belief ρ h . The logit functionamplifies increments in ρ h when ρ h is close to or . We caninterpret BLLR as a measure of confidence on hypothesis h and thus, we refer to it as confidence level .The objective is to design an experiment selection strategy g such that the confidence level C H on the true hypothesis H increases as quickly as possible. In other words, the totalreward after acquiring N observations is the average rate ofincrease in the confidence level on the true hypothesis H andis given by C H ( ρ ( N + 1)) − C H ( ρ (1)) N . (9)More explicitly, we seek to design a strategy g that maximizesthe asymptotic expected reward K ( g ) which is defined as K ( g ) := lim N →∞ inf 1 N E g [ C H ( ρ ( N + 1)) − C H ( ρ (1))] . . . . . − − p log p − p Fig. 2: The logit function is the inverse of the logistic sigmoidfunction / (1+ e − x ) . It is widely used in statistics and machinelearning to quantify confidence level [8].Henceforth, we refer to this problem as the Expected Con-fidence Maximization (ECM) problem for hypothesis testing .For a hypothesis h and a strategy g ∈ G , define J ( g, h ) as lim N →∞ inf 1 N E g [ C H ( ρ ( N + 1)) − C H ( ρ (1)) | H = h ] . The value J ( g, h ) represents the performance of a strategy g conditioned on the hypothesis h . Let J ∗ ( h ) = sup g ∈G J ( g, h ) . (10)For a given hypothesis h , we refer to the problem of max-imizing J ( g, h ) as the hypothesis verification problem. Let g ∗ ( h ) be an optimal verification strategy, i.e. it achieves thesupremum in equation (10). We will later show that theexistence of an optimal strategy g ∗ ( h ) is guaranteed undera mild assumption. A. Hypothesis Testing vs Hypothesis Verification
The optimal verification cost J ∗ ( h ) can be used to obtain anupper bound on the expected reward K ( g ) in the hypothesistesting problem. Lemma II.1.
For any experiment selection strategy g ∈ G ,we have K ( g ) ≤ X h ∈H ρ h (1) J ∗ ( h ) . (11) Proof.
For any strategy g ∈ G , we have K ( g ) = X h ∈H ρ h (1) J ( g, h ) ≤ X h ∈H ρ h (1) J ∗ ( h ) . (12)The last inequality follows from the definition of J ∗ ( h ) .It is clear from the proof of Lemma II.1 that this upperbound is achieved by employing the strategy g ∗ ( h ) whenhypothesis h is true. However, the agent cannot use differentstrategies under different hypotheses because it does not knowthe true hypothesis H . Therefore, we propose an experiment selection strategy of the following form. Similar strategieshave also been used in [2]. ¯ g ( ρ ) = ( g ∗ ( h )( ρ ) if for some h, ρ h > ¯ ρg e ( ρ ) otherwise , (13)where . < ¯ ρ < is a constant and g e is an exploration strategy. The interpretation of the strategy ¯ g is that when theagent has a moderate level of confidence on some hypothesis h , it employs the corresponding verification strategy g ∗ ( h ) .This is to verify if hypothesis h is indeed true by furtherimproving its confidence level. When the agent is not veryconfident about any particular hypothesis, the agent employsan exploration strategy g e . The primary purpose of the ex-ploration strategy is to ensure that ρ H eventually crosses thethreshold ¯ ρ . A naive exploration strategy is to randomly selectevery experiment uniformly. Better exploration strategies doexist [2], [7]. It remains to show that a strategy like ¯ g canindeed achieve the upper bound in Lemma II.1. In this paper,we focus on the hypothesis verification problem. We derivesufficient conditions for an experiment selection strategy tobe an optimal verification strategy.III. M ARKOV D ECISION P ROCESS F ORMULATION
In this section, we show that the verification problem canbe formulated as an infinite-horizon average-reward MDPproblem. All of the following analysis is for h = 1 andwith slight abuse of notation, we henceforth refer to g ∗ (1) and J ( g, as g ∗ and J ( g ) , respectively. The same analysiscan be repeated for any other h to obtain similar results.The state of the MDP is the posterior belief ρ ( n ) . Theposterior belief is updated using Bayes’ rule. Thus, if U n = u and Y n = y , we have ρ h ( n + 1) = ρ h ( n ) p uh ( y ) P h ′ ρ h ′ ( n ) p uh ′ ( y ) . (14)For convenience, we denote the Bayes’ update in (14) by ρ ( n + 1) = F ( ρ ( n ) , U n , Y n ) . (15)Since H = 1 , we have Y n = ξ (1 , U n , W n ) . Clearly, thedynamics of this system are Markovian. The expectation ofaverage confidence rate under a strategy g is given by J N ( g ) : = 1 N E g [ C ( ρ ( N + 1)) − C ( ρ (1))] (16) = 1 N E g N X n =1 [ C ( ρ ( n + 1)) − C ( ρ ( n ))] (17) = 1 N E g N X n =1 E [ C ( ρ ( n + 1)) − C ( ρ ( n )) | I n , U n ]= 1 N E g N X n =1 E [ C ( ρ ( n + 1)) − C ( ρ ( n )) | ρ ( n ) , U n ]=: 1 N E g N X n =1 r ( ρ ( n ) , U n ) . (18) Instantaneous reward for this MDP is the expected instanta-neous increase in the confidence level and is given by r ( ρ , u ) = X y ∈Y p u ( y ) log ρ p u ( y ) P j =1 ρ j p uj ( y ) − log ρ (1 − ρ )= X y ∈Y p u ( y ) log p u ( y ) P j =1 ˜ ρ j p uj ( y ) , (19)where ˜ ρ j = ρ j / (1 − ρ ) . Note that ˜ ρ j is a probabilitydistribution over the set of alternate hypotheses ˜ H = H \ { } .Also, notice that r ( ρ , u ) is a KL-divergence between twodistributions and hence, is always non-negative. The objectiveis to find a strategy g ∗ that maximizes the following averagereward J ( g ) := lim N →∞ inf 1 N N X n =1 E g ( r ( ρ ( n ) , U n )) . (20)We use Dynamic Programming (DP) to characterize optimalsolutions for this infinite-horizon problem. In this framework,it can be shown that the randomized strategies used in [3], [1],[2] asymptotically achieve optimal rate J ∗ . Additionally, weidentify a class of strategies that also achieve optimal rate andpossibly, converge faster to the optimal rate than policies usedin prior works.Consider the following fixed point equation for the infinitehorizon MDP J ′ + w ( ρ ) = max u { r ( ρ , u ) + X y p u ( y ) w ( F ( ρ , u, y )) } , (21)where J ′ ∈ R is some constant and w : ∆ H → R is somemapping. If such J ′ and w exist, then with some algebra(see [9] for details), we can conclude the following for anyexperiment selection strategy g (possibly non-stationary) lim N →∞ sup 1 N N X n =1 E g ( r ( ρ ( n ) , U n )) (22) ≤ lim N →∞ sup 1 N ( E g w ( ρ (1)) − E g w ( ρ ( N + 1))) + J ′ . (23)If we can show that lim N →∞ sup 1 N ( E g w ( ρ (1)) − E g w ( ρ ( N + 1))) ≤ , (24)for every strategy g , then clearly the optimal rate J ∗ ≤ J ′ .Additionally, if for some strategy g ∗ , lim N →∞ inf 1 N (cid:16) E g ∗ w ( ρ (1)) − E g ∗ w ( ρ ( N + 1)) (cid:17) = 0 (25)is satisfied and the experiment selected by g ∗ is a maximizerin the fixed point equation (21), then g ∗ is indeed an optimalstrategy and J ∗ = J ′ [9]. Our objective now is to find J ′ and a function w that satisfy these conditions. We make thefollowing assumption on the conditional distributions p uh ( y ) . Assumption 1.
There exists a constant
B > such that | λ ij ( u, y ) | < B for every experiment u , observation y andhypotheses i, j ∈ H , where λ ij ( u, y ) := log p ui ( y ) p uj ( y ) . We use the following defined quantities throughout ourproofs. Let α ∗ := arg max α ∈ ∆ U min j =1 X u α u D ( p u || p uj ) (26) β ∗ := arg min β ∈ ∆ ˜ H max u ∈U X j =1 β j D ( p u || p uj ) . (27)Since the sets U and H are finite, existence of α ∗ and β ∗ isguaranteed and also, by minimax theorem [10] max α ∈ ∆ U min j =1 X u α u D ( p u || p uj ) = min β ∈ ∆ ˜ H max u ∈U X j =1 β j D ( p u || p uj )=: R ∗ . (28)We refer to the elements in the support of β ∗ as critical hy-potheses and those in the support of α ∗ as critical experiments .In particular, we show that the optimal rate J ∗ = R ∗ .IV. D YNAMIC P ROGRAMMING S OLUTION
In this section, we solve the MDP formulated in Section III.Lemma IV.1 identifies a solution for the fixed point equation(21) and the subsequent Corollary IV.1 is used to obtain anupper bound on J ∗ . We then show that this upper bound canindeed be achieved. Lemma IV.1.
The fixed point equation (21) is satisfied with J ′ = R ∗ and w ( ρ ) = − X j =1 β ∗ j log ρ j − ρ = − X j =1 β ∗ j log ˜ ρ j . (29) Also, any critical experiment is a maximizer in the fixed pointequation (21).Proof.
Define v ( ρ ) := w ( ρ ) + C ( ρ ) , that is v ( ρ ) := X j =1 β ∗ j log ρ ρ j . Therefore, we have for every u X y p u ( y ) w ( F ( ρ , u, y )) − w ( ρ ) (30) = X y p u ( y ) v ( F ( ρ , u, y )) − v ( ρ ) − r ( ρ , u ) . (31)This is because r ( ρ , u ) equal to the expected increase in theconfidence level C ( ρ ) after performing the experiment u .Hence, max u { r ( ρ , u ) + X y p u ( y ) w ( F ( ρ , u, y )) } − w ( ρ ) (32) = max u X y p u ( y ) v ( F ( ρ , u, y )) − v ( ρ ) (33) = max u X y p u ( y ) X j =1 β ∗ j log ρ p u ( y ) ρ j p uj ( y ) − v ( ρ ) (34) = max u X y p u ( y ) X j =1 β ∗ j (log ρ ρ j + log p u ( y ) p uj ( y ) ) − v ( ρ ) (35) = max u X y p u ( y ) X j =1 β ∗ j log p u ( y ) p uj ( y ) + v ( ρ ) − v ( ρ ) (36) = max u X j =1 β ∗ j D ( p u || p uj ) = R ∗ = J ′ . (37) The last equality follows from the fact that β ∗ is a solutionfor the minimax problem and the minimax value is equal to R ∗ . Therefore, J ′ and w satisfy the fixed point equation (21).Note that any critical experiment u is a maximizer in (37). Corollary IV.1.
For any strategy g , we have lim N →∞ sup 1 N ( E g w [ ρ (1)) − E g w ( ρ ( N + 1)]) (38) = lim N →∞ sup 1 N X j =1 β ∗ j E g log ˜ ρ j ( N + 1) ≤ . (39) Proof.
This is simply because ˜ ρ j ( N + 1) ≤ . Theorem IV.1.
The optimal average rate J ∗ ≤ R ∗ . Proof.
This directly follows from the fact that w defined inLemma IV.1 satisfies inequality (24) and with J ′ = R ∗ , thefixed point equation (21) is satisfied. Theorem IV.2.
The optimal average rate J ∗ = R ∗ .Proof. It is sufficient to show that there exists a strategy g ∗ that satisfies lim N →∞ inf 1 N X j =1 β ∗ j E g ∗ log ˜ ρ j ( N + 1) = 0 , (40)and the strategy g ∗ selects only critical experiments. Let X j ( n + 1) = X j ( n ) + λ j ( U n , Y n ) , (41)where X j (1) = log ρ j (1) . If X j ( N +1) = x j and ˜ ρ j ( N +1) =˜ ρ j , we have log ˜ ρ j = x j − log X k =1 e x k . (42)Consider an open-loop randomized strategy where at eachtime, the experiment is selected independently using thedistribution α ∗ . Clearly, this strategy selects only criticalexperiments. Under this open-loop strategy, we have for any j = 1 E [ λ j ( U , Y )] = X u α ∗ u X y p u ( y ) log( p uj ( y ) /p u ( y )) (43) = X u − α ∗ u D ( p u || p uj ) =: − R j . (44)Notice that for every critical hypothesis j , R j = R ∗ and forevery non-critical alternate hypothesis, R j > R ∗ . This followsfrom the definition of α ∗ . Further, we have N E X j ( N + 1) = 1 N E X j (0) − R j . (45)As N → ∞ , the term X j (0) /N → and we can ignore it.Thus, for every critical hypothesis j , N E log ˜ ρ j ( N + 1) = 1 N E [ X j ( N + 1) − log X k =1 e X k ( N +1) ]= − R ∗ − N E log X k =1 e X k ( N +1) . We can ignore the non-critical hypotheses because β ∗ j = 0 fornon-critical hypotheses. If we can show that the second termapproaches − R ∗ as N → ∞ , then clearly, the condition (40) is satisfied with equality. Using Strong Law of Large Num-bers (SLLN) [11], we can conclude that for every alternatehypothesis j , N X j ( N + 1) → − R j , (46)with probability 1. We can use SLLN because of Assumption1. Therefore, max j =1 { N X j ( N + 1) } → max j =1 {− R j } = − R ∗ . (47)Further, because of Assumption 1, X j ( N + 1) /N is uniformlybounded by B for every alternate hypothesis j . Thus, usingbounded convergence theorem [11], we have E max j =1 { N X j ( N + 1) } → − R ∗ . (48)For the log sum exponential function, we have the following max j =1 { X j ( N + 1) } ≤ log X k =1 e X k ( N +1) (49) ≤ max j =1 { X j ( N + 1) } + log |H| − . Therefore, N E log X k =1 e X k ( N +1) → − R ∗ . (50)Thus, the open-loop randomized policy α ∗ is asymptoticallyoptimal and J ∗ = R ∗ .To summarize, the following conditions are sufficient for astationary verification strategy g to be asymptotically optimal:1) The strategy g only selects critical experiments, i.e.experiments from the support of α ∗ .2) The stability criterion in (40) is satisfied, i.e. lim N →∞ inf 1 N X j =1 β ∗ j E g ∗ log ˜ ρ j ( N + 1) = 0 . (51)These conditions suggest that there could be many strategiesother than the open-loop randomized strategy used in TheoremIV.2 that achieve asymptotic optimality.V. N UMERICAL R ESULTS
In this section, we propose a new heuristic based on aKullback-Leibler divergence zero-sum game and demonstratenumerically that this heuristic’s performance is close to themaximum achievable confidence rate R ∗ . We first brieflydescribe all the strategies used in our experiments.
1) Extrinsic Jensen-Shannon (EJS) Divergence:
ExtrinsicJensen-Shannon divergence as a notion of information wasfirst introduced in [7]. Using our notation, EJS for a query u at some belief state ρ is given by EJS ( ρ , u ) = E [ C ( F ( ρ , u, Y )) − C ( ρ )] , (52)where C ( ρ ) = X i ∈H ρ i log ρ i − ρ i = X i ∈H ρ i C i ( ρ ) . (53)Notice that the only random variable in the expression aboveis Y and the expectation is with respect to the distribution P h ∈H ρ h p uh ( y ) on Y . The EJS heuristic selects the experiment u that maximizes EJS ( ρ , u ) for a given state ρ .
2) Open Loop Verification (OPE):
As discussed earlier, thestrategies in [2], [1], [3] when specialized to verification areopen-loop and randomized. According to this strategy, thequeries are randomly selected independently in an open-loopmanner from the distribution α ∗ . Recall that this strategy isasymptotically optimal as shown in Theorem IV.2.
3) KL-divergence Zero-sum Game (KLZ):
We design thefollowing heuristic. Consider a zero-sum game [10] in whichthe first player (maximizing) selects an experiment u ∈ U andthe second player (minimizing) selects an alternate hypothesis j ∈ ˜ H . The payoff for this zero-sum game is the KL-divergence D ( p u || p uj ) . The agent picks an experiment u thatmaximizes P ( ρ , u ) := X j = i ˜ ρ j D ( p u || p uj ) . This strategy can be interpreted as the first player’s best-response when the second player uses the mixed strategy ˜ ρ j to select an alternate hypothesis. Note that the mixed strategy α ∗ used in OPE is an equilibrium strategy for the maximizingplayer. A. Simulation Setup
To simulate these heuristics, we first consider a simplesetup with three hypotheses and two queries. The conditionaldistributions p ui ( y ) for each of these queries are illustrated inFigure 3. y = 0 y = 1 h h h (a) Query u y = 0 y = 1 h h h (b) Query u Fig. 3: Conditional distributions p ui ( y ) for each queryThe queries are designed such that when H = h , the agentis forced to make both queries u and u . This is becausehypotheses h and h are indistinguishable under query u andsimilarly, hypotheses h and h are indistinguishable underquery u . We illustrate the evolution of expected confidencerate J N under hypothesis h in Figure 4. The heuristics EJSand KLZ come very close to the maximum achievable rate.OPE eventually achieves maximal rate but very slowly. y = 0 y = 1 h h − δ δh (a) Query u y = 0 y = 1 h h h − δ δ (b) Query u Fig. 5: Conditional distributions p ui ( y ) for each additionalquery. Here, δ = 0 . .In the second experimental setup, we include two additionalqueries u and u characterized by the distributions in Figure5. When H = h the queries u and u together can eliminate N . . . . . . . . E xp ec t e d C on fi d e n ce R a t e J N EJSKLZOPEOptimal Rate
Fig. 4: Evolution of expected confidence rate J N underhypothesis h in the first setup with queries u and u . Notethe subpar performance of OPE in this setup. N . . . . . . . E xp ec t e d C on fi d e n ce R a t e J N KLZOPEEJSOptimal Rate
Fig. 6: Evolution of expected confidence rate R N underhypothesis h in the second setup with additional queries u and u . Note the subpar performance of OPE and EJS in thissetup.at a much faster rate than u and u . Intuitively, this isbecause when the agent performs u and observes y = 1 , thebelief on h decreases drastically because y = 1 is extremelyunlikely under hypothesis h . Similarly, u is very effectivein eliminating h . The evolution of expected confidence rateunder hypothesis h with additional experiments u and u isshown in Figure 6. The heuristics KLZ and OPE select queries u and u under hypothesis h . But the greedy heuristic EJSusually selects only u and u and fails to realize that queries u and u are more effective under hypothesis h . The greedyEJS approach fails because queries u and u are constructedin such way that they are optimal over longer horizons butare sub-optimal over shorter horizons. Thus the assumptionrequired for asymptotic optimality of EJS in [7] does not holdin this setup. L . . . . . E xp ec t e d S t opp i ng T i m e d i v i d e dby l og L KLZOPEEJSOptimal Rate
Fig. 7: Evolution of expected stopping time under hypothesis h in the first setup with queries u and u . Note the subparperformance of OPE in this setup. L . . . . . . . . . . E xp ec t e d S t opp i ng T i m e d i v i d e dby l og L KLZOPEEJSOptimal Rate
Fig. 8: Evolution of expected stopping time under hypothesis h in the second setup with additional queries u and u . Notethe subpar performance of OPE and EJS in this setup. B. Stopping Time Formulation
In [3], [1], [12], a stopping time formulation for hypothesistesting is considered. The sampling process stops when thebelief on some hypothesis exceeds a threshold or equivalently,when the confidence C h ( ρ ) > log L , where L is a parameter.Let this stopping time be N . Under this stopping criterion,we numerically study the expected stopping time for all thestrategies discussed. The plots in Figures 7 and 8 depictthe quantity E [ N ] / log L as a function of the parameter L .Numerical results suggest that our heuristic performs bettereven in the stopping time formulation.VI. C ONCLUSION
In this paper, we formulate the problem of quickly verifyinga given hypothesis using observations from experiments as aninfinite horizon average cost MDP. We characterize the optimal rate of this MDP using infinite horizon dynamic programming.A stability criterion arises out of the DP equations. We showthat any strategy that satisfies this stability criterion whileselecting experiments from a critical set is asymptotically opti-mal. We proposed a heuristic adaptive strategy and numericallydemonstrated that it performs better than open-loop policies inthe non-asymptotic regime. For future work, we intend to usethis stability criterion, perhaps with additional penalty terms,to design strategies with better non-asymptotic performance.R
EFERENCES[1] Sirin Nitinawarat, George K Atia, and Venugopal V Veeravalli, “Con-trolled sensing for multihypothesis testing,”
IEEE Transactions onAutomatic Control , vol. 58, no. 10, pp. 2451–2464, 2013.[2] Mohammad Naghshvar, Tara Javidi, et al., “Active sequential hypothesistesting,”
The Annals of Statistics , vol. 41, no. 6, pp. 2703–2738, 2013.[3] Herman Chernoff, “Sequential design of experiments,”
The Annals ofMathematical Statistics , vol. 30, no. 3, pp. 755–770, 1959.[4] Mohammad Naghshvar, Tara Javidi, and Kamalika Chaudhuri,“Bayesian active learning with non-persistent noise,”
IEEE Transactionson Information Theory , vol. 61, no. 7, pp. 4080–4098, 2015.[5] Abraham Wald,
Sequential analysis , Courier Corporation, 1973.[6] Stuart Alan Bessler,
Theory and applications of the sequential designof experiments, k-actions and infinitely many experiments , Departmentof Statistics, Stanford University., 1960.[7] Mohammad Naghshvar and Tara Javidi, “Extrinsic jensen-shannondivergence with application in active hypothesis testing,” in
InformationTheory Proceedings (ISIT), 2012 IEEE International Symposium on .IEEE, 2012, pp. 2191–2195.[8] David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant,
Applied logistic regression , vol. 398, John Wiley & Sons, 2013.[9] Panganamala Ramana Kumar and Pravin Varaiya,
Stochastic systems:Estimation, identification, and adaptive control , vol. 75, SIAM, 2015.[10] Martin J Osborne and Ariel Rubinstein,
A course in game theory , MITpress, 1994.[11] Rick Durrett,
Probability: theory and examples , Cambridge universitypress, 2010.[12] Mohammad Naghshvar and Tara Javidi, “Sequentiality and adaptivitygains in active hypothesis testing,”