[PDF] Sequential Experiment Design for Hypothesis Verification

Abstract

Hypothesis testing is an important problem with applications in target localization, clinical trials etc. Many active hypothesis testing strategies operate in two phases: an exploration phase and a verification phase. In the exploration phase, selection of experiments is such that a moderate level of confidence on the true hypothesis is achieved. Subsequent experiment design aims at improving the confidence level on this hypothesis to the desired level. In this paper, the focus is on the verification phase. A confidence measure is defined and active hypothesis testing is formulated as a confidence maximization problem in an infinite-horizon average-reward Partially Observable Markov Decision Process (POMDP) setting. The problem of maximizing confidence conditioned on a particular hypothesis is referred to as the hypothesis verification problem. The relationship between hypothesis testing and verification problems is established. The verification problem can be formulated as a Markov Decision Process (MDP). Optimal solutions for the verification MDP are characterized and a simple heuristic adaptive strategy for verification is proposed based on a zero-sum game interpretation of Kullback-Leibler divergences. It is demonstrated through numerical experiments that the heuristic performs better in some scenarios compared to existing methods in literature.

Full PDF

aa r X i v : . [ s t a t . M L ] D ec Sequential Experiment Design for HypothesisVeriﬁcation

Dhruva Kartik, Ashutosh Nayyar and Urbashi Mitra

Abstract —Hypothesis testing is an important problem withapplications in target localization, clinical trials etc. Many activehypothesis testing strategies operate in two phases: an explorationphase and a veriﬁcation phase. In the exploration phase, selectionof experiments is such that a moderate level of conﬁdence onthe true hypothesis is achieved. Subsequent experiment designaims at improving the conﬁdence level on this hypothesis tothe desired level. In this paper, the focus is on the veriﬁcationphase. A conﬁdence measure is deﬁned and active hypothesistesting is formulated as a conﬁdence maximization problem inan inﬁnite-horizon average-reward Partially Observable MarkovDecision Process (POMDP) setting. The problem of maximizingconﬁdence conditioned on a particular hypothesis is referred toas the hypothesis veriﬁcation problem. The relationship betweenhypothesis testing and veriﬁcation problems is established. Theveriﬁcation problem can be formulated as a Markov DecisionProcess (MDP). Optimal solutions for the veriﬁcation MDPare characterized and a simple heuristic adaptive strategy forveriﬁcation is proposed based on a zero-sum game interpretationof Kullback-Leibler divergences. It is demonstrated throughnumerical experiments that the heuristic performs better in somescenarios compared to existing methods in literature.

I. I

NTRODUCTION

Hypothesis testing is a classical problem and has beenaddressed in various settings. The problem can be describedqualitatively as follows. An agent is interested in a phe-nomenon, and wants to test if the phenomenon conforms toany one of the hypotheses from a known class. The agent canperform various experiments and based on the observationsfrom these experiments, it needs to infer the true hypothesis.As opposed to the one-shot hypothesis testing problem, anactive agent can choose which experiment to perform basedon the observations made in the past. The agent seeks to selectexperiments such that all false hypotheses are eliminated asquickly as possible.Many active hypothesis testing strategies [1], [2] operate intwo phases. The ﬁrst phase is an exploration phase in whichthe experiment design is such that a moderate level of conﬁ-dence is achieved on the true hypothesis. In most cases, thisphase terminates in ﬁnite time almost surely [3]. The secondis a veriﬁcation phase in which the agent has a moderatelevel of conﬁdence on some hypothesis and experiments areselected such that conﬁdence on this hypothesis is improvedto the desired level. When the desired conﬁdence level is very

D. Kartik, A. Nayyar and U. Mitra are with the Department of ElectricalEngineering, University of Southern California, Los Angeles, CA 90089 (e-mail: [email protected]; [email protected]; [email protected]). This researchwas supported, in part, by National Science Foundation under Grant NSFCNS-1213128, CCF-1410009, CPS-1446901, Grant ONR N00014-15-1-2550,and Grant AFOSR FA9550-12-1-0215. high, the veriﬁcation cost dominates the performance. In thispaper, we make the notions of exploration and veriﬁcationmore formal and focus on analyzing the veriﬁcation phase.Active hypothesis testing ﬁnds applications in many areassuch as sensor selection for target detection and localization,state tracking, design of clinical trials and learning unknownfunctions from queries [4]. Consequently, the veriﬁcationphase plays an important role in all these applications.We consider a slightly different mathematical formulationfor hypothesis testing than previously explored [1], [2]. Usingposterior belief on the set of hypotheses, we deﬁne a conﬁ-dence level called Bayesian log-likelihood ratio. The objectiveis to design an experiment selection strategy that maximizesthe expected rate of increase in the conﬁdence level. Ourcontributions in this paper can be summarized as follows:1) We formulate the veriﬁcation problem as an inﬁnite-horizon average-reward Markov Decision Process(MDP) problem.2) We characterize the optimal rate using inﬁnite-horizonDynamic Programming (DP).3) We identify a set of critical experiments. We then showthat any strategy that selects these experiments whilesatisfying a stability criterion is asymptotically optimal.4) We design a new heuristic experiment selection strategyand numerically show that it achieves better performancecompared to existing methods in some scenarios.The rest of the paper is organized as follows. In SectionI-A, we discuss the relation between our problem and thosein prior works. Section II formulates the problem. Section IIIrelates the problem to the MDP framework and deﬁnes criticalexperiments. In Section IV, we solve the DP and in SectionV, we describe an adaptive strategy and numerically compareit with existing policies. We conclude the paper in Section VI.

A. Prior Work

The simplest active hypothesis testing problem was ﬁrstformulated by Chernoff in [3] inspired by Wald’s analysis ofthe sequential probability ratio test [5]. Thereafter, it has beengeneralized in different ways depending on the target appli-cation [1], [2]. A major difference between our formulationand the formulation in these works is the reward structure.Prior works consider a combination of expected stopping timeand Bayesian error probability. Fixed horizon problems havealso been considered and they try to minimize the Bayesianerror probability or maximal error probability [1]. We deﬁnea notion of conﬁdence and maximize the expected rate of in-crease in conﬁdence over long horizons. In prior formulations, if the agent makes an error in guessing the true hypothesis,it incurs a cost of 1 (or some constant c ) irrespective ofits conﬁdence level. Whereas in our formulation, we rewardthe agent for generating observations that result in a highconﬁdence level on the true hypothesis. We believe that ourformulation is related to the stopping time formulation becauseof the strong similarity in the results. In [6], [3], [1], [2],the authors obtain asymptotically tight performance boundsand design policies that are asymptotically optimal. When thepolicies in these works are adapted to the veriﬁcation problemdeﬁned herein, they turn out to be open-loop and randomized.A closed loop policy was designed in [7] but this may notalways be asymptotically optimal. In this paper, we design astrategy for veriﬁcation that is more adaptive and conjecturethat it is asymptotically optimal. B. Notation

Random variables/vectors are denoted by upper case bold-face letters, their realization by the corresponding lower caseletter. We use calligraphic fonts to denote sets (e.g. U ) and ∆ U is the probability simplex over a ﬁnite set U . In general,subscripts are used as time index. There are two exceptions( ρ j ( n ) , X j ( n ) ) to this convention where the subscript de-notes the hypothesis and n denotes time. For time indices n ≤ n , Y n : n is the short hand notation for the variables ( Y n , Y n +1 , ..., Y n ) . For a strategy g , we use P g [ · ] and E g [ · ] to indicate that the probability and expectation depend onthe choice of g . The Shannon entropy of a discrete distribution p over a ﬁnite space Y is given by H ( p ) = − X y ∈Y p ( y ) log p ( y ) . (1)And the Kullback-Leibler divergence between distributions p and q is given by D ( p || q ) = X y ∈Y p ( y ) log p ( y ) q ( y ) . (2)II. P ROBLEM F ORMULATION

Let

H ⊂ N be a ﬁnite set of hypotheses and let H be thetrue hypothesis. At each time n ∈ N , the agent can performan experiment U n ∈ U and obtain an observation Y n ∈ Y .For simplicity, let us also assume that the sets U and Y areﬁnite. When an experiment u ∈ U is performed for the k thtime, the observation Y obtained is given by Y = ξ ( H , u, W uk ) , (3)where { W uk : u ∈ U , k ∈ N } is a collection of mutually inde-pendent and identically distributed primitive random variables.The observation Y n at time n can be expressed as Y n = ξ ( H , U n , W n ) . (4)The probability of observing y after performing an experiment u under hypothesis h is denoted by p uh ( y ) .The information available at time n , denoted by I n , is thecollection of all experiments performed and the correspondingobservations up to time n − , i.e. I n = { U n − , Y n − } . (5) ρ (1) h = 1 u y u ... u ... y u ... u ... u ... ... h = 2 u ... ... u y u ... u ... y u ... u ... NatureAgentNatureAgent

Fig. 1: Agent’s choices and subsequent observations repre-sented as a tree. Every instance of the probability space canbe uniquely represented by a path in this tree.Actions of the agent at time n can be functions of I n . Let thepolicy used for selecting the experiment be g n , i.e. U n = g n ( I n ) . (6)The sequence of all the policies { g n } is denoted by g which isreferred to as a strategy . Let the collection of all such strategiesbe G .Using the available information, the agent forms a posteriorbelief ρ ( n ) on H at time n which is given by ρ h ( n ) = P [ H = h | Y n − , U n − ] . (7) Deﬁnition II.1 (Bayesian Log-Likelihood Ratio) . TheBayesian log-likelihood ratio C h ( ρ ) associated with an hy-pothesis h ∈ H is deﬁned as C h ( ρ ) := log ρ h − ρ h . (8)The Bayesian log-likelihood ratio (BLLR) is the logarithmof the ratio of the probability that hypothesis h is true versusthe probability that hypothesis h is not true. BLLR is obtainedby applying the logit function (also referred to as log-odds instatistics [8]) on the posterior belief ρ h . The logit functionampliﬁes increments in ρ h when ρ h is close to or . We caninterpret BLLR as a measure of conﬁdence on hypothesis h and thus, we refer to it as conﬁdence level .The objective is to design an experiment selection strategy g such that the conﬁdence level C H on the true hypothesis H increases as quickly as possible. In other words, the totalreward after acquiring N observations is the average rate ofincrease in the conﬁdence level on the true hypothesis H andis given by C H ( ρ ( N + 1)) − C H ( ρ (1)) N . (9)More explicitly, we seek to design a strategy g that maximizesthe asymptotic expected reward K ( g ) which is deﬁned as K ( g ) := lim N →∞ inf 1 N E g [ C H ( ρ ( N + 1)) − C H ( ρ (1))] . . . . . − − p log p − p Fig. 2: The logit function is the inverse of the logistic sigmoidfunction / (1+ e − x ) . It is widely used in statistics and machinelearning to quantify conﬁdence level [8].Henceforth, we refer to this problem as the Expected Con-ﬁdence Maximization (ECM) problem for hypothesis testing .For a hypothesis h and a strategy g ∈ G , deﬁne J ( g, h ) as lim N →∞ inf 1 N E g [ C H ( ρ ( N + 1)) − C H ( ρ (1)) | H = h ] . The value J ( g, h ) represents the performance of a strategy g conditioned on the hypothesis h . Let J ∗ ( h ) = sup g ∈G J ( g, h ) . (10)For a given hypothesis h , we refer to the problem of max-imizing J ( g, h ) as the hypothesis veriﬁcation problem. Let g ∗ ( h ) be an optimal veriﬁcation strategy, i.e. it achieves thesupremum in equation (10). We will later show that theexistence of an optimal strategy g ∗ ( h ) is guaranteed undera mild assumption. A. Hypothesis Testing vs Hypothesis Veriﬁcation

The optimal veriﬁcation cost J ∗ ( h ) can be used to obtain anupper bound on the expected reward K ( g ) in the hypothesistesting problem. Lemma II.1.

For any experiment selection strategy g ∈ G ,we have K ( g ) ≤ X h ∈H ρ h (1) J ∗ ( h ) . (11) Proof.

For any strategy g ∈ G , we have K ( g ) = X h ∈H ρ h (1) J ( g, h ) ≤ X h ∈H ρ h (1) J ∗ ( h ) . (12)The last inequality follows from the deﬁnition of J ∗ ( h ) .It is clear from the proof of Lemma II.1 that this upperbound is achieved by employing the strategy g ∗ ( h ) whenhypothesis h is true. However, the agent cannot use differentstrategies under different hypotheses because it does not knowthe true hypothesis H . Therefore, we propose an experiment selection strategy of the following form. Similar strategieshave also been used in [2]. ¯ g ( ρ ) = ( g ∗ ( h )( ρ ) if for some h, ρ h > ¯ ρg e ( ρ ) otherwise , (13)where . < ¯ ρ < is a constant and g e is an exploration strategy. The interpretation of the strategy ¯ g is that when theagent has a moderate level of conﬁdence on some hypothesis h , it employs the corresponding veriﬁcation strategy g ∗ ( h ) .This is to verify if hypothesis h is indeed true by furtherimproving its conﬁdence level. When the agent is not veryconﬁdent about any particular hypothesis, the agent employsan exploration strategy g e . The primary purpose of the ex-ploration strategy is to ensure that ρ H eventually crosses thethreshold ¯ ρ . A naive exploration strategy is to randomly selectevery experiment uniformly. Better exploration strategies doexist [2], [7]. It remains to show that a strategy like ¯ g canindeed achieve the upper bound in Lemma II.1. In this paper,we focus on the hypothesis veriﬁcation problem. We derivesufﬁcient conditions for an experiment selection strategy tobe an optimal veriﬁcation strategy.III. M ARKOV D ECISION P ROCESS F ORMULATION

In this section, we show that the veriﬁcation problem canbe formulated as an inﬁnite-horizon average-reward MDPproblem. All of the following analysis is for h = 1 andwith slight abuse of notation, we henceforth refer to g ∗ (1) and J ( g, as g ∗ and J ( g ) , respectively. The same analysiscan be repeated for any other h to obtain similar results.The state of the MDP is the posterior belief ρ ( n ) . Theposterior belief is updated using Bayes’ rule. Thus, if U n = u and Y n = y , we have ρ h ( n + 1) = ρ h ( n ) p uh ( y ) P h ′ ρ h ′ ( n ) p uh ′ ( y ) . (14)For convenience, we denote the Bayes’ update in (14) by ρ ( n + 1) = F ( ρ ( n ) , U n , Y n ) . (15)Since H = 1 , we have Y n = ξ (1 , U n , W n ) . Clearly, thedynamics of this system are Markovian. The expectation ofaverage conﬁdence rate under a strategy g is given by J N ( g ) : = 1 N E g [ C ( ρ ( N + 1)) − C ( ρ (1))] (16) = 1 N E g N X n =1 [ C ( ρ ( n + 1)) − C ( ρ ( n ))] (17) = 1 N E g N X n =1 E [ C ( ρ ( n + 1)) − C ( ρ ( n )) | I n , U n ]= 1 N E g N X n =1 E [ C ( ρ ( n + 1)) − C ( ρ ( n )) | ρ ( n ) , U n ]=: 1 N E g N X n =1 r ( ρ ( n ) , U n ) . (18) Instantaneous reward for this MDP is the expected instanta-neous increase in the conﬁdence level and is given by r ( ρ , u ) = X y ∈Y p u ( y ) log ρ p u ( y ) P j =1 ρ j p uj ( y ) − log ρ (1 − ρ )= X y ∈Y p u ( y ) log p u ( y ) P j =1 ˜ ρ j p uj ( y ) , (19)where ˜ ρ j = ρ j / (1 − ρ ) . Note that ˜ ρ j is a probabilitydistribution over the set of alternate hypotheses ˜ H = H \ { } .Also, notice that r ( ρ , u ) is a KL-divergence between twodistributions and hence, is always non-negative. The objectiveis to ﬁnd a strategy g ∗ that maximizes the following averagereward J ( g ) := lim N →∞ inf 1 N N X n =1 E g ( r ( ρ ( n ) , U n )) . (20)We use Dynamic Programming (DP) to characterize optimalsolutions for this inﬁnite-horizon problem. In this framework,it can be shown that the randomized strategies used in [3], [1],[2] asymptotically achieve optimal rate J ∗ . Additionally, weidentify a class of strategies that also achieve optimal rate andpossibly, converge faster to the optimal rate than policies usedin prior works.Consider the following ﬁxed point equation for the inﬁnitehorizon MDP J ′ + w ( ρ ) = max u { r ( ρ , u ) + X y p u ( y ) w ( F ( ρ , u, y )) } , (21)where J ′ ∈ R is some constant and w : ∆ H → R is somemapping. If such J ′ and w exist, then with some algebra(see [9] for details), we can conclude the following for anyexperiment selection strategy g (possibly non-stationary) lim N →∞ sup 1 N N X n =1 E g ( r ( ρ ( n ) , U n )) (22) ≤ lim N →∞ sup 1 N ( E g w ( ρ (1)) − E g w ( ρ ( N + 1))) + J ′ . (23)If we can show that lim N →∞ sup 1 N ( E g w ( ρ (1)) − E g w ( ρ ( N + 1))) ≤ , (24)for every strategy g , then clearly the optimal rate J ∗ ≤ J ′ .Additionally, if for some strategy g ∗ , lim N →∞ inf 1 N (cid:16) E g ∗ w ( ρ (1)) − E g ∗ w ( ρ ( N + 1)) (cid:17) = 0 (25)is satisﬁed and the experiment selected by g ∗ is a maximizerin the ﬁxed point equation (21), then g ∗ is indeed an optimalstrategy and J ∗ = J ′ [9]. Our objective now is to ﬁnd J ′ and a function w that satisfy these conditions. We make thefollowing assumption on the conditional distributions p uh ( y ) . Assumption 1.

There exists a constant

B > such that | λ ij ( u, y ) | < B for every experiment u , observation y andhypotheses i, j ∈ H , where λ ij ( u, y ) := log p ui ( y ) p uj ( y ) . We use the following deﬁned quantities throughout ourproofs. Let α ∗ := arg max α ∈ ∆ U min j =1 X u α u D ( p u || p uj ) (26) β ∗ := arg min β ∈ ∆ ˜ H max u ∈U X j =1 β j D ( p u || p uj ) . (27)Since the sets U and H are ﬁnite, existence of α ∗ and β ∗ isguaranteed and also, by minimax theorem [10] max α ∈ ∆ U min j =1 X u α u D ( p u || p uj ) = min β ∈ ∆ ˜ H max u ∈U X j =1 β j D ( p u || p uj )=: R ∗ . (28)We refer to the elements in the support of β ∗ as critical hy-potheses and those in the support of α ∗ as critical experiments .In particular, we show that the optimal rate J ∗ = R ∗ .IV. D YNAMIC P ROGRAMMING S OLUTION

In this section, we solve the MDP formulated in Section III.Lemma IV.1 identiﬁes a solution for the ﬁxed point equation(21) and the subsequent Corollary IV.1 is used to obtain anupper bound on J ∗ . We then show that this upper bound canindeed be achieved. Lemma IV.1.

The ﬁxed point equation (21) is satisﬁed with J ′ = R ∗ and w ( ρ ) = − X j =1 β ∗ j log ρ j − ρ = − X j =1 β ∗ j log ˜ ρ j . (29) Also, any critical experiment is a maximizer in the ﬁxed pointequation (21).Proof.

Deﬁne v ( ρ ) := w ( ρ ) + C ( ρ ) , that is v ( ρ ) := X j =1 β ∗ j log ρ ρ j . Therefore, we have for every u X y p u ( y ) w ( F ( ρ , u, y )) − w ( ρ ) (30) = X y p u ( y ) v ( F ( ρ , u, y )) − v ( ρ ) − r ( ρ , u ) . (31)This is because r ( ρ , u ) equal to the expected increase in theconﬁdence level C ( ρ ) after performing the experiment u .Hence, max u { r ( ρ , u ) + X y p u ( y ) w ( F ( ρ , u, y )) } − w ( ρ ) (32) = max u X y p u ( y ) v ( F ( ρ , u, y )) − v ( ρ ) (33) = max u X y p u ( y ) X j =1 β ∗ j log ρ p u ( y ) ρ j p uj ( y ) − v ( ρ ) (34) = max u X y p u ( y ) X j =1 β ∗ j (log ρ ρ j + log p u ( y ) p uj ( y ) ) − v ( ρ ) (35) = max u X y p u ( y ) X j =1 β ∗ j log p u ( y ) p uj ( y ) + v ( ρ ) − v ( ρ ) (36) = max u X j =1 β ∗ j D ( p u || p uj ) = R ∗ = J ′ . (37) The last equality follows from the fact that β ∗ is a solutionfor the minimax problem and the minimax value is equal to R ∗ . Therefore, J ′ and w satisfy the ﬁxed point equation (21).Note that any critical experiment u is a maximizer in (37). Corollary IV.1.

For any strategy g , we have lim N →∞ sup 1 N ( E g w [ ρ (1)) − E g w ( ρ ( N + 1)]) (38) = lim N →∞ sup 1 N X j =1 β ∗ j E g log ˜ ρ j ( N + 1) ≤ . (39) Proof.

This is simply because ˜ ρ j ( N + 1) ≤ . Theorem IV.1.

The optimal average rate J ∗ ≤ R ∗ . Proof.

This directly follows from the fact that w deﬁned inLemma IV.1 satisﬁes inequality (24) and with J ′ = R ∗ , theﬁxed point equation (21) is satisﬁed. Theorem IV.2.

The optimal average rate J ∗ = R ∗ .Proof. It is sufﬁcient to show that there exists a strategy g ∗ that satisﬁes lim N →∞ inf 1 N X j =1 β ∗ j E g ∗ log ˜ ρ j ( N + 1) = 0 , (40)and the strategy g ∗ selects only critical experiments. Let X j ( n + 1) = X j ( n ) + λ j ( U n , Y n ) , (41)where X j (1) = log ρ j (1) . If X j ( N +1) = x j and ˜ ρ j ( N +1) =˜ ρ j , we have log ˜ ρ j = x j − log X k =1 e x k . (42)Consider an open-loop randomized strategy where at eachtime, the experiment is selected independently using thedistribution α ∗ . Clearly, this strategy selects only criticalexperiments. Under this open-loop strategy, we have for any j = 1 E [ λ j ( U , Y )] = X u α ∗ u X y p u ( y ) log( p uj ( y ) /p u ( y )) (43) = X u − α ∗ u D ( p u || p uj ) =: − R j . (44)Notice that for every critical hypothesis j , R j = R ∗ and forevery non-critical alternate hypothesis, R j > R ∗ . This followsfrom the deﬁnition of α ∗ . Further, we have N E X j ( N + 1) = 1 N E X j (0) − R j . (45)As N → ∞ , the term X j (0) /N → and we can ignore it.Thus, for every critical hypothesis j , N E log ˜ ρ j ( N + 1) = 1 N E [ X j ( N + 1) − log X k =1 e X k ( N +1) ]= − R ∗ − N E log X k =1 e X k ( N +1) . We can ignore the non-critical hypotheses because β ∗ j = 0 fornon-critical hypotheses. If we can show that the second termapproaches − R ∗ as N → ∞ , then clearly, the condition (40) is satisﬁed with equality. Using Strong Law of Large Num-bers (SLLN) [11], we can conclude that for every alternatehypothesis j , N X j ( N + 1) → − R j , (46)with probability 1. We can use SLLN because of Assumption1. Therefore, max j =1 { N X j ( N + 1) } → max j =1 {− R j } = − R ∗ . (47)Further, because of Assumption 1, X j ( N + 1) /N is uniformlybounded by B for every alternate hypothesis j . Thus, usingbounded convergence theorem [11], we have E max j =1 { N X j ( N + 1) } → − R ∗ . (48)For the log sum exponential function, we have the following max j =1 { X j ( N + 1) } ≤ log X k =1 e X k ( N +1) (49) ≤ max j =1 { X j ( N + 1) } + log |H| − . Therefore, N E log X k =1 e X k ( N +1) → − R ∗ . (50)Thus, the open-loop randomized policy α ∗ is asymptoticallyoptimal and J ∗ = R ∗ .To summarize, the following conditions are sufﬁcient for astationary veriﬁcation strategy g to be asymptotically optimal:1) The strategy g only selects critical experiments, i.e.experiments from the support of α ∗ .2) The stability criterion in (40) is satisﬁed, i.e. lim N →∞ inf 1 N X j =1 β ∗ j E g ∗ log ˜ ρ j ( N + 1) = 0 . (51)These conditions suggest that there could be many strategiesother than the open-loop randomized strategy used in TheoremIV.2 that achieve asymptotic optimality.V. N UMERICAL R ESULTS

In this section, we propose a new heuristic based on aKullback-Leibler divergence zero-sum game and demonstratenumerically that this heuristic’s performance is close to themaximum achievable conﬁdence rate R ∗ . We ﬁrst brieﬂydescribe all the strategies used in our experiments.

1) Extrinsic Jensen-Shannon (EJS) Divergence:

ExtrinsicJensen-Shannon divergence as a notion of information wasﬁrst introduced in [7]. Using our notation, EJS for a query u at some belief state ρ is given by EJS ( ρ , u ) = E [ C ( F ( ρ , u, Y )) − C ( ρ )] , (52)where C ( ρ ) = X i ∈H ρ i log ρ i − ρ i = X i ∈H ρ i C i ( ρ ) . (53)Notice that the only random variable in the expression aboveis Y and the expectation is with respect to the distribution P h ∈H ρ h p uh ( y ) on Y . The EJS heuristic selects the experiment u that maximizes EJS ( ρ , u ) for a given state ρ .

2) Open Loop Veriﬁcation (OPE):

As discussed earlier, thestrategies in [2], [1], [3] when specialized to veriﬁcation areopen-loop and randomized. According to this strategy, thequeries are randomly selected independently in an open-loopmanner from the distribution α ∗ . Recall that this strategy isasymptotically optimal as shown in Theorem IV.2.

3) KL-divergence Zero-sum Game (KLZ):

We design thefollowing heuristic. Consider a zero-sum game [10] in whichthe ﬁrst player (maximizing) selects an experiment u ∈ U andthe second player (minimizing) selects an alternate hypothesis j ∈ ˜ H . The payoff for this zero-sum game is the KL-divergence D ( p u || p uj ) . The agent picks an experiment u thatmaximizes P ( ρ , u ) := X j = i ˜ ρ j D ( p u || p uj ) . This strategy can be interpreted as the ﬁrst player’s best-response when the second player uses the mixed strategy ˜ ρ j to select an alternate hypothesis. Note that the mixed strategy α ∗ used in OPE is an equilibrium strategy for the maximizingplayer. A. Simulation Setup

To simulate these heuristics, we ﬁrst consider a simplesetup with three hypotheses and two queries. The conditionaldistributions p ui ( y ) for each of these queries are illustrated inFigure 3. y = 0 y = 1 h h h (a) Query u y = 0 y = 1 h h h (b) Query u Fig. 3: Conditional distributions p ui ( y ) for each queryThe queries are designed such that when H = h , the agentis forced to make both queries u and u . This is becausehypotheses h and h are indistinguishable under query u andsimilarly, hypotheses h and h are indistinguishable underquery u . We illustrate the evolution of expected conﬁdencerate J N under hypothesis h in Figure 4. The heuristics EJSand KLZ come very close to the maximum achievable rate.OPE eventually achieves maximal rate but very slowly. y = 0 y = 1 h h − δ δh (a) Query u y = 0 y = 1 h h h − δ δ (b) Query u Fig. 5: Conditional distributions p ui ( y ) for each additionalquery. Here, δ = 0 . .In the second experimental setup, we include two additionalqueries u and u characterized by the distributions in Figure5. When H = h the queries u and u together can eliminate N . . . . . . . . E xp ec t e d C on ﬁ d e n ce R a t e J N EJSKLZOPEOptimal Rate

Fig. 4: Evolution of expected conﬁdence rate J N underhypothesis h in the ﬁrst setup with queries u and u . Notethe subpar performance of OPE in this setup. N . . . . . . . E xp ec t e d C on ﬁ d e n ce R a t e J N KLZOPEEJSOptimal Rate

Fig. 6: Evolution of expected conﬁdence rate R N underhypothesis h in the second setup with additional queries u and u . Note the subpar performance of OPE and EJS in thissetup.at a much faster rate than u and u . Intuitively, this isbecause when the agent performs u and observes y = 1 , thebelief on h decreases drastically because y = 1 is extremelyunlikely under hypothesis h . Similarly, u is very effectivein eliminating h . The evolution of expected conﬁdence rateunder hypothesis h with additional experiments u and u isshown in Figure 6. The heuristics KLZ and OPE select queries u and u under hypothesis h . But the greedy heuristic EJSusually selects only u and u and fails to realize that queries u and u are more effective under hypothesis h . The greedyEJS approach fails because queries u and u are constructedin such way that they are optimal over longer horizons butare sub-optimal over shorter horizons. Thus the assumptionrequired for asymptotic optimality of EJS in [7] does not holdin this setup. L . . . . . E xp ec t e d S t opp i ng T i m e d i v i d e dby l og L KLZOPEEJSOptimal Rate

Fig. 7: Evolution of expected stopping time under hypothesis h in the ﬁrst setup with queries u and u . Note the subparperformance of OPE in this setup. L . . . . . . . . . . E xp ec t e d S t opp i ng T i m e d i v i d e dby l og L KLZOPEEJSOptimal Rate

Fig. 8: Evolution of expected stopping time under hypothesis h in the second setup with additional queries u and u . Notethe subpar performance of OPE and EJS in this setup. B. Stopping Time Formulation

In [3], [1], [12], a stopping time formulation for hypothesistesting is considered. The sampling process stops when thebelief on some hypothesis exceeds a threshold or equivalently,when the conﬁdence C h ( ρ ) > log L , where L is a parameter.Let this stopping time be N . Under this stopping criterion,we numerically study the expected stopping time for all thestrategies discussed. The plots in Figures 7 and 8 depictthe quantity E [ N ] / log L as a function of the parameter L .Numerical results suggest that our heuristic performs bettereven in the stopping time formulation.VI. C ONCLUSION

In this paper, we formulate the problem of quickly verifyinga given hypothesis using observations from experiments as aninﬁnite horizon average cost MDP. We characterize the optimal rate of this MDP using inﬁnite horizon dynamic programming.A stability criterion arises out of the DP equations. We showthat any strategy that satisﬁes this stability criterion whileselecting experiments from a critical set is asymptotically opti-mal. We proposed a heuristic adaptive strategy and numericallydemonstrated that it performs better than open-loop policies inthe non-asymptotic regime. For future work, we intend to usethis stability criterion, perhaps with additional penalty terms,to design strategies with better non-asymptotic performance.R

EFERENCES[1] Sirin Nitinawarat, George K Atia, and Venugopal V Veeravalli, “Con-trolled sensing for multihypothesis testing,”

IEEE Transactions onAutomatic Control , vol. 58, no. 10, pp. 2451–2464, 2013.[2] Mohammad Naghshvar, Tara Javidi, et al., “Active sequential hypothesistesting,”

The Annals of Statistics , vol. 41, no. 6, pp. 2703–2738, 2013.[3] Herman Chernoff, “Sequential design of experiments,”

The Annals ofMathematical Statistics , vol. 30, no. 3, pp. 755–770, 1959.[4] Mohammad Naghshvar, Tara Javidi, and Kamalika Chaudhuri,“Bayesian active learning with non-persistent noise,”

IEEE Transactionson Information Theory , vol. 61, no. 7, pp. 4080–4098, 2015.[5] Abraham Wald,

Sequential analysis , Courier Corporation, 1973.[6] Stuart Alan Bessler,

Theory and applications of the sequential designof experiments, k-actions and inﬁnitely many experiments , Departmentof Statistics, Stanford University., 1960.[7] Mohammad Naghshvar and Tara Javidi, “Extrinsic jensen-shannondivergence with application in active hypothesis testing,” in

InformationTheory Proceedings (ISIT), 2012 IEEE International Symposium on .IEEE, 2012, pp. 2191–2195.[8] David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant,

Applied logistic regression , vol. 398, John Wiley & Sons, 2013.[9] Panganamala Ramana Kumar and Pravin Varaiya,

Stochastic systems:Estimation, identiﬁcation, and adaptive control , vol. 75, SIAM, 2015.[10] Martin J Osborne and Ariel Rubinstein,

A course in game theory , MITpress, 1994.[11] Rick Durrett,

Probability: theory and examples , Cambridge universitypress, 2010.[12] Mohammad Naghshvar and Tara Javidi, “Sequentiality and adaptivitygains in active hypothesis testing,”