[PDF] Distributed Hypothesis Testing over a Noisy Channel: Error-exponents Trade-off

Abstract

A two-terminal distributed binary hypothesis testing (HT) problem over a noisy channel is studied. The two terminals, called the observer and the decision maker, each has access to n independent and identically distributed samples, denoted by U and V , respectively. The observer communicates to the decision maker over a discrete memoryless channel (DMC), and the decision maker performs a binary hypothesis test on the joint probability distribution of (U,V) based on V and the noisy information received from the observer. The trade-off between the exponents of the type I and type II error probabilities in HT is investigated. Two inner bounds are obtained, one using a separation-based scheme that involves type-based compression and unequal error-protection channel coding, and the other using a joint scheme that incorporates type-based hybrid coding. The separation-based scheme is shown to recover the inner bound obtained by Han and Kobayashi for the special case of a rate-limited noiseless channel, and also the one obtained by the authors previously for a corner point of the trade-off. Exact single-letter characterization of the optimal trade-off is established for the special case of testing for the marginal distribution of U , when V is unavailable. Our results imply that a separation holds in this case, in the sense that the optimal trade-off is achieved by a scheme that performs independent HT and channel coding. Finally, we show via an example that the joint scheme achieves a strictly tighter bound than the separation-based scheme for some points of the error-exponent trade-off.

Full PDF

11 Distributed Hypothesis Testing over a NoisyChannel: Error-exponents Trade-off

Sreejith Sreekumar and Deniz G¨und¨uz

Abstract

A two-terminal distributed binary hypothesis testing (HT) problem over a noisy channel is studied. The twoterminals, called the observer and the decision maker , each has access to n independent and identically distributedsamples, denoted by U and V , respectively. The observer communicates to the decision maker over a discretememoryless channel (DMC), and the decision maker performs a binary hypothesis test on the joint probabilitydistribution of ( U , V ) based on V and the noisy information received from the observer. The trade-off between theexponents of the type I and type II error probabilities in HT is investigated. Two inner bounds are obtained, one usinga separation-based scheme that involves type-based compression and unequal error-protection channel coding, andthe other using a joint scheme that incorporates type-based hybrid coding . The separation-based scheme is shown torecover the inner bound obtained by Han and Kobayashi for the special case of a rate-limited noiseless channel, andalso the one obtained by the authors previously for a corner point of the trade-off. Exact single-letter characterizationof the optimal trade-off is established for the special case of testing for the marginal distribution of U , when V isunavailable. Our results imply that a separation holds in this case, in the sense that the optimal trade-off is achievedby a scheme that performs independent HT and channel coding. Finally, we show via an example that the joint schemeachieves a strictly tighter bound than the separation-based scheme for some points of the error-exponent trade-off. Index Terms

Distributed hypothesis testing, noisy channel, discrete memoryless channel, error-exponents, separate hypothesistesting and channel coding, joint source-channel coding, hybrid coding.

I. I

NTRODUCTION

Hypothesis testing (HT), which refers to the problem of choosing between one or more alternatives, based onavailable data is a topic that plays a central role in statistics and information theory. Distributed HT (DHT) problemsarise in situations where the test data is scattered across multiple terminals, and needs to be communicated to acentral terminal, called the decision maker , which performs the hypothesis test. The need to jointly optimize thecommunication scheme and the hypothesis test makes DHT problems much more challenging than their centralizedcounterparts. Indeed, while an efﬁcient characterization of the optimal hypothesis test (HT) and its asymptotic

This work is supported in part by the European Research Council (ERC) through Starting Grant BEACON (agreement a r X i v : . [ s t a t . O T ] M a y performance is well-known in the centralized setting, thanks to [1]–[5], the same problem in even the simplestdistributed setting remains open except for some special cases [6]–[10].In this work, we consider a DHT problem, in which the observer communicates to the decision maker over anoisy channel. The observer and the decision maker each has access to independent and identically distributed (i.i.d.)samples, denoted by U and V , respectively. Based on the information received from the observer over the noisychannel and its own observations V , the decision maker performs a binary hypothesis test on the joint distributionof ( U , V ) . Our goal is to characterize the trade-off between the best achievable exponents of the type I and typeII error probabilities. We will refer to this problem as DHT over a noisy channel , and its special instance with thenoisy channel replaced by a rate-limited noiseless channel as

DHT over a noiseless channel . A. Background

The information-theoretic study of DHT problems under communication constraints was initiated by Berger in[11]. The ﬁrst rigorous analysis of the same was performed by Ahlswede and Csisz´ar in [6] for DHT over a noiselesschannel. Therein, the objective is to characterize the maximum asymptotic value of the exponent of the type II errorprobability, known as the type II error-exponent (T2EE), subject to a ﬁxed constraint on the type I error probability.The authors establish several fundamental results including a lower bound on the optimal T2EE, and a strongconverse which shows that the optimal T2EE is independent of the type I error probability constraint. Furthermore,a single-letter characterization of the optimal T2EE is obtained for a special case of HT known as testing againstindependence (TAI), in which the joint distribution factors as a product of the marginal distributions under thealternate hypothesis. Improved lower bounds on the T2EE were later obtained by Han and Shimokawa et. al. in [7]and [8], respectively, and the strong converse was extended to zero-rate settings by Shalaby and Papamarcou [12].While all the above mentioned works focus on the characterization of T2EE, the trade-off between the exponentsof both the type I and type II error probabilities in the same setting was ﬁrst explored by Han and Kobayashi [13].In the recent decades, there has been a renewed interest in DHT problems and several interesting extensionsof [6] have been studied. These include DHT under a successive reﬁnement model [14], multi-terminal settings[9], [15]–[18], DHT under security or privacy constraints [19]–[22], DHT with lossy compression [23], interactivesettings [24]–[26], to name a few. There have been some progress on the characterization of error-exponents forDHT over a noiseless channel as well. In this regard, improved converse bounds on the optimal T2EE for the testingof correlation of two bivariate standard normal distributions over a noiseless channel are established in [9] and [27].New inner bounds on the type I and type II error-exponents trade-off are established in [28] and [29], where [28]focuses on the special case of testing of a doubly-binary symmetric source, while [29] considers the general case.While the above works focus on the asymptotic performance in DHT, a Neyman-Pearson (NP) like test for zero-rate multiterminal HT is proposed in [30], which in addition to achieving the optimal type II error-exponent, alsoachieves the optimal second order term in the exponent among the class of all type-based testing schemes.When the communication channel between the observer and the decision maker is noisy, then besides the type Iand type II errors arising due to a compression requirement in rate-limited settings, additional errors may possiblyoccur due to the channel noise. Since the reliability of the transmitted message depends on the communication rate employed [31], there is a trade-off between transmitting less information more reliably and transmitting moreinformation less reliably, to the decision maker. In [32], we proved a single-letter characterization of the optimalT2EE for TAI over a noisy channel. An interesting aspect of this characterization is that the optimal T2EE dependson the communication channel only through its capacity. Extensions of this problem to general HT was investigatedin [10] and [33]. In [10], we obtained lower bounds on the optimal T2EE using a separate HT and channel codingscheme, and a joint scheme that uses hybrid coding [34] for communication between the observer and the decisionmaker. In contrast to TAI, these lower bounds depend more intricately on the channel transition kernel P Y | X thanonly through its capacity. We also showed via an example in [10] that our joint scheme strictly outperforms ourseparation-based scheme for some instances of HT. B. Contributions

In this work, our objective is to study the type I and type II error-exponents trade-off for DHT over a noisychannel. Our goal can be considered as a generalization of [13] from noiseless rate-limited channels to noisychannels, and also of [10] from a type I error probability constraint to a positive type I error-exponent constraint.Our main contributions can be summarized as follows:(i) Firstly, for the degenerate case of testing for the marginal distribution of U when V is unavailable at thedecision maker, we establish a single-letter characterization of the optimal trade-off between the error-exponents(Proposition 1). We will refer to this setting as the remote HT (RHT) problem.(ii) For the general case of DHT over a noisy channel, we obtain an inner bound (Theorem 4) on the error-exponents trade-off by using a separate HT and channel coding scheme (SHTCC) that is a combination of atype-based quantize-bin strategy and unequal error-protection scheme of [35]. This result is shown to recoverthe bounds established in [13] and [10]. Furthermore, we evaluate Theorem 4 for two important instancesof DHT, namely TAI and its opposite, i.e., testing against dependence (TAD), in which the joint distributionunder the null hypothesis factors as a product of marginal distributions.(iii) We also obtain a second inner bound (Theorem 5) on the error-exponents trade-off by using a joint HT andchannel coding scheme (JHTCC) based on hybrid coding [34]. Subsequently, we show via an example thatthe JHTCC scheme strictly outperforms the SHTCC scheme for some points on the error-exponent trade-off.The DHT problem considered here has been recently investigated in [36], where an inner bound ( [36, Theorem 2])on the error-exponents trade-off is obtained using a combination of a type-based quantization scheme and unequalerror protection scheme of [37] with two special messages. While [36, Theorem 2] is quite general, it is hard tocompute as the acceptance regions for each hypothesis and certain other parameters which characterize the bound areleft as open variables for optimization. In comparison, Theorem 4 and Theorem 5 are relatively simpler to evaluateas the acceptance regions are explicitly speciﬁed and there are less parameters to optimize. A detailed comparisonbetween Theorem 5 and [36, Theorem 2] also reveals that neither of these bounds subsume each other due to thefollowing reason. On one hand, the JHTCC scheme achieving the inner bound in Theorem 5 uses a stronger decoding rule compared to the scheme in [36, Theorem 2], thus improving the factor in the error-exponent due to decodingerrors. But, on the other hand, the scheme in [36] is more ﬂexible since the choice of certain parameters such asthe acceptance regions for each hypothesis are left as open parameters to be optimized. A direct computationalcomparison appears prohibitive as optimizing over all the open parameters in [36] is a formidable task. C. Organization

The remainder of the paper is organized as follows. Section II provides the detailed problem formulation alongwith the required deﬁnitions. The main results are presented in Section III. The proofs are furnished in Section IV.Finally, concluding remarks are given in Section V.II. P

RELIMINARIES AND P ROBLEM FORMULATION

A. Notation

We use the following notation. All logarithms are with respect to the natural base e . N , R , R ≥ , and ¯ R denotesthe set of natural, real, non-negative real and extended real numbers, respectively. For a, b ∈ R ≥ , [ a : b ] := { n ∈ N : a ≤ n ≤ b } and [ b ] := [1 : b ] . Calligraphic letters, e.g., X , denote sets while X c and |X | standsfor its complement and cardinality, respectively. For n ∈ N , X n denotes the n -fold Cartesian product of X , and x n = ( x , · · · , x n ) denotes an element of X n . Whenever the dimension n is clear from the context, bold-faceletters denotes vectors or sequences, e.g., x for x n . For i, j ∈ N such that i ≤ j , x ji := ( x i , x i +1 , · · · , x j ) ; thesubscript is omitted when i = 1 .Let (Ω , F , P ) be a probability space, where Ω , F and P are the sample space, σ -algebra and probability measure,respectively. Random variables (r.v.’s) over (Ω , F , P ) and their realizations are denoted by uppercase and lowercaseletters, respectively, e.g., X and x . Similar conventions apply for random vectors and their realizations. We use A for the indicator function of A ∈ F . The set of all probability mass functions (PMFs) on a ﬁnite set X (alwaysendowed with the power set σ -algebra) is denoted by P ( X ) .The joint PMF of two discrete r.v.’s X and Y on (Ω , F , P ) is denoted by P XY ; the corresponding marginals are P X and P Y . The conditional PMF of X given Y is represented by P X | Y . Expressions such as P XY = P X P Y | X are to be understood as pointwise equality, i.e., P XY ( x, y ) = P X ( x ) P Y | X ( y | x ) , for all ( x, y ) ∈ X × Y . Whenthe joint distribution of a triple ( X, Y, Z ) factors as P XY Z = P XY P Z | X , these variable form a Markov chain X − Y − Z . When X and Y are statistically independent, we write X ⊥⊥ Y . If the entries of X n are drawn in anindependent and identically distributed (i.i.d.) manner, i.e., if P X n ( x ) = (cid:81) ni =1 P X ( x i ) , ∀ x ∈ X n , then the PMF P X n is denoted by P ⊗ nX . Similarly, if P Y n | X n ( y | x ) = (cid:81) ni =1 P Y | X ( y i | x i ) for all ( x , y ) ∈ X n × Y n , then we write P ⊗ nY | X for P Y n | X n . The conditional product PMF given a ﬁxed x ∈ X n is designated by P ⊗ nY | X ( ·| x ) .For a discrete measurable space ( X , F ) , the probability measure induced by a PMF P ∈ P ( X ) is denoted by P P ; namely P P ( A ) = (cid:80) x ∈A P ( x ) , for all A ∈ F . The corresponding expectation is designated by E P . Similarly,mutual information and entropy with an underlying PMF P are denoted as I P and H P , respectively. When the In [36], the metric used at the decoder factors as the sum of two metrics, one which depends only on the source statistics, and the other whichdepends only on the channel statistics. However, in the JHTCC scheme, the decoding metric depends jointly on the source-channel statistics.

PMF is clear from the context, the subscript is omitted. The type or empirical PMF of a sequence x ∈ X n isdenoted by P x , i.e., P x ( x ) := n (cid:80) ni =1 { x i = x } . The set of n -length sequences x ∈ X n of type P X is representedby T n ( P X , X n ) : T n ( P X , X n ) := { x ∈ X n : P x = P X } . Whenever the underlying alphabet X n is clear from the context, T n ( P X , X n ) is simpliﬁed to T n ( P X ) . The set ofall possible types of n -length sequences x ∈ X n is denoted by T ( X n ) : T ( X n ) := (cid:8) P X ∈ P ( X ) : (cid:12)(cid:12) T n ( P X , X n ) (cid:12)(cid:12) ≥ (cid:9) . Similar notations are used for larger combinations, e.g., P xy , T n ( P XY , X × Y ) and T ( X n × Y n ) . For a given x ∈ T n ( P X , X n ) and a conditional PMF P Y | X , T n ( P Y | X , x ) := { y ∈ Y n : ( x , y ) ∈ T n ( P XY , X n × Y n ) } stands for the P Y | X -conditional type class of x .For a countable sample space X and PMFs P, Q ∈ P ( X ) , the Kullback-Leibler (KL) divergence between P and Q is D ( P || Q ) := (cid:88) x ∈X P ( x ) log (cid:18) P ( x ) Q ( x ) (cid:19) . The conditional KL divergence between two conditional PMFs P Y | X and Q Y | X (deﬁned on the same alphabets)given a PMF P X ∈ P ( X ) is D (cid:0) P Y | X || Q Y | X (cid:12)(cid:12) P X (cid:1) := (cid:88) x ∈X P X ( x ) D (cid:0) P Y | X ( ·| x ) || Q Y | X ( ·| x ) (cid:1) . For ( x , y ) ∈ X n × Y n , the empirical conditional entropy of y given x is H e ( y | x ) := H P ( ˜ Y | ˜ X ) , where P ˜ X ˜ Y = P xy . The set of r -divergent n -length sequences from P X is denoted by J n ( r, P X ) , i.e., J n ( r, P X ) := { x ∈ X n : D ( P x || P X ) ≤ r } . (1)For a real sequence { a n } n ∈ N , a n ( n ) −−→ b stands for lim n →∞ a n = b , while a n (cid:38) b denotes lim n →∞ a n ≥ b .Similar notations apply for other inequalities. Finally, O ( · ) , Ω( · ) and o ( · ) represents the standard asymptotic notationsof Big-O, Big-omega and Little-o, respectively. B. Problem formulation

Let U , V , X and Y be ﬁnite sets, and n ∈ N . The DHT over a noisy channel setting is depicted in Fig. 1.Herein, the observer and the decision maker observe n i.i.d. samples, denoted by u and v , respectively. Based Fig. 1: DHT over a noisy channel. The observer observes an n -length i.i.d. sequence U , and transmits X over theDMC P ⊗ nY | X . Based on the channel output Y and the n -length i.i.d. sequence V , the decision maker performs abinary HT to determine whether ( U , V ) ∼ P ⊗ nUV or ( U , V ) ∼ Q ⊗ nUV .on its observations u , the observer outputs a sequence x ∈ X n as the channel input sequence . The discretememoryless channel (DMC) with transition kernel P Y | X produces a sequence y ∈ Y n according to the probabilitylaw P ⊗ nY | X ( ·| x ) as its output. Based on its observations, y and v , the decision maker performs binary HT on thejoint probability distribution of ( U , V ) with the null ( H ) and alternate ( H ) hypotheses given by H : ( U , V ) ∼ P ⊗ nUV , (2a) H : ( U , V ) ∼ Q ⊗ nUV . (2b)The decision maker outputs ˆ h ∈ ˆ H := { , } as the decision of the hypothesis test, where and denote H and H , respectively. Deﬁnition 1 ( Code ) . A length- n DHT code c n is a pair of functions ( f n , g n ) , where1) f n : U n → P ( X n ) denotes the encoding function, and2) g n : V n × Y n → ˆ H denotes a deterministic decision function speciﬁed by an acceptance region (for nullhypothesis H ) A n ⊆ V n × Y n as g n ( v , y ) = 1 − { ( v , y ) ∈A n } , ∀ ( v , y ) ∈ V n × Y n . (3)A Code c n = ( f n , g n ) induces the joint PMFs P ( c n ) UVXY ˆ H ∈ P ( U n × V n × X n × Y n × ˆ H ) and Q ( c n ) UVXY ˆ H ∈P ( U n × V n × X n × Y n × ˆ H ) under the null and alternate hypotheses, respectively, where P ( c n ) UVXY ˆ H ( u , v , x , y , ˆ h ) := P ⊗ nUV ( u , v ) f n ( x | u ) P ⊗ nY | X ( y | x ) (cid:8) g n ( v , y )=ˆ h (cid:9) , (4)and Q ( c n ) UVXY ˆ H ( u , v , x , y , ˆ h ) := Q ⊗ nUV ( u , v ) f n ( x | u ) P ⊗ nY | X ( y | x ) (cid:8) g n ( v , y )=ˆ h (cid:9) , (5) In our problem formulation, we assume that the ratio of the number of channel uses to the number of data samples, termed the bandwidthratio, is 1. However, the results easily generalize to arbitrary bandwidth ratios. There is no loss in generality in restricting our attention to a deterministic decision function for the objective of characterizing the error-exponents trade-off in HT (for example, see [21, Lemma 3]). respectively.

Deﬁnition 2 ( Type I and Type II error probabilities ) . For a given code c n , the type I and type II error probabilitiesare given by α n ( c n ) := P P ( cn ) ( ˆ H = 1) , and β n ( c n ) := P Q ( cn ) ( ˆ H = 0) , respectively. The following deﬁnition formally states the error-exponents trade-off we aim to characterize.

Deﬁnition 3 ( Achievability ) . An error-exponent pair ( κ α , κ β ) ∈ R ≥ is said to be achievable if there exists asequence of codes { c n } n ∈ N , and n ∈ N such that α n ( c n ) ≤ e − nκ α , ∀ n ≥ n , (6a) lim inf n →∞ − n log β n ( c n ) ≥ κ β . (6b) Deﬁnition 4 ( Error-exponent region and its positive boundary ) . The error-exponent region ¯ R is the closure ofthe set of all achievable error-exponent pairs ( κ α , κ β ) . Let κ ( κ α ) := sup { κ β : ( κ α , κ β ) ∈ ¯ R} . The positive boundary R of the error-exponent region ¯ R is R := { (cid:0) κ α , κ ( κ α ) (cid:1) : κ α ∈ (0 , κ ∗ α ) } , where κ ∗ α = inf { κ α : κ ( κ α ) = 0 } . We will assume the following: Assumption 1. P Y | X ( ·| ˜ x ) (cid:28) P Y | X ( ·| x (cid:48) ) , ∀ (˜ x, x (cid:48) ) ∈ X × X , where for PMFs P and Q deﬁned on the samesupport, P (cid:28) Q denotes that P is absolutely continuous with respect to Q .C. Some basic concepts and supporting results In order to state our results, we require some concepts on the log moment generating function (Log-MGF) of ar.v., which we brieﬂy review next. For a given function f : Z → R and a r.v. Z ∼ P Z , the log-MGF of Z withrespect to (w.r.t.) f , denoted by ψ P Z ,f ( λ ) , is ψ P Z ,f ( λ ) := log (cid:16) E P Z (cid:104) e λf ( Z ) (cid:105)(cid:17) , This technical condition ensures that for functions f and distributions P that we consider below, ψ P,f ( λ ) < ∞ , ∀ λ ∈ R . whenever the expectation exists. Let ψ ∗ P Z ,f ( θ ) := sup λ ∈ R θλ − ψ P Z ,f ( λ ) . (7)The following properties of log-MGFs will be used in the proof of our results. Lemma 1. [38, Theorem 13.2, Theorem 13.3](i) ψ P Z ,f (0) = 0 and ψ (cid:48) P Z ,f (0) = E P Z [ f ( Z )] , where ψ (cid:48) P Z ,f ( λ ) denotes the derivative of ψ P Z ,f ( λ ) w.r.t. λ .(ii) ψ P Z ,f ( λ ) is a strictly convex function in λ .(iii) ψ ∗ P Z ,f ( θ ) is strictly convex and strictly positive in θ except ψ ∗ P Z ,f ( E P Z [ Z ]) = 0 . Next, consider the scenario in which the channel P Y | X is a deterministic injective map (i.e., noiseless) and itscapacity C ( P Y | X ) is greater than log |U| . Since U can be communicated error-free to the decision maker in thiscase, the pair ( U , V ) may be identiﬁed as Z that is available at the decision maker, and consequently the hypothesistest in (2) becomes H : Z ∼ P ⊗ nZ , (8a) H : Z ∼ Q ⊗ nZ . (8b)We refer to this setting as direct HT, and denote R by R D .For PMFs P Z , Q Z ∈ P ( Z ) , let Π P Z ,Q Z : Z → ¯ R denote the log-likelihood ratio deﬁned as Π P Z ,Q Z ( z ) := log (cid:18) Q Z ( z ) P Z ( z ) (cid:19) , ∀ z ∈ Z . (9)Deﬁne its n -fold extension by Π ( n ) P Z ,Q Z : Z n → ¯ R , where Π ( n ) P Z ,Q Z ( z ) := n (cid:88) i =1 Π P Z ,Q Z ( z i ) , ∀ z ∈ Z n . Also, for θ ∈ R , let χ ( n ) P Z ,Q Z ,θ : Z n → { , } denote the NP test [1] χ ( n ) P Z ,Q Z ,θ ( z ) := (cid:8) Π ( n ) PZ,QZ ( z ) ≥ nθ (cid:9) . (10)The following theorem provides a single-letter characterization of R D in terms of ψ ∗ P Z , Π PZ,QZ . Theorem 1. [38, Theorem 15.1] Assume P Z (cid:28) Q Z and Q Z (cid:28) P Z . Then, R D = (cid:110)(cid:16) ψ ∗ P Z , Π PZ,QZ ( θ ) , ψ ∗ P Z , Π PZ,QZ ( θ ) − θ (cid:17) : θ ∈ I ( P Z , Q Z ) (cid:111) , (11) where the interval I ( P Z , Q Z ) is deﬁned as I ( P Z , Q Z ) := (cid:0) − D ( P Z || Q Z ) , D ( Q Z || P Z ) (cid:1) . (12) Moreover, the exponent pair (cid:16) ψ ∗ P Z , Π PZ,QZ ( θ ) , ψ ∗ P Z , Π PZ,QZ ( θ ) − θ (cid:17) is achieved by a decision function employing the NP test, i.e., g n = χ ( n ) P Z ,Q Z ,θ . To prove our results, a converse that follows from [38, Theorem 12.5] will be useful, which we state next. Let α (cid:48) n ( g n ) and β (cid:48) n ( g n ) denote the type I and type II error probabilities achieved by decision function g n : Z n → { , } for the direct HT scenario in (8), respectively. The following theorem provides a lower bound on a weighted sumof the type I and type II error probabilities. Theorem 2. [38, Theorem 12.5] For the hypothesis test in (8) and any decision function g n , α (cid:48) n ( g n ) + γ β (cid:48) ( g n ) ≥ P P ⊗ nZ (cid:18) log (cid:18) P ⊗ nZ ( Z ) Q ⊗ nZ ( Z ) (cid:19) ≤ log γ (cid:19) , ∀ γ > . Finally, consider a generalization of the direct HT scenario, in which the samples observed at the decision makerare generated according to a product of ﬁnite non-identical distributions, i.e., the samples are independent, but notnecessarily identically distributed. Let P X X ∈ P ( X ×X ) be an arbitrary joint PMF, and { (˜ x , x (cid:48) ) ∈ X n ×X n } n ∈ N denote a sequence of pairs of n -length sequences (˜ x , x (cid:48) ) such that P ˜ xx (cid:48) (˜ x, x (cid:48) ) ( n ) −−→ P X X (˜ x, x (cid:48) ) , ∀ (˜ x, x (cid:48) ) ∈ X × X . (13)Consider the following HT: H : Y ∼ P ⊗ nY | X ( ·| ˜ x ) , (14a) H : Y ∼ P ⊗ nY | X ( ·| x (cid:48) ) . (14b)Then, the decision function ¯ g n : Y n → { , } is speciﬁed by an acceptance region ¯ A n for H as ¯ g n ( y ) =1 − { y ∈ ¯ A n } , and the corresponding type I and type II error probabilities are ¯ α n (¯ g n ) := 1 − P P ⊗ nY | X ( ·| ˜ x ) ( ¯ A n ) , and ¯ β n (¯ g n ) := P P ⊗ nY | X ( ·| x (cid:48) ) ( ¯ A n ) , respectively. The deﬁnition of achievability of an error-exponent pair ( κ α , κ β ) ∈ R ≥ is similar to that in Deﬁnition3, with α n and β n replaced by ¯ α n and ¯ β n , respectively. Let the error-exponent region and its positive boundarybe given by ¯ κ ( κ α , P X X ) := sup { κ β : ( κ α , κ β ) is achievable for HT in (14) } , and R N ( P X X ) := { ( κ α , ¯ κ ( κ α , P X X )) : κ α ∈ (0 , ¯ κ ∗ α ) } , respectively, where ¯ κ ∗ α = inf { κ α : ¯ κ ( κ α , P X X ) = 0 } . The following proposition provides a single-letter charac-terization of R N ( P X X ) . As will become evident later, the error-exponent region for the HT in (14) depends on (˜ x , x (cid:48) ) only through its limiting joint type P X X . Proposition 1. R N ( P X X )= (cid:26)(cid:18) E P X X (cid:20) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ,PY | X ( θ ) (cid:21) , E P X X (cid:20) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ,PY | X ( θ ) (cid:21) − θ (cid:19) , θ ∈ ¯ I ( P X X , P Y | X ) (cid:27) , where, for each (˜ x, x (cid:48) ) ∈ X × X , ¯Π ˜ x,x (cid:48) ,P Y | X : Y → ¯ R is given by ¯Π ˜ x,x (cid:48) ,P Y | X ( y ) := log (cid:18) P Y | X ( y | x (cid:48) ) P Y | X ( y | ˜ x ) (cid:19) , (15) and ¯ I ( P X X , P Y | X ) := (cid:0) − d min ( P X X , P Y | X ) , d max ( P X X , P Y | X ) (cid:1) , (16) d min ( P X X , P Y | X ) := E P X X (cid:2) D (cid:0) P Y | X ( ·| X ) || P Y | X ( ·| X ) (cid:1) (cid:3) , (17) d max ( P X X , P Y | X ) := E P X X (cid:2) D (cid:0) P Y | X ( ·| X ) || P Y | X ( ·| X ) (cid:1) (cid:3) . (18) A decision rule that achieves the exponent pair (cid:18) E P X X (cid:20) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ,PY | X ( θ ) (cid:21) , E P X X (cid:20) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ,PY | X ( θ ) (cid:21) − θ (cid:19) (19) is the NP test given by ¯ g n ( y ) = ¯ χ ( n ) P Y | X , ˜ x , x (cid:48) ,θ ( y ) := (cid:8) ¯Π ( n )˜ x , x (cid:48) ,PY | X ( y ) ≥ nθ (cid:9) , (20) where ¯Π ( n )˜ x , x (cid:48) ,P Y | X ( y ) := n (cid:88) i =1 ¯Π ˜ x i ,x (cid:48) i ,P Y | X ( y i ) . (21)The proof of Proposition 1 is given in Section IV-A. The achievability of the error-exponent pair in (19) is shownby analyzing the type I and type II error probabilities achieved by ¯ g n , while the converse proof uses Theorem 2 asan ingredient. Proposition 1 will be used below to obtain a single-letter characterization of R for the RHT problem(Theorem 3), and also to establish an inner bound on R for the DHT problem (Theorem 4).III. M AIN R ESULTS

In the remainder of the paper, our goal is to provide a computable characterization of R . We will ﬁrst obtain asingle-letter characterization of R for the RHT problem. This characterization will also turn out to be useful forobtaining an inner bound on R for the DHT problem. A. RHT problem

Recall that for the RHT problem, V is not available at the decision maker, and the hypothesis test in (2) specializesto the following test: H : U ∼ P ⊗ nU , (22a) H : U ∼ Q ⊗ nU . (22b)The next theorem provides a single-letter characterization of κ ( κ α ) and thereby of R . Theorem 3.

Assume that P U (cid:28) Q U and Q U (cid:28) P U . Then, κ ( κ α ) = sup { κ β : ( κ α , κ β ) ∈ R ∗ } , where R ∗ := (cid:91) P X X ∈P ( X ×X ) (cid:91) ( θ ,θ ) ∈I ( P U ,Q U ) × ¯ I ( P X X ,P Y | X ) (cid:0) ζ ( θ , θ , P X X ) , ζ ( θ , θ , P X X ) (cid:1) , (23) ζ ( θ , θ , P X X ) := min (cid:26) ψ ∗ P U , Π PU ,QU ( θ ) , E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ,PY | X ( θ ) (cid:105)(cid:27) ,ζ ( θ , θ , P X X ) := min (cid:26) ψ ∗ P U , Π PU ,QU ( θ ) − θ , E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ,PY | X ( θ ) (cid:105) − θ (cid:27) , and, Π P U ,Q U , I ( P U , Q U ) , ¯Π ˜ x,x (cid:48) ,P Y | X , (˜ x, x (cid:48) ) ∈ X × X , and ¯ I ( P X X , P Y | X ) are given in (9) , (12) , (15) and (16) ,respectively. The proof of Theorem 3 is provided in Section IV-B. The error-exponent pair ( ζ ( · ) , ζ ( · )) is achieved by anencoder that ﬁrst performs a local NP test based on U , and communicates its decision to the decision maker usinga channel code with two messages. Thus, separation holds in the sense of separate source and channel coding. Theconverse part of the proof uses the converses of Theorem 1 and Proposition 1 to arrive at the desired conclusion.Let κ ( P U , Q U , P Y | X ) := min (cid:8) D ( P U || Q U ) , E c ( P Y | X ) (cid:9) , where E c ( P Y | X ) := D (cid:0) P Y | X ( ·| a ) || P Y | X ( ·| b ) (cid:1) , (24) ( a, b ) := arg max (˜ x,x (cid:48) ) ∈X ×X D (cid:0) P Y | X ( ·| ˜ x ) || P Y | X ( ·| x (cid:48) ) (cid:1) . (25)The optimal T2EE under a ﬁxed constraint on the type I error probability for the RHT problem, that is establishedin [39, Theorem 2], can be recovered by taking the limits θ → − D ( P U || Q U ) , θ → − d min ( P X X , P Y | X ) , andmaximizing w.r.t. P X X ∈ P ( X × X ) in Theorem 3. This can be seen by noting that ζ (cid:0) − D ( P U || Q U ) , − d min ( P X X , P Y | X ) , P X X (cid:1) = 0 ,ζ (cid:0) − D ( P U || Q U ) , − d min ( P X X , P Y | X ) , P X X (cid:1) = min (cid:8) D ( P U || Q U ) , d min ( P X X , P Y | X ) (cid:9) , max P X X ∈P ( X ×X ) d min ( P X X , P Y | X ) = E c ( P Y | X ) . B. DHT problem

We next obtain two inner bounds on R for the DHT problem depicted in Fig. 1. The ﬁrst bound is obtainedusing a separation-based scheme that performs independent HT and channel coding, termed the SHTCC scheme,while the second bound, referred to as the JHTCC scheme, is a joint HT and channel coding scheme that useshybrid coding for communication between the observer and the decision maker. Inner bound using the SHTCC scheme:

Let S = X and P SXY = P SX P Y | X ∈ P ( S × X × Y ) be a PMF under which S − X − Y forms a Markov chain.For x ∈ X , let Υ x,P SXY ( y ) := log (cid:18) P Y | X = x ( y ) P Y | S = x ( y ) (cid:19) , (26)and deﬁne E sp ( P SX , θ ) := (cid:88) s ∈S P S ( s ) ψ ∗ P Y | S = s , Υ s,PSXY ( θ ) . (27)For a ﬁxed P SX and R ≥ , let E x ( R, P SX ) denote the expurgated exponent [40] [31] given by E x ( R, P SX ) := max ρ ≥  − ρ R − ρ log  (cid:88) s,x, ˜ x P S ( s ) P X | S ( x | s ) P X | S (˜ x | s ) (cid:32)(cid:88) y (cid:113) P Y | X ( y | x ) P Y | X ( y | ˜ x ) (cid:33) ρ  . (28)Let W be an arbitrary ﬁnite set, i.e., |W| < ∞ , and F denote the set of all continuous mappings from P ( U ) to P (cid:48) ( W|U ) , where P (cid:48) ( W|U ) is the set of all conditional distributions P W | U . Let θ L ( P SX ) := (cid:88) s ∈S P S ( s ) D (cid:0) P Y | S = s || P Y | X = s (cid:1) , (29) θ U ( P SX ) := (cid:88) s ∈S P S ( s ) D (cid:0) P Y | X = s || P Y | S = s (cid:1) , (30) Θ( P SX ) := (cid:0) − θ L ( P SX ) , θ U ( P SX ) (cid:1) , (31) L ( κ α ) :=  ( ω, R, P SX , θ ) ∈ F × R ≥ × P ( S × X ) × Θ( P SX ) : ζ ( κ α , ω ) − ρ ( κ α , ω ) ≤ R < I P ( X ; Y | S ) ,P SXY := P SX P Y | X , min { E sp ( P SX , θ ) , E x ( R, P SX ) , E b ( κ α , ω, R ) } ≥ κ α  , ˆ L ( κ α , ω ) := (cid:110) P ˆ U ˆ V ˆ W : D (cid:0) P ˆ U ˆ V ˆ W || P UV ˆ W (cid:1) ≤ κ α , P ˆ W | ˆ U = ω ( P ˆ U ) , P UV ˆ W = P UV P ˆ W | ˆ U (cid:111) , (32) E b ( κ α , ω, R ) :=  R − ζ ( κ α , ω ) + ρ ( κ α , ω ) , if ≤ R < ζ ( κ α , ω ) , ∞ , otherwise ,ζ ( κ α , ω ) := max P ˆ U ˆ W : ∃ P ˆ V ,P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α ,ω ) I P ( ˆ U ; ˆ W ) , (33) ρ ( κ α , ω ) := min P ˆ V ˆ W : ∃ P ˆ U ,P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α ,ω ) I P ( ˆ V ; ˆ W ) , (34) E ( κ α , ω ) := min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) , E ( κ α , ω, R ) :=  min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) + E b ( κ α , ω, R ) , if R < ζ ( κ α , ω ) , ∞ , otherwise , E ( κ α , ω, R, P SX ) :=  min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) + E b ( κ α , ω, R ) + E x ( R, P SX ) , if R < ζ ( κ α , ω ) , min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) + ρ ( κ α , ω ) + E x ( R, P SX ) , otherwise , E ( κ α , ω, R, P SX , θ ) :=  min P ˆ V : P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α ,ω ) D ( P ˆ V || Q V ) + E b ( κ α , ω, R ) + E m ( P SX , θ ) − θ, if R < ζ ( κ α , ω ) , min P ˆ V : P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α ,ω ) D ( P ˆ V || Q V ) + ρ ( κ α , ω ) + E m ( P SX , θ ) − θ, otherwise , where , T ( κ α , ω ) :=  ( P ˜ U ˜ V ˜ W , Q ˜ U ˜ V ˜ W ) : P ˜ U ˜ W = P ˆ U ˆ W , P ˜ V ˜ W = P ˆ V ˆ W , Q ˜ U ˜ V ˜ W := Q UV P ˜ W | ˜ U for some P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α , ω )  , (35) T ( κ α , ω ) :=  ( P ˜ U ˜ V ˜ W , Q ˜ U ˜ V ˜ W ) : P ˜ U ˜ W = P ˆ U ˆ W , P ˜ V = P ˆ V , H P ( ˜ W | ˜ V ) ≥ H P ( ˆ W | ˆ V ) ,Q ˜ U ˜ V ˜ W := Q UV P ˜ W | ˜ U for some P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α , ω )  , (36) T ( κ α , ω ) :=  ( P ˜ U ˜ V ˜ W , Q ˜ U ˜ V ˜ W ) : P ˜ U ˜ W = P ˆ U ˆ W , P ˜ V = P ˆ V , Q ˜ U ˜ V ˜ W := Q UV P ˜ W | ˜ U for some P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α , ω )  . (37)We have the following lower bound for κ ( κ α ) . Theorem 4. κ ( κ α ) ≥ κ ∗ s ( κ α ) , where κ ∗ s ( κ α ) := max ( ω,R,P SX ,θ ) ∈ L ( κ α ) min (cid:8) E ( κ α , ω ) , E ( κ α , ω, R ) , E ( κ α , ω, R, P SX ) , E ( κ α , ω, R, P SX , θ ) (cid:9) . (38)The proof of Theorem 4 is presented in Section IV-C. The SHTCC scheme which achieves the error-exponentpair ( κ α , κ ∗ s ( κ α )) is a coding scheme analogous to separate source and channel coding for the lossy transmissionof a source over a communication channel with correlated side-information at the receiver [41]; however, with theobjective of reliable HT. In this scheme, the source samples are ﬁrst compressed to an index, which acts as themessage to be transmitted over the channel. But, in contrast to standard communication problems, there is a needto protect certain messages more reliably than others, and hence an unequal error-protection scheme [35], [37] isused. To describe brieﬂy, the SHTCC scheme involves(i) quantization and binning of u sequences whose type P u is within a κ α -neighbourhood (in terms of KL divergence) of P U , using V as side-information at the decision maker for decoding.(ii) unequal error-protection channel coding scheme in [35] for protecting a special message which informs thedecision maker that P u lies outside the κ α -neighbourhood of P U . Remark 1.

In [13, Theorem 1], Han and Kobayashi obtained an inner bound on R for DHT over a noiselesschannel. At a high level, their coding scheme involves type-based quantization of u ∈ U n sequences, whose type P u lies within a κ α -neighbourhood of P U , where κ α is the desired type I error-exponent. As a corollary, Theorem4 recovers the lower bound for κ ( κ α ) obtained in [13] by1) setting E x ( R, P SX ) , E m ( P SX , θ ) and E m ( P SX , θ ) − θ to ∞ , which hold when the channel is noiseless, and2) maximizing over the set (cid:8) ( ω, R, P SX , θ ) ∈ F × R ≥ × P ( S × X ) × Θ( P SX ) : ζ ( κ α , ω ) ≤ R < I P ( X ; Y | S ) P SXY := P SX P Y | X (cid:9) ⊆ L ( κ α ) in (38) .Then, note that the terms E ( κ α , ω, R ) , E ( κ α , ω, R, P SX ) and E ( κ α , ω, R, P SX , θ ) all equal ∞ , and thus theinner bound in Theorem 4 reduces to that given in [13, Theorem 1]. Remark 2.

Since the lower bound on κ ( κ α ) in Theorem 4 is not necessarily concave, a tighter bound canbe obtained using the technique of “time-sharing” similar to [13, Theorem 3]. We omit its description as it iscumbersome, although straightforward. Theorem 4 recovers the lower bound for the optimal T2EE in the Stein’s regime, i.e., κ α → , established in [10,Theorem 2]. Corollary 1. lim κ α → κ ∗ s ( κ α ) = κ s , where κ s is the lower bound on the type II error-exponent for a ﬁxed type I error probability constraint and unitbandwidth ratio established in [10, Theorem 2]. The proof of Corollary 1 is provided in Section IV-D. Specializing the lower bound in Theorem 4 to the case ofTAI, i.e., when Q UV = P U P V , we obtain the following result which also recovers the optimal T2EE for TAI inthe Stein’s regime established in [10, Proposition 7]. Corollary 2.

Let P UV ∈ P ( U × V ) be an arbitrary distribution and Q UV = P U P V . Then, κ ( κ α ) ≥ κ ∗ s ( κ α ) ≥ κ ∗ I ( κ α ) := max ( ω,P SX ,θ ) ∈L ∗ ( κ α ) min (cid:8) E I ( κ α , ω ) , E I ( κ α , ω, P SX ) , E I ( κ α , ω, P SX , θ ) (cid:9) , (39) where L ∗ ( κ α ) :=  ( ω, P SX , θ ) ∈ F × P ( S × X ) × Θ( P SX ) : ζ ( κ α , ω ) < I P ( X ; Y | S ) ,P SXY := P SX P Y | X , min { E sp ( P SX , θ ) , E x ( ζ ( κ α , ω ) , P SX ) } ≥ κ α  , (40) E I ( κ α , ω ) := min P ˆ V ˆ W : ∃ P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α ,ω ) (cid:104) I P ( ˆ V ; ˆ W ) + D ( P ˆ V || P V ) (cid:105) , E I ( κ α , ω, P SX ) := ρ ( κ α , ω ) + E x ( ζ ( κ α , ω ) , P SX ) , E I ( κ α , ω, P SX , θ ) := ρ ( κ α , ω ) + E sp ( P SX , θ ) − θ, and ˆ L ( κ α , ω ) , ζ ( κ α , ω ) and ρ ( κ α , ω ) are as deﬁned in (32) , (33) and (34) , respectively. In particular, lim κ α → κ ( κ α ) = κ ∗ s (0) = κ ∗ I (0) = max P W | U : I P ( U ; W ) ≤ C,P

UV W = P UV P W | U I P ( V ; W ) , (41) where |W| ≤ |U| + 1 and C := C ( P Y | X ) denotes the capacity of the channel P Y | X . The proof of Corollary 2 is given in Section IV-E. Its achievability follows from the SHTCC scheme withoutbinning at the encoder.Next, we consider the opposite case of TAI in which Q UV is an arbitrary joint distribution and P UV := Q U Q V .We refer to this case as testing against dependence (TAD). Theorem 4 specialized to TAD gives the followingcorollary. Corollary 3.

Let Q UV ∈ P ( U × V ) be an arbitrary distribution and P UV = Q U Q V . Then, κ ( κ α ) ≥ κ ∗ s ( κ α ) = κ ∗ D ( κ α ) := max ( ω,P SX ,θ ) ∈L ∗ ( κ α ) min (cid:8) E D ( κ α , ω ) , E D ( κ α , ω, P SX ) , E D ( P SX , θ ) (cid:9) , (42) where E D ( κ α , ω ) := min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) ≥ min ( P ˆ V ˆ W ,Q V ˆ W ): P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α ,ω ) ,Q UV ˆ W = Q UV P ˆ W | ˆ U D ( P ˆ V ˆ W || Q V ˆ W ) , E D ( κ α , ω, P SX ) := E x ( ζ ( κ α , ω ) , P SX ) , E D ( P SX , θ ) := E sp ( P SX , θ ) − θ, and, ˆ L ( κ α , ω ) , T ( κ α , ω ) and L ∗ ( κ α ) are given in (32) , (35) and (40) , respectively. In particular, lim κ α → κ ( κ α ) ≥ κ ∗ s (0) = κ ∗ D (0) ≥ κ ∗ TAD , (43) where κ ∗ TAD = max ( P W | U ,P SX ): I Q ( W ; U ) ≤ I P ( X ; Y | S ) Q UV W = Q UV P W | U P SXY = P SX P Y | X min (cid:8) D ( Q V Q W || Q V W ) , E x ( I Q ( U ; W ) , P SX ) , θ L ( P SX ) (cid:9) , (44) and |W| ≤ |U| + 1 . The proof of Corollary 3 is given in Section IV-F. In Section III-C, we will consider an example of TAD over a binary symmetric channel (BSC), and compare κ ∗ s ( κ α ) = κ ∗ D ( κ α ) with the inner bound achieved by the JHTCCscheme that we introduce next. Inner bound using the JHTCC scheme:

It is well known that joint source-channel coding schemes offer advantages over separation-based coding schemesin several information theoretic problems, such as the transmission of correlated sources over a multiple-accesschannel [34], [42] and the error-exponent in the lossless or lossy transmission of a source over a noisy channel[37], [43], to name a few. Recently, it is shown via an example in [10] that joint schemes also achieve a strictlylarger type II error-exponent in DHT problems compared to a separation-based scheme in some scenarios. Motivatedby this, we present an inner bound on R using a generalization of the JHTCC scheme in [10].Let W and S be arbitrary ﬁnite sets, and F (cid:48) denote the set of all continuous mappings from P ( U × S ) to P (cid:48) ( W| U × S ) , where P (cid:48) ( W| U × S ) is the set of all conditional distributions P W | US . Let L h ( κ α ) :=  (cid:0) P S , ω (cid:48) ( · , P S ) , P X | USW , P X (cid:48) | US (cid:1) ∈ P ( S ) × F (cid:48) × P (cid:48) ( X | U × S × W ) × P (cid:48) ( X | U × S ) : E (cid:48) b ( κ α , ω (cid:48) , P S , P X | USW ) > κ α ,  , ˆ L h ( κ α , ω (cid:48) , P S , P X | USW ):=  ( P ˆ U ˆ V ˆ W ˆ Y S : D ( P ˆ U ˆ V ˆ W ˆ Y | S || P UV ˆ W Y | S | P S ) ≤ κ α , P SUV ˆ W XY := P S P UV P ˆ W | ˆ US P X | USW P Y | X ,P ˆ W | ˆ US = ω (cid:48) ( P ˆ U , P S )  , E (cid:48) b ( κ α , ω (cid:48) , P S , P X | USW ) := ρ (cid:48) ( κ α , ω (cid:48) , P S , P X | USW ) − ζ (cid:48) q ( κ α , ω (cid:48) , P S ) ,ζ (cid:48) ( κ α , ω (cid:48) , P S ) := max P ˆ U ˆ WS : ∃ P ˆ V ˆ Y ,P ˆ U ˆ V ˆ W ˆ Y S ∈ ˆ L h ( κ α ,ω (cid:48) ,P S ,P X | USW ) I P ( ˆ U ; ˆ W | S ) ,ρ (cid:48) ( κ α , ω (cid:48) , P S , P X | USW ) := min P ˆ V ˆ W ˆ Y S : ∃ P ˆ U ,P ˆ U ˆ V ˆ W ˆ Y S ∈ ˆ L h ( κ α ,ω (cid:48) ,P S ,P X | USW ) I P ( ˆ Y, ˆ V ; ˆ W | S ) , E (cid:48) ( κ α , ω (cid:48) ) := min ( P ˜ U ˜ V ˜ W ˜ Y S ,Q ˜ U ˜ V ˜ W ˜ Y S ) ∈ T (cid:48) ( κ α ,ω (cid:48) ) D ( P ˜ U ˜ V ˜ W ˜ Y | S || Q ˜ U ˜ V ˜ W ˜ Y | S | P S ) , E (cid:48) ( κ α , ω (cid:48) , P S , P X | USW ) := min ( P ˜ U ˜ V ˜ W ˜ Y S ,Q ˜ U ˜ V ˜ W ˜ Y S ) ∈ T (cid:48) ( κ α ,ω (cid:48) ,P S ,P X | USW ) D ( P ˜ U ˜ V ˜ W ˜ Y | S || Q ˜ U ˜ V ˜ W ˜ Y | S | P S ) + E (cid:48) b ( κ α , ω (cid:48) , P S , P X | USW ) , E (cid:48) ( κ α , ω (cid:48) , P S , P X | USW , P X (cid:48) | US ) := min P ˆ V ˆ Y S : P ˆ U ˆ V ˆ W ˆ Y S ∈ ˆ L h ( κ α ,ω (cid:48) ,P S ,P X | USW ) D ( P ˆ V ˆ Y | S || Q V Y (cid:48) | S | P S ) + E (cid:48) b ( κ α , ω (cid:48) , P S , P X | USW ) ,Q SUV X (cid:48) Y (cid:48) := P S Q UV P X (cid:48) | US P Y (cid:48) | X (cid:48) , P Y (cid:48) | X (cid:48) := P Y | X , T (cid:48) ( κ α , ω (cid:48) , P S , P X | USW ) :=  ( P ˜ U ˜ V ˜ W ˜ Y S , Q ˜ U ˜ V ˜ W ˜ Y S ) : P ˜ U ˜ W S = P ˆ U ˆ W S , P ˜ V ˜ W ˜ Y S = P ˆ V ˆ W ˆ Y S ,Q S ˜ U ˜ V ˜ W ˜ X ˜ Y := P S Q UV P ˜ W | ˜ US P X | USW P Y | X for some P ˆ U ˆ V ˆ W ˆ Y S ∈ ˆ L h ( κ α , ω (cid:48) , P S , P X | USW )  , T (cid:48) ( κ α , ω (cid:48) , P S , P X | USW ) :=  ( P ˜ U ˜ V ˜ W ˜ Y S , Q ˜ U ˜ V ˜ W ˜ Y S ) : P ˜ U ˜ W S = P ˆ U ˆ W S , P ˜ V ˜ Y S = P ˆ V ˆ Y S ,H P ( ˜ W | ˜ V, ˜ Y, S ) ≥ H P ( ˆ W | ˆ V, ˆ Y, S ) ,Q S ˜ U ˜ V ˜ W ˜ X ˜ Y := P S Q UV P ˜ W | ˜ US P X | USW P Y | X for some P ˆ U ˆ V ˆ W ˆ Y S ∈ ˆ L h ( κ α , ω (cid:48) , P S , P X | USW )  . Then, we have the following result.

Theorem 5. κ ( κ α ) ≥ max { κ ∗ h ( κ α ) , κ ∗ u (( κ α )) } , (45) where κ ∗ h ( κ α ) := max ( P S ,ω (cid:48) ,P X | USW ,P X (cid:48)| US ) ∈ L h ( κ α ) min (cid:8) E (cid:48) ( κ α , ω (cid:48) ) , E (cid:48) ( κ α , ω (cid:48) , P S , P X | USW ) , E (cid:48) ( κ α , ω (cid:48) , P S , P X | USW , P X (cid:48) | US ) (cid:9) ,κ ∗ u ( κ α ) := max ( P S ,P X | US ) ∈P ( S ) ×P (cid:48) ( X |S×U ) κ u ( κ α , P S , P X | US ) ,κ u ( κ α , P S , P X | US ) := min P S P ˆ V ˆ Y : D ( P ˆ V ˆ Y | S || P V Y | S | P S ) ≤ κ α D (cid:16) P ˆ V ˆ Y | S || Q V Y | S | P S (cid:17) ,P SUV XY = P S P UV P X | US P Y | X and Q SUV XY = P S Q UV P X | US P Y | X . (46)The proof of Theorem 5 is given in Section IV-G. The scheme achieving the error-exponent pair ( κ α , κ ∗ h ( κ α )) utilizes a generalization of hybrid coding [34]. The factor κ ∗ h ( κ α ) corresponds to the error-exponent achieved whena combination of digital and analog coding schemes are used in hybrid coding, while κ ∗ u ( κ α ) corresponds to thatachieved by uncoded transmission in which the channel input X is generated by passing U through a DMC P X | U ,along with time-sharing. As a Corollary, we recover the lower bound on the optimal T2EE in the Stein’s regimeproved in [10, Theorem 5]. Corollary 4. lim κ α → κ ∗ h ( κ α ) = κ h , where κ h is as deﬁned in [10, Theorem 5]. The proof of Corollary 4 is given in Section IV-H.

C. Comparison of the inner bounds

We compare the inner bounds established in Theorem 4 and Theorem 5 for a simple setting of TAD over a BSC.For this purpose, we will use the inner bound κ ∗ D ( κ α ) stated in Corollary 3 and κ ∗ u ( κ α ) that is achieved by uncoded -3 E x Fig. 2: Comparison of the error-exponents trade-off achieved by the SHTCC and JHTCC schemes for TAD overa BSC in Example 1. The blue curve shows ( κ α , κ ∗ u ( κ α )) pairs achieved by uncoded transmission while the redline plots ( κ α , E x (0)) . Note that E x (0) is an upper bound on κ ∗ D ( κ α ) for all values of κ α . Thus, the joint schemeclearly achieves a better error-exponent trade-off for κ α ≤ . .transmission. Our objective is to illustrate that the JHTCC scheme achieves a strictly tighter bound on R comparedto the SHTCC scheme, at least for some points of the trade-off. Example 1.

Let U = V = X = Y = S = { , } , Q UV =  . .  , P Y | X =  − p pp − p  , where p = 0 . , and P UV = Q U Q V . The comparison of the SHTCC and JHTCC schemes for this example are shown in Fig. 2, where we plot theerror-exponents trade-off achieved by uncoded transmission (a lower bound for the JHTCC scheme), and E x (0) := max P SX ∈P ( S×X ) E x ( P SX ,

0) = − .

25 log(4 p (1 − p )) = 0 . , (47)which is an upper bound on κ ∗ D ( κ α ) for any κ α ≥ . To compute E x (0) in (47), we used the closed-form expressionfor E x ( · ) given in [31, Problem 10.26(c)]. Clearly, it can be seen that the JHTCC scheme outperforms SHTCCscheme for κ α ≤ . . IV. P ROOFS

A. Proof of Proposition 1

Let (˜ x , x (cid:48) ) ∈ X n × X n be sequences that satisfy (13). For simplicity of presentation, let ¯ χ ( n ) P Y | X , ˜ x , x (cid:48) ,θ , ¯Π ˜ x,x (cid:48) ,P Y | X , ¯Π ( n )˜ x , x (cid:48) ,P Y | X , d max ( P X X , P Y | X ) and d min ( P X X , P Y | X ) be denoted by ¯ χ ( n ) θ , ¯Π ˜ x,x (cid:48) , ¯Π ( n )˜ x , x (cid:48) , d max and d min , re- spectively. Achievability:

We need to show that ¯ κ (cid:16) E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) , P X X (cid:17) ≥ E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) − θ, for − d min < θ < d max .The type I error probability can be upper bounded for θ > − d min and sufﬁciently large n as follows: ¯ α n (cid:16) ¯ χ ( n ) θ (cid:17) = P P ⊗ nY | X ( ·| ˜ x ) (cid:16) ¯Π ( n )˜ x , x (cid:48) ( Y ) ≥ nθ (cid:17) ( a ) ≤ e − sup λ ≥ (cid:32) nθλ − ψ P ⊗ nY | X ( ·| ˜ x ) , ¯Π( n )˜ x , x (cid:48) ( λ ) (cid:33) (48) ( b ) = e − sup λ ∈ R (cid:32) n (cid:32) θλ − n ψ P ⊗ nY | X ( ·| ˜ x ) , ¯Π( n )˜ x , x (cid:48) ( λ ) (cid:33)(cid:33) , (49)where(a) follows from the Chernoff bound;(b) holds because, for θ > − d min and sufﬁciently large n , the supremum in (49) always occurs at λ ≥ . To seethis, note that the term l n ( λ ) := θλ − n ψ P ⊗ nY | X ( ·| ˜ x ) , ¯Π ( n )˜ x , x (cid:48) ( λ ) is a concave function of λ by Lemma 1 (i). Also,denoting its derivative w.r.t. λ by l (cid:48) n ( λ ) , we have l (cid:48) n (0) = θ − n E P ⊗ nY | X ( ·| ˜ x ) (cid:104) ¯Π ( n )˜ x , x (cid:48) (cid:105) (50) = θ − n n (cid:88) i =1 E P Y | X ( ·| ˜ x i ) (cid:20) log (cid:18) P Y | X ( Y i | x (cid:48) i ) P Y | X ( Y i | ˜ x i ) (cid:19)(cid:21) ( n ) −−→ θ + d min > , (51)where (50) follows from Lemma 1 (iii), and (51) is due to Assumption 1 and (13). Thus, by the concavity of l n ( λ ) , its supremum has to occur at λ ≥ .Simplifying the term within the exponent in (49), we obtain n ψ P ⊗ nY | X ( ·| x ) , ¯Π ( n )˜ x , x (cid:48) := 1 n log (cid:32) E P ⊗ nY | X ( ·| ˜ x ) (cid:34) P λ ⊗ nY | X ( Y | x (cid:48) ) P λ ⊗ nY | X ( Y | ˜ x ) (cid:35)(cid:33) = 1 n log (cid:32) E P ⊗ nY | X ( ·| ˜ x ) (cid:34) n (cid:89) i =1 P λY | X ( Y i | x (cid:48) i ) P λY | X ( Y i | ˜ x i ) (cid:35)(cid:33) = 1 n log (cid:32) n (cid:89) i =1 E P Y | X ( ·| ˜ x i ) (cid:34) P λY | X ( Y i | x (cid:48) i ) P λY | X ( Y i | ˜ x i ) (cid:35)(cid:33) = 1 n n (cid:88) i =1 log (cid:32) E P Y | X ( ·| ˜ x i ) (cid:34) P λY | X ( Y i | x (cid:48) i ) P λY | X ( Y i | ˜ x i ) (cid:35)(cid:33) = (cid:88) ˜ x,x (cid:48) P ˜ xx (cid:48) (˜ x, x (cid:48) ) log (cid:32) E P Y | X ( ·| ˜ x ) (cid:34) P λY | X ( Y | x (cid:48) ) P λY | X ( Y | ˜ x ) (cid:35)(cid:33) (52) ( n ) −−→ E P X X (cid:104) log (cid:16) E P Y | X ( ·| X ) (cid:104) e λ ¯Π X ,X ( Y ) (cid:105)(cid:17)(cid:105) , (53)where (53) follows from (13) and Assumption 1. Substituting (53) in (49) and from (7), we obtain for arbitrarilysmall (but ﬁxed) δ > and sufﬁciently large n , that ¯ α n (cid:16) ¯ χ ( n ) θ (cid:17) ≤ e − sup λ ∈ R (cid:16) n (cid:16) θλ − E PX X (cid:104) log (cid:16) E PY | X ( ·| X (cid:104) e λ ¯Π X ,X Y ) (cid:105)(cid:17)(cid:105) − δ (cid:17)(cid:17) = e − n (cid:18) E PX X (cid:20) sup λ ∈ R (cid:16) θλ − E PY | X ( ·| X (cid:16) e λ ¯Π X ,X Y ) (cid:17)(cid:17)(cid:21) − δ (cid:19) = e − n (cid:18) E PX X (cid:20) ψ ∗ PY | X ( ·| X , ¯Π X ,X ( θ ) (cid:21) − δ (cid:19) . (54)Similarly, it can be shown that for θ < d max , ¯ β n (cid:16) ¯ χ ( n ) θ (cid:17) ≤ e − n (cid:18) E PX X (cid:20) ψ ∗ PY | X ( ·| X , ¯Π X ,X ( θ ) (cid:21) − δ (cid:19) . (55)Moreover, for (˜ x, x (cid:48) ) ∈ X × X , we have e ψ PY | X ( ·| x (cid:48) ) , ¯Π˜ x,x (cid:48) ( λ ) = (cid:88) y ∈Y P λ +1 Y | X ( ·| x (cid:48) ) P λY | X ( ·| ˜ x ) = e ψ PY | X ( ·| ˜ x ) , ¯Π˜ x,x (cid:48) ( λ +1) . It follows that ψ ∗ P Y | X ( ·| x (cid:48) ) , ¯Π ˜ x,x (cid:48) ( θ ) := sup λ ∈ R (cid:16) λθ − ψ P Y | X ( ·| x (cid:48) ) , ¯Π ˜ x,x (cid:48) ( λ ) (cid:17) = sup λ ∈ R (cid:16) λθ − ψ P Y | X ( ·| ˜ x ) , ¯Π ˜ x,x (cid:48) ( λ + 1) (cid:17) = ψ ∗ P Y | X ( ·| ˜ x ) , ¯Π ˜ x,x (cid:48) ( θ ) − θ. Hence, E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) = E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) − θ. (56)From (54), (55) and (56), it follows that for − d min < θ < d max , ¯ κ (cid:16) E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) − δ, P X X (cid:17) ≥ E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) − θ − δ. The proof of achievability is completed by noting that δ > is arbitrary and ¯ κ ( κ α , P X X ) is a continuous functionof κ α for a ﬁxed P X X .Next, we prove the converse. Converse:

Let I n (˜ x, x (cid:48) ) := { i ∈ [ n ] : ˜ x i = ˜ x and x (cid:48) i = x (cid:48) } . For any θ ∈ R and decision function ¯ g n , we have from Theorem 2 that ¯ α n (¯ g n ) + e − nθ ¯ β n (¯ g n ) ≥ P P ⊗ nY | X ( ·| ˜ x ) (cid:32) log (cid:32) P ⊗ nY | X ( Y | x (cid:48) ) P ⊗ nY | X ( Y | ˜ x ) (cid:33) ≥ nθ (cid:33) . (57)Simplifying the right hand side (RHS) of (57), we obtain P P ⊗ nY | X ( ·| ˜ x ) (cid:32) log (cid:32) P ⊗ nY | X ( Y | x (cid:48) ) P ⊗ nY | X ( Y | ˜ x ) (cid:33) ≥ nθ (cid:33) = P P ⊗ nY | X ( ·| ˜ x ) (cid:32) n (cid:88) i =1 log (cid:18) P Y | X ( Y i | x (cid:48) i ) P Y | X ( Y i | ˜ x i ) (cid:19) ≥ nθ (cid:33) = P P ⊗ nY | X ( ·| ˜ x ) (cid:32) (cid:88) ˜ x,x (cid:48) (cid:88) i ∈I n (˜ x,x (cid:48) ) log (cid:18) P Y | X ( Y i | x (cid:48) i ) P Y | X ( Y i | ˜ x i ) (cid:19) ≥ nθ (cid:33) = P P ⊗ nY | X ( ·| ˜ x ) (cid:88) ˜ x,x (cid:48) (cid:88) i ∈I n (˜ x,x (cid:48) ) log (cid:18) P Y | X ( Y i | x (cid:48) i ) P Y | X ( Y i | ˜ x i ) (cid:19) ≥ (cid:88) (˜ x,x (cid:48) ) ∈X ×X nP ˜ xx (cid:48) (˜ x, x (cid:48) ) θ  ( a ) ≥ P P ⊗ nY | X ( ·| ˜ x )  (cid:92) ˜ x,x (cid:48)  (cid:88) i ∈I n (˜ x,x (cid:48) ) log (cid:18) P Y | X ( Y i | x (cid:48) i ) P Y | X ( Y i | ˜ x i ) (cid:19) ≥ nP ˜ xx (cid:48) (˜ x, x (cid:48) ) θ  ( b ) = (cid:89) (˜ x,x (cid:48) ) ∈X ×X P P ⊗ nY | X ( ·| ˜ x )  (cid:88) i ∈I n (˜ x,x (cid:48) ) log (cid:18) P Y | X ( Y i | x (cid:48) i ) P Y | X ( Y i | ˜ x i ) (cid:19) ≥ nP ˜ xx (cid:48) (˜ x, x (cid:48) ) θ  , where(a) follows since the event (cid:110) (cid:84) ˜ x,x (cid:48) (cid:16)(cid:80) i ∈I n (˜ x,x (cid:48) ) log (cid:16) P Y | X ( Y i | x (cid:48) i ) P Y | X ( Y i | ˜ x i ) (cid:17) ≥ nP ˜ xx (cid:48) (˜ x, x (cid:48) ) θ (cid:17) (cid:111) ⊆ (cid:110) (cid:80) ˜ x,x (cid:48) (cid:80) i ∈I n (˜ x,x (cid:48) ) log (cid:16) P Y | X ( Y i | x (cid:48) i ) P Y | X ( Y i | ˜ x i ) (cid:17) ≥ (cid:80) (˜ x,x (cid:48) ) ∈X ×X nP ˜ xx (cid:48) (˜ x, x (cid:48) ) θ (cid:111) ;(b) is due to the independence of the events (cid:8) (cid:80) i ∈I n (˜ x,x (cid:48) ) log (cid:16) P Y | X ( Y i | x (cid:48) i ) P Y | X ( Y i | ˜ x i ) (cid:17) ≥ nP ˜ xx (cid:48) (˜ x, x (cid:48) ) θ (cid:9) for different (˜ x, x (cid:48) ) ∈ X × X .Let b ˜ x,x (cid:48) ,P Y | X ( θ ) := min ˜ Q ˜ x ∈P ( Y ): E ˜ Q ˜ x (cid:20) log (cid:18) PY | X ( Y | x (cid:48) ) PY | X ( Y | ˜ x ) (cid:19)(cid:21) ≥ θ D (cid:16) ˜ Q x || P Y | X ( ·| ˜ x ) (cid:17) . (58)Then, for arbitrary δ > , δ (cid:48) > δ and sufﬁciently large n , we can write ¯ α n + e − nθ ¯ β n ( a ) ≥ (cid:89) (˜ x,x (cid:48) ) ∈X ×X e − nP ˜ xx (cid:48) (˜ x,x (cid:48) ) (cid:16) b ˜ x,x (cid:48) ,PY | X ( θ )+ δ (cid:17) (59) ( b ) ≥ (cid:89) (˜ x,x (cid:48) ) ∈X ×X e − nP ˜ xx (cid:48) (˜ x,x (cid:48) ) (cid:18) ψ ∗ PY | X ( ·| ˜ x ) , ¯Π˜ x,x (cid:48) ( θ )+ δ (cid:19) (60) ( c ) = e − n (cid:18) E PX X (cid:20) ψ ∗ PY | X ( ·| X , ¯Π X ,X ( θ ) (cid:21) + δ (cid:48) (cid:19) , (61)where(a) follows from [38, Theorem 14.1];(b) follows since b ˜ x,x (cid:48) ,P Y | X ( θ ) = ψ ∗ P Y | X ( ·| ˜ x ) , ¯Π ˜ x,x (cid:48) ( θ ) from [38, Theorem 13.3] and [38, Theorem 14.3]; (c) is due to (13).Equation (61) implies that lim sup n →∞ min (cid:26) − n log α n , − n log β n + θ (cid:27) ≤ E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) + δ (cid:48) . (62)Hence, if for all sufﬁciently large n , α n < e − n (cid:18) E PX X (cid:20) ψ ∗ PY | X ( ·| X , ¯Π X ,X ( θ ) (cid:21) + δ (cid:48) (cid:19) , (63)then lim sup n →∞ − n log β n ≤ E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) − θ + δ (cid:48) . (64)Since δ (and δ (cid:48) ) is arbitrary, this implies via the continuity of ¯ κ ( κ α , P X X ) in κ α that ¯ κ (cid:16) E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) , P X X (cid:17) ≤ E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) − θ. To complete the proof, we need to show that θ can be restricted to lie in ¯ I ( P X X , P Y | X ) . Towards this, it sufﬁcesto show the following:(i) E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( − d min ) (cid:105) = 0 ,(ii) E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( d max ) (cid:105) = d max , and(iii) E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) and E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) − θ are convex functions of θ .We have, E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( − d min ) (cid:105) := sup λ ∈ R − λ E P X X (cid:2) D (cid:0) P Y | X ( ·| X ) || P Y | X ( ·| X ) (cid:1) (cid:3) − E P X X (cid:104) ψ P Y | X ( ·| X ) , ¯Π X ,X ( λ ) (cid:105) ≤ (cid:88) ˜ x,x (cid:48) P X X (˜ x, x (cid:48) ) (cid:34) sup λ ˜ x,x (cid:48) ∈ R − λ ˜ x,x (cid:48) D (cid:0) P Y | X ( ·| ˜ x ) || P Y | X ( ·| x (cid:48) ) (cid:1) − ψ P Y | X ( ·| ˜ x ) , ¯Π ˜ x,x (cid:48) ( λ ˜ x,x (cid:48) ) (cid:35) (65) = 0 , (66)where (66) follows since each term inside the square braces in (65) is zero, which in turn follows from Lemma 1(iii). Also, E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( − d min ) (cid:105) = (cid:88) ˜ x,x (cid:48) P X X (˜ x, x (cid:48) ) ψ ∗ P Y | X ( ·| ˜ x ) , ¯Π ˜ x,x (cid:48) ( − d min ) ≥ , (67)where (67) follows from the non-negativity of ψ ∗ P Y | X ( ·| ˜ x ) , ¯Π ˜ x,x (cid:48) for every (˜ x, x (cid:48) ) ∈ X × X stated in Lemma 1 (iii).Combining (66) and (67) proves ( i ) . We also have E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( d max ) (cid:105) − d max = E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( d max ) (cid:105) = 0 , (68)where (68) follows similarly to the proof of ( i ) . This proves ( ii ) . Finally, (iii) follows from Lemma 1 (iii) and thefact that a weighted sum of convex functions is convex provided the weights are non-negative, thus completing theproof. B. Proof of Theorem 3

Achievability:

Fix P X X ∈ P ( X × X ) . Let (cid:8) (˜ x , x (cid:48) ) ∈ X n × X n (cid:9) n ∈ N satisfy (13). For brevity, we will denote χ ( n ) P U ,Q U ,θ , ¯ χ ( n ) P Y | X , ˜ x , x (cid:48) ,θ , ¯Π ˜ x,x (cid:48) ,P Y | X and ¯Π ( n )˜ x , x (cid:48) ,P Y | X by χ ( n ) θ , ¯ χ ( n ) θ , ¯Π ˜ x,x (cid:48) and ¯Π ( n )˜ x , x (cid:48) , respectively. Consider any θ ∈ I ( P U , Q U ) and θ ∈ ¯ I ( P X X , P Y | X ) . The achievability scheme is as follows:The observer ﬁrst performs the NP test given in (10) locally on u with θ = θ , i.e., χ ( n ) θ , and outputs the channelinput codeword x distributed according to f n ( x | u ) = { x =˜ x } (cid:8) χ ( n ) θ ( u )=0 (cid:9) + { x = x (cid:48) } (cid:8) χ ( n ) θ ( u )=1 (cid:9) . Based on the received samples y ∈ Y n , the decision maker outputs g n ( y ) = ¯ χ ( n ) θ ( y ) , where ¯ χ ( n ) θ is deﬁned in(20). Let A n,θ = (cid:40) y ∈ Y n : n (cid:88) i =1 log (cid:18) P Y | X ( y i | x (cid:48) i ) P Y | X ( y i | ˜ x i ) (cid:19) < nθ (cid:41) . (69)Let c n = ( f n , g n ) denote the HT code. The type I error error probability can be upper bounded for any δ > andsufﬁciently large n as follows: α n ( c n ) ≤ P P ⊗ nU (cid:16) χ ( n ) θ ( U ) = 1 (cid:17) P P ⊗ nY | X ( ·| x (cid:48) ) (cid:0) A cn,θ (cid:1) + P P ⊗ nU (cid:16) χ ( n ) θ ( U ) = 0 (cid:17) P P ⊗ nY | X ( ·| ˜ x ) (cid:0) A cn,θ (cid:1) ≤ P P ⊗ nU (cid:16) χ ( n ) θ ( U ) = 1 (cid:17) + P P ⊗ nY | X ( ·| ˜ x ) (cid:0) A cn,θ (cid:1) ≤ e − n (cid:18) ψ ∗ PU , Π PU ,QU ( θ ) − δ (cid:19) + e − n (cid:18) E PX X (cid:20) ψ ∗ PY | X ( ·| X , ¯Π X ,X ( θ ) (cid:21) − δ (cid:19) , (70)where (70) follows from Theorem 1 and Proposition 1. Similarly, the type II error probability can be upper boundedas follows: β n ( c n ) ≤ P Q ⊗ nU (cid:16) χ ( n ) θ ( U ) = 0 (cid:17) P P ⊗ nY | X ( ·| ˜ x ) ( A n,θ ) + P Q ⊗ nU (cid:16) χ ( n ) θ ( U ) = 1 (cid:17) P P ⊗ nY | X ( ·| x (cid:48) ) ( A n,θ ) ≤ P Q ⊗ nU (cid:16) χ ( n ) θ ( U ) = 0 (cid:17) + P P ⊗ nY | X ( ·| x (cid:48) ) ( A n,θ ) ≤ e − n (cid:18) ψ ∗ QU , Π PU ,QU ( θ ) − δ (cid:19) + e − n (cid:18) E PX X (cid:20) ψ ∗ PY | X ( ·| X , ¯Π X ,X ( θ ) (cid:21) − δ (cid:19) = e − n (cid:18) ψ ∗ PU , Π PU ,QU ( θ ) − θ − δ (cid:19) + e − n (cid:18) E PX X (cid:20) ψ ∗ PY | X ( ·| X , ¯Π X ,X ( θ ) (cid:21) − θ − δ (cid:19) , (71) where (71) again follows from Theorem 1 and Proposition 1. Thus, from (70) and (71), respectively, we have lim inf n →∞ − n log α n ( c n ) ≥ min (cid:110) ψ ∗ P U , Π PU ,QU ( θ ) , E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105)(cid:111) − δ := ζ ( θ , θ , P X X ) − δ, and lim inf n →∞ − n log β n ( c n ) ≥ min (cid:110) ψ ∗ P U , Π PU ,QU ( θ ) − θ , E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) − θ (cid:111) − δ := ζ ( θ , θ , P X X ) − δ. Since δ is arbitrary, it follows by varying P X X ∈ P ( X × X ) , θ ∈ I ( P U , Q U ) and θ ∈ ¯ I ( P X X , P Y | X ) that κ ( κ α ) ≥ sup { κ α : ( κ α , κ β ) ∈ R ∗ } , where R ∗ is deﬁned in (23). This completes the proof of achievability. Converse : From the proof of the converse part of Theorem 1, it follows that for θ ∈ I ( P U , Q U ) , R ⊆ (cid:91) θ ∈I ( P U ,Q U ) (cid:16) ψ ∗ P U , Π PU ,QU ( θ ) , ψ ∗ P U , Π PU ,QU ( θ ) − θ (cid:17) . (72)Le c n = ( f n , g n ) denote an arbitrary code and let A n ⊆ Y n denote the corresponding acceptance region. We have α n ( c n ) = (cid:88) u ∈U n P ⊗ nU ( u ) (cid:88) x ∈X n f n ( x | u ) P P ⊗ nY | X ( ·| x ) ( A cn ) ≥ P P ⊗ nY | X ( ·| ˜ x ) ( A cn ) , (73)for some ˜ x ∈ X n that depends on A n . Similarly, β n ( c n ) ≥ P P ⊗ nY | X ( ·| x (cid:48) ) ( A n ) , (74)for some x (cid:48) ∈ X n . Let ¯ P X X denote the joint type of the sequences (˜ x , x (cid:48) ) . Note that the R.H.S. of (73) and (74)correspond to the type I and type II error probabilities of the HT given in (14). Then, it follows from the conversepart of the proof of Proposition 1 that, if for some θ ∈ ¯ I ( ¯ P X X , P Y | X ) and all sufﬁciently large n , it holds that α n ( c n ) < e − n E ¯ PX X (cid:20) ψ ∗ PY | X ( ·| X , ¯Π X ,X ( θ ) (cid:21) , (75)then lim sup n →∞ − n log β n ( c n ) ≤ E ¯ P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) − θ . (76)From (75) and (76), we have R ⊆ (cid:91) P X X ∈ P ( X ×X ) (cid:91) θ ∈ ¯ I ( P X X ,P Y | X ) (cid:16) E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) , E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) − θ (cid:17) . (77) It follows from (72) and (77) that ( κ α , κ β ) ∈ R only if there exists some P X X and ( θ , θ ) ∈ I ( P U , Q U ) × ¯ I ( P X X , P Y | X ) such that κ α ≤ min (cid:110) ψ ∗ P U , Π PU ,QU ( θ ) , E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105)(cid:111) , and κ β ≤ min (cid:110) ψ ∗ P U , Π PU ,QU ( θ ) − θ , E P X X (cid:104) ψ ∗ P Y | X ( ·| X ) , ¯Π X ,X ( θ ) (cid:105) − θ (cid:111) , from which it follows that κ ( κ α ) ≤ sup { κ β : ( κ α , κ β ) ∈ R ∗ } . This completes the proof. C. Proof of Theorem 4

We will show the achievability of the error-exponent pair ( κ α , κ ∗ s ( κ α )) by constructing a suitable ensemble ofHT codes, and showing that the expected (over this ensemble) type I and type II error probabilities satisfy (6) forthe pair ( κ α , κ ∗ s ( κ α )) . Then, an expurgation argument [40] will be used to show the existence of a HT code thatsatisﬁes (6) for the same error-exponent pair, thus showing that ( κ α , κ ∗ s ( κ α )) ∈ R as desired.Let n ∈ N , |W| < ∞ , κ α > , ( ω, R, P SX , θ ) ∈ L ( κ α ) , and η > be a small number. Let R (cid:48) , R ≥ satisfy R (cid:48) := ζ ( κ α , ω ) , (78)and ζ ( κ α , ω ) − ρ ( κ α , ω ) ≤ R < I P ( X ; Y | S ) , (79)where ζ ( κ α , ω ) and ρ ( κ α , ω ) are deﬁned in (33) and (34), respectively. Our scheme is as follows: Encoder

The observer’s encoder is composed of two stages, a source encoder followed by a channel encoder . Source encoder

The source encoding comprises of a quantization scheme followed by binning (if necessary). The details are asfollows:

Quantization codebook

Let D n ( P U , η ) := (cid:8) P ˆ U ∈ T ( U n ) : D ( P ˆ U || P U ) ≤ κ α + η (cid:9) . (80)Consider some ordering on the types in D n ( P U , η ) and denote the elements as P ˆ U ( i ) for i ∈ (cid:2) |D n ( P U , η ) | (cid:3) . Foreach type P ˆ U ( i ) ∈ D n ( P U , η ) , i ∈ (cid:2) |D n ( P U , η ) | (cid:3) , choose a joint type variable P ˆ U ( i ) ˆ W ( i ) ∈ T ( U n × W n ) such that D (cid:16) P ˆ W ( i ) | ˆ U ( i ) || P W ( i ) | U (cid:12)(cid:12) P ˆ U ( i ) (cid:17) ≤ η , (81) I P (cid:16) ˆ U ( i ) ; ˆ W ( i ) (cid:17) ≤ R (cid:48) + η , (82)where P W ( i ) | U = ω ( P ˆ U ( i ) ) . Note that this is always possible for n sufﬁciently large.Let D n ( P UW , η ) := (cid:8) P ˆ U ( i ) ˆ W ( i ) : i ∈ (cid:2) |D n ( P U , η ) | (cid:3)(cid:9) , (83) R (cid:48) i := I P (cid:16) ˆ U ( i ) ; ˆ W ( i ) (cid:17) + η , i ∈ (cid:2) |D n ( P U , η ) | (cid:3) , (84) M (cid:48) i := (cid:34) i − (cid:88) k =1 e nR (cid:48) k : i (cid:88) k =1 e nR (cid:48) k (cid:35) . (85)Let B ( n ) W = (cid:110) W ( j ) , j ∈ (cid:104)(cid:80) |D n ( P U ,η ) | i =1 |M (cid:48) i | (cid:105)(cid:111) denote a random quantization codebook such that the codeword W ( j ) ∼ Unif [ T n (cid:0) P ˆ W ( i ) (cid:1) ] , if j ∈ M (cid:48) i for some i ∈ (cid:2) |D n ( P U , η ) | (cid:3) . Denote a realization of B ( n ) W by B ( n ) W = (cid:110) w ( j ) ∈ W n , j ∈ (cid:104) (cid:80) |D n ( P U ,η ) | i =1 |M (cid:48) i | (cid:105)(cid:111) . Quantization scheme

For a given codebook B ( n ) W and u ∈ T n (cid:0) P ˆ U ( i ) (cid:1) such that P ˆ U ( i ) ∈ D n ( P U , η ) for some i ∈ (cid:2) |D n ( P U , η ) | (cid:3) , let ˜ M ( u , B ( n ) W ) := (cid:8) j ∈ M (cid:48) i : w ( j ) ∈ B ( n ) W and ( u , w ( j )) ∈ T n (cid:0) P ˆ U ( i ) ˆ W ( i ) (cid:1) , P ˆ U ( i ) ˆ W ( i ) ∈ D n ( P UW , η ) (cid:9) . If | ˜ M ( u , B ( n ) W ) | ≥ , let M (cid:48) (cid:16) u , B ( n ) W (cid:17) denote an index selected uniformly at random from the set ˜ M ( u , B ( n ) W ) ,otherwise, set M (cid:48) (cid:16) u , B ( n ) W (cid:17) = 0 . Denoting the support of M (cid:48) (cid:16) u , B ( n ) W (cid:17) by M (cid:48) , we have for sufﬁciently large n that |M (cid:48) | ≤ |D n ( P U ,η ) | (cid:88) i =1 e nR (cid:48) i ≤ |D n ( P U , η ) | e n (cid:32) max P ˆ U ˆ W ∈D n ( PUW ,η ) I ( ˆ U ; ˆ W )+ η (cid:33) ≤ |D n ( P U , η ) | e n ( R (cid:48) + η ) ≤ e n ( R (cid:48) + η ) , (86)where, in (86), we used (82) and the fact that |D n ( P U , η ) | ≤ ( n + 1) |U| . Binning If |M (cid:48) | > |M| , then the source encoder performs binning as follows:Let R n := log (cid:18) e nR |D n ( P U , η ) | (cid:19) , n ∈ N , M i := [1 + ( i − R n : iR n ] , i ∈ (cid:2) |D n ( P U , η ) | (cid:3) , and M := { } (cid:91) (cid:8) ∪ i ∈ [ |D n ( P U ,η ) | ] M i (cid:9) . Note that e nR n ≥ e n ( R − |U| n log( n +1) ) . (87)Let f B denote the random binning function such that for each j ∈ M (cid:48) i , f B ( j ) ∼ Unif [ |M i | ] , i ∈ (cid:2) |D n ( P U , η ) | (cid:3) ,and f B (0) = 0 with probability one. Denote a realization of f B ( j ) by f b , where f b : M (cid:48) → M . Given a codebook B ( n ) W and binning function f b , the source encoder outputs M = f b (cid:16) M (cid:48) (cid:16) u , B ( n ) W (cid:17)(cid:17) for u ∈ U n .If |M (cid:48) | ≤ |M| , then f b is taken to be the identity map (no binning), and in this case, M = M (cid:48) (cid:16) u , B ( n ) W (cid:17) . Channel codebook

Let B ( n ) X := { X ( m ) ∈ X n , m ∈ M} denote a random channel codebook generated as follows:Without loss of generality (w.l.o.g.), denote the elements of the set S = X as , . . . , |X | . The codeword length n is divided into |S| = |X | blocks, where the length of the ﬁrst block is (cid:100) P S (1) n (cid:101) , the second block is (cid:100) P S (2) n (cid:101) ,so on so forth, and the length of the last block is chosen such that the total length is n . For i ∈ [ |X | ] , let k i := i − (cid:88) l =1 (cid:100) P S ( l ) n (cid:101) + 1 , ¯ k i := i (cid:88) l =1 (cid:100) P S ( l ) n (cid:101) , where the empty sum is deﬁned to be zero. Let s ∈ X n be such that s ¯ k i k i = i , i.e., the elements of s equal i in the i th block for i ∈ [ |X | ] . Let X (0) = s with probability one, and the remaining codewords X ( m ) , m ∈ M\{ } beconstant composition codewords selected such that X ¯ k i k i ( m ) ∼ Unif (cid:104) T (cid:100) P S ( i ) n (cid:101) (cid:16) ˆ P X | S ( ·| i ) (cid:17)(cid:105) , where ˆ P X | S is suchthat T (cid:100) P S ( i ) n (cid:101) (cid:16) ˆ P X | S ( ·| i ) (cid:17) is non-empty and D ( ˆ P X | S || P X | S | P S ) ≤ η . Denote a realization of B ( n ) X by B ( n ) X := { x ( m ) ∈ X n , m ∈ M} . Note that for m ∈ M\{ } and large n , the codeword pair ( x (0) , x ( m )) has joint type(approx) P x (0) x ( j ) = ˆ P SX := P S ˆ P X | S . Channel encoder

For a given B ( n ) X , the channel encoder outputs x = x ( m ) for output m ∈ M from the source encoder. Denote thismap by f B ( n ) X : M → X n .Let f n : U n → P ( X n ) denote the encoder induced by the above operations, i.e., f n ( x | u ) = (cid:26) x = f B ( n ) X (cid:16) f b (cid:16) M (cid:48) (cid:16) u , B ( n ) W (cid:17)(cid:17)(cid:17)(cid:27) . (88) Decision function

The decision function at the decision maker consists of three parts, a channel decoder, a source decoder and a tester.

Channel decoder

The channel decoder ﬁrst performs a NP test on the channel output y ∈ Y n according to ˜Π θ : Y n → { , } , where ˜Π θ ( y ) := (cid:32) n (cid:88) k =1 log (cid:18) P Y | X ( y k | s k ) P Y | S ( y k | s k ) (cid:19) ≥ nθ (cid:33) . (89)If ˜Π θ ( y ) = 1 , then ˆ M = 0 . Else, for a given B ( n ) X , maximum likelihood (ML) decoding is done on the remaining setof codewords { x ( m ) , m ∈ M\{ }} , and ˆ M is set equal to the ML estimate. Denote the channel decoder inducedby the above mentioned operations by g B ( n ) X , where g B ( n ) X : Y n → M .For a given codebook B ( n ) X , the channel encoder-decoder pair described above induces a distribution P (cid:16) B ( n ) X (cid:17) XY ˆ M | M ( m, x , y , ˆ m | m ) := (cid:26) f B∗ ( n ) X ( m )= x (cid:27) P ⊗ nY | X ( y | x ) (cid:110) ˆ m = g B ( n ) X (cid:111) . (90)Note that P x (0) x ( m ) = ˆ P SX , Y ∼ (cid:81) |X | i =1 P ⊗(cid:100) P S ( k ) n (cid:101) Y | X ( ·| i ) for M = 0 and Y ∼ (cid:81) |X | i =1 P ⊗(cid:100) P S ( k ) n (cid:101) Y | S ( ·| i ) for M = m (cid:54) =0 . Then, it follows similar to Proposition 1 that for any B ( n ) X and n sufﬁciently large, the NP test in (89) yields P P ( B ( n ) X ) (cid:16) ˆ M = 0 | M = m (cid:17) ≤ e − n ( E sp ( P SX ,θ ) − η ) , m ∈ M\{ } , (91) and P P ( B ( n ) X ) (cid:16) ˆ M (cid:54) = 0 | M = 0 (cid:17) ≤ e − n ( E sp ( P SX ,θ ) − θ − η ) . (92)Moreover, given ˆ M (cid:54) = 0 , it follows from a random coding argument over the ensemble of B nX (see [31, Exercise10.18, 10.24] and [40]) that there exists a deterministic codebook B ∗ ( n ) X such that (91) and (92) holds, and theML-decoding described above asymptotically yields P P ( B∗ ( n ) X ) (cid:16) ˆ M (cid:54) = m | M = m (cid:54) = 0 , ˆ M (cid:54) = 0 (cid:17) ≤ e − n ( E x ( R,P SX ) − η ) . (93)This deterministic codebook B ∗ ( n ) X is used for channel coding. Source decoder

For a given codebook B ( n ) W and inputs ˆ M = ˆ m and V = v , the source decoder ﬁrst decodes for the quantizationcodeword w ( ˆ m (cid:48) ) (if required) using the empirical conditional entropy decoder (ECED), and then declares the outputof the hypothesis test ˆ H based on w ( ˆ m (cid:48) ) and v . More speciﬁcally, if binning is not performed, i.e., if |M| ≥ |M (cid:48) | , ˆ M (cid:48) = ˆ m . Otherwise, ˆ M (cid:48) = ˆ m (cid:48) , where ˆ m (cid:48) :=  , if ˆ m = 0 , arg min j : f b ( j )= ˆ m H e ( w ( j ) | v ) , otherwise . Denote the source decoder induced by the above operations by g B ( n ) W : M × V n → M (cid:48) . Testing and Acceptance region A n If ˆ m (cid:48) = 0 , ˆ H = 1 is declared. Otherwise, ˆ H = 0 or ˆ H = 1 is declared depending on whether ( ˆ m (cid:48) , v ) ∈ A n or ( ˆ m (cid:48) , v ) / ∈ A n , respectively, where A n denotes the acceptance region for H as speciﬁed below:For a given codebook B ( n ) W , let O m (cid:48) denote the set of u such that the source encoder outputs m (cid:48) , m (cid:48) ∈ M (cid:48) \{ } .For each m (cid:48) ∈ M (cid:48) \{ } and u ∈ O m (cid:48) , let Z m (cid:48) ( u ) = { v ∈ V n : ( w ( m (cid:48) ) , u , v ) ∈ J n ( κ α + η, P W m (cid:48) UV ) } , where J n ( · ) is as deﬁned in (1), and P UV W m (cid:48) := P UV P W m (cid:48) | U and P W m (cid:48) | U = ω ( P u ) . (94)For m (cid:48) ∈ M (cid:48) \{ } , we deﬁne Z m (cid:48) := { v : v ∈ Z m (cid:48) ( u ) for some u ∈ O m (cid:48) } . Deﬁne the acceptance region for H at the decision maker as A n := ∪ m (cid:48) ∈M (cid:48) \ m (cid:48) × Z m (cid:48) , (95) or equivalently as A en := ∪ m (cid:48) ∈M (cid:48) \ O m (cid:48) × Z m (cid:48) . (96)Note that A n is the same as the acceptance region for H in [13, Theorem 1]. Denote the decision function inducedby g B ( n ) X , g B ( n ) W and A n by g n : Y n × V n → ˆ H . Induced probability distribution

Denote the PMF induced by a code c n = ( f n , g n ) w.r.t. codebook B n := (cid:16) B ( n ) W , f b , B ∗ ( n ) X (cid:17) under H and H by P ( B n ,c n ) UV M (cid:48) M XY ˆ M ˆ M (cid:48) ˆ H ( u , v , m (cid:48) , m, x , y , ˆ m, ˆ m (cid:48) , ˆ h ):= P ⊗ nUV ( u , v ) (cid:110) M (cid:48) (cid:16) u , B ( n ) W (cid:17) = m (cid:48) (cid:111) { f b ( m (cid:48) )= m } P (cid:16) B ( n ) X (cid:17) XY ˆ M | M ( m, x , y , ˆ m | m ) (cid:26) g B ( n ) W ( m, v )= ˆ m (cid:48) (cid:27) (cid:110) ˆ h = { ( ˆ m (cid:48) , v ) ∈A cn } (cid:111) , and Q ( B n ,c n ) UV M (cid:48) M XY ˆ M ˆ M (cid:48) ˆ H ( u , v , m (cid:48) , m, x , y , ˆ m, ˆ m (cid:48) , ˆ h ):= Q ⊗ nUV ( u , v ) (cid:110) M (cid:48) (cid:16) u , B ( n ) W (cid:17) = m (cid:48) (cid:111) { f b ( m (cid:48) )= m } P (cid:16) B ( n ) X (cid:17) XY ˆ M | M ( m, x , y , ˆ m | m ) (cid:26) g B ( n ) W ( m, v )= ˆ m (cid:48) (cid:27) (cid:110) ˆ h = { ( ˆ m (cid:48) , v ) ∈A cn } (cid:111) , respectively. For simplicity of presentation, we will denote P ( B n ,c n ) UV M (cid:48) M XY ˆ M ˆ M (cid:48) ˆ H by P ( B n ) and Q ( B n ,c n ) UV M (cid:48) M XY ˆ M ˆ M (cid:48) ˆ H by Q ( B n ) . Let B n := (cid:16) B ( n ) W , f B , B ∗ ( n ) X (cid:17) , B n , and µ n denote the random codebook, its support, and the probabilitymeasure induced by its random construction, respectively. Also, deﬁne ¯ P P ( B n ) := E µ n (cid:2) P P ( B n ) (cid:3) and ¯ P Q ( B n ) := E µ n (cid:2) P Q ( B n ) (cid:3) . Analysis of the type I and type II error probabilities

We analyze the type I and type II error probabilities averaged over the random ensemble of quantization and binningcodebooks ( B W , f B ) . Then, an expurgation technique [40] guarantees the existence of a sequence of deterministiccodebooks {B n } n ∈ N and a code { c n = ( f n , g n ) } n ∈ N that achieves the lower bound given in Theorem 4. Type I error probability

In the following, random sets where the randomness is induced due to B n will be written using blackboard boldletters, e.g., A n (and equivalently A en ) for the random acceptance region for H . Note that a type I error can occuronly under the following events:(i) E EE := (cid:83) P ˆ U ∈D n ( P U ,η ) (cid:83) u ∈T n ( P ˆ U ) E EE ( u ) , where E EE ( u ) := (cid:8) (cid:64) j ∈ M (cid:48) \{ } , s.t. ( u , W ( j )) ∈ T n (cid:0) P ˆ U ( i ) ˆ W ( i ) (cid:1) , P ˆ U ( i ) = P u , P ˆ U ( i ) ˆ W ( i ) ∈ D n ( P UW , η ) (cid:9) ; (ii) E NE := { ˆ M (cid:48) = M (cid:48) and ( ˆ M (cid:48) , V ) / ∈ A n } ;(iii) E OCE := { M (cid:48) (cid:54) = 0 , ˆ M (cid:54) = M and ( ˆ M (cid:48) , V ) / ∈ A n } ;(iv) E SCE := { M (cid:48) = M = 0 , ˆ M (cid:54) = M and ( ˆ M (cid:48) , V ) / ∈ A n } ;(v) E BE := { M (cid:48) (cid:54) = 0 , ˆ M = M, ˆ M (cid:48) (cid:54) = M (cid:48) and ( ˆ M (cid:48) , V ) / ∈ A n } .Here, E EE corresponds to the event that there does not exist a quantization codeword corresponding to atleast one sequence u of type P u ∈ D n ( P U , η ) ; E NE corresponds to the event, in which, there is neither an error at the channeldecoder nor at the ECED; E OCE and E SCE corresponds to the case, in which, there is an error at the channel decoder(hence also at the ECED); and, E BE corresponds to the case such that there is an error (due to binning) only at theECED.As we show later in (177), it follows from a generalization of the type-covering lemma [31, Lemma 9.1] that ¯ P P ( B n ) ( E EE ) ≤ e − e n Ω( η ) , (97)where Ω( · ) denotes the Big-omega function. Since e n Ω( η ) n ( n ) −−→ ∞ for η > , the event E EE may be safely ignoredfrom the analysis of the error-exponents. Given E c EE holds for some B ( n ) W , it follows from [13, Equation 4.22] that ¯ P P ( B n ) ( E NE |E c EE ) ≤ e − nκ α , (98)for sufﬁciently large n since the acceptance region is the same as that in [13, Theorem 1]. Next, consider the event E OCE . We have for sufﬁciently large n that ¯ P P ( B n ) ( E OCE ) ≤ ¯ P P ( B n ) ( M (cid:48) (cid:54) = 0) ¯ P P ( B n ) (cid:16) ˆ M (cid:54) = M | M (cid:48) (cid:54) = 0 (cid:17) ( a ) ≤ ¯ P P ( B n ) (cid:16) ˆ M (cid:54) = M | M (cid:54) = 0 (cid:17) ≤ ¯ P P ( B n ) (cid:16) ˆ M = 0 | M (cid:54) = 0 (cid:17) + ¯ P P ( B n ) (cid:16) ˆ M (cid:54) = M | M (cid:54) = 0 , ˆ M (cid:54) = 0 (cid:17) ( b ) ≤ e − n ( E m ( P SX ,θ ) − η ) + e − n ( E x ( R,P SX ) − η ) (99) = e − n (min { E m ( P SX ,θ ) , E x ( R,P SX ) }− η ) , (100)where(a) holds since the event { M (cid:48) (cid:54) = 0 } is equivalent to { M (cid:54) = 0 } ;(b) holds due to (91) and (93), which holds for B ∗ ( n ) X .Also, the probability of E SCE can be upper bounded as ¯ P P ( B n ) ( E SCE ) ≤ ¯ P P ( B n ) ( M (cid:48) = 0) ≤ ¯ P P ( B n ) ( M (cid:48) = 0 | U ∈ D n ( P U , η )) + ¯ P P ( B n ) ( U / ∈ D n ( P U , η ))= ¯ P P ( B n ) ( E EE ) + ¯ P P ( B n ) ( U / ∈ D n ( P U , η )) ≤ e − nκ α , (101)where (101) is due to (97), the deﬁnition of D n ( P U , η ) in (80) and [31, Lemma 2.2, Lemma 2.6] . Finally, considerthe event E BE . Note that this event occurs only when |M| ≤ |M (cid:48) | . Also, M = 0 iff M (cid:48) = 0 , and hence M (cid:48) (cid:54) = 0 and ˆ M = M implies that ˆ M (cid:54) = 0 . Let D n ( P V W , η ):= (cid:26) P ˆ V ˆ W : ∃ ( w , u , v ) ∈ ∪ m (cid:48) ∈M (cid:48) \{ } J n ( κ α + η, P W m (cid:48) UV ) , P W m (cid:48) UV satisﬁes (94) and P wuv = P ˆ W ˆ U ˆ V (cid:27) . We have ¯ P P ( B n ) ( E BE ) = ¯ P P ( B n ) ( E BE , ( M (cid:48) , V ) ∈ A n ) + ¯ P P ( B n ) ( E BE , ( M (cid:48) , V ) / ∈ A n ) . (102)The second term in (102) can be upper-bounded as ¯ P P ( B n ) (cid:0) E BE , ( M (cid:48) , V ) / ∈ A n (cid:1) ≤ ¯ P P ( B n ) (cid:0) ( M (cid:48) , V ) / ∈ A n , E EE (cid:1) + ¯ P P ( B n ) (cid:0) ( M (cid:48) , V ) / ∈ A n , E c EE (cid:1) ≤ e − e n Ω( η ) + ¯ P P ( B n ) (cid:0) ( M (cid:48) , V ) / ∈ A n |E c EE (cid:1) ≤ e − e n Ω( η ) + ¯ P P ( B n ) (cid:0) ( U , V ) / ∈ A en (cid:1) ≤ e − e n Ω( η ) + e − nκ α , (103)where the inequality in (103) follows from [13, Equation 4.22] for sufﬁciently large n , since the acceptance region A en is the same as that in [13]. Let D n ( P V , η ) := { P ˆ V : ∃ P ˆ V ˆ W ∈ D n ( P V W , η ) } . Since ( M (cid:48) , V ) ∈ A n implies that M (cid:48) (cid:54) = 0 , the ﬁrst term in (102) can be bounded as ¯ P P ( B n ) (cid:0) E BE , ( M (cid:48) , V ) ∈ A n (cid:1) = (cid:88) ( m (cid:48) ,m ) ∈M (cid:48) ×M ¯ P P ( B n ) (cid:0) E BE , ( M (cid:48) , V ) ∈ A n , M = m, M (cid:48) = m (cid:48) (cid:1) = (cid:88) ( m (cid:48) ,m ) ∈M (cid:48) ×M ¯ P P ( B n ) (cid:0) M = m, M (cid:48) = m (cid:48) , ˆ M = M (cid:1) ¯ P P ( B n ) (cid:16) ˆ M (cid:48) (cid:54) = M (cid:48) , ( ˆ M (cid:48) , V ) / ∈ A n , ( M (cid:48) , V ) ∈ A n (cid:12)(cid:12) M (cid:48) = m (cid:48) , M = m, ˆ M = M (cid:17) ≤ (cid:88) ( m (cid:48) ,m ) ∈M (cid:48) ×M ¯ P P ( B n ) (cid:0) M = m, M (cid:48) = m (cid:48) , ˆ M = M (cid:1) ¯ P P ( B n ) (cid:16) ˆ M (cid:48) (cid:54) = M (cid:48) , ( M (cid:48) , V ) ∈ A n (cid:12)(cid:12) M (cid:48) = m (cid:48) , M = m, ˆ M = M (cid:17) (104) ( a ) = ¯ P P ( B n ) (cid:16) ˆ M (cid:48) (cid:54) = M (cid:48) , ( M (cid:48) , V ) ∈ A n (cid:12)(cid:12) M (cid:48) = 1 , M = 1 , ˆ M = M (cid:17) (105) ( b ) ≤ (cid:88) P v ∈D n ( P V ,η ) (cid:88) v ∈ P v ¯ P P ( B n ) ( V = v | M (cid:48) = 1)¯ P P ( B n ) (cid:16) ∃ j ∈ f − B (1) , j (cid:54) = 1 , H e ( W ( j ) | v ) ≤ H e ( W (1) | v ) (cid:12)(cid:12) M (cid:48) = 1 , V = v (cid:17) , (106)where(a) follows since by the symmetry of the source encoder, binning function and random codebook construction, theterm in (104) is independent of ( m, m (cid:48) ) ;(b) holds since ( M (cid:48) , V ) ∈ A n implies that P v ∈ D n ( P V , η ) and ( V , B W ) − M (cid:48) − ( M, ˆ M ) form a Markov chain. Deﬁning P ˆ V = P v , and the event E (cid:48) := { M (cid:48) = 1 , V = v } , we have ¯ P P ( B n ) (cid:0) ∃ j ∈ f − B (1) , j (cid:54) = 1 , H e ( W ( j ) | v ) ≤ H e ( W (1) | v ) (cid:12)(cid:12) E (cid:48) (cid:1) = (cid:88) j ∈M (cid:48) \{ , } ¯ P P ( B n ) (cid:0) f B ( j ) = 1 , H e ( W ( j ) | v ) ≤ H e ( W (1) | v ) (cid:12)(cid:12) E (cid:48) (cid:1) ( a ) ≤ e nR n (cid:88) j ∈M (cid:48) \{ , } ¯ P P ( B n ) (cid:0) H e ( W ( j ) | v ) ≤ H e ( W (1) | v ) (cid:12)(cid:12) E (cid:48) (cid:1) (107) ( b ) ≤ e nR n (cid:88) j ∈M (cid:48) \{ , } (cid:88) P ˆ W : P ˆ V ˆ W ∈D n ( P V W ,η ) (cid:88) w :( v , w ) ∈T n ( P ˆ V ˆ W ) ¯ P P ( B n ) (cid:0) W (1) = w (cid:12)(cid:12) E (cid:48) (cid:1)(cid:88) ˜ w ∈T n ( P ˆ W ) H e ( ˜ w | v ) ≤ H ( ˆ W | ˆ V ) ¯ P P ( B n ) (cid:0) W ( j ) = ˜ w (cid:12)(cid:12) E (cid:48) ∪ { W (1) = w } (cid:1) (108) ( c ) ≤ e nR n (cid:88) j ∈M (cid:48) \{ , } (cid:88) P ˆ W : P ˆ V ˆ W ∈D n ( P V W ,η ) (cid:88) w :( v , w ) ∈T n ( P ˆ V ˆ W ) ¯ P P ( B n ) (cid:0) W (1) = w (cid:12)(cid:12) E (cid:48) (cid:1)(cid:88) ˜ w ∈T n ( P ˆ W ): H e ( ˜ w | v ) ≤ H ( ˆ W | ˆ V ) P P ( B n ) ( W ( j ) = ˜ w ) , (109)where(a) follows since f B ( · ) is the uniform binning function independent of B ( n ) W ;(b) holds due to the fact that if P v ∈ D n ( P V , η ) , then M (cid:48) = 1 implies that ( W (1) , v ) ∈ T n ( P ˆ V ˆ W ) with probabilityone for some P ˆ V ˆ W ∈ D n ( P V W , η ) ;(c) holds since ¯ P P ( B n ) (cid:0) W ( j ) = ˜ w (cid:12)(cid:12) E (cid:48) ∪ { W (1) = w } (cid:1) ≤ P P ( B n ) ( W ( j ) = ˜ w ) , (110)which will be shown in Appendix A.Continuing, we can write for sufﬁciently large n , ¯ P P ( B n ) (cid:0) ∃ j ∈ f − B (1) , j (cid:54) = 1 , H e ( W ( j ) | v ) ≤ H e ( W (1) | v ) (cid:12)(cid:12) E (cid:48) (cid:1) ( a ) ≤ e nR n (cid:88) j ∈M (cid:48) \{ , } (cid:88) P ˆ W : P ˆ V ˆ W ∈D n ( P V W ,η ) (cid:88) w :( v , w ) ∈T n ( P ˆ V ˆ W ) ¯ P P ( B n ) (cid:0) W (1) = w (cid:12)(cid:12) E (cid:48) (cid:1)(cid:88) ˜ w ∈T n ( P ˆ W ): H e ( ˜ w | v ) ≤ H ( ˆ W | ˆ V ) e − n ( H ( ˆ W ) − η ) (111) ( b ) ≤ e nR n (cid:88) j ∈M (cid:48) \{ , } (cid:88) P ˆ W : P ˆ V ˆ W ∈D n ( P V W ,η ) (cid:88) w :( v , w ) ∈T n ( P ˆ V ˆ W ) ¯ P P ( B n ) (cid:0) W (1) = w (cid:12)(cid:12) E (cid:48) (cid:1) ( n + 1) |V||W| e nH ( ˆ W | ˆ V ) e − n ( H ( ˆ W ) − η ) (112) ≤ e nR n (cid:88) j ∈M (cid:48) \{ , } (cid:88) P ˆ W : P ˆ V ˆ W ∈D n ( P V W ,η ) n + 1) |V||W| e − n ( I ( ˆ W ; ˆ V ) − η ) ( c ) ≤ e nR n (cid:88) j ∈M (cid:48) \{ , } n + 1) |W| ( n + 1) |V||W| e − n (cid:32) min P ˆ V ˆ W ∈D n ( PV W ,η ) I ( ˆ W ; ˆ V ) − η (cid:33) (113) ( d ) ≤ e − n ( R − R (cid:48) + ρ n − η (cid:48) n ) , (114)where ρ n := min P ˆ V ˆ W ∈D n ( P V W ,η ) I ( ˆ V ; ˆ W ) and η (cid:48) n := 3 η + |W| ( |V| + 1) log( n + 1) n + log(2) n + |U| log( n + 1) n . In the above(a) used [31, Lemma 2.3] and the fact that the codewords are chosen uniformly at random from T n ( P ˆ W ) ;(b) follows since the total number of sequences ˜ w ∈ T n ( P ˆ W ) such that P ˜ wv = P ˜ W ˜ V and H ( ˜ W | ˜ V ) ≤ H ( ˆ W | ˆ V ) is upper bounded by e nH ( ˆ W | ˆ V ) , and |T ( W n × V n ) | ≤ ( n + 1) |V||W| ;(c) holds due to [31, Lemma 2.2];(d) follows from (78), (79), (86) and (87).Thus, for sufﬁciently large n , since ρ n → ρ ( κ α , ω ) + O ( η ) , we have from (102), (103), (106), (114) that ¯ P P ( B n ) ( E BE ) ≤ e − n (min { κ α ,R − ζ ( κ α ,ω )+ ρ ( κ α ,ω ) − O ( η ) } ) . (115)By choice of ( ω, P SX , θ ) ∈ L ( κ α ) , it follows from (97), (98), (100), (101) and (115) that the type I error probabilityis upper bounded by e − n ( κ α − O ( η )) for large n . Type II error probability

We analyze the type II error probability averaged over B n . A type II error can occur only under the followingevents:(i) E a :=  ˆ M = M, ˆ M (cid:48) = M (cid:48) (cid:54) = 0 , ( U , V , W ( M (cid:48) )) ∈ T n (cid:0) P ˆ U ˆ V ˆ W (cid:1) s.t. P ˆ U ˆ W ∈ D n ( P UW , η ) and P ˆ V ˆ W ∈ D n ( P V W , η )  , (ii) E b :=  M (cid:48) (cid:54) = 0 , ˆ M = M, ˆ M (cid:48) (cid:54) = M (cid:48) , f B ( ˆ M (cid:48) ) = f B ( M (cid:48) ) , ( U , V , W ( M (cid:48) ) , W ( ˆ M (cid:48) )) ∈T n (cid:16) P ˆ U ˆ V ˆ W ˆ W d (cid:17) s.t. P ˆ U ˆ W ∈ D n ( P UW , η ) , P ˆ V ˆ W d ∈ D n ( P V W , η ) , and H e (cid:16) W ( ˆ M (cid:48) ) | V (cid:17) ≤ H e ( W ( M (cid:48) ) | V )  , (iii) E c :=  M (cid:48) (cid:54) = 0 , ˆ M (cid:54) = M or , ( U , V , W ( M (cid:48) ) , W ( ˆ M (cid:48) )) ∈ T n (cid:16) P ˆ U ˆ V ˆ W ˆ W d (cid:17) s.t. P ˆ U ˆ W ∈ D n ( P UW , η ) and P ˆ V ˆ W d ∈ D n ( P V W , η )  , (iv) E d := (cid:110) M = M (cid:48) = 0 , ˆ M (cid:54) = M, ( V , W ( ˆ M (cid:48) )) ∈ T n (cid:16) P ˆ V ˆ W d (cid:17) s.t. P ˆ V ˆ W d ∈ D n ( P V W , η ) (cid:111) .Similar to (97), it follows that ¯ P Q ( B n ) ( E EE ) ≤ e − e n Ω( η ) . (116) Hence, we may assume that E c EE holds for the type II error-exponent analysis. It then follows from the analysis in[13, Eq. 4.23-4.27] that for sufﬁciently large n , we have ¯ P Q ( B n ) ( E a |E c EE ) ≤ e − n ( E ( κ α ,ω ) − O ( η )) . (117)Next, consider event E b . Note that this event occurs only when |M| < |M (cid:48) | . Let F ,n ( η ) : = { P ˜ U ˜ V ˜ W ˜ W d ∈ T ( U n × V n × W n × W n ) : P ˜ U ˜ W ∈ D n ( P UW , η ) , P ˜ V ˜ W d ∈ D n ( P V W , η ) and H ( ˜ W d | ˜ V ) ≤ H ( ˜ W | ˜ V ) } . Then, we can write ¯ P Q ( B n ) ( E b ) ≤ (cid:88) P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) (cid:88) ( u , v , w , ¯ w ) ∈T n (cid:16) P ˜ U ˜ V ˜ W ˜ Wd (cid:17) (cid:88) m (cid:48) ∈M (cid:48) \{ } ¯ P Q ( B n ) ( U = u , V = v , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = w )  (cid:88) ˆ m (cid:48) ∈M (cid:48) \{ } ¯ P Q ( B n ) ( W ( ˆ m (cid:48) ) = ¯ w , f B ( m (cid:48) ) = f B ( ˆ m (cid:48) ) | U = u , V = v , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = w )  . (118)The ﬁrst term in (118) can be written as ¯ P Q ( B n ) ( U = u , V = v , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = w )= ¯ P Q ( B n ) ( U = u , V = v , M (cid:48) = m (cid:48) ) ¯ P Q ( B n ) ( W ( m (cid:48) ) = w | U = u , V = v , M (cid:48) = m (cid:48) ) . (119)Let P ˜ U ˜ V ˜ W ˜ W d denote the joint type of ( u , v , w ( m (cid:48) ) , w ( ˆ m (cid:48) )) for m (cid:48) , ˆ m (cid:48) (cid:54) = 0 . Note that m (cid:48) (cid:54) = 0 and U = u impliesthat P ˜ U ˜ W ∈ D n ( P UW ) . Hence, we can bound the second term in (118) for sufﬁciently large n as ¯ P Q ( B n ) ( W ( m (cid:48) ) = w | U = u , V = v , M (cid:48) = m (cid:48) ) ≤  e n ( H ( ˜ W | ˜ U ) − η ) , if w ∈ T n ( P ˜ W ) , , otherwise . (120)To obtain (120), we used the facts that given M (cid:48) = m (cid:48) (cid:54) = 0 and U = u , W ( m (cid:48) ) is uniformly distributed in the set T n ( P ˜ W | ˜ U , u ) , and that for sufﬁciently large n , |T n ( P ˜ W | ˜ U , u ) | ≥ e n ( H ( ˜ W | ˜ U ) − η ) , which in turn follows from [31, Lemma 2.5]. On the other hand, the second term in (118) can be bounded asfollows: ¯ P Q ( B n ) ( W ( ˆ m (cid:48) ) = ¯ w , f B ( m (cid:48) ) = f B ( ˆ m (cid:48) ) | U = u , V = v , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = w ) ( a ) ≤ e nR n ¯ P Q ( B n ) ( W ( ˆ m (cid:48) ) = ¯ w | U = u , V = v , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = w ) (121) ( b ) ≤ e nR n P Q ( B n ) ( W ( ˆ m (cid:48) ) = ¯ w ) , (122) where(a) is since f B ( m (cid:48) ) is uniformly distributed independent of the codebook B W ;(b) is due to ¯ P Q ( B n ) ( W ( ˆ m (cid:48) ) = ¯ w | U = u , V = v , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = w ) ≤ P Q ( B n ) ( W ( ˆ m (cid:48) ) = ¯ w ) . (123)whose proof is similar to (110).Thus, from (120) and (122), we can bound the term in (118) (for sufﬁciently large n ) as ¯ P Q ( B n ) ( E b ) ≤ e nR n (cid:88) P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) (cid:88) ( u , v , w , ¯ w ) ∈T n (cid:16) P ˜ U ˜ V ˜ W ˜ Wd (cid:17) (cid:88) m (cid:48) ∈M (cid:48) \{ } ¯ P Q ( B n ) ( U = u , V = v , M (cid:48) = m (cid:48) ) 1 e n ( H ( ˜ W | ˜ U ) − η ) (cid:88) ˆ m (cid:48) ∈M (cid:48) \{ } ¯ P Q ( B n ) ( W ( ˆ m (cid:48) ) = ¯ w ) ≤ e nR n (cid:88) P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) (cid:88) ( u , v , w , ¯ w ) ∈T n (cid:16) P ˜ U ˜ V ˜ W ˜ Wd (cid:17) e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) 1 e n ( H ( ˜ W | ˜ U ) − η ) (cid:88) m (cid:48) ∈M (cid:48) \{ } ¯ P Q ( B n ) ( M (cid:48) = m (cid:48) | U = u , V = v ) (cid:88) ˆ m (cid:48) ∈M (cid:48) \{ } ¯ P Q ( B n ) ( W ( ˆ m (cid:48) ) = ¯ w ) ≤ e nR n (cid:88) P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) (cid:88) ( u , v , w , ¯ w ) ∈T n (cid:16) P ˜ U ˜ V ˜ W ˜ Wd (cid:17) e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) 1 e n ( H ( ˜ W | ˜ U ) − η ) (cid:88) ˆ m (cid:48) ∈M (cid:48) \{ } ¯ P Q ( B n ) ( W ( ˆ m (cid:48) ) = ¯ w ) ≤ e nR n (cid:88) P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) (cid:88) ( u , v , w , ¯ w ) ∈T n (cid:16) P ˜ U ˜ V ˜ W ˜ Wd (cid:17) e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) 1 e n ( H ( ˜ W | ˜ U ) − η ) e n ( R (cid:48) + η ) e n ( H ( ˜ W d ) − η ) ≤ e nR n (cid:88) P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) (cid:88) ( u , v , w ) ∈T n ( P ˜ U ˜ V ˜ W ) e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) 1 e n ( H ( ˜ W | ˜ U ) − η ) e n ( R (cid:48) + η ) e n ( H ( ˜ W d ) − η ) e nH ( ˜ W d | ˜ V ) ≤ e nR n (cid:88) P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) e nH ( ˜ U, ˜ V, ˜ W ) e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) 1 e n ( H ( ˜ W | ˜ U ) − η ) e n ( R (cid:48) + η ) e n ( H ( ˜ W d ) − η ) e nH ( ˜ W d | ˜ V ) ≤ e − n E ,n , where E ,n := min P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) − H ( ˜ U, ˜ V, ˜ W ) + H ( ˜ U, ˜ V ) + D ( P ˜ U ˜ V || Q UV ) + H ( ˜ W | ˜ U ) + I ( ˜ V ; ˜ W d )+ R − R (cid:48) − η − δ (cid:48) n ,δ (cid:48) n := |U||V||W| n log( n + 1) + |U| n log( n + 1) + log 2 n . (124) Note that since P ˜ U ˜ V ˜ W ˜ W d ∈ F ,n ( η ) implies that P ˜ V ˜ W d ∈ D n ( P V W , η ) , we have E ,n ≥ min P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) − H ( ˜ U, ˜ V, ˜ W ) + H ( ˜ U, ˜ V ) + D ( P ˜ U ˜ V || Q UV ) + H ( ˜ W | ˜ U ) + ρ n + R − R (cid:48) − η − δ (cid:48) n . (125)Simplifying the terms in (125) and using ρ n ( n ) −−→ ρ ( κ α , ω ) + O ( η ) , we obtain by the continuity of KL divergencethat − n log (cid:0) ¯ P Q ( B n ) ( E b ) (cid:1) (cid:38)  min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) + E b ( κ α , ω, R ) − O ( η ) , if R < ζ ( κ α , ω ) + η, ∞ , otherwise , = E ( κ α , ω, R ) − O ( η ) . (126)Next, consider the event E c . Assume that |M (cid:48) | > |M| (i.e., binning is required). Let F ,n ( η ) := { P ˜ U ˜ V ˜ W ˜ W d ∈ T ( U n × V n × W n × W n ) : P ˜ U ˜ W ∈ D n ( P UW , η ) and P ˜ V ˜ W d ∈ D n ( P V W , η ) } . Then, we can write (for sufﬁciently large n ) that ¯ P Q ( B n ) ( E c ) ≤ (cid:88) P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) (cid:88) ( u , v , w , ¯ w ) ∈T n (cid:16) P ˜ U ˜ V ˜ W ˜ Wd (cid:17) (cid:88) m (cid:48) ∈M (cid:48) \{ } ¯ P Q ( B n ) ( U = u , V = v , M (cid:48) = m (cid:48) , W ( M (cid:48) ) = w ) (cid:88) m (cid:54) =0 , ˆ m (cid:54) =0:ˆ m (cid:54) = m ¯ P Q ( B n ) ( M = m ) P (cid:16) ˆ M = ˆ m | M = m (cid:17) (cid:88) ˆ m (cid:48) ∈M (cid:48) \{ } ¯ P Q ( B n ) ( W ( ˆ m (cid:48) ) = ¯ w , f B ( ˆ m (cid:48) ) = ˆ m | U = u , V = v , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = w )  ≤ e nR n (cid:88) P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) e nH ( ˜ U, ˜ V, ˜ W ) e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) 1 e n ( H ( ˜ W | ˜ U ) − η ) e n ( R (cid:48) + η ) e n ( H ( ˜ W d ) − η ) e nH ( ˜ W d | ˜ V ) e − n ( E x ( R,P SX ) − η ) (127) ≤ e − n E ,n , where E ,n := min P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) − H ( ˜ U, ˜ V, ˜ W ) + H ( ˜ U, ˜ V ) + D ( P ˜ U ˜ V || Q UV ) + H ( ˜ W | ˜ U ) + E x ( R, P SX )+ ρ n + R − R (cid:48) − O ( η ) − δ (cid:48) n , and δ (cid:48) n is as deﬁned in (124). To obtain (127), we used (93), (120) and (122).On the other hand, if |M (cid:48) | ≤ |M| , it can be shown similarly that ¯ P Q ( B n ) ( E c ) ≤ e − n E (cid:48) ,n , where E (cid:48) ,n := min P ˜ U ˜ V ˜ W ˜ Wd ∈F ,n ( η ) − H ( ˜ U, ˜ V, ˜ W ) + H ( ˜ U, ˜ V ) + D ( P ˜ U ˜ V || Q UV ) + H ( ˜ W | ˜ U ) + E x ( R, P SX )+ ρ n − O ( η ) − |U||V||W| n log( n + 1) − log(2) n . Hence, we obtain − n log (cid:0) ¯ P Q ( B n ) ( E c ) (cid:1) (cid:38)  min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) + E b ( κ α , ω, R ) + E x ( R, P SX ) − O ( η ) , if R < ζ ( κ α , ω ) + η, min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) + ρ ( κ α , ω ) + E x ( R, P SX ) − O ( η ) , otherwise , = E ( κ α , ω, R, P SX ) − O ( η ) . (128)Finally, we consider the event E d . Consider |M (cid:48) | > |M| . We have ¯ P Q ( B n ) ( E d ) = (cid:88) u ∈T n ( P ˜ U ): P ˜ U ∈D n ( P U ,η ) ¯ P Q ( B n ) ( U = u , E EE , E d ) + (cid:88) u ∈T n ( P ˜ U ): P ˜ U / ∈D n ( P U ,η ) ¯ P Q ( B n ) ( U = u , E d ) , (129)where (129) follows from the fact that if P ˜ U ∈ D n ( P U , η ) , then E d can occur only if E EE occurs. From (97), forany u ∈ T n ( P ˜ U ) such that P ˜ U ∈ D n ( P U , η ) , we have ¯ P Q ( B n ) ( U = u , E EE , E d ) ≤ e − e n Ω( η ) . Next, note that if P ˜ U / ∈ D n ( P U , η ) , then M (cid:48) = 0 is chosen with probability one independent of the codebook B ( n ) W .Hence, we can write the second term in (129) as follows: (cid:88) u ∈T n ( P ˜ U ): P ˜ U / ∈D n ( P U ,η ) ¯ P Q ( B n ) ( U = u , E d ) ≤ (cid:88) u ∈T n ( P ˜ U ): P ˜ U / ∈D n ( P U ,η ) (cid:88) ( v , ¯ w ) ∈T n ( P ˜ V ˜ Wd ): P ˜ V ˜ Wd ∈D n ( P V W ,η ) (cid:88) ˆ m ∈M\{ } ¯ P Q ( B n ) ( U = u , V = v , M (cid:48) = M = 0) P (cid:16) ˆ M = ˆ m | M = 0 (cid:17)(cid:88) ˆ m (cid:48) ∈M (cid:48) \{ } ¯ P Q ( B n ) ( f B ( ˆ m (cid:48) ) = ˆ m, W ( ˆ m (cid:48) ) = ¯ w ) ≤ (cid:88) u ∈T n ( P ˜ U ): P ˜ U / ∈D n ( P U ,η ) (cid:88) ( v , ¯ w ) ∈T n ( P ˜ V ˜ Wd ): P ˜ V ˜ Wd ∈D n ( P V W ,η ) ¯ P Q ( B n ) ( U = u , V = v , M (cid:48) = M = 0) (cid:88) ˆ m ∈M\{ } ¯ P Q ( B n ) (cid:16) ˆ M = ˆ m | M = 0 (cid:17) (cid:88) ˆ m (cid:48) ∈M (cid:48) \{ } e nR n e n ( H ( ˜ W d ) − η ) ≤ (cid:88) u ∈T n ( P ˜ U ): P ˜ U / ∈D n ( P U ,η ) (cid:88) ( v , ¯ w ) ∈T n ( P ˜ V ˜ Wd ): P ˜ V ˜ Wd ∈D n ( P V W ,η ) ¯ P Q ( B n ) ( U = u , V = v , M (cid:48) = M = 0) (cid:88) ˆ m ∈M\{ } ¯ P Q ( B n ) (cid:16) ˆ M = ˆ m | M = 0 (cid:17) e n ( R (cid:48) + η ) e nR n e n ( H ( ˜ W d ) − η ) ≤ (cid:88) u ∈T n ( P ˜ U ): P ˜ U / ∈D n ( P U ,η ) (cid:88) ( v , ¯ w ) ∈T n ( P ˜ V ˜ Wd ): P ˜ V ˜ Wd ∈D n ( P V W ,η ) ¯ P Q ( B n ) ( U = u , V = v , M (cid:48) = M = 0) e − n ( E sp ( P SX ,θ ) − θ − η ) e n ( R (cid:48) + η ) e nR n e n ( H ( ˜ W d ) − η ) (130) ≤ (cid:88) P ˜ U ˜ V ˜ Wd ∈D n ( P U ,η ) c ×D n ( P V W ,η ) e nH ( ˜ U ˜ V ) e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) e − n ( E sp ( P SX ,θ ) − θ − η ) e n ( R (cid:48) + η ) e nR n e nH ( ˜ W d | ˜ V ) e n ( H ( ˜ W d ) − η ) ≤ e − n E ,n , where E ,n := min P ˜ U ˜ V ˜ Wd ∈D n ( P U ,η ) c ×D n ( P V W ,η ) D ( P ˜ U ˜ V || Q UV ) + E sp ( P SX , θ ) − θ + ρ n + R − R (cid:48) − O ( η ) − |U||V||W| n log( n + 1) ≥ min P ˜ V : P ˜ V ˜ W ∈D n ( P V W ,η ) D ( P ˜ V || Q V ) + E sp ( P SX , θ ) − θ + ρ n + R − R (cid:48) − O ( η ) − |U||V||W| n log( n + 1) . In (130), we used (92).If |M (cid:48) | ≤ |M| , it can be shown that ¯ P Q ( B n ) ( E c ) ≤ e − n E (cid:48) ,n , where E (cid:48) ,n ≥ min P ˜ V : P ˜ V ˜ W ∈D n ( P V W ,η ) D ( P ˜ V || Q V ) + E sp ( P SX , θ ) − θ + ρ n − O ( η ) − |U||V||W| n log( n + 1) . Hence, we obtain − n log (cid:0) ¯ P Q ( B n ) ( E d ) (cid:1) (cid:38)  min P ˜ V : P ˜ V ˜ W ∈D n ( P V W ,η ) D ( P ˜ V || Q V ) + E b ( κ α , ω, R ) + E sp ( P SX , θ ) − θ − O ( η ) if R < ζ ( κ α , ω ) + η, min P ˜ V : P ˜ V ˜ W ∈D n ( P V W ,η ) D ( P ˜ V || Q V ) + ρ ( κ α , ω ) + E sp ( P SX , θ ) − θ − O ( η ) , otherwise , = E ( κ α , ω, R, P SX , θ ) − O ( η ) . (131)Since the exponent of the type II error probability is lower bounded by the minimum of the exponent of the type II error causing events, we have shown from (117), (126), (128) and (131) that for a ﬁxed ( ω, R, P SX , θ ) ∈ L ( κ α ) , ¯ P P ( B n ) (cid:16) ˆ H = 1 (cid:17) ≤ e − n ( κ α − O ( η )) , (132)and ¯ P Q ( B n ) (cid:16) ˆ H = 0 (cid:17) ≤ e − n (¯ κ s ( κ α ,ω,R,P SX ,θ ) − O ( η )) , (133)for all sufﬁciently large n , where ¯ κ s ( κ α , ω, R, P SX , θ ) := min (cid:8) E ( κ α , ω ) , E ( κ α , ω, R ) , E ( κ α , ω, R, P SX ) , E ( κ α , ω, R, P SX , θ ) (cid:9) . (134) Expurgation:

To complete the proof, we extract a deterministic codebook B ∗ n that satisﬁes P P ( B∗ n ) (cid:16) ˆ H = 1 (cid:17) ≤ e − n ( κ α − O ( η )) , (135)and P Q ( B∗ n ) (cid:16) ˆ H = 0 (cid:17) ≤ e − n (¯ κ s ( κ α ,ω,R,P SX ,θ ) − O ( η )) . (136)For this purpose, remove a set B (cid:48) n ⊂ B n of highest type I error probability codebooks such that the remaining set B n \ B (cid:48) n has a probability of τ ∈ (0 . , . , i.e., µ n ( B n \ B (cid:48) n ) = τ . Then, it follows from (132) and (133) that forall B n ∈ B n \ B (cid:48) n , P P ( B n ) (cid:16) ˆ H = 1 (cid:17) ≤ e − n ( κ α − O ( η )) , and ˜ P Q ( B n ) (cid:16) ˆ H = 0 (cid:17) ≤ e − n (¯ κ s ( κ α ,ω,R,P SX ,θ ) − O ( η )) , where ˜ P Q ( B n ) = τ E µ n (cid:104) P Q B n { B n ∈ B n \ B (cid:48) n } (cid:105) is a PMF.Perform one more similar expurgation step to obtain B ∗ n = (cid:16) B ∗ ( n ) W , f ∗ b , B ∗ ( n ) X (cid:17) ∈ B n \ B (cid:48) n such that for allsufﬁciently large n P P ( B∗ n ) (cid:16) ˆ H = 1 (cid:17) ≤ e − n ( κ α − O ( η )) ≤ e − n ( κ α − O ( η ) − log(2) n ) , and P Q ( B∗ n ) (cid:16) ˆ H = 0 (cid:17) ≤ e − n (¯ κ s ( κ α ,ω,R,P SX ,θ ) − O ( η )) ≤ e − n ( ¯ κ s ( κ α ,ω,R,P SX ,θ ) − O ( η ) − log(4) n ) . Maximizing over ( ω, R, P SX , θ ) ∈ L ( κ α ) and noting that η > is arbitrary completes the proof. D. Proof of Corollary 1

Note that ˆ L (0 , ω ) = { P UV W = P UV P W | U , P W | U = ω ( P U ) } ,ζ (0 , ω ) = I P ( U ; W ) ,ρ (0 , ω ) = I P ( V ; W ) . The result then follows from Theorem 4 by noting that ˆ L ( κ α , ω ) , ζ ( κ α , ω ) and ρ ( κ α , ω ) are continuous in κ α andthe fact that E sp ( P SX , θ ) , E x ( R, P SX ) and E b ( κ α , ω, R ) are all greater than or equal to zero. E. Proof of Corollary 2

Consider ( ω, P SX , θ ) ∈ L ∗ ( κ α ) and R = ζ ( κ α , ω ) . Then, ( ω, R, P SX , θ ) ∈ L ( κ α ) . Also, for any ( P ˜ U ˜ V ˜ W , Q ˜ U ˜ V ˜ W ) ∈T ( κ α , ω ) , we have D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) = D ( P ˜ U ˜ W || Q ˜ U ˜ W ) + D (cid:16) P ˜ V | ˜ U ˜ W || Q ˜ V | ˜ U ˜ W | P ˜ U ˜ W (cid:17) ( a ) ≥ D (cid:16) P ˜ V | ˜ U ˜ W || P V | P ˜ U ˜ W (cid:17) (137) = D ( P ˜ V ˜ U ˜ W || P V P ˜ U ˜ W ) ( b ) ≥ D ( P ˜ V ˜ W || P V P ˜ W ) (138) ( c ) = D (cid:0) P ˆ V ˆ W || P V P ˆ W (cid:1) (139) = I P ( ˆ V ; ˆ W ) + D ( P ˆ V || P V ) , (140)where(a) is due to the non-negativity of KL divergence and since Q ˜ V | ˜ U ˜ W = P V ;(b) is due to the monotonicity of KL divergence [38, Theorem 2.2];(c) follows since for ( P ˜ U ˜ V ˜ W , Q ˜ U ˜ V ˜ W ) ∈ T ( κ α , ω ) , P ˜ V ˜ W = P ˆ V ˆ W for some P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α , ω ) .Minimizing over all P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α , ω ) yields that E ( κ α , ω ) = min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) ≥ min P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α ,ω ) (cid:104) I P ( ˆ V ; ˆ W ) + D ( P ˆ V || P V ) (cid:105) (141) = min P ˆ V ˆ W : P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α ,ω ) (cid:104) I P ( ˆ V ; ˆ W ) + D ( P ˆ V || P V ) (cid:105) := E I ( κ α , ω ) , where (141) follows from (140). Next, since ζ ( κ α , ω ) = R , we have that E ( κ α , ω, R ) = ∞ . Also, E ( κ α , ω, R, P SX ) = min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) + ρ ( κ α , ω ) + E x ( R, P SX ) ≥ ρ ( κ α , ω ) + E x ( ζ ( κ α , ω ) , P SX ) := E I ( κ α , ω, P SX ) , (142)and E ( κ α , ω, P SX , θ ) := min P ˆ V : P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α ,ω ) D ( P ˆ V || P V ) + ρ ( κ α , ω ) + E m ( P SX , θ ) − θ = ρ ( κ α , ω ) + E m ( P SX , θ ) − θ := E I ( κ α , ω, P SX , θ ) , (143) where (142) follows by the non-negativity of KL divergence, and (143) is since P UV P W | U ∈ ˆ L ( κ α , ω ) for P W | U := ω ( P U ) . The claim in (39) now follows from Theorem 4.Next, we prove (41). Note that ˆ L (0 , ω ) := { P UV W = P UV P W | U : P W | U = ω ( P U ) } , and L ∗ (0) := { ( ω, P SX , θ ) ∈ F × P ( S × X ) × Θ( P SX ) : I P ( U ; W ) < I P ( X ; Y | S ) , (144) P W | U = ω ( P U ) , P SXY := P SX P Y | X } , since E sp ( P SX , θ ) ≥ and E x ( I P ( U ; W ) , P SX ) ≥ . Hence, we have E I (0 , ω ) ≥ min P ˆ U ˆ V ˆ W ∈ ˆ L (0 ,ω ) I P ( ˆ V ; ˆ W ) = I P ( V ; W ) . (145)Also, ρ (0 , ω ) = I P ( V ; W ) , E I (0 , ω, P SX ) ≥ ρ (0 , ω ) , (146) E I (0 , ω, P SX , θ ) ≥ ρ (0 , ω ) . (147)By choosing P XS = P ∗ X P S where P ∗ X is the capacity achieving input distribution, we have I P ( X ; Y | S ) = C .Then, it follows from (39), (145)-(147), and the continuity of E I ( κ α , ω ) , E I ( κ α , ω, P SX ) and E I ( κ α , ω, P SX , θ ) in κ α that lim κ α → κ ( κ α ) ≥ κ ∗ I (0) . (148)On the other hand, lim κ α → κ ( κ α ) ≤ κ ∗ I (0) (149)follows from the converse proof in [10, Proposition 7 ]. The proof of the cardinality bound |W| ≤ |U| + 1 followsfrom a standard application of the Eggleston-Fenchel-Carath´eodory Theorem [44, Theorem 18], thus completingthe proof. F. Proof of Corollary 3

Specializing Theorem 4 to TAD, note that ρ ( κ α , ω ) = 0 since P ˆ U ˆ V ˆ W = Q U Q V P ˆ W | ˆ U ∈ ˆ L ( κ α , ω ) and I P ( ˆ V ; ˆ W ) = 0 . Also, for R ≥ ζ ( κ α , ω ) , E b ( κ α , ω, R ) = ∞ . Thus, we have L ( κ α ) :=  ( ω, R, P SX , θ ) ∈ F × R ≥ × P ( S × X ) × Θ( P SX ) : ζ ( κ α , ω ) ≤ R < I P ( X ; Y | S ) ,P SXY := P SX P Y | X , min { E sp ( P SX , θ ) , E x ( R, P SX ) } ≥ κ α  , ˆ L ( κ α , ω ) := (cid:110) P ˆ U ˆ V ˆ W : D (cid:0) P ˆ U ˆ V ˆ W || P UV ˆ W (cid:1) ≤ κ α , P ˆ W | ˆ U = ω ( P ˆ U ) , P UV ˆ W = Q U Q V P ˆ W | ˆ U (cid:111) . We have E ( κ α , ω ) := E D ( κ α , ω ) := min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) ( a ) ≥ min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ V ˜ W || Q ˜ V ˜ W ) , ( b ) = min ( P ˆ V ˆ W ,Q V ˆ W ): P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α ,ω ) ,Q UV ˆ W = Q UV P ˆ W | ˆ U D ( P ˆ V ˆ W || Q V ˆ W ) , (150)where(a) follows due to the data processing inequality for KL divergence [38];(b) is since ( P ˜ U ˜ V ˜ W , Q ˜ U ˜ V ˜ W ) ∈ T ( κ α , ω ) implies that P ˜ V ˜ W = P ˆ V ˆ W and Q ˜ U ˜ V ˜ W = Q UV P ˆ W | ˆ U for some P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α , ω ) .Next, note that since R ≥ ζ ( κ α , ω ) , we have that E ( κ α , ω, R ) = ∞ . Also, E ( κ α , ω, R, P SX ) = min ( P ˜ U ˜ V ˜ W ,Q ˜ U ˜ V ˜ W ) ∈T ( κ α ,ω ) D ( P ˜ U ˜ V ˜ W || Q ˜ U ˜ V ˜ W ) + E x ( R, P SX ) (151) ( a ) = E x ( R, P SX ) , (152) E ( κ α , ω, P SX , θ ) = min P ˆ V : P ˆ U ˆ V ˆ W ∈ ˆ L ( κ α ,ω ) D ( P ˆ V || Q V ) + E m ( P SX , θ ) − θ ( b ) = E m ( P SX , θ ) − θ := E D ( P SX , θ ) , (153)where(a) follows by taking P ˆ U ˆ V ˆ W = Q U Q V P W | U ∈ ˆ L ( κ α , ω ) , P W | U = ω ( Q U ) , in the deﬁnition of T ( κ α , ω ) . Thisimplies that ( P ˜ U ˜ V ˜ W , Q ˜ U ˜ V ˜ W ) = ( Q UV P W | U , Q UV P W | U ) ∈ T ( κ α , ω ) , and hence that the ﬁrst term in theRHS of (151) is ;(b) is due to Q U Q V P W | U ∈ ˆ L ( κ α , ω ) for P W | U = ω ( Q U ) .Since E x ( R, P SX ) is a non-increasing function of R and R ≥ ζ ( κ α , ω ) , selecting R = ζ ( κ α , ω ) maximizes E ( κ α , ω, R, P SX ) . Then, (42) follows from (150), (152) and (153).Next, we prove (43). Note that ζ (0 , ω ) = I Q ( U ; W ) , where Q UW = Q U P W | U , P W | U = ω ( Q U ) , and since E sp ( P SX , θ ) ≥ and E x ( I Q ( U ; W ) , P SX ) ≥ , L ∗ (0) =  ( ω, P SX , θ ) ∈ F × P ( S × X ) × Θ( P SX ) : I Q ( U ; W ) < I P ( X ; Y | S ) , Q UV W = Q UV P W | U ,P W | U = ω ( Q U ) , P SXY := P SX P Y | X  . Also, ˆ L (0 , ω ) = (cid:110) Q U Q V P W | U , P W | U = ω ( Q U ) (cid:111) . (154) By choosing θ = − θ L ( P SX ) (deﬁned in (29)) that maximizes E D ( P SX , θ ) , we have E D (0 , ω ) ≥ min ( P ˆ V ˆ W ,Q V ˆ W ): P ˆ U ˆ V ˆ W ∈ ˆ L (0 ,ω ) ,Q UV ˆ W = Q UV P ˆ W | ˆ U D ( P ˆ V ˆ W || Q V ˆ W ) = min ( P W | U ,P SX ): I Q ( U ; W ) ≤ I P ( X ; Y | S ) ,Q UV W = Q UV P W | U P SXY = P SX P Y | X D ( Q V Q W || Q V W ) , (155) E D (0 , ω, P SX ) = E x ( I Q ( U ; W ) , P SX ) , (156) E D ( P SX , − θ L ( P SX )) := E m ( P SX , − θ L ( P SX )) + θ L ( P SX )= θ L ( P SX ) , (157)where (157) is due to E m ( P SX , − θ L ( P SX )) = 0 which in turn follows similar to (66) and (67) from the deﬁnitionof E m ( · , · ) . From (42), (155), (156), (157), and the continuity of E D ( κ α , ω ) , E D ( κ α , ω, P SX ) in κ α , (43) follows.The proof of the cardinality bound |W| ≤ |U| + 1 in the RHS of (155) follows from a standard application ofthe Eggleston-Fenchel-Carath´eodory Theorem [44, Theorem 18]. To see this, note that it is sufﬁcient to preserve { Q U ( u ) , u ∈ U} , D ( Q V Q W || Q V W ) and H Q ( U | W ) , all of which can be written as a linear combination offunctionals of Q U | W ( ·| w ) with weights Q W ( w ) . Thus, it requires |U| − points to preserve { Q U ( u ) , u ∈ U} andone each for D ( Q V Q W || Q V W ) and H Q ( U | W ) . This completes the proof. G. Proof of Theorem 5

We will show that the error-exponent pairs (cid:0) κ α , κ ∗ h ( κ α ) (cid:1) and (cid:0) κ α , κ ∗ u ( κ α ) (cid:1) are achieved by a hybrid codingscheme and uncoded transmission scheme , respectively. First we describe the hybrid coding scheme.Let n ∈ N , |W| < ∞ , κ α > , and (cid:0) P S , ω (cid:48) ( · , P S ) , P X | USW , P X (cid:48) | US (cid:1) ∈ L h ( κ α ) . Let η > be a small number,and choose a sequence s ∈ T n (cid:0) P ˆ S (cid:1) , where P ˆ S satisﬁes D (cid:0) P ˆ S || P S (cid:1) ≤ η . Let R (cid:48) := ζ (cid:48) ( κ α , ω (cid:48) , P ˆ S ) . Encoding

The encoder performs type-based quantization followed by hybrid coding [34]. The details are as follows:

Quantization codebook

Let D n ( P U , η ) be as deﬁned in (80). Consider some ordering on the types in D n ( P U , η ) and denote the elementsas P ˆ U ( i ) , i ∈ (cid:2) |D n ( P U , η ) | (cid:3) . For each joint type P ˆ S ˆ U ( i ) such that P ˆ U ( i ) ∈ D n ( P U , η ) and ˆ S ⊥⊥ ˆ U ( i ) , choose a jointtype variable P ˆ S ˆ U ( i ) ˆ W ( i ) , P ˆ W ( i ) ∈ T ( W n ) , such that D (cid:16) P ˆ W ( i ) | ˆ U ( i ) ˆ S || P W ( i ) | U ˆ S (cid:12)(cid:12) P ˆ U ( i ) ˆ S (cid:17) ≤ η ,I ( ˆ S, ˆ U ( i ) ; ˆ W ( i ) ) ≤ R (cid:48) + η , where P W ( i ) | U,S = ω (cid:48) ( P ˆ U ( i ) , P ˆ S ) . Deﬁne D n ( P SUW , η ) := (cid:110) P ˆ S ˆ U ( i ) ˆ W ( i ) : i ∈ (cid:2) |D n ( P U , η ) | (cid:3)(cid:111) ,R (cid:48) i := I P ( ˆ S, ˆ U ( i ) ; ˆ W ( i ) ) + η , i ∈ (cid:2) |D n ( P U , η ) | (cid:3) , (158) Although uncoded transmission is a special case of hybrid coding, we obtain a better error-exponent trade-off by separately analyzing sucha scheme as decoding errors due to compression are not present. M (cid:48) i := [1 + i − (cid:88) m =1 e nR (cid:48) m : i (cid:88) m =1 e nR (cid:48) m ] , i ∈ (cid:2) |D n ( P U , η ) | (cid:3) . Let B ( n ) W = (cid:110) W ( j ) ∈ W n , j ∈ (cid:104)(cid:80) |D n ( P U ,η ) | i =1 e nR (cid:48) i (cid:105)(cid:111) denote a random quantization codebook such that for i ∈ (cid:2) |D n ( P U , η ) | (cid:3) , each codeword W ( j ) , j ∈ M (cid:48) i , is independently selected from T n (cid:0) P ˆ W ( i ) (cid:1) according to uniformdistribution, i.e., W ( j ) ∼ Unif (cid:2) T n (cid:0) P ˆ W ( i ) (cid:1)(cid:3) . Let B ( n ) W denote a realization of B ( n ) W . Type-based hybrid coding

For u ∈ T n (cid:0) P ˆ U ( i ) (cid:1) such that P ˆ U ( i ) ∈ D n ( P U , η ) for some i ∈ (cid:2) |D n ( P U , η ) | (cid:3) , let ¯ M (cid:16) u , B ( n ) W (cid:17) := (cid:110) j ∈ M (cid:48) i : w ( j ) ∈ B ( n ) W and ( s , u , w ( j )) ∈ T n ( P ˆ S ˆ U ( i ) ˆ W ( i ) ) , P ˆ S ˆ U ( i ) ˆ W ( i ) ∈ D n ( P SUW , η ) (cid:111) . If | ¯ M (cid:16) u , B ( n ) W (cid:17) | ≥ , let M (cid:48) (cid:16) u , B ( n ) W (cid:17) denote an index selected uniformly at random from the set ¯ M (cid:16) u , B ( n ) W (cid:17) ,otherwise, set M (cid:48) (cid:16) u , B ( n ) W (cid:17) = 0 . Given B ( n ) W and u ∈ U n , the quantizer outputs M (cid:48) = M (cid:48) (cid:16) u , B ( n ) W (cid:17) , where thesupport of M (cid:48) is M (cid:48) := { } (cid:83) |D n ( P U ,η ) | i =1 M (cid:48) i . Note that for sufﬁciently large n , it follows similarly to (86) that |M (cid:48) | ≤ e n ( R (cid:48) + η ) . For a given B ( n ) W and u ∈ U n , the encoder transmits X ∼ P ⊗ nX | USW ( ·| u , s , w ( m (cid:48) )) if M (cid:48) = m (cid:48) (cid:54) = 0 , and X (cid:48) ∼ P ⊗ nX (cid:48) | US ( ·| u , s ) if M (cid:48) = 0 . Acceptance region

For a given codebook B ( n ) W and m (cid:48) ∈ M (cid:48) \{ } , let O m (cid:48) denote the set of u such that M (cid:48) (cid:16) u , B ( n ) W (cid:17) = m (cid:48) . Foreach m (cid:48) ∈ M (cid:48) \{ } and u ∈ O m (cid:48) , let Z (cid:48) m (cid:48) ( u ) = (cid:110) ( v , y ) ∈ V n × Y n : ( s , u , ¯ w m (cid:48) , v , y ) ∈ J n (cid:16) κ α + η, P ˆ SUW m (cid:48) V Y (cid:17) (cid:111) , where P ˆ SUW m (cid:48) V XY = P ˆ S P UV P W m (cid:48) | U ˆ S P X | U ˆ SW m (cid:48) P Y | X , (159a) P W m (cid:48) | U ˆ S = ω (cid:48) ( P u , P ˆ S ) and P X | U ˆ SW m (cid:48) = P X | USW . (159b)For m (cid:48) ∈ M (cid:48) \{ } , deﬁne Z (cid:48) m (cid:48) := { ( v , y ) : ( v , y ) ∈ Z (cid:48) m (cid:48) ( u ) for some u ∈ O m (cid:48) } . The acceptance region for H is given by A n := ∪ m (cid:48) ∈M (cid:48) \ s × m (cid:48) × Z (cid:48) m (cid:48) , or equivalently as A en := ∪ m (cid:48) ∈M (cid:48) \ s × O m (cid:48) × Z (cid:48) m (cid:48) . Decoding Given codebook B ( n ) W , Y = y , and V = v , if ( v , y ) ∈ (cid:83) m (cid:48) ∈M (cid:48) \{ } Z (cid:48) m (cid:48) , then ˆ M (cid:48) = ˆ m (cid:48) , where ˆ m (cid:48) := arg min j ∈M (cid:48) \ H e ( w ( j ) | v , y , s ) . Otherwise, ˆ M (cid:48) = 0 . Denote the decoder induced by the above operations by g B ( n ) W : S n × V n × Y n → M (cid:48) . Testing If ˆ M (cid:48) = 0 , ˆ H = 1 is declared. Otherwise, ˆ H = 0 or ˆ H = 1 is declared depending on whether ( s , ˆ m (cid:48) , v , y ) ∈ A n or ( s , ˆ m (cid:48) , v , y ) / ∈ A n , respectively. Denote the decision function induced by g B ( n ) W and A n by g n : S n ×V n ×Y n → ˆ H . Induced probability distribution

Denote the PMF induced by a code c n = ( f n , g n ) w.r.t. codebook B ( n ) W under H and H by P ( B ( n ) W ,c n ) UV M (cid:48) XY ˆ M (cid:48) ˆ H ( u , v , m (cid:48) , x , y , ˆ m (cid:48) , ˆ h ):=  P ⊗ nUV ( u , v ) (cid:110) M (cid:48) (cid:16) u , B ( n ) W (cid:17) = m (cid:48) (cid:111) P ⊗ nX | USW ( x | s , u , w ( m (cid:48) )) P ⊗ nY | X ( y | x ) (cid:26) g B ( n ) W ( v , y , s )= ˆ m (cid:48) (cid:27) (cid:110) ˆ h = { ( s , ˆ m (cid:48) , v , y ) ∈A cn } (cid:111) , if m (cid:48) (cid:54) = 0 ,P ⊗ nUV ( u , v ) (cid:110) M (cid:48) (cid:16) u , B ( n ) W (cid:17) = m (cid:48) (cid:111) P ⊗ nX (cid:48) | US ( x | s , u ) P ⊗ nY | X ( y | x ) (cid:26) g B ( n ) W ( v , y , s )= ˆ m (cid:48) (cid:27) (cid:110) ˆ h = { ( s , ˆ m (cid:48) , v , y ) ∈A cn } (cid:111) , otherwise , and Q ( B ( n ) W ,c n ) UV M (cid:48) XY ˆ M (cid:48) ˆ H ( u , v , m (cid:48) , x , y , ˆ m (cid:48) , ˆ h ):=  Q ⊗ nUV ( u , v ) (cid:110) M (cid:48) (cid:16) u , B ( n ) W (cid:17) = m (cid:48) (cid:111) P ⊗ nX | USW ( x | s , u , w ( m (cid:48) )) P ⊗ nY | X ( y | x ) (cid:26) g B ( n ) W ( v , y , s )= ˆ m (cid:48) (cid:27) (cid:110) ˆ h = { ( s , ˆ m (cid:48) , v , y ) ∈A cn } (cid:111) , if m (cid:48) (cid:54) = 0 ,Q ⊗ nUV ( u , v ) (cid:110) M (cid:48) (cid:16) u , B ( n ) W (cid:17) = m (cid:48) (cid:111) P ⊗ nX (cid:48) | US ( x | s , u ) P ⊗ nY | X ( y | x ) (cid:26) g B ( n ) W ( v , y , s )= ˆ m (cid:48) (cid:27) (cid:110) ˆ h = { ( s , ˆ m (cid:48) , v , y ) ∈A cn } (cid:111) , otherwise , respectively. For simplicity of presentation, we will denote B ( n ) W , B ( n ) W , P ( B ( n ) W ,c n ) UV M (cid:48) XY ˆ M (cid:48) ˆ H , Q ( B ( n ) W ,c n ) UV M (cid:48) XY ˆ M (cid:48) ˆ H by B n , B n , P ( B n ) and Q ( B n ) , respectively. Also, let B n and µ n stand for the support and probability measure of B n ,respectively. Also, deﬁne ¯ P P ( B n ) := E µ n (cid:2) P P ( B n ) (cid:3) and ¯ P Q ( B n ) := E µ n (cid:2) P Q ( B n ) (cid:3) . Analysis of the type I and type II error probabilities:

We analyze the expected type I and type II error probabilities over B n . Then, an expurgation technique similarto that in Theorem 4 guarantees the existence of a sequence of deterministic codebooks {B n } n ∈ N and a code { c n = ( f n , g n ) } n ∈ N that achieves the lower bound given in Theorem 5. Type I error probability

In the following proof, random sets where the randomness is induced due to B n will be written using blackboardbold letters, e.g., A n (and equivalently A en ) for the random acceptance region for H . Note that a type I error can occur only under the following events:(i) E (cid:48) EE := (cid:83) P ˆ U ∈D n ( P U ,η ) (cid:83) u ∈T n ( P ˆ U ) E (cid:48) EE ( u ) , where E (cid:48) EE ( u ) :=  (cid:64) j ∈ M (cid:48) \{ } s.t. ( s , u , W ( j )) ∈ T n ( P ˆ S ˆ U ( i ) ˆ W ( i ) ) , P ˆ S ˆ U ( i ) = P su ,P ˆ S ˆ U ( i ) ˆ W ( i ) ∈ D n ( P SUW , η )  . (ii) E (cid:48) NE := { ˆ M (cid:48) = M (cid:48) and ( s , ˆ M (cid:48) , V , Y ) / ∈ A n } .(iii) E (cid:48) ODE := { M (cid:48) (cid:54) = 0 , ˆ M (cid:48) (cid:54) = M (cid:48) and ( s , ˆ M (cid:48) , V , Y ) / ∈ A n } .(iv) E (cid:48) SDE := { M (cid:48) = 0 , ˆ M (cid:48) (cid:54) = M (cid:48) and ( s , ˆ M (cid:48) , V , Y ) / ∈ A n } .Since R (cid:48) i satisﬁes (158), we have similar to (97) that ¯ P P B n ( E (cid:48) EE ) ≤ e − e n Ω( η ) . (160)Next, the event E (cid:48) NE can be upper bounded as ¯ P P B n ( E (cid:48) NE |E (cid:48) c EE ) ≤ ¯ P P B n (cid:16) ( s , ˆ M (cid:48) , V , Y ) / ∈ A n | ˆ M (cid:48) = M (cid:48) , E (cid:48) cEE (cid:17) = 1 − ¯ P P B n (( s , U , V , Y ) ∈ A en |E (cid:48) cEE ) . (161)For u ∈ O m (cid:48) , note that similar to [13, Equation 4.17], we have ¯ P P B n (( V , Y ) ∈ Z (cid:48) m (cid:48) ( u ) | U = u , W ( m (cid:48) ) = ¯ w m (cid:48) , E (cid:48) cEE ) ≥ − e − n ( κ α + η − D ( P u || P U ) ) . (162)Then, using (80) and (162), we obtain similar to [13, Equation 4.22] that ¯ P P B n (( s , U , V , Y ) ∈ A ne |E (cid:48) cEE ) ≥ − e − nκ α . (163)Substituting (163) in (161) yields ¯ P P B n ( E (cid:48) NE |E (cid:48) c EE ) ≤ e − nκ α . (164)Next, the probability of event E (cid:48) ODE can be upper bounded as follows: ¯ P P B n ( E (cid:48) ODE )= ¯ P P B n (cid:16) M (cid:48) (cid:54) = 0 , ˆ M (cid:48) (cid:54) = M (cid:48) , ( s , M (cid:48) , V , Y ) ∈ A n , ( s , ˆ M (cid:48) , V , Y ) / ∈ A n (cid:17) + ¯ P P B n (cid:16) M (cid:48) (cid:54) = 0 , ˆ M (cid:48) (cid:54) = M (cid:48) , ( s , M (cid:48) , V , Y ) / ∈ A n , ( s , ˆ M (cid:48) , V , Y ) / ∈ A n (cid:17) ≤ ¯ P P B n (cid:16) M (cid:48) (cid:54) = 0 , ˆ M (cid:48) (cid:54) = M (cid:48) , ( s , M (cid:48) , V , Y ) ∈ A n , ( s , ˆ M (cid:48) , V , Y ) / ∈ A n (cid:17) + ¯ P P B n (cid:16) M (cid:48) (cid:54) = 0 , ˆ M (cid:48) (cid:54) = M (cid:48) , ( s , M (cid:48) , V , Y ) / ∈ A n (cid:17) ( a ) ≤ ¯ P P B n (cid:16) M (cid:48) (cid:54) = 0 , ˆ M (cid:48) (cid:54) = M (cid:48) , ( s , M (cid:48) , V , Y ) ∈ A n , ( s , ˆ M (cid:48) , V , Y ) / ∈ A n (cid:17) + e − e n Ω( η ) + e − nκ α (165) ≤ ¯ P P B n (cid:16) ˆ M (cid:48) (cid:54) = M (cid:48) | M (cid:48) (cid:54) = 0 , ( s , M (cid:48) , V , Y ) ∈ A n (cid:17) + e − e n Ω( η ) + e − nκ α , ( b ) = ¯ P P B n (cid:16) ˆ M (cid:48) (cid:54) = M (cid:48) | M (cid:48) (cid:54) = 0 , ˆ M (cid:48) (cid:54) = 0 , ( s , M (cid:48) , V , Y ) ∈ A n (cid:17) (166) ( c ) ≤ e − n ( ρ (cid:48) ( κ α ,ω (cid:48) ,P S ,P X | USW ) − ζ (cid:48) ( κ α ,ω (cid:48) ,P ˆ S ) − O ( η ) ) , (167)where(a) (165) follows similar to (103) using (160) and (163);(b) is since ( s , M (cid:48) , V , Y ) ∈ A n implies that ˆ M (cid:48) (cid:54) = 0 ;(c) follows similar to (114).Also, ¯ P P B n ( E (cid:48) SDE ) ≤ ¯ P P B n ( M (cid:48) = 0) ≤ ¯ P P B n ( M (cid:48) = 0 |E (cid:48) cEE ) + ¯ P P B n ( E (cid:48) EE ) ( a ) = (cid:88) u : P u / ∈D n ( P U ,η ) P ⊗ nU ( u ) + ¯ P P B n ( E (cid:48) EE ) ( b ) ≤ e − nκ α + e − e n Ω( η ) , (168)where(a) follows since given E (cid:48) cEE , M (cid:48) = 0 occurs only for U = u such that P u / ∈ D n ( P U , η ) ;(b) follows from (160), the deﬁnition of D n ( P U , η ) and [31, Lemma 1.6].From (160), (164), (167) and (168), it follows via the union bound on probability that the expected type I errorprobability satisﬁes e − n ( κ α − O ( η )) for sufﬁciently large n . Type II error probability

Next, we analyze the expected type II error probability over B n . Let D n ( P SV W Y , η ):=  P ˆ S ˆ V ˆ W ˆ Y : ∃ ( s , u , v , ¯ w , y ) ∈ ∪ m (cid:48) ∈M (cid:48) \{ } J n (cid:16) κ α + η, P ˆ SUV W m (cid:48) Y (cid:17) , P ˆ SUV W m (cid:48) Y satisﬁes (159)and P suv ¯ wy = P ˆ S ˆ U ˆ V ˆ W ˆ Y  . A type II error can occur only under the following events:(a) E (cid:48) a :=  ˆ M (cid:48) = M (cid:48) (cid:54) = 0 , ( s , U , V , W ( M (cid:48) ) , Y ) ∈ T n (cid:0) P ˆ S ˆ U ˆ V ˆ W ˆ Y (cid:1) s.t. P ˆ U ˆ W ∈ D n ( P SUW , η ) and P ˆ S ˆ V ˆ W ˆ Y ∈ D n ( P SV W Y , η )  . (b) E (cid:48) b :=  M (cid:48) (cid:54) = 0 , ˆ M (cid:48) (cid:54) = M (cid:48) , ( s , U , V , W ( M (cid:48) ) , Y , W ( ˆ M (cid:48) )) ∈ T n (cid:16) P ˆ S ˆ U ˆ V ˆ W ˆ Y ˆ W d (cid:17) s.t. P ˆ S ˆ U ˆ W ∈ D n ( P SUW , η ) , P ˆ S ˆ V ˆ W d ˆ Y ∈ D n ( P SV W Y , η ) , and H e (cid:16) W ( ˆ M (cid:48) ) | s , V , Y (cid:17) ≤ H e ( W ( M (cid:48) ) | s , V , Y )  . (c) E (cid:48) c := { M (cid:48) = 0 , ˆ M (cid:48) (cid:54) = M (cid:48) , ( s , V , Y , W ( ˆ M (cid:48) )) ∈ T n (cid:16) P ˆ S ˆ V ˆ Y ˆ W d (cid:17) s.t. P ˆ S ˆ V ˆ W d ˆ Y ∈ D n ( P SV W Y , η ) } . Let F (cid:48) ,n ( η ) :=  P ˆ S ˜ U ˜ V ˜ W ˜ Y ∈ T ( S n × U n × V n × W n × Y n ) : P ˆ S ˜ U ˜ W ∈ D n ( P SUW , η ) ,P ˆ S ˜ V ˜ W ˜ Y ∈ D n ( P SV W Y , η )  . Then, we can write ¯ P Q B n ( E (cid:48) a ) ≤ (cid:88) P ˆ S ˜ U ˜ V ˜ W ˜ Y ∈F (cid:48) ,n ( η ) (cid:88) ( u , v , ¯ w , y ):( s , u , v , ¯ w , y ) ∈T n ( P ˆ S ˜ U ˜ V ˜ W ˜ Y ) (cid:88) m (cid:48) ∈M (cid:48) \{ } ¯ P Q B n ( U = u , V = v , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = ¯ w , Y = y | S = s ) ≤ (cid:88) P ˆ S ˜ U ˜ V ˜ W ˜ Y ∈F (cid:48) ,n ( η ) (cid:88) ( u , v , ¯ w , y ):( s , u , v , ¯ w , y ) ∈T n ( P ˆ S ˜ U ˜ V ˜ W ˜ Y ) (cid:88) m (cid:48) ∈M (cid:48) \{ } ¯ P Q B n ( U = u , V = v , M (cid:48) = m (cid:48) | S = s )¯ P Q B n ( W ( m (cid:48) ) = ¯ w | U = u , V = v , M (cid:48) = m (cid:48) , S = s )¯ P Q B n ( Y = y | U = u , V = v , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = ¯ w , S = s ) ≤ (cid:88) P ˆ S ˜ U ˜ V ˜ W ˜ Y ∈F (cid:48) ,n ( η ) (cid:88) ( u , v , ¯ w , y ):( s , u , v , ¯ w , y ) ∈T n ( P ˆ S ˜ U ˜ V ˜ W ˜ Y ) (cid:88) m (cid:48) ∈M (cid:48) \{ } e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) ¯ P Q B n ( M (cid:48) = m (cid:48) | U = u , V = v , S = s ) (169) e n ( H ( ˜ W | ˆ S, ˜ U ) − η ) e − n ( H ( ˜ Y | ˜ U, ˆ S, ˜ W )+ D ( P ˜ Y | ˜ U ˆ S ˜ W || P Y | USW | P ˜ U ˆ S ˜ W )) ≤ (cid:88) P ˆ S ˜ U ˜ V ˜ W ˜ Y ∈F (cid:48) ,n ( η ) e nH ( ˜ U, ˜ V, ˜ W, ˜ Y | ˆ S ) e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) 1 e n ( H ( ˜ W | ˆ S, ˜ U ) − η ) e − n ( H ( ˜ Y | ˜ U, ˆ S, ˜ W )+ D ( P ˜ Y | ˜ U ˆ S ˜ W || P Y | USW | P ˜ U ˆ S ˜ W )) ≤ e − n E (cid:48) ,n , (170)where E (cid:48) ,n := min P ˆ S ˜ U ˜ V ˜ W ˜ Y ∈F (cid:48) ,n ( η ) H ( ˜ U, ˜ V ) + D ( P ˜ U ˜ V || Q UV ) + H ( ˜ W | ˆ S, ˜ U ) − η + H ( ˜ Y | ˜ U, ˆ S, ˜ W )+ D (cid:16) P ˜ Y | ˜ U ˆ S ˜ W || P Y | USW | P ˜ U ˆ S ˜ W (cid:17) − H ( ˜ U, ˜ V, ˜ W, ˜ Y | ˆ S ) − n ||U||V||W||Y| log( n + 1) (cid:38) min ( P ˜ U ˜ V ˜ W ˜ Y S ,Q ˜ U ˜ V ˜ W ˜ Y S ) ∈T (cid:48) ( κ α ,ω (cid:48) ,P S ,P X | USW ) D ( P ˜ U ˜ V ˜ W ˜ Y | S || Q UV W Y | S | P S ) − O ( η )= E (cid:48) ( κ α , ω (cid:48) ) − O ( η ) . In (169), we used the fact that ¯ P Q B n ( W ( m (cid:48) ) = ¯ w | U = u , V = v , S = s , M (cid:48) = m (cid:48) ) ≤  e n ( H ( ˜ W | ˆ S, ˜ U ) − η ) , if ¯ w ∈ T n ( ˜ W ) , , otherwise , which in turn follows from the fact that given M (cid:48) = m (cid:48) and U = u , W ( m (cid:48) ) is uniformly distributed in the set T n (cid:0) P ˜ W | ˆ S ˜ U , s , u (cid:1) and that for sufﬁciently large n , (cid:12)(cid:12) T n (cid:0) P ˜ W | ˆ S ˜ U , s , u (cid:1)(cid:12)(cid:12) ≥ e n ( H ( ˜ W | ˆ S, ˜ U ) − η ) . Next, we analyze the probability of the event E (cid:48) b . Let F (cid:48) ,n ( η ) :=  P ˆ S ˜ U ˜ V ˜ W ˜ Y ˜ W d ∈ T ( S n × U n × V n × W n × Y n × W n ) : P ˆ S ˜ U ˜ W ∈ D n ( P SUW , η ) ,P ˆ S ˜ V ˜ W d ˜ Y ∈ D n ( P SV W Y , η ) and H (cid:16) ˜ W d | ˆ S, ˜ V, ˜ Y (cid:17) ≤ H (cid:16) ˜ W | ˆ S, ˜ V, ˜ Y (cid:17)  . Then, ¯ P Q B n ( E (cid:48) b ) ≤ (cid:88) P ˆ S ˜ U ˜ V ˜ W ˜ Y ˜ Wd ∈F (cid:48) ,n ( η ) (cid:88) ( u , v , ¯ w , y , w (cid:48) ):( s , u , v , ¯ w , y , w (cid:48) ) ∈T n ( P ˆ S ˜ U ˜ V ˜ W ˜ Y ˜ Wd ) (cid:88) m (cid:48) ∈M (cid:48) \{ } ¯ P Q B n ( U = u , V = v , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = ¯ w , Y = y | S = s ) (cid:88) ˆ m (cid:48) ∈M (cid:48) \{ ,m (cid:48) } ¯ P Q B n (cid:0) ¯ W ( ˆ m (cid:48) ) = w (cid:48) | U = u , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = ¯ w , S = s (cid:1) ≤ (cid:88) P ˆ S ˜ U ˜ V ˜ W ˜ Y ˜ Wd ∈F (cid:48) ,n ( η ) (cid:88) ( u , v , ¯ w , y , w (cid:48) ):( s , u , v , ¯ w , y , w (cid:48) ) ∈T n ( P ˆ S ˜ U ˜ V ˜ W ˜ Y ˜ Wd ) (cid:88) m (cid:48) ∈M (cid:48) \{ } (cid:34) e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) ¯ P Q B n ( M (cid:48) = m (cid:48) | U = u , V = v , S = s )1 e n ( H ( ˜ W | ˆ S, ˜ U ) − η ) e − n ( H ( ˜ Y | ˜ U, ˆ S, ˜ W )+ D ( P ˜ Y | ˜ U ˆ S ˜ W || P Y | USW | P ˜ U ˆ S ˜ W )) (cid:88) ˆ m (cid:48) ∈M (cid:48) \{ ,m (cid:48) } ¯ P Q B n (cid:0) ¯ W ( ˆ m (cid:48) ) = w (cid:48) | U = u , M (cid:48) = m (cid:48) , W ( m (cid:48) ) = ¯ w , S = s (cid:1) (cid:35) ≤ (cid:88) P ˆ S ˜ U ˜ V ˜ W ˜ Y ˜ Wd ∈F (cid:48) ,n ( η ) (cid:88) ( u , v , ¯ w , y ):( s , u , v , ¯ w , y ) ∈T n ( P ˆ S ˜ U ˜ V ˜ W ˜ Y ) (cid:88) m (cid:48) ∈M (cid:48) \{ } (cid:34) e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) ¯ P Q B n ( M (cid:48) = m (cid:48) | U = u , V = v , S = s )1 e n ( H ( ˜ W | ˆ S, ˜ U ) − η ) e − n ( H ( ˜ Y | ˜ U, ˆ S, ˜ W )+ D ( P ˜ Y | ˜ U ˆ S ˜ W || P Y | USW | P ˜ U ˆ S ˜ W )) (cid:88) ˆ m (cid:48) ∈M (cid:48) \{ ,m (cid:48) } e nH ( ˜ W d | ˆ S, ˜ V, ˜ Y ) e n ( H ( ˜ W d ) − η ) (cid:35) ≤ (cid:88) P ˆ S ˜ U ˜ V ˜ W ˜ Y ˜ Wd ∈F (cid:48) ,n ( η ) (cid:88) ( u , v , ¯ w , y ):( s , u , v , ¯ w , y ) ∈T n ( P ˆ S ˜ U ˜ V ˜ W ˜ Y ) (cid:34) e − n ( H ( ˜ U, ˜ V )+ D ( P ˜ U ˜ V || Q UV )) 1 e n ( H ( ˜ W | ˆ S, ˜ U ) − η ) e − n ( H ( ˜ Y | ˜ U, ˆ S, ˜ W )+ D ( P ˜ Y | ˜ U ˆ S ˜ W || P Y | USW | P ˜ U ˆ S ˜ W )) e n ( ζ (cid:48) ( κ α ,ω (cid:48) ,P ˆ S )+ η ) 2 e nH ( ˜ W d | ˆ S, ˜ V, ˜ Y ) e n ( H ( ˜ W d ) − η ) (cid:35) ≤ e − n E (cid:48) ,n , (171)where E (cid:48) ,n := min P ˆ S ˜ U ˜ V ˜ W ˜ Y ˜ Wd ∈F (cid:48) ,n ( η ) H ( ˜ U, ˜ V ) + D ( P ˜ U ˜ V || Q UV ) + H ( ˜ W | ˆ S, ˜ U ) − η + H ( ˜ Y | ˜ U, ˆ S, ˜ W ) + D (cid:16) P ˜ Y | ˜ U ˆ S ˜ W || P Y | USW | P ˜ U ˆ S ˜ W (cid:17) + I ( ˜ W d ; ˆ S, ˜ V, ˜ Y ) − H ( ˜ U, ˜ V, ˜ W, ˜ Y | ˆ S ) − ζ (cid:48) ( κ α , ω (cid:48) , P ˆ S ) − log 2 n − |S||U||V||W| |Y| log( n + 1) n ≥ min P ˆ S ˜ U ˜ V ˜ W ˜ Y ˜ Wd ∈F (cid:48) ,n ( η ) D ( P ˜ U ˜ V ˜ W ˜ Y | S || Q UV W Y | S | P ˆ S ) + I ( ˜ W d ; ˆ S, ˜ V, ˜ Y ) − ζ (cid:48) ( κ α , ω (cid:48) , P ˆ S ) − log 2 n − |S||U||V||W| |Y| log( n + 1) n (cid:38) min ( P ˜ U ˜ V ˜ W ˜ Y S ,Q ˜ U ˜ V ˜ W ˜ Y S ) ∈T (cid:48) ( κ α ,ω (cid:48) ,P S ,P X | USW ) D ( P ˜ U ˜ V ˜ W ˜ Y | S || Q UV W Y | S | P S ) + ρ (cid:48) ( κ α , ω (cid:48) , P S , P X | USW ) − ζ (cid:48) ( κ α , ω (cid:48) , P S ) − O ( η )= E (cid:48) ( κ α , ω (cid:48) , P S , P X | USW ) − O ( η ) . Finally, consider the event E (cid:48) c . Similar to (129), we can write ¯ P Q B n ( E (cid:48) c ) = (cid:88) u ∈T n ( P ˜ U ): P ˜ U ∈D n ( P U ,η ) ¯ P Q B n ( U = u , E (cid:48) EE , E (cid:48) c | S = s ) + (cid:88) u ∈T n ( P ˜ U ): P ˜ U / ∈D n ( P U ,η ) ¯ P Q B n ( U = u , E (cid:48) c | S = s ) . (172)The ﬁrst term in (172) decays double exponentially as e − e n Ω( η ) . The second term in (172) can be simpliﬁed asfollows: (cid:88) u ∈T n ( P ˜ U ): P ˜ U / ∈D n ( P U ,η ) ¯ P Q B n ( U = u , E (cid:48) c | S = s ) ≤ (cid:88) u ∈T n ( P ˜ U ): P ˜ U / ∈D n ( P U ,η ) (cid:88) ( v , y , w (cid:48) ):( s , v , y , w (cid:48) ) ∈T n (cid:16) P ˆ S ˜ V ˜ Y ˜ Wd (cid:17) ,P ˆ S ˜ V ˜ Wd ˜ Y ∈D n ( P SV WY ,η ) (cid:88) ˆ m ∈M\{ } ¯ P Q B n ( U = u , V = v , M (cid:48) = 0 , Y = y | S = s ) (cid:88) ˆ m (cid:48) ∈M (cid:48) \{ } ¯ P Q B n ( W ( ˆ m (cid:48) ) = ¯ w ) ≤ (cid:88) u ∈T n ( P ˜ U ): P ˜ U / ∈D n ( P U ,η ) (cid:88) ( v , y , w (cid:48) ):( s , v , y , w (cid:48) ) ∈T n (cid:16) P ˆ S ˜ V ˜ Y ˜ Wd (cid:17) ,P ˆ S ˜ V ˜ Wd ˜ Y ∈D n ( P SV WY ,η ) ¯ P Q B n ( U = u , V = v , M (cid:48) = 0 , Y = y | S = s ) (cid:88) ˆ m (cid:48) ∈M (cid:48) \{ } e n ( H ( ˜ W d ) − η ) ≤ (cid:88) u ∈T n ( P ˜ U ): P ˜ U / ∈D n ( P U ,η ) (cid:88) ( v , y ):( s , v , y ) ∈T n ( P ˆ S ˜ V ˜ Y ) ,P ˆ S ˜ V ˜ Wd ˜ Y ∈D n ( P SV WY ,η ) ¯ P Q B n ( U = u , V = v ) ¯ P Q B n ( Y = y | U = u , V = v , M (cid:48) = 0 , S = s ) e nH ( ˜ W d | ˆ S, ˜ V, ˜ Y ) e n ( R (cid:48) + η ) e n ( H ( ˜ W d ) − η ) ≤ (cid:88) P ˜ U ˆ S ˜ V ˜ Wd ˜ Y ∈D n ( P U ,η ) c ×D n ( P SV WY ,η ) e nH ( ˜ U, ˜ V, ˜ Y | ˆ S ) e − n ( H ( ˜ U, ˜ V, ˜ Y | ˆ S )+ D ( P ˜ U ˜ V ˜ Y | ˆ S || Q UV Y (cid:48)| ˆ S | P ˆ S )) e nH ( ˜ W d | ˆ S, ˜ V, ˜ Y ) e n ( R (cid:48) + η ) e n ( H ( ˜ W d ) − η ) ≤ e − n E (cid:48) ,n , (173) where E (cid:48) ,n := min P ˜ U ˆ S ˜ V ˜ Wd ˜ Y ∈D n ( P U ,η ) c ×D n ( P SV WY ,η ) D (cid:16) P ˜ U ˜ V ˜ Y | ˆ S || Q UV Y (cid:48) | ˆ S | P ˆ S (cid:17) + I ( ˜ W d ; ˆ S, ˜ V, ˜ Y ) − R (cid:48) − O ( η ) − |U||V||W||Y||S| n log( n + 1) (cid:38) min P ˆ V ˆ Y S : P ˆ U ˆ V ˆ W ˆ Y S ∈ ˆ L h ( κ α ,ω (cid:48) ,P S ,P X | USW ) D (cid:16) P ˆ V ˆ Y | S || Q V Y (cid:48) | S | P S (cid:17) + ρ (cid:48) ( κ α , ω (cid:48) , P S , P X | USW ) − ζ (cid:48) ( κ α , ω (cid:48) , P S ) − O ( η )= E (cid:48) ( κ α , ω (cid:48) , P S , P X | USW , P X (cid:48) | US ) − O ( η ) . Since the exponent of the type II error probability is lower bounded by the minimum of the exponent of the typeII error causing events, it follows from (170), (171) and (173) that for a ﬁxed (cid:0) P S , ω (cid:48) ( · , P S ) , P X | USW , P X (cid:48) | US (cid:1) ∈L h ( κ α ) , that ¯ P P ( B n ) (cid:16) ˆ H = 1 (cid:17) ≤ e − n ( κ α − O ( η )) , (174a)and ¯ P Q ( B n ) (cid:16) ˆ H = 0 (cid:17) ≤ e − n (¯ κ h ( κ α ,ω (cid:48) ,P S ,P X | USW ,P X (cid:48)| US ) − O ( η )) , (174b)where ¯ κ h (cid:0) κ α , ω (cid:48) , P S , P X | USW , P X (cid:48) | US (cid:1) := min (cid:8) E (cid:48) ( κ α , ω (cid:48) ) , E (cid:48) ( κ α , ω (cid:48) , P S , P X | USW ) , E (cid:48) ( κ α , ω (cid:48) , P S , P X | USW , P X (cid:48) | US ) (cid:9) − O ( η ) . Performing expurgation similar to the proof of Theorem 4 to obtain a deterministic codebook B n satisfying (174),maximizing over (cid:0) P S , ω (cid:48) ( · , P S ) , P X | USW , P X (cid:48) | US (cid:1) ∈ L h ( κ α ) and noting that η > is arbitrary yields that κ ( κ α ) ≥ κ ∗ h ( κ α ) .Finally, we show that κ ( κ α ) ≥ κ ∗ h ( κ α ) . Fix P X | US and let P UV XY := P UV P X | US P Y | X and Q UV XY := Q UV P X | US P Y | X . Consider uncoded transmission scheme in which the channel input X ∼ f n ( ·| u ) = P ⊗ nX | US ( ·| u , s ) .Let the decision rule g n be speciﬁed by the acceptance region A n = { ( s , v , y ) : D (cid:0) P vy | s || P V Y | S | P s (cid:1) ≤ κ α + η } for some small number η > . Then, it follows from [37, Lemma 2.6] that for sufﬁciently large n , α n ( f n , g n ) = P ⊗ nV Y | S ( A cn | s ) ≤ e − nκ α ,β n ( f n , g n ) = Q ⊗ nV Y | S ( A n | s ) ≤ e − n ( κ ∗ u ( κ α ) − O ( η )) . The proof is complete by noting that η > is arbitrary. H. Proof of Corollary 4

Note that ˆ L h (0 , ω (cid:48) , P S , P X | USW ) := { P UV ˆ W Y S : P SUV W XY := P S P UV P W | US P X | USW P Y | X , P W | US = ω (cid:48) ( P U , P S ) } , ζ (cid:48) (0 , ω (cid:48) , P S ) := I P ( U ; W | S ) ,ρ (0 , ω (cid:48) , P S , P X | USW ) := I P ( Y, V ; W | S ) , L h (0) :=  (cid:0) P S , ω (cid:48) ( · , P S ) , P X | USW , P X (cid:48) | US (cid:1) ∈ P S × F (cid:48) × P ( X |U × S × W ) × P ( X |U × S ) : I P ( U ; W | S ) < I P ( Y, V ; W | S )  , and E (cid:48) b (0 , ω (cid:48) , P S , P X | USW ) := I P ( Y, V ; W | S ) − I P ( U ; W | S ) . The result then follows from Theorem 5 via the continuity of ˆ L h ( κ α , · , · , · ) , ζ (cid:48) ( κ α , · , · ) , ρ ( κ α , · , · , · ) , L h ( κ α ) and E (cid:48) b ( κ α , · , · , · ) in κ α . V. C ONCLUSION

We studied the trade-off between the exponents of the type I and type II error probabilities for DHT over anoisy channel. For the RHT problem, we obtained a single-letter characterization of the optimal trade-off betweenthe error-exponents. The direct (achievability) part of the proof shows that the optimal trade-off can be achievedif the observer performs an appropriate NP test locally and communicates its decision to the decision maker usinga suitable channel code, while the decision maker performs an appropriate NP test on the channel output. Thisimplies a “separation” between HT and channel coding in this setting; in the sense that, there is no loss in optimalityincurred by optimizing the tasks of HT and channel coding independently of each other. For the DHT problem,we obtained inner bounds on the error-exponents trade-off using the SHTCC and JHTCC schemes. The SHTCCscheme recovers some of the existing bounds in the literature as special cases. We also showed via an exampleof TAD that the JHTCC scheme strictly outperforms the SHTCC scheme at some points of the error-exponentstrade-off. An interesting avenue for future research is the exploration of novel outer bounds that could shed lighton the tightness of our inner bounds. A

PPENDIX AP ROOF OF (97)

AND (110)First, we prove (97). Since W ( j ) , j ∈ M (cid:48) i , is selected uniformly at random from the set T n (cid:0) P ˆ W ( i ) (cid:1) , we havefrom [31, Lemma 2.5] that for any u ∈ T n (cid:0) P ˆ U ( i ) (cid:1) and sufﬁciently large n , ¯ P P ( B n ) (cid:16) ( u , W ( j )) / ∈ T n (cid:0) P ˆ U ( i ) ˆ W ( i ) (cid:1) (cid:17) ≤ (cid:32) − e n ( H ( ˆ W ( i ) | ˆ U ( i ) ) − η ) e nH ( ˆ W ( i ) ) (cid:33) . (175)Since the codewords are selected independently, the union bound on probability yields ¯ P P ( B n ) (cid:16) (cid:64) ( u , W ( j )) / ∈ T n (cid:0) P ˆ U ( i ) ˆ W ( i ) (cid:1) , j ∈ M (cid:48) i (cid:17) ≤ (cid:32) − e n ( H ( ˆ W ( i ) | ˆ U ( i ) ) − η ) e nH ( ˆ W ( i ) ) (cid:33) e nR (cid:48) i ≤ e − e n ( R (cid:48) i − I ( ˆ U ( i ); ˆ W ( i ) ) − η ) . (176) Hence, by the choice of R (cid:48) i in (84), we have for sufﬁciently large n that ¯ P P ( B n ) ( E EE ) = |D n ( P U ,η ) | (cid:88) i =1 e − e n η ≤ ( n + 1) |U| e − e n η ≤ e − e n η . (177)This completes the proof of (97).Next, we prove (110). Note that by the encoding procedure, M (cid:48) = 1 and w ∈ T n (cid:0) P ˆ W ( i ) (cid:1) for some i ∈ (cid:2) |D n ( P U , η ) | (cid:3) implies that U ∈ T n (cid:16) P ˆ U ( i ) | ˆ W ( i ) , w (cid:17) with probability one. Hence, it follows for j (cid:54) = 1 that ¯ P P ( B n ) ( W ( j ) = ˜ w | V = v , M (cid:48) = 1 , W (1) = w )= (cid:88) u ∈T n (cid:16) P ˆ U ( i ) | ˆ W ( i ) , w (cid:17) ¯ P P ( B n ) ( U = u | V = v , M (cid:48) = 1 , W (1) = w )¯ P P ( B n ) ( W ( j ) = ˜ w | U = u , V = v , M (cid:48) = 1 , W (1) = w ) Let B − ,j := B ( n ) W \{ w (1) , w ( j ) } , B − ,j := B ( n ) W \{ W (1) , W ( j ) } , E := { U = u , V = v , M (cid:48) = 1 , W (1) = w } . Then, denoting the support of B − ,j by B − ,j , we have ¯ P P ( B n ) ( W ( j ) = ˜ w |E ) = (cid:88) B − ,j ∈ B − ,j ¯ P P ( B n ) ( B − ,j = B − ,j |E )¯ P P ( B n ) ( W ( j ) = ˜ w |E , B − ,j = B − ,j ) . (178)We can write the term within the summation in (178) as follows: ¯ P P ( B n ) ( W ( j ) = ˜ w |E , B − ,j = B − ,j )= ¯ P P ( B n ) ( W ( j ) = ˜ w | U = u , V = v , B − ,j = B − ,j ) P P ( B n ) ( M (cid:48) = 1 | W ( j ) = ˜ w , W (1) = w , U = u , V = v , B − ,j = B − ,j )¯ P P ( B n ) ( M (cid:48) = 1 | W (1) = w , U = u , V = v , B − ,j = B − ,j )= ¯ P P ( B n ) ( W ( j ) = ˜ w ) ¯ P P ( B n ) ( M (cid:48) = 1 | W ( j ) = ˜ w , W (1) = w , U = u , V = v , B − ,j = B − ,j )¯ P P ( B n ) ( M (cid:48) = 1 | W (1) = w , U = u , V = v , B − ,j = B − ,j ) . (179)Let N ( u , B − ,j ) = (cid:12)(cid:12) { w ( l ) ∈ B − ,j : l (cid:54) = 1 , j, ( u , w ( l )) ∈ T n (cid:0) P ˆ U ( i ) ˆ W ( i ) (cid:1) } (cid:12)(cid:12) . Recall that if there are multiple indices l in the codebook B ( n ) W such that ( u , w ( l )) ∈ T n (cid:0) P ˆ U ( i ) ˆ W ( i ) (cid:1) , then theencoder selects one of them uniformly at random. Also, note that since M (cid:48) = 1 , ( u , w (1)) ∈ T n (cid:0) P ˆ U ( i ) ˆ W ( i ) (cid:1) . Thus, if ( u , ˜ w ) ∈ T n (cid:0) P ˆ U ( i ) ˆ W ( i ) (cid:1) , then ¯ P P ( B n ) ( M (cid:48) = 1 | W ( j ) = ˜ w , W (1) = w , U = u , V = v , B − ,j = B − ,j )¯ P P ( B n ) ( M (cid:48) = 1 | W (1) = w , U = u , V = v , B − ,j = B − ,j )= (cid:34) N ( u , B − ,j ) + 2 (cid:35) P P ( B n ) ( M (cid:48) = 1 | U = u , V = v , B − ,j = B − ,j ) ≤ N ( u , B − ,j ) + 2 N ( u , B − ,j ) + 2 = 1 . (180)On the other hand, if ( u , ˜ w ) / ∈ T n (cid:0) P ˆ U ( i ) ˆ W ( i ) (cid:1) , then ¯ P P ( B n ) ( M (cid:48) = 1 | W ( j ) = ˜ w , W (1) = w , U = u , V = v , B − ,j = B − ,j )¯ P P ( B n ) ( M (cid:48) = 1 | W (1) = w , U = u , V = v , B − ,j = B − ,j )= (cid:34) N ( u , B − ,j ) + 1 (cid:35) P P ( B n ) ( M (cid:48) = 1 | U = u , V = v , B − ,j = B − ,j ) ≤ N ( u , B − ,j ) + 2 N ( u , B − ,j ) + 1 ≤ . (181)From (178), (179), (180) and (181), (110) follows. A similar proof also shows that (110) holds with ¯ P Q ( B n ) in placeof ¯ P P ( B n ) . The proof of (123) is similar to that of (110), and hence, omitted.R EFERENCES[1] J. Neyman and E. Pearson, “On the problem of the most efﬁcient tests of statistical hypotheses,”

Philos. Trans. of the Royal Society ofLondon , vol. 231, pp. 289–337, Feb. 1933.[2] H. Chernoff, “A measure of asymptotic efﬁciency for tests of a hypothesis based on a sum of observations,”

Ann. Math. Statist. , vol. 23,no. 4, pp. 493–507, 1952.[3] W. Hoeffding, “Asymptotically optimal tests for multinominal distributions,”

Ann. Math. Stat. , vol. 36, no. 2, pp. 369–400, 1965.[4] R. E. Blahut, “Hypothesis testing and information theory,”

IEEE Trans. Inf. Theory , vol. 20, no. 4, pp. 405–417, Jul. 1974.[5] E. Tuncel, “On error exponents in hypothesis testing,”

IEEE Trans. Inf. Theory , vol. 51, no. 8, pp. 2945–2950, 2005.[6] R. Ahlswede and I. Csisz´ar, “Hypothesis testing with communication constraints,”

IEEE Trans. Inf. Theory , vol. 32, no. 4, pp. 533–542,Jul. 1986.[7] T. S. Han, “Hypothesis testing with multiterminal data compression,”

IEEE Trans. Inf. Theory , vol. 33, no. 6, pp. 759–772, Nov. 1987.[8] H. Shimokawa, T. S. Han, and S. Amari, “Error bound of hypothesis testing with data compression,” in

Proc. IEEE Int. Symp. Inf. Theory(ISIT) , Trondheim, Norway, 1994.[9] M. S. Rahman and A. B. Wagner, “On the optimality of binning for distributed hypothesis testing,”

IEEE Trans. Inf. Theory , vol. 58,no. 10, pp. 6282–6303, Oct. 2012.[10] S. Sreekumar and D. G¨und¨uz, “Distributed hypothesis testing over discrete memoryless channels,”

IEEE Trans. Inf. Theory , vol. 66, no. 4,Apr. 2020.[11] T. Berger, “Decentralized estimation and decision theory,” in

IEEE 7th. Spring Workshop on Inf. Theory , Mt. Kisco, NY, Sep. 1979.[12] H. M. H. Shalaby and A. Papamarcou, “Multiterminal detection with zero-rate data compression,”

IEEE Trans. Inf. Theory , vol. 38, no. 2,pp. 254–267, Mar. 1992.[13] T. S. Han and K. Kobayashi, “Exponential-type error probabilities for multiterminal hypothesis testing,”

IEEE Trans. Inf. Theory , vol. 35,no. 1, pp. 2–14, Jan. 1989.[14] C. Tian and J. Chen, “Successive reﬁnement for hypothesis testing and lossless one-helper problem,”

IEEE Trans. Inf. Theory , vol. 54,no. 10, pp. 4666–4681, Oct. 2008.[15] W. Zhao and L. Lai, “Distributed testing against independence with multiple terminals,” in , Monticello, IL, USA, Oct. 2014.[16] M. Wigger and R. Timo, “Testing against independence with multiple decision centers,” in

Int. Conf. on Signal Processing andCommunication , Bengaluru, India, Jun. 2016. [17] S. Salehkalaibar, M. Wigger, and L. Wang, “Hypothesis testing in multi-hop networks,” IEEE Trans. Inf. Theory , vol. 65, no. 7, pp.4411–4433, Jul. 2019.[18] A. Zaidi and I. E. Aguerri, “Optimal rate-exponent region for a class of hypothesis testing against conditional independence problems,” arXiv:1904.03028 .[19] M. Mhanna and P. Piantanida, “On secure distributed hypothesis testing,” in

Proc. IEEE Int. Symp. Inf. Theory (ISIT) , Hong Kong, China,Jun. 2015.[20] S. Sreekumar and D. G¨und¨uz, “Testing against conditional independence under security constraints,” in

Proc. IEEE Int. Symp. Inf. Theory(ISIT) , Vail, CO, USA, Jun. 2018.[21] S. Sreekumar, A. Cohen, and D. G¨und¨uz, “Privacy-aware distributed hypothesis testing,” arXiv:1807.02764 .[22] A. Gilani, S. B. Amor, S. Salehkalaibar, and V. Tan, “Distributed hypothesis testing with privacy constraints,”

Entropy , vol. 21, no. 478,pp. 1–27, May 2019.[23] G. Katz, P. Piantanida, and M. Debbah, “Distributed binary detection with lossy data compression,”

IEEE Trans. Inf. Theory , vol. 63,no. 8, pp. 5207–5227, Mar. 2017.[24] Y. Xiang and Y. H. Kim, “Interactive hypothesis testing with communication constraints,” in , Monticello, IL, USA, Oct. 2012.[25] ——, “Interactive hypothesis testing against independence,” in

Proc. IEEE Int. Symp. Inf. Theory (ISIT) , Istanbul, Turkey, Nov. 2013.[26] G. Katz, P. Piantanida, and M. Debbah, “Collaborative distributed hypothesis testing,” arXiv:1604.01292 [cs.IT] , Apr. 2016.[27] U. Hadar, J. Liu, Y. Polyanskiy, and O. Shayevitz, “Error exponents in distributed hypothesis testing of correlations,” in

Proc. IEEE Int.Symp. Inf. Theory (ISIT) , 2019, pp. 2674–2678.[28] E. Haim and Y. Kochman, “On binary distributed hypothesis testing,” arXiv:1801.00310 .[29] N. Weinberger and Y. Kochman, “On the reliability function of distributed hypothesis testing under optimal detection,”

IEEE Trans. Inf.Theory , vol. 65, no. 8, pp. 4940–4965, Apr. 2019.[30] S. Watanabe, “Neyman-Pearson test for zero-rate multiterminal hypothesis testing,”

IEEE Trans. Inf. Theory , vol. 64, no. 7, pp. 4923–4939,Jul. 2018.[31] I. Csisz´ar and J. K¨orner,

Information Theory: Coding Theorems for Discrete Memoryless Systems . Cambridge University Press, 2011.[32] S. Sreekumar and D. G¨und¨uz, “Distributed hypothesis testing over noisy channels,” in

Proc. IEEE Int. Symp. Inf. Theory (ISIT) , Aachen,Germany, Jun. 2017.[33] S. Salehkalaibar and M. Wigger, “Distributed hypothesis testing based on unequal-error protection codes,” arXiv:1806.05533 .[34] P. Minero, S. H. Lim, and Y. H. Kim, “A uniﬁed approach to hybrid coding,”

IEEE Trans. Inf. Theory , vol. 61, no. 4, pp. 1509–1523,Apr. 2015.[35] S. Borade, B. Nakibo˘glu, and L. Zheng, “Unequal error protection: An information-theoretic perspective,”

IEEE Trans. Inf. Theory , vol. 55,no. 12, pp. 5511–5539, Dec 2009.[36] N. Weinberger, Y. Kochman, and M. Wigger, “Exponent trade-off for hypothesis testing over noisy channels,” in

Proc. IEEE Int. Symp.Inf. Theory (ISIT) , Paris, France, 2019.[37] I. Csisz´ar, “On the error exponent of source-channel transmission with a distortion threshold,”

IEEE Trans. Inf. Theory , vol. 28, no. 6, pp.823–828, Nov. 1982.[38] Y. Polyanskiy and Y. Wu,

Lecture Notes on Information Theory . [Online]. Available: http://people.lids.mit.edu/yp/homepage/data/itlecturesv5.pdf, 2017.[39] S. Sreekumar and D. G¨und¨uz, “Hypothesis testing over a noisy channel,” in

Proc. IEEE Int. Symp. Inf. Theory (ISIT) , Paris, France, 2019.[40] R. Gallager, “A simple derivation of the coding theorem and some applications,”

IEEE Trans. Inf. Theory , vol. 11, no. 1, pp. 3–18, Jan.1965.[41] N. Merhav and S. Shamai, “On joint source-channel coding for the Wyner-Ziv source and the Gelfand-Pinsker channel,”

IEEE Trans. Inf.Theory , vol. 49, no. 11, pp. 2844–2855, Nov 2003.[42] T. Cover, A. E. Gamal, and M. Salehi, “Multiple access channels with arbitrarily correlated sources,”

IEEE Trans. Inf. Theory , vol. 26,no. 6, pp. 648–657, Nov. 1980.[43] I. Csisz´ar, “Joint source-channel error exponent,”

Prob. of Control and Inf. Theory , vol. 9, no. 5, pp. 315–328, Oct. 1980.[44] H. G. Eggleston,