Guesswork with Quantum Side Information
Eric P. Hanson, Vishal Katariya, Nilanjana Datta, Mark M. Wilde
GGuesswork with quantum side information
Eric P. Hanson ∗ , Vishal Katariya † , Nilanjana Datta ‡ , and Mark M. Wilde § Department of Applied Mathematics and Theoretical Physics, University of Cambridge,Cambridge CB3 0WA, UK Hearne Institute for Theoretical Physics, Department of Physics and Astronomy, and Center forComputation and Technology, Louisiana State University, Baton Rouge, Louisiana 70803, USAFebruary 4, 2020
Abstract
What is the minimum number of guesses needed on average to guess a realization of a random variablecorrectly? The answer to this question led to the introduction of a quantity called guesswork by Massey in 1994,which can be viewed as an alternate security criterion to entropy. In this paper, we consider the guesswork inthe presence of quantum side information, and show that a general sequential guessing strategy is equivalentto performing a single quantum measurement and choosing a guessing strategy based on the outcome. We usethis result to deduce entropic one-shot and asymptotic bounds on the guesswork in the presence of quantumside information, and to formulate a semi-definite program (SDP) to calculate the quantity. We evaluate theguesswork for a simple example involving the BB84 states, both numerically and analytically, and we prove acontinuity result that certifies the security of slightly imperfect key states when the guesswork is used as thesecurity criterion.
Information theory, among other things, concerns the security of messages against attacks by malicious agents.Conventionally, it is accepted that that the more unpredictable a message is, and the higher the (Shannon) entropyof the distribution from which it is drawn, the more secure it is to brute force attacks. Therefore, when establishinga secret key or a cipher, the gold standard is to choose a key whose elements are picked uniformly at random fromsome alphabet.Entropy, however, is not the only such criterion for security. Another relevant quantity, which is also maximizedby messages drawn uniformly, is the guesswork. First put forth by Massey [1], the quantity is operationally describedby the following guessing game. Consider the problem of guessing a realization of a random variable X , takingvalues in a finite alphabet X , by asking questions of the form “Is X = x ?”. The guesswork G ( X ) is defined as theminimum value of the average number of questions of this form that needs to be asked until the answer is “yes”. Inthe real world, questions of this form arise from query access to a resource; for example, if a hacker is attemptingto guess a user’s password on an online portal, he or she can only ask this kind of question (as opposed to, say, “Is X ≥ x ?”), and only allowed a limited number of guesses before being locked out. Therefore, for someone setting upa password, the number of guesses allowed by the portal provides the operational security criterion against whichhis or her password must compare.In contrast, the entropy of a distribution is approximately the minimum value of the average number of guessesrequired to obtain the correct guess when one is allowed to ask questions of the form, “Is X ∈ (cid:101) X ?”, where each (cid:101) X is some subset of the alphabet X [2, Theorem 5.4.1]. Qualitatively speaking, entropy can be considered to bethe query complexity of a binary search-type algorithm, whereas guesswork corresponds to the query complexityof a linear search-type algorithm [3]. Figure 1 illustrates this difference. It is well known that binary search has asmaller complexity than linear search, which leads to the simple claim that the entropy of a distribution is always ∗ [email protected] † [email protected] ‡ [email protected] § [email protected] a r X i v : . [ qu a n t - ph ] F e b x x x x x x x x x X = x X ≠ x (a) (b) X ˜ X
14 2 H ( X ) + 1 , (1)provided that H ( X ) ≥ Y that is correlated with X . In this case, the guesswork is the minimal numberof questions of the form “Is X = x ?” that is required on average to obtain the correct answer, given the valueof Y . The optimal guessing strategy is simply to guess in decreasing order of conditional probability p X | Y ( x | y ).Arikan [4] obtained upper and lower bounds on the guesswork and its positive moments, in this scenario as wellas in the case without side information. Further work on guesswork in the classical setting has been done in,e.g. [5, 6, 7, 8, 9, 10, 11, 12].In this paper, we consider a natural generalization of the above guessing problem to the case in which the classicalside information is replaced by quantum side information. This generalization was first considered in [13]. In thiscase, the guesser (say, Bob) holds a quantum system B , instead of a classical random variable (or equivalently, aclassical system) Y . Here, the joint state of X and B is given by a classical-quantum (c-q) state, which we denoteas ρ XB (see Section 2 for details). We define guesswork in the presence of quantum side information to be theminimum number of guesses needed, on average, for Bob to correctly guess Alice’s choice, by performing a generalsequential protocol, as follows. Bob acts on his system B with a quantum instrument, yielding a classical outcomeˆ x which he guesses, as well as a post-measurement state. If his guess is incorrect, he performs another instrument(possibly adapted based on his previous guess), and repeats the protocol until he either guesses correctly or runsout of guesses (which might be the case if he is allowed a limited number of guesses K < |X | ).While the case of classical side information admits a very simple optimal strategy (which amounts to simplysorting the conditional probabilities p X | Y ( ·| y ) in non-increasing order and guessing accordingly), the quantum caserequires measurement on the quantum system B , which potentially disturbs the state of B , a priori complicatingthe analysis of the sequence of guesses in the optimal strategy. We show in Section 3.2, however, that a generalsequential strategy is in fact equivalent to performing a single generalized measurement yielding a classical randomvariable Y of outcomes and then doing the optimal strategy using this Y as the classical side information. Theearlier work [13] instead defined guesswork with quantum side information as the latter quantity, i.e., a measuredversion of the guesswork in the presence of classical side information. While these definitions are equivalent, weconsider the definition in terms of a sequential protocol to be a more natural one. Moreover, the above-mentionedequivalence is proved via an explicit construction, allowing such a guessing strategy to be implemented sequentially.The single-measurement protocol could in general involve making a measurement with exponentially (in |X | ) manyoutcomes. Hence it may be more efficient to implement it instead as a sequence of (linearly-many) measurementswith linearly-many outcomes, as allowed by the above construction.We consider moreover a slight generalization of the guesswork in which Bob may only make K ≤ |X | guesses in2otal, and in which the “cost” of needing to make k guesses is given by a vector (cid:126)c = ( c , c , . . . , c K ), which couldbe different from (1 , , . . . , K ), the latter of which corresponds to the expected value. These generalizations can bebetter models for certain situations; in the password-guessing example, e.g., Bob may be locked out after K guessesand hence is limited to a small number of guesses, or perhaps one has to wait after each guess before makinganother, and the time that one waits increases with the number of incorrect guesses. We show that this generalizedsituation (including the guesswork as a special case) admits a semi-definite programming (SDP) representation inwhich the number of variables scales as |X | K , and hence smaller values of K yield smaller problems and betterscaling with |X | . See Section 5 for more on the computational aspects of the guesswork.One can consider related task, in which one wishes to maximize what is known as the “guessing probability” p guess ( X | B ) [14]. In this case, the guesser is given only one attempt to guess the value of X (and hence is freeto any arbitrary measurement on his system B ). The guessing probability is related to the co-called conditionalmin-entropy H min ( X | B ) of the c-q state ρ XB . In some sense, we can consider the guesswork to be an extension ofthe guessing probability. However, the nature of the optimization being done is different: instead of maximizing theprobability of success in one attempt, we minimize the total number of guesses required. Therefore, the operationsthat a guesser performs to minimize the guesswork may be very different from those needed to maximize the guessingprobability. Some of the connections between these two tasks have been investigated in [13]. Overview
In Section 2, we formally describe the task. In Section 3, we define classical and quantum guessingstrategies in a unified framework, and in Theorem 1 prove that three classes of quantum strategies are equivalent.In Section 4, we establish one-shot and asymptotic entropic bounds on the guesswork, using analogous boundsdeveloped by Arikan for the case of classical side information [4]. In Section 5, we revisit the idea that theguesswork may be formulated as an semidefinite optimization problem (SDP) originally discussed in [13], and usesuch a representation to prove that the guesswork in a concave function in Section 5.1 and a Lipschitz continuousfunction in Section 5.2. We discuss the dual formulation of the SDP in Section 5.3, a resulting algorithm to efficientlycompute upper bounds in Section 5.4, and we present a mixed-integer SDP representation in Section 5.5. Section 6shows a simple example of the guesswork involving the BB84 states, and Section 7 provides a robustness result forusing guesswork as a security criterion.
Alice chooses a letter x ∈ X with some probability p X ( x ), where X is a finite alphabet. This naturally defines arandom variable X ∼ p X ( x ). She then sends a quantum system B to Bob, prepared in the state ρ xB , which dependson her choice x . Bob knows the set of states { ρ xB : x ∈ X } , and the probability distribution { p X ( x ) : x ∈ X } , buthe does not know which particular state is sent to him by Alice. Bob’s task is to guess x correctly with as fewguesses as possible. From Bob’s perspective, he therefore has access to the B -part of the c-q state ρ XB = (cid:88) x p X ( x ) | x (cid:105)(cid:104) x | X ⊗ ρ xB . (2)In the purely classical case, this task reduces to the following scenario: Alice holds the random variable X ∼ p X ( x ), and Bob holds a correlated random variable Y , and knows the joint distribution of ( X, Y ). In this case ρ XB reduces to the state ρ XY = (cid:88) x p X ( x ) | x (cid:105)(cid:104) x | X ⊗ (cid:88) y p Y | X ( y | x ) | y (cid:105)(cid:104) y | Y . (3)In this case, if Bob’s random variable Y has value y , then an optimal guessing strategy would be to sort theconditional distribution p X | Y ( ·| y ) in non-increasing order so that p X | Y ( x | y ) ≥ p X | Y ( x | y ) ≥ . . . ≥ p X | Y ( x |X | | y ) (4)and simply guess first x , then x , etc., until he gets it correct [4].In the case in which Bob’s system B is quantum, he is allowed to perform any local operations he wishes on B ,and then make a first guess x . He is told by Alice whether or not his guess is correct; then he can perform localoperations on B , and make another guess, and so forth. We are interested in determining the minimal number ofguesses needed on average for a given ensemble { p X ( x ) , ρ xB } x ∈X and the associated optimal strategy.More generally, we allow Bob to make K guesses, with possibly K < |X | . Formally, we assume that Bob alwaysmakes all K guesses; any guess after the correct guess simply does not factor into the calculation of the minimal3umber of guesses (see Section 3 for a more detailed definition of the minimal number of guesses). Thus, Bob makesa sequence of guesses, g , . . . , g k ∈ X K (cid:54) = with some probability.We could consider the scenario in which Bob makes a guess x , then learns whether or not the guess was correct,and uses that information to make his second guess x , and so forth. However, if Bob learns that his j th guess x j iscorrect, then it does not matter what he guesses subsequently (it has no bearing on the minimal number of guesses).If the guess is incorrect, then his subsequent guesses do matter, and he should make his next guess accordingly.Hence, in such a protocol, the feedback about whether or not the j th guess is correct does not help, and Bob mightas well assume that each guess is incorrect. When Alice chooses x ∗ ∈ X , a guessing strategy for Bob outputs a sequence of guesses (cid:126)g = ( g , . . . , g K ) ∈ X K withsome probability p (cid:126)G | X ( (cid:126)g | x ∗ ). Hence, formally, a guessing strategy for X with K guesses is a random variable (cid:126)G on X K that is correlated with X , such that ( X, (cid:126)G ) has marginal X ∼ p X . Note that the definition of a guessingstrategy has no reference to the side information (if any) that Bob has access to; instead, the side informationdictates the set of guessing strategies Bob has access to. This allows various types of side information to beanalyzed within a uniform framework; in particular, the set of strategies available when Bob has access to someclassical side information Y is described in Section 3.1, while the case of quantum side information is described inSection 3.2.We are interested in the minimal number of guesses required to guess x ∗ correctly. This is defined as follows: N ( (cid:126)g, x ∗ ) := (cid:40) min { j : g j = x ∗ } g j = x ∗ for some j = 1 , . . . , K ∞ else , (5)where the outcome ∞ occurs when none of the K guesses are correct. We can view N as a random variable takingvalues in { , , . . . , K, ∞} . Given a guessing strategy (cid:126)G , the quantity of interest is N ( (cid:126)G, X ), the correspondingrandom variable. We define S K ( X ) := (cid:110) N ( (cid:126)G, X ) : X (cid:126)G ∼ p X (cid:126)G (cid:111) (6)to be the set of all possible random variables N associated to all guessing strategies (cid:126)G with K guesses. We say twoguessing strategies (cid:126)G and (cid:126)G (cid:48) for X with K guesses are equivalent if N ( (cid:126)G (cid:48) , X ) = N ( (cid:126)G, X ).Note that if (cid:126)G and (cid:126)G (cid:48) are two strategies with K guesses for X that differ only in guesses made after guessing thecorrect answer, then they are equivalent. This formalizes the notion introduced at the end of the previous section:since guesses made after the correct answer do not change the value of N ( (cid:126)g, x ∗ ), feedback of whether or not g j = x ∗ can only lead to equivalent strategies. Consider a pair of random variables (
X, Y ) where X has a finite alphabet X and Y has a countable alphabet Y .Alice chooses x ∗ ∈ X (with probability p X ( x ∗ )) and Bob is given y ∈ Y (with probability ( p Y | X ( y | x ∗ )). Bob’stask is to guess x ∗ . Since Bob’s sequence of guesses ( g , . . . , g K ) can only depend on x ∗ via y , a classical guessingstrategy (cid:126)G is any random variable (cid:126)G such that the ordered triple ( X, Y, (cid:126)G ) of random variables forms a Markovchain, which we denote as X − Y − (cid:126)G . Hence, given a joint probability distribution p XY , we define the set of randomvariables N associated to classical guessing strategies as follows: S Classical K ( p XY ) := (cid:110) N ( (cid:126)G, X ) : X − Y − (cid:126)G (cid:111) ⊆ S K ( X ) . (7) Let us consider three classes of quantum strategies:1. Measured strategy: Bob performs an arbitrary POVM { E y } y ∈Y on the B -system. Let Y be the randomvariable with outcomes in a finite alphabet Y corresponding to his measurement outcomes, i.e. p Y | X ( y | x ) = tr[ E y ρ xB ] , ∀ x ∈ X , y ∈ Y . (8)4ob then employs a classical guessing strategy on ( X, Y ). The set of random variables corresponding to thepossible number of guesses under such a strategy is given by S Measured K ( ρ XB ) := (cid:110) N ( (cid:126)G, X ) : X − Y − (cid:126)G, Y satisfies (8) for some finite alphabet Y & POVM { E y } y ∈Y (cid:111) . (9)We then observe that S Measured K ( ρ XB ) ⊆ S K ( X ) . (10)2. Ordered strategy: Bob performs a measurement with outcomes in X K , which are identified with guessingorders; i.e., if the outcome is ( x , . . . , x K ) ∈ X K , Bob first guesses x , then x , and so forth. In this case, Bobperforms a POVM { E (cid:126)g } (cid:126)g ∈X K and the guessing strategy (cid:126)G is distributed according to p (cid:126)G | X ( (cid:126)g | x ) = tr[ E (cid:126)g ρ xB ] . (11)As above, we define S Ordered K ( ρ XB ) := (cid:110) N ( (cid:126)G, X ) : ( (cid:126)G, X ) satisfy (11) for some POVM { E (cid:126)g } (cid:126)g ∈X K (cid:111) ⊆ S K ( X ) . (12)It is evident that S Ordered K ( ρ XB ) ⊆ S Measured K ( ρ XB ) (13)because any such ordered strategy is a special type of measured strategy (with Y = (cid:126)G ). However, anymeasured strategy can in fact be simulated by an ordered strategy. Suppose we have a measured strategywith alphabet Y , POVM { E y } y ∈Y , and (cid:126)G satisfying X − Y − (cid:126)G . Then p (cid:126)G | X ( (cid:126)g | x ) = (cid:88) y ∈Y p (cid:126)G | Y ( (cid:126)g | y ) p Y | X ( y, x ) = (cid:88) y ∈Y p (cid:126)G | Y ( (cid:126)g | y ) tr[ E y ρ xB ] , (14)where we have used the Markov property for the first equality and (8) for the second equality.Let ˜ E (cid:126)g := (cid:80) y ∈Y p (cid:126)G | Y ( (cid:126)g | y ) E y . Note { ˜ E (cid:126)g } (cid:126)g ∈X K is a POVM: each element is positive semi-definite since { E y } y ∈Y is a POVM, and (cid:88) (cid:126)g ∈X K E (cid:126)g = (cid:88) (cid:126)g ∈X K (cid:88) y ∈Y p (cid:126)G | Y ( (cid:126)g | y ) E y = (cid:88) y ∈Y (cid:88) (cid:126)g ∈X K p (cid:126)G | Y ( (cid:126)g | y ) E y = (cid:88) y ∈Y E y = B , (15)using again that { E y } y ∈Y is a POVM. Then substituting the definition of ˜ E (cid:126)g into (14) yields p (cid:126)G | X ( (cid:126)g | x ) = tr[ ˜ E (cid:126)g ρ xB ] (16)and hence (11) is satisfied with E = ˜ E . Therefore, S Ordered ( ρ XB ) = S Measured ( ρ XB ) . (17)3. Sequential quantum strategy: Suppose that Alice chooses x (which occurs with probability p X ( x )), and henceBob has the state ρ xB . To make his first guess, Bob chooses a set of generalized measurement operators { M (1) x } x ∈X and reports the measurement outcome as his guess. He gets outcome x with probability p G | X ( x | x ) = tr[ M (1) x ρ xB M (1) x † ] (18)and his post-measurement state is 1 p G | X ( x | x ) M (1) x ρ xB M (1) x † . (19) Note: in general, Bob could perform a unitary operation U on his state before measuring it. However, thiswould simply correspond to measuring with { M (1) x U } x ∈X instead. Hence, it suffices to simply consider ageneralized measurement { M (1) x } x ∈X .Then, after learning the outcome x , Bob chooses a new set of generalized measurement operators { M (2 | x ) x } x ∈X .Note that this set of measurement operators can depend on x . Without loss of generality, we can keep the5ame outcome set X , since Bob could set, e.g. M (2 | x ) x = 0 to avoid guessing the same number twice. Bobmeasures his state and gets the outcome x with probability p G | G X ( x | x , x ) = 1 p G | X ( x | x ) tr[ M (2 | x ) x M (1) x ρ xB M (1) x † M (2 | x ) x † ] . (20)Multiplying by p G | X ( x | x ) we see the joint distribution is given by p G G | X ( x , x | x ) = tr[ M (2 | x ) x M (1) x ρ xB M (1) x † M (2 | x ) x † ] . (21)To make his j th guess, we allow Bob to choose a new set of generalized measurement operators { M ( j | x ,...,x j − ) x } x ∈X which may depend on the previous j − p G G ··· G K | X ( x , x , . . . , x K | x ) =tr[ M ( K | x ,x ,...,x K − ) x K · · · M (2 | x ) x M (1) x ρ xB M (1) x † M (2 | x ) x † · · · M ( K | x ,x ,...,x K − ) x K † ] . (22)Under such a strategy, the possible random variables giving the number of guesses is given by S Sequential ( ρ XB ) := (cid:110) N ( (cid:126)G, X ) : ( (cid:126)G, X ) satisfy (22) for some collections of measurement operators { M ( j | x ,x ,...,x j − ) x j } x j ∈X ,j = 1 , . . . , K, x , x , . . . , x K ∈ X (cid:111) . (23) Theorem 1.
Let ρ XB be a c-q state as defined in (2) and K a natural number with K ≤ |X | . Then S Sequential K ( ρ XB ) = S Ordered K ( ρ XB ) = S Measured K ( ρ XB ) . (24)Hence, all three sets of random variables of the number of guesses obtained from various classes of strategies allcoincide. Hence, we call the single class that of quantum strategies , denoted S Quantum K ( ρ XB ). Proof.
The second equality was already stated in (17) and proven before that, and so it remains to prove the firstequality. Consider a sequential strategy, with the notation of point 3 above. Define E x ,...,x K := M (1) x † M (2 | x ) x † · · · M ( K | x ,x ,...,x K − ) x K † M ( K | x ,x ,...,x K − ) x K · · · M (2 | x ) x M (1) x . (25)We see that E x ,...,x K = A † A for A = M ( K | x ,x ,...,x K − ) x K · · · M (2 | x ) x M (1) x , and hence is positive semi-definite. More-over, (cid:88) x ,...,x K ∈X E x ,...,x K = I B (26)as can be seen by first summing (25) over x K , using (cid:88) x K ∈X M ( K | x ,x ,...,x K − ) x K † M ( K | x ,x ,...,x K − ) x K = I B (27)since { M ( K | x ,x ,...,x K − ) x } x ∈X is a POVM, and then similarly summing over x K − , x K − ,. . . , and finally x . Letus write E (cid:126)x where (cid:126)x = ( x , . . . , x K ) for E x ,...,x K . We have shown that { E (cid:126)x } (cid:126)x ∈X K is a POVM. Moreover, p G G ··· G K | X ( x , x , . . . , x K | x ) = tr[ E x ,...,x K ρ xB ] . (28)Hence, Bob’s strategy is equivalent to simply performing the single POVM { E (cid:126)x } (cid:126)x ∈X K once, obtaining an outcome (cid:126)x = ( x , . . . , x K ), and then making x his first guess, x his second guess, and so forth. That is, any such strategycan be recast as an ordered strategy.On the other hand, any such ordered strategy can be reformulated as an adaptive strategy, by the followingrecursive approach. Suppose that we are given { E (cid:126)y } (cid:126)y ∈X K . For each x ∈ X , define M (1) x = (cid:115) (cid:88) x ,...,x K ∈X E x ,...,x K (29)6here we have chosen the positive semi-definite square root. We have that (cid:88) x ∈X M (1) x † M (1) x = (cid:88) x ∈X ( M (1) x ) = (cid:88) x ∈X (cid:88) x ,...,x K ∈X E x ,...,x K = I B , (30)so (cid:110) M (1) x (cid:111) x ∈X is indeed a POVM with outcomes in X . Next, for each x ∈ X , corresponding to obtaining outcome x on the first measurement, we define a POVM { M (2 | x ) x } x ∈X by M (2 | x ) x = (cid:115) ( M (1) x ) − (cid:88) x ,...,x K ∈X E x ,...,x K ( M (1) x ) − . (31)Then (cid:88) x ∈X ( M (2 | x ) x ) = ( M (1) x ) − (cid:88) x ∈X (cid:88) x ,...,x K ∈X E x ,...,x K ( M (1) x ) − (32)= ( M (1) x ) − ( M (1) x ) ( M (1) x ) − = I B . (33)Likewise, we define M (3 | x ,x ) x = (cid:115) ( M (2 | x ) x ) − ( M (1) x ) − (cid:88) x ,...,x K ∈X E x ,...,x K ( M (1) x ) − ( M (2 | x ) x ) − . (34)Then (cid:88) x ∈X ( M (3 | x ,x ) x ) = ( M (2 | x ) x ) − ( M (1) x ) − (cid:88) x ∈X (cid:88) x ,...,x K ∈X E x ,...,x K ( M (1) x ) − ( M (2 | x ) x ) − (35)= ( M (2 | x ) x ) − ( M (2 | x ) x ) ( M (2 | x ) x ) − = I B . (36)Repeating this process, we define M ( (cid:96) | x ,x ,...,x (cid:96) − ) x (cid:96) = (cid:115) ( M ( (cid:96) − | x ,x ,...,x (cid:96) − x (cid:96) − ) − · · · ( M (1) x ) − (cid:88) x (cid:96) +1 ,...,x K ∈X E x ,...,x K ( M (1) x ) − · · · M ( (cid:96) − | x ,x ,...,x (cid:96) − x (cid:96) − ) − (37)to obtain a POVM for step (cid:96) (to use when having obtained outcomes x , . . . , x (cid:96) − during the previous steps). Atthe last step, (cid:96) = K , there is no sum, namely M ( K | x ,x ,...,x K − ) x K = (cid:113) ( M ( K − | x ,x ,...,x K − x K − ) − · · · ( M (1) x ) − E x ,...,x K ( M (1) x ) − · · · M ( K − | x ,x ,...,x K − x K − ) − . (38)Lastly, we check that by design, (25) holds. Thus, we can work backwards from that equation and see that our newlycreated adaptive strategy yields the same outcomes with the same probabilities as the initial ordered strategy. Given a random variable X and a maximal number K of allowed guesses, how do we measure the success of aguessing strategy (cid:126)G ? We will focus on expectations of N ( (cid:126)G, X ). In particular, we consider the expected number ofguesses required to guess correctly: E [ N ( (cid:126)G, X )] = (cid:40)(cid:80) Kk =1 k · p N ( (cid:126)G,X ) ( k ) if p N ( (cid:126)G,X ) ( ∞ ) = 0 ∞ if p N ( (cid:126)G,X ) ( ∞ ) > . (39)We can also consider bounded approximations of this quantity, such as K (cid:88) k =1 k p N ( (cid:126)G,X ) ( k ) + c ∞ p N ( (cid:126)G,X ) ( ∞ ) (40)7here c ∞ ∈ R + represents the “cost” of all K guesses being incorrect, and R + denotes the strictly positive realnumbers. This is a special case of E (cid:126)c ( N ( (cid:126)G, X )) := K (cid:88) k =1 c k p N ( (cid:126)G,X ) ( k ) + c ∞ p N ( (cid:126)G,X ) ( ∞ ) (41)where (cid:126)c = ( c , . . . , c K , c ∞ ) ∈ R K +1+ is a cost vector , satisfying0 < c ≤ c ≤ · · · ≤ c K ≤ c ∞ . (42)We may unify the definitions (39) and (41) by allowing c ∞ = ∞ , using the convention that ∞ · ρ XB , we define the minimal expected number of guesses with respect to a cost vector (cid:126)c as E ∗ (cid:126)c ( ρ XB , K ) := inf N ∈S Quantum K ( ρ XB ) E (cid:126)c ( N ) . (43)Likewise, given a joint distribution p XY , let E ∗ (cid:126)c ( p XY , K ) := inf N ∈S Classical K ( p XY ) E (cid:126)c ( N ) . (44)From the equality S Quantum K ( ρ XB , K ) = S Measured K ( ρ XB ) (45)of Theorem 1 it follows that E ∗ (cid:126)c ( ρ XB , K ) = inf { E y } y ∈Y E ∗ (cid:126)c ( p XY , K ) (46)where the infimum is over all finite alphabets Y and POVMs { E y } y ∈Y and p XY ( x, y ) = p X ( x ) tr[ E y ρ xB ].In the common case in which K = |X | and (cid:126)c = (1 , , . . . , |X | , ∞ ), we define the guesswork as G ( X | B ) ≡ G ( X | B ) ρ := E ∗ (cid:126)c ( ρ XB , K ) (47)and likewise define G ( X | Y ) p = E ∗ (cid:126)c ( p XY , K ) in the case of classical side information Y . Remark . In the work [13], guesswork with quantum side information was defined by the right-hand side of (46)(with K = |X | and (cid:126)c = (1 , , . . . , |X | , ∞ )). Moreover, Proposition 1 of that work shows that the infimum in (46) inthat case may be restricted to POVMs whose elements are all rank one. In this section, we use the results of Section 3 to obtain one-shot and asymptotic entropic bounds on E ∗ in termsof measured versions of bounds known in the classical case. In the case in which K = |X | , Arikan [4] showed that11 + ln |X | exp( H ↑ ( X | Y ) p ) ≤ G ( X | Y ) p ≤ exp( H ↑ ( X | Y ) p ) (48)where H ↑ α ( X | Y ) p for α ∈ (0 , ∪ (1 , ∞ ) denotes the following α -conditional entropy of a joint distribution p XY given by H ↑ α ( X | Y ) = α − α ln (cid:88) y ∈Y (cid:32) (cid:88) x ∈X p XY ( x, y ) α (cid:33) /α = sup q Y [ − D α ( p XY (cid:107) X ⊗ q Y )] (49)where the supremum is over probability distributions q Y on Y , and D α is the α -R´enyi relative entropy, D α ( p X (cid:107) q X ) = 1 α − (cid:32)(cid:88) x p X ( x ) α q X ( x ) − α (cid:33) . (50)8he second equality of (49) follows from of [15, Theorem 4].Arikan’s bound (48) applies to each E ∗ (cid:126)c ( p XY , K ) in (46), and hence by minimizing over the POVMs { E y } y ∈Y ,we obtain 11 + ln |X | exp( H ↑ ,M ( X | B ) ρ ) ≤ G ( X | B ) ρ ≤ exp( H ↑ ,M ( X | B ) ρ ) , (51)where for α ∈ (0 , ∪ (1 , ∞ ), H ↑ ,Mα ( X | B ) ρ is the B -measured conditional α -R´enyi entropy, defined by H ↑ ,Mα ( X | B ) ρ := inf { E y } y ∈Y H ↑ α ( X | Y ) p , (52)where p XY ( x, y ) = p X ( x ) tr[ E y ρ xB ] is the joint probability distribution obtained by measuring the B part of ρ XB via { E y } y ∈Y . Remark . We may expand this quantity as H ↑ ,Mα ( X | B ) ρ = inf { E y } y ∈Y sup q Y [ − D α ( p XY (cid:107) X ⊗ q Y )] , (53)where p XY is induced by the measurement of ρ XY . This quantity seems to be different from the conditional entropyinduced by the measured R´enyi divergence , namely H ↑ D Mα ( X | B ) ρ := sup σ B − D Mα ( ρ XB (cid:107) X ⊗ σ B ) , (54)where the supremum is over states on the B system, and for any pair of states ( ρ, σ ), D Mα ( ρ (cid:107) σ ) := sup { E z } z D α ( { tr[ E z ρ ] } z (cid:107){ tr[ E z σ ] } z ) (55)is the measured α -R´enyi divergence . Indeed, the latter quantity may be expanded to obtain H ↑ D Mα ( X | B ) ρ = sup σ B inf { E z } z [ − D α ( { tr[ E z ρ XB ] } z (cid:107){ tr[ E z X ⊗ σ B ] } z )] . (56)From the min-max inequality, and the fact that collective measurements on XB can simulate measurements on B alone, we have H ↑ D Mα ( X | B ) ρ ≤ H ↑ ,Mα ( X | B ) ρ . (57) We can consider the asymptotic setting in which a Bob receives a sequence of product states ρ (cid:126)xB := ρ x B ⊗ · · · ⊗ ρ x n B ,with probability p X ( x ) · · · p X ( x n ) and aims to guess the full sequence (cid:126)x = ( x , . . . , x n ). In this case, the problemis characterized by the c-q state ρ ⊗ nXB . The 1-shot bounds (51) give us − n ln (1 + n ln( |X | )) + 1 n H ↑ ,M ( X n | B n ) ρ ⊗ n ≤ n ln G ( X n | B n ) ρ ⊗ n ≤ n H ↑ ,M ( X n | B n ) ρ ⊗ n (58)where H ↑ ,M ( X n | B n ) ρ ⊗ n can involve collective measurements on the system B n . Taking n → ∞ , we obtainlim n →∞ n ln G ( X n | B n ) ρ ⊗ n = lim n →∞ n H ↑ ,M ( X n | B n ) ρ ⊗ n , (59)assuming that the limit on the right-hand side exists.Note that we can bound 1 n H ↑ ,M ( X n | B n ) ρ ⊗ n ≤ n inf { E y } y ∈Y H ↑ ( X n | Y n ) p ⊗ n (60)= inf { E y } y ∈Y H ↑ ( X | Y ) p (61)= H ↑ ,M ( X | B ) ρ (62)9here the first inequality follows from the fact that product measurements are a special case of collective measure-ments, and the first equality follows from the additivity of the classical R´enyi entropy (Proposition 1 of [4]), andthe third by the definition of H ↑ ,M ( X | B ) ρ . Moreover, by the data-processing inequality [16],1 n H ↑ ,M ( X n | B n ) ρ ⊗ n ≥ n (cid:101) H ↑ ( X n | B n ) ρ ⊗ n = (cid:101) H ↑ ( X | B ) ρ , (63)where the conditional R´enyi entropy (cid:101) H ↑ α ( C | D ) σ of a bipartite state σ CD is defined as (cid:101) H ↑ α ( C | D ) σ = sup ω D (cid:104) − (cid:101) D α ( σ CD (cid:107) C ⊗ ω D ) (cid:105) , (64)with the optimization with respect to states ω D and the sandwiched R´enyi relative entropy defined as [17, 18]: (cid:101) D α ( X (cid:107) Y ) = 1 α − Y (1 − α ) /α XY (1 − α ) /α ) α ] . (65)The equality in (63) follows from the additivity of (cid:101) H ↑ under tensor products (see, e.g., Corollary 5.2 of [19]). Hence,we obtain (cid:101) H ↑ ( X | B ) ρ ≤ lim n →∞ n ln G ( X n | B n ) ρ ⊗ n ≤ H ↑ ,M ( X | B ) ρ . (66)In the classical case, (3), both the left and right-hand sides reduce to H ↑ ( X | Y ) p (67)where p is the underlying classical distribution of (3). Hence, these bounds recover Proposition 5 of [4]. The task of calculating E ∗ (cid:126)c ( ρ XB , K ) as defined in (43) can be written as a semi-definite optimization problem, aswas found in [13]. In this section, we present a different derivation of that fact yielding in (75) a representation dualto the one found in [13]. In Section 5.1 we use this representation to prove the guesswork G ( X | B ) ρ is a concavefunction of the c-q state ρ XB . In Section 5.2 we likewise use this representation to obtain a Lipschitz continuitybound on the guesswork. Then in Section 5.3 we compute the dual SDP, recovering the one obtained in [13]. InSection 5.4 we use this dual representation to develop a simple algorithm to obtain upper bounds on the quantity.Lastly, in Section 5.5, we formulate a mixed-integer SDP representation of the problem, whose number of variablesand constraints scales polynomially with all the relevant quantities (at the cost of adding binary constraints). Wealso provide implementations of these SDP representations [20], using the Julia programming language [21] and theoptimization library Convex.jl [22].Consider an ordered strategy (cid:126)G with a set of POVMs { E (cid:126)g } (cid:126)g ∈X K . Then since p (cid:126)G,X ( (cid:126)g, x ) = p X ( x ) tr[ E (cid:126)g ρ xB ], wehave c k p N ( (cid:126)G,X ) ( k ) = c k (cid:88) x ∈X p X ( x ) (cid:88) (cid:126)g ∈X K N ( (cid:126)g,x )= k tr[ E (cid:126)g ρ xB ] (68)and hence E c ( N ( (cid:126)G, X )) = K (cid:88) k =1 c k (cid:88) x ∈X p X ( x ) (cid:88) (cid:126)g ∈X K N ( (cid:126)g,x )= k tr[ E (cid:126)g ρ xB ] + c ∞ (cid:88) x ∈X p X ( x ) (cid:88) (cid:126)g ∈X K N ( (cid:126)g,x )= ∞ tr[ E (cid:126)g ρ xB ] (69)= (cid:88) (cid:126)g ∈X K (cid:88) x ∈X c N ( (cid:126)g,x ) p X ( x ) tr[ E (cid:126)g ρ xB ] (70)= (cid:88) (cid:126)g ∈X K tr[ R (cid:126)g E (cid:126)g ] (71)10here we define R (cid:126)g := (cid:80) x ∈X p X ( x ) c N ( (cid:126)g,x ) ρ xB for (cid:126)g ∈ X K . Thus, E ∗ (cid:126)c ( ρ XB , K ) = minimize (cid:88) (cid:126)g ∈X K tr[ R (cid:126)g E (cid:126)g ]subject to E (cid:126)g ≥ ∀ (cid:126)g ∈ X K (cid:88) (cid:126)g ∈X K E (cid:126)g = B . (72)The expression in (72) clarifies that R (cid:126)g has an interpretation as a cost operator corresponding to the guessingoutcome (cid:126)g . Since (cid:80) (cid:126)g ∈X K tr[ R (cid:126)g E (cid:126)g ] is linear in each positive semi-definite (matrix) variable E (cid:126)g , (75) gives an SDPrepresentation of E ∗ (cid:126)c ( ρ XB , K ). This program has |X | K variables (each d B × d B complex positive semi-definitematrices), subject to one constraint. Note, however, since the cost vector (cid:126)c is increasing, any guess (cid:126)h ∈ X K withrepeated elements is a suboptimal guessing order, in the sense that if { E (cid:126)g } (cid:126)g ∈X K is a POVM with E (cid:126)h (cid:54) = 0, and (cid:126)h (cid:48) ∈ X K only differs from (cid:126)h by replacing repeated elements such that (cid:126)h (cid:48) has no repeated elements, then the POVMdefined by ˜ E (cid:126)g := E (cid:126)g (cid:126)g (cid:54) = (cid:126)h and (cid:126)g (cid:54) = (cid:126)h (cid:48) (cid:126)g = (cid:126)hE (cid:126)h + E (cid:126)h (cid:48) (cid:126)g = (cid:126)h (cid:48) (73)has (cid:80) (cid:126)g ∈X K tr[ R (cid:126)g ˜ E (cid:126)g ] ≤ (cid:80) (cid:126)g ∈X K tr[ R (cid:126)g E (cid:126)g ]. Hence, we may restrict to the outcome space X K (cid:54) = := { (cid:126)g ∈ X K : g i (cid:54) = g j , ∀ i (cid:54) = j } ⊆ X K . (74)Note |X K (cid:54) = | = |X | !( |X |− K )! , and in the case in which K = |X | , the set X K is just the set of permutations of X . Hence,(72) can be re-written as the following smaller problem: E ∗ (cid:126)c ( ρ XB , K ) = minimize (cid:88) (cid:126)g ∈X K (cid:54) = tr[ R (cid:126)g E (cid:126)g ]subject to E (cid:126)g ≥ ∀ (cid:126)g ∈ X K (cid:54) = (cid:88) (cid:126)g ∈X K (cid:54) = E (cid:126)g = B . (75)Note that in the case c ∞ = ∞ and K < |X | , there exists a finite solution if and only if there exists a POVM { E (cid:126)g } (cid:126)g ∈X K (cid:54) = such that for all x ∈ X and (cid:126)g ∈ X K (cid:54) = with x (cid:54)∈ (cid:126)g , we have tr[ E (cid:126)g ρ xB ] = 0. Whether or not this holdsdepends on the particular state ρ XB . However, when c ∞ < ∞ or K = |X | , for any state ρ XB , the problem (75) hasa finite solution and moreover, for any POVM { E (cid:126)g } (cid:126)g ∈X K (cid:54) = , the objective (cid:80) (cid:126)g ∈X K (cid:54) = tr[ R (cid:126)g E (cid:126)g ] is finite. In the following,we restrict to those two cases. Remark . This optimization problem has the same form as that of discriminating quantum states in an ensemble,as described in, e.g., [23, Section 3.2.1]. Note, however, that (1) the R (cid:126)g are positive semi-definite but not normalized,and (2) the case of having two copies of the unknown state, in the guessing framework, does not correspond to R ⊗ (cid:126)g . Nevertheless, slight modifications to [23, Theorem 3.9] show that a POVM { E (cid:126)g } (cid:126)g ∈X K (cid:54) = is optimal for (75) ifand only if Y = (cid:88) (cid:126)g ∈X K (cid:54) = R (cid:126)g E (cid:126)g (76)satisfies Y ≤ R (cid:126)g for all (cid:126)g ∈ X K (cid:54) = . Remark . The set of POVMs is convex and since the objective function is linear, any minimizer for (75) may bedecomposed into extremal POVMs which are also minimizers. By [24, Corollary 2.2], for any extremal POVM ona Hilbert space of size d B has at most d B non-zero elements. Hence, there exist minimizers of (75) with at most d B non-zero elements (even though |X K (cid:54) = | could be far larger than d B ). Let S ⊆ X K (cid:54) = be a set of d B points such thatthere exists { ˜ E (cid:126)g } (cid:126)g ∈ S with ˜ E (cid:126)g ≥ (cid:80) (cid:126)g ∈ S ˜ E (cid:126)g = B , and E ∗ (cid:126)c ( ρ XB , K ) = (cid:88) (cid:126)g ∈ S tr[ ˜ E (cid:126)g R (cid:126)g ] . (77)11hen (75) holds with X K (cid:54) = replaced by S , namely E ∗ (cid:126)c ( ρ XB , K ) = minimize (cid:88) (cid:126)g ∈ S tr[ R (cid:126)g E (cid:126)g ]subject to E (cid:126)g ≥ ∀ (cid:126)g ∈ S (cid:88) (cid:126)g ∈ S E (cid:126)g = B . (78)Note the “ ≤ ” direction of the equality (78) is trivial, since given a minimizer { E (cid:126)g } (cid:126)g ∈ S for (78), simply extendingit by choosing E (cid:126)g = 0 for (cid:126)g (cid:54)∈ S gives a feasible point for the optimization problem on the right-hand side of (75).The “ ≥ ” direction follows from the existence of the { ˜ E (cid:126)g } (cid:126)g ∈ S described above. Proposition 6.
For each cost vector (cid:126)c and K ≤ |X | , the function ρ XB (cid:55)→ E ∗ (cid:126)c ( ρ XB , K ) (79) from the set of c-q states of the form (2) to R ≥ ∪ {∞} , is concave.Proof. For (cid:126)g ∈ X K (cid:54) = , and ρ XB a c-q state, the quantity R ρ(cid:126)g := (cid:80) x ∈X p X ( x ) c N ( (cid:126)g,x ) ρ xB can be expressed as R ρ(cid:126)g = tr X (cid:34)(cid:32) (cid:88) x ∈X c N ( (cid:126)g,x ) | x (cid:105)(cid:104) x | X ⊗ I B (cid:33) ρ XB (cid:35) (80)and hence is linear in ρ XB . Then for each POVM ( E (cid:126)g ) (cid:126)g ∈X K (cid:54) = ), ρ XB (cid:55)→ (cid:88) (cid:126)g ∈X K (cid:54) = tr[ R ρ(cid:126)g E (cid:126)g ] (81)is linear in ρ XB . The arbitrary infimum of concave functions, and in particular linear functions, is concave, andhence E ∗ (cid:126)c ( ρ XB , K ) ≡ min ( E (cid:126)g ) (cid:126)g ∈X K (cid:54) = (cid:88) (cid:126)g ∈X K (cid:54) = tr[ R ρ(cid:126)g E (cid:126)g ] , (82)where the minimum is taken over all POVMs on system B with outcomes in X K (cid:54) = , is concave. Remark . Proposition 6 carries over to guesswork without side information, G ( X ), which simply corresponds to thecase that ρ xB ≡ ρ B is independent of x ∈ X . Since G ( X ) is manifestly symmetric under permutations of the density p X , this proves that G ( X ) is a Schur concave function of the distribution p X (i.e., decreasing in the majorizationpre-order; see, e.g., [25] for an overview of majorization and Schur concave functions). Consequently, the work [26]provides an algorithm to calculate local continuity bounds for G ( X ). Proposition 8.
For each cost vector (cid:126)c and K ≤ |X | , such that either c ∞ < ∞ or K = |X | , the function ρ XB (cid:55)→ E ∗ (cid:126)c ( ρ XB , K ) (83) from the set of c-q states of the form (2) to R ≥ , is Lipschitz continuous, satisfying the bound | E ∗ (cid:126)c ( ρ XB , K ) − E ∗ (cid:126)c ( σ XB , K ) | ≤ κ (cid:107) ρ XB − σ XB (cid:107) . (84) for any c-q states ρ XB and σ XB , where κ = c ∞ if K < |X | , and κ = c |X | if K = |X | .Proof. Define f ( ρ XB , { E (cid:126)g } (cid:126)g ∈X K (cid:54) = ) := (cid:88) (cid:126)g ∈X K (cid:54) = tr[ R ρ(cid:126)g E (cid:126)g ] . (85)12hen, by linearity (as discussed in the proof of Proposition 6), f ( ρ XB , { E (cid:126)g } (cid:126)g ∈X K (cid:54) = ) − f ( σ XB , { E (cid:126)g } (cid:126)g ∈X K (cid:54) = ) = (cid:88) (cid:126)g ∈X K (cid:54) = tr[ R ρ − σ(cid:126)g E (cid:126)g ] (86)= (cid:88) (cid:126)g ∈X K (cid:54) = tr[tr X [ C ( (cid:126)g ) XB ∆ XB ] E (cid:126)g ] (87)using (80), where C ( (cid:126)g ) XB := (cid:80) x ∈X c N ( (cid:126)g,x ) | x (cid:105)(cid:104) x | ⊗ I B ≥ XB := ρ XB − σ XB . Since C ( (cid:126)g ) XB and ∆ XB commute,using the c-q structure of each, we have f ( ρ XB , { E (cid:126)g } (cid:126)g ∈X K (cid:54) = ) − f ( σ XB , { E (cid:126)g } (cid:126)g ∈X K (cid:54) = ) = (cid:88) (cid:126)g ∈X K (cid:54) = tr[ C ( (cid:126)g ) XB ∆ XB ( I X ⊗ E (cid:126)g )] (88)= tr ∆ XB (cid:88) (cid:126)g ∈X K (cid:54) = C ( (cid:126)g ) XB ( I X ⊗ E (cid:126)g ) . (89)Set F XB := (cid:88) (cid:126)g ∈X K (cid:54) = C ( (cid:126)g ) XB ( I X ⊗ E (cid:126)g ) = (cid:88) x ∈X (cid:88) (cid:126)g ∈X K (cid:54) = c N ( (cid:126)g,x ) | x (cid:105)(cid:104) x | ⊗ E (cid:126)g . (90)Since c N ( (cid:126)g,x ) ≤ κ for each x ∈ X and (cid:126)g ∈ X K (cid:54) = , we have that F XB ≤ κ (cid:80) x ∈X (cid:80) (cid:126)g ∈X K (cid:54) = | x (cid:105)(cid:104) x | ⊗ E (cid:126)g in semi-definiteorder. Performing the sums, we have F XB ≤ κ I X ⊗ I B and hence (cid:107) F XB (cid:107) ∞ ≤ κ . Thus, f ( ρ XB , { E (cid:126)g } (cid:126)g ∈X K (cid:54) = ) − f ( σ XB , { E (cid:126)g } (cid:126)g ∈X K (cid:54) = ) = tr [∆ XB F XB ] (91) ≤ (cid:107) ∆ XB F XB (cid:107) (92) ≤ (cid:107) ∆ XB (cid:107) (cid:107) F XB (cid:107) ∞ (93) ≤ κ (cid:107) ρ XB − σ XB (cid:107) (94)using H¨older’s inequality in the second to last inequality. Swapping ρ XB and σ XB completes the proof. Next, we compute the dual problem to (75), in the case K = |X | or c ∞ < ∞ . Consider the Lagrangian L (( E (cid:126)g ) (cid:126)g ∈X K (cid:54) = , ( λ (cid:126)g ) (cid:126)g ∈X K (cid:54) = , ν ) = (cid:88) (cid:126)g ∈X K (cid:54) = (cid:104) R (cid:126)g , E (cid:126)g (cid:105) − (cid:88) (cid:126)g ∈X K (cid:54) = (cid:104) λ (cid:126)g , E (cid:126)g (cid:105) + (cid:42) ν, (cid:88) (cid:126)g ∈X K (cid:54) = E (cid:126)g − B (cid:43) (95)= (cid:88) (cid:126)g ∈X K (cid:54) = (cid:104) R (cid:126)g − λ (cid:126)g + ν, E (cid:126)g (cid:105) − tr[ ν ] (96)where we have introduced the Hilbert–Schmidt product (cid:104) A, B (cid:105) = tr[ A † B ], and where λ (cid:126)g ≥ E (cid:126)g ≥
0, and ν = ν † is the dual variable to the equality constraint (cid:80) (cid:126)g ∈X K (cid:54) = E (cid:126)g = B . Asshown in, e.g., [27], the primal problem (75) can be expressed asmin ( E (cid:126)g ) (cid:126)g ∈X K (cid:54) = max λ (cid:126)g ≥ ,ν L (( E (cid:126)g ) (cid:126)g ∈X K (cid:54) = , ( λ (cid:126)g ) (cid:126)g ∈X K (cid:54) = , ν ) (97)while the dual problem is given by max λ (cid:126)g ≥ ,ν min ( E (cid:126)g ) (cid:126)g ∈X K (cid:54) = L (( E (cid:126)g ) (cid:126)g ∈X K (cid:54) = , ( λ (cid:126)g ) (cid:126)g ∈X K (cid:54) = , ν ) . (98)If R (cid:126)g − λ (cid:126)g + ν (cid:54) = 0 for any (cid:126)g ∈ X K (cid:54) = , then the inner minimization in (98) yields −∞ . Hence,min ( E (cid:126)g ) (cid:126)g ∈X K (cid:54) = L (( E (cid:126)g ) (cid:126)g ∈X K (cid:54) = , ( λ (cid:126)g ) (cid:126)g ∈X K (cid:54) = , ν ) = (cid:40) −∞ R (cid:126)g − λ (cid:126)g + ν (cid:54) = 0 ∃ (cid:126)g ∈ X K (cid:54) = − tr[ ν ] else. (99)13he constraint λ (cid:126)g ≥ R (cid:126)g − λ (cid:126)g + ν = 0 imply the semi-definite inequality − ν ≤ R (cid:126)g . Writing Y = − ν andmaximizing over λ (cid:126)g ≥
0, (98) becomes maximize tr[ Y ]subject to Y = Y † Y ≤ R (cid:126)g ∀ (cid:126)g ∈ X K (cid:54) = (100)Since (75) is strictly feasible (e.g., E (cid:126)g = B |X K (cid:54) = | is a strictly feasible point) by Slater’s condition, strong dualityholds. Hence, (100) obtains the same optimal value as (75). The formulation of the problem as given in (100) waspreviously found in the work [13, Proposition 3]. The dual form of the SDP can be used to generate upper bounds on E ∗ (cid:126)c ( ρ XB , K ) simply by removing constraints.This provides an algorithm to find an upper bound on the objective function: Decide on some number of constraints κ to impose in total. Then,1. Initialize an empty list L = {} corresponding to constraints to impose.2. Set Y to be the identity matrix, as a first guess at the optimal dual variable.3. If Y satisfies Y ≤ R (cid:126)g for all (cid:126)g ∈ X K (cid:54) = , then Y is the maximizer of the dual problem (100), and the optimizationis solved. Otherwise, find (cid:126)g ∈ X K (cid:54) = such that Y (cid:54)≤ R (cid:126)g , and add (cid:126)g to the list L .4. Solve the problem maximize tr[ Y ]subject to Y = Y † Y ≤ R (cid:126)g ∀ (cid:126)g ∈ L (101)and set Y to be its maximizer.5. Repeat steps 2 and 3 until the list L has length κ .6. Solve the problem one last time, and return the output.In order to find a constraint that Y violates, a heuristic technique such as simulated annealing can be used. Moreover,in the case that there are too many constraints to fit into memory or check exhaustively, using an iterative technique(such as simulated annealing) is essential. If this algorithm was continued (without imposing a limit on the totalnumber of constraints κ to impose), it would eventually yield the true value E ∗ (cid:126)c ( ρ XB , K ). When a total number ofconstraints is limited, it provides an upper bound (since it is a relaxation of (100)).However, even with a limit κ on the total number of constraints, this algorithm can in theory yield the truevalue E ∗ (cid:126)c ( ρ XB , K ). Note that the dual problem to (78) ismaximize tr[ Y ]subject to Y = Y † Y ≤ R (cid:126)g ∀ (cid:126)g ∈ S (102)where S ⊆ X K (cid:54) = has | S | = d B and is described in the remark above. Hence, if L in (101) equals S , then the algorithmfinds the true value E ∗ (cid:126)c ( ρ XB , K ), not just an upper bound. Thus, κ = d B suffices if the constraints (cid:126)g can be chosenprecisely to obtain L = S . In general, finding S is as difficult as solving the original problem. Nonetheless, thismotivates why choosing a relatively small value of κ (such as d B ) can still yield good upper bounds. The problem can be formulated another way as a mixed-integer SDP, i.e. an SDP that has additional integer orbinary constraints. Consider a POVM { F j } Mj =1 with M outcomes. When outcome j is obtained, Bob guesses in14ome order (cid:126)g ( j ) ∈ X K (cid:54) = . Then consider the problemminimize (cid:88) x ∈X ,j =1 ,...,M p X ( x ) c N ( (cid:126)g ( j ) ,x ) tr[ F j ρ xB ]subject to F j ≥ j = 1 , . . . , M,(cid:126)g ( j ) ∈ X K (cid:54) = , j = 1 , . . . , M, M (cid:88) j =1 F j = B . (103)This optimization is not an SDP, since the dependence on the optimization variables { (cid:126)g ( j ) } Mj =1 and { F j } is notlinear, and (cid:126)g ( j ) ∈ X K (cid:54) = is a discrete constraint. Consider, however, the case that K = |X | . With this assumption,we will be able to remove the nonlinearity although not the discrete variables. This yields a mixed-integer SDP: anoptimization problem such that if all integer constraints were removed, the result would be an SDP. We proceed asfollows.Under the condition K = |X | , we may restrict to considering guessing orders that are permutations without lossof generality; other guessing orders have repeated guesses, which can only increase the objective function. In thiscase, the outcome ∞ never occurs, and for each (cid:126)g ∈ S |X | , the quantity ( c N ( (cid:126)g,x ) ) x ∈X satisfies( c N ( (cid:126)g,x ) ) x ∈X = (cid:126)g − ( c ) , (104)where (cid:126)g − is the inverse permutation to (cid:126)g , and c = ( c k ) Kk =1 is the cost vector (without ∞ ). Here, S n is the setof permutations on { , . . . , n } . Let P ( j ) be an |X | × |X | matrix representation of the permutation (cid:126)g ( j ) − . Then( P ( j ) c ) x = (cid:80) y ∈X P ( j ) xy c y = c N ( (cid:126)g,x ) . Hence, the optimization (103) can be reformulated asminimize (cid:88) x,y ∈X ,j =1 ,...,M p X ( x ) P ( j ) xy c y tr[ F j ρ xB ]subject to F j ∈ M d B ∀ j ∈ [ M ] ,P ( j ) xy ∈ { , } , ∀ j ∈ [ M ] , x, y ∈ X F j ≥ ∀ j ∈ [ M ] , M (cid:88) j =1 F j = B , (cid:88) x ∈X P ( j ) xy = 1 , ∀ j ∈ [ M ] , y ∈ X (cid:88) y ∈X P ( j ) xy = 1 , ∀ j ∈ [ M ] , x ∈ X (105)Note that all the constraints are semi-definite or linear, except that each element P ( j ) xy is a binary variable: P ( j ) xy ∈{ , } , which is a particularly simple type of discrete constraint. The non-linearity in the objective function, however,persists. To remove this, we take advantage of the fact that the P ( j ) xy are binary. In particular, [28, Equations (22)–(24)] provide a clever trick to turn objective functions with terms of the form zx where z is a binary variable and x a continuous variable into objective functions of a continuous variable y subject to four affine constraints (in termsof x and z ), as long as x is bounded by known constants. We reproduce this argument in the following.We first write the objective function entirely in terms of scalar quantities: (cid:88) x,y ∈X ,j ∈ [ M ] p X ( x ) P ( j ) xy c y tr[ F j ρ xB ] = (cid:88) k,(cid:96) ∈ [ d B ] (cid:88) x,y ∈X ,j ∈ [ M ] p X ( x )( ρ xB ) k(cid:96) c y P ( j ) xy ( F j ) (cid:96)k (106)Let x = ( F j ) (cid:96)k and z = P ( j ) xy ∈ { , } . Then | x | ≤ tr[ F j ] / ≤ d B /
2. Then x L := − d B / x U := d B / x , respectively. Hence, the following four inequalities hold trivially: z ( x − x L ) ≥ , ( z − x − x U ) ≥ ,z ( x − x U ) ≤ , ( z − x − x L ) ≤ . (107)15ow, let y = xz . Then we have y − zx L ≥ ,y − zx U ≥ x − x U ,y − zx U ≤ ,y − zx L ≤ x − x L . (108)On the other hand, let us remove the constraint y = xz , and consider y as another variable. Then if z = 0, the firstequation of (108) implies that y ≥
0, while the third implies y ≤
0, so y = 0. On the other hand, if z = 1, then thesecond equation of (108) implies that y ≥ x while the fourth implies that y ≤ x . Hence, either way, y = xz . Thus,(108) is equivalent to y = xz .With this transformation, (105) can be reformulated as the following.minimize (cid:88) k,(cid:96) ∈ [ d B ] (cid:88) x,y ∈X ,j ∈ [ M ] p X ( x )( ρ xB ) k(cid:96) c y y xy(cid:96)kj subject to F j ∈ M d B ∀ j ∈ [ M ] ,y xy(cid:96)kj ∈ R , ∀ x, y ∈ X , (cid:96), k ∈ [ d B ] , j ∈ [ M ] ,P ( j ) xy ∈ { , } , ∀ j ∈ [ M ] , x, y ∈ X F j ≥ ∀ j ∈ [ M ] , M (cid:88) j =1 F j = B , (cid:88) x ∈X P ( j ) xy = 1 , ∀ j ∈ [ M ] , y ∈ X (cid:88) y ∈X P ( j ) xy = 1 , ∀ j ∈ [ M ] , x ∈ X y xy(cid:96)kj + P ( j ) xy d B ≥ ∀ x, y ∈ X , (cid:96), k ∈ [ d B ] , j ∈ [ M ] ,y xy(cid:96)kj − P ( j ) xy d B ≥ ( F j ) (cid:96)k − d B ∀ x, y ∈ X , (cid:96), k ∈ [ d B ] , j ∈ [ M ] ,y xy(cid:96)kj − P ( j ) xy d B ≤ ∀ x, y ∈ X , (cid:96), k ∈ [ d B ] , j ∈ [ M ] ,y xy(cid:96)kj + P ( j ) xy d B ≤ ( F j ) (cid:96)k + d B ∀ x, y ∈ X , (cid:96), k ∈ [ d B ] , j ∈ [ M ] . (109)This is a mixed-integer SDP, with a number of constraints and variables that is polynomial in M, d B , |X | . Moreover,if M ≥ d B , then as follows from the remark below (75), the mixed-integer SDP (109) obtains the same optimalvalue as (75), namely E ∗ (cid:126)c ( ρ XB , |X | ), using that K = |X | . Note, however, that mixed-integer SDPs are not in generalefficiently solvable; they encompass mixed integer linear programs, which are NP-hard. However, in practice theycan sometimes be quickly solved. Since the original SDP formulation (75) involves an exponential (in |X | ) numberof variables (or an exponential number of constraints in its dual formulation (100)), (109) which instead has apolynomial (in |X | ) number of variables may provide a more practical approach in some cases. Mixed-integer SDPscan be solved in various ways; in the code [20], the problem (109) is solved using the library Pajarito.jl [29], whichproceeds by solving an alternating sequence of mixed-integer linear problems and SDPs. As an example, we consider the problem of calculating guesswork when one has four uniformly distributed lettersto guess from, each correlated to one of the four BB84 states [30]. That is, ρ XB = 14 (cid:88) k =1 | x k (cid:105)(cid:104) x k | X ⊗ | ψ k (cid:105)(cid:104) ψ k | B (110)with the four | ψ k (cid:105) ’s being chosen from {| (cid:105) , | (cid:105) , | + (cid:105) , |−(cid:105)} . This example is firmly in the quantum realm of guesswork,as more information about the side information system B can be obtained via a quantum measurement than aclassical one (in the computational basis, that is). 16igure 2: The guesswork G ( X | B ) as a function of theparameter ϕ , when X is uniformly distributed over { , , , } and the corresponding side information statesare {| (cid:105) , | (cid:105) , | ψ ( ϕ ) (cid:105) , | ψ ( − ϕ ) (cid:105)} , where | ψ ( ϕ ) (cid:105) = cos ϕ | (cid:105) +sin ϕ | (cid:105) . We see that for classical states, i.e., when ϕ = 0or ϕ = π , we obtain a maximum value of 1.75. For theBB84 states, we achieve a minimum. G ( X | B ) We establish an analytic upper bound on the guesswork by considering a particular POVM and associatedsequences of guesses. We consider the POVM consisting of two orthogonal projectors | θ (cid:105)(cid:104) θ | and | θ ⊥ (cid:105)(cid:104) θ ⊥ | with | θ (cid:105) := sin θ | (cid:105) +cos θ | (cid:105) . If the outcome corresponding to | θ (cid:105)(cid:104) θ | is obtained, then we guess in the order correspondingto (1 , + , − , θ ⊥ is (0 , − , + , G ( X | B ) ≤ (cid:18) · cos θ + 2 ·
12 (1 + sin 2 θ ) + 3 ·
12 (1 − sin 2 θ ) + 4 · sin θ (cid:19) (111)= 1 .
75 + 32 sin θ −
14 sin 2 θ. (112)With the aim of minimizing the guesswork, we choose θ = arctan , and obtain the right-hand side of (111) as (cid:0) − √ (cid:1) ≈ . {| (cid:105) , | (cid:105) , | ψ ( ϕ ) (cid:105) , | ψ ( − ϕ ) (cid:105)} where | ψ ( ϕ ) (cid:105) = cos ϕ | (cid:105) + sin ϕ | (cid:105) . The BB84 states are a special case of this ensemblewith ϕ = π/
2. For each of these ensembles, we compute the guesswork using our SDP formulation in (75). Theresults are shown in Figure 2.
A primitive in any cryptography scheme is the establishment of a secret key between two communicating parties.Quantum key distribution (QKD) protocols can produce a certifiably secure secret key by using pre-shared entan-glement [32]. However, if the protocol is not implemented perfectly, as is the case in realistic scenarios, then someinformation can leak out to an eavesdropper. How secure is the key obtained in this “imperfect” scenario? In otherwords, if there is a small deviation from the ideal protocol, how does it affect the security of the key? We addressthis question considering the guesswork as a security criterion.Consider two systems K and E , where K denotes the key system and encodes the secret key, and E is thesystem held by the eavesdropper. An ideal key state is of the form π K ⊗ ρ E where π K refers to the maximallymixed state on the key system. This means that the eavesdropper can learn nothing about the key with access tothe E system alone. An imperfect key, generally, is the joint state ρ KE . Consider the promise that the imperfectkey state is ε -close to an ideal one in trace distance:12 (cid:107) ρ KE − π K ⊗ ρ E (cid:107) ≤ ε. (113)For an ideal key state, the expected guesswork for the eavesdropper is (cid:80) k |X | k = |X | +12 . For the imperfect key17tate satisfying the promise (113), we get the following bound on guesswork: G ( X | E ) ≥ |X | + 12 − |X | ε. (114)This provides a robustness guarantee that imperfect key states continue to have near-maximal guesswork, if theyremain close to an ideal key state in trace distance.The proof of the lower bound follows from the application of a analogous result pertaining to guesswork due toPliam [33, Theorem 3]. This result states that for any random variable X with probability distribution p X , |X | + 12 − G ( X ) ≤ |X | (cid:107) p X − u X (cid:107) , (115)where G ( X ) denotes the guesswork and u X denotes the uniform distribution. First, we extend the result above in(115) to pertain to the case of guesswork with classical side information. Lemma 9.
For random variables X and Y , the following bound holds for the guesswork. |X | + 12 − G ( X | Y ) ≤ |X | (cid:107) p XY − u X ⊗ p Y (cid:107) . (116) Proof.
Consider the case of a joint distribution p XY , with conditional distribution p X | Y and marginal distribution p Y , and suppose that the value of y is fixed. Then we can invoke Pliam’s bound (115) to find that |X | + 12 − G ( X | Y = y ) ≤ |X | (cid:13)(cid:13) p X | Y = y − u X (cid:13)(cid:13) , (117)where the notation G ( X | Y = y ) indicates the guesswork (without side information) of a random variable distributedaccording to p X | Y ( ·| y ). Taking the expectation of both sides with respect to the random variable Y , we find that |X | + 12 − (cid:88) y p Y ( y ) G ( X | Y = y ) ≤ |X | (cid:88) y p Y ( y ) (cid:13)(cid:13) p X | Y = y − u X (cid:13)(cid:13) (118)= |X | (cid:88) y p Y ( y ) (cid:88) x (cid:12)(cid:12) p X | Y ( x | y ) − u X ( x ) (cid:12)(cid:12) (119)= |X | (cid:88) y (cid:88) x (cid:12)(cid:12) p X | Y ( x | y ) p Y ( y ) − u X ( x ) p Y ( y ) (cid:12)(cid:12) (120)= |X | (cid:88) y (cid:88) x | p XY ( x, y ) − u X ( x ) p Y ( y ) | (121)= |X | (cid:107) p XY − u X ⊗ p Y (cid:107) . (122)Using the fact that (cid:88) y p Y ( y ) G ( X | Y = y ) = G ( X | Y ) , (123)we can conclude the generalization of (115) in the presence of classical side information |X | + 12 − G ( X | Y ) ≤ |X | (cid:107) p XY − u X ⊗ p Y (cid:107) . (124)We know from Theorem 1 that a measured strategy for guesswork is equivalent to a quantum strategy. Usingthat fact, and by combining the promise (cid:107) ρ KE − π K ⊗ ρ E (cid:107) ≤ ε and the result in Lemma 9, we have (114). Remark . Note that Proposition 8 gives the following continuity bound for the guesswork near π K ⊗ ρ E : | G ( X | E ) ρ − G ( X | E ) π ⊗ ρ E | ≤ ε |X | , (125)and hence G ( X | E ) ρ ≥ |X | (cid:18) − ε (cid:19) + 12 . (126)Thus, the bound in (114) is slightly better than what we obtain by employing Proposition 8.18 Open questions
Guesswork presents an operationally-relevant method to quantify uncertainty, and has been relatively unexploredin the presence of quantum side information. We hope our investigation opens the door to further analysis of theguesswork and methods to compute it. In particular, our work leaves open the following questions:1. Does equality hold in (63)? If so, the single-letter expressionlim n →∞ n ln G ( X n | B n ) ρ ⊗ n = (cid:101) H ↑ ( X | B ) ρ (127)holds, matching the classical case [4, Prop. 5].2. Ref. [34] presented variational expressions for the measured R´enyi divegerences D Mα and showed how thoselead to efficient ways to compute the divergences. Are there similar variational formulas for H ↑ ,Mα ( X | Y ) ρ ?That could similarly provide an efficient way to compute the quantity. Acknowledgements
E.H. would like to thank Harsha Nagarajan for pointing out the transformation in [28, Equations (22)–(24)].E.H. is supported by the Cantab Capital Institute for the Mathematics of Information (CCIMI). V.K. acknowledgessupport from the Economic Development Assistantship. M.M.W. acknowledges support from the US NationalScience Foundation through grant no. 1907615.
References [1] J. Massey, “Guessing and entropy,” in
Proceedings of 1994 IEEE International Symposium on InformationTheory . Trondheim, Norway: IEEE, 1994, p. 204. [Online]. Available: http://ieeexplore.ieee.org/document/394764/[2] T. M. Cover and J. A. Thomas,
Elements of Information Theory , 2nd ed. Wiley-Interscience, 2006.[3] R. Lundin, T. Holleboom, and S. Lindskog, “On the Relationship between Confidentiality Measures: Entropyand Guesswork,” in
Proceedings of the 5th International Workshop on Security in Information Systems , 2007,pp. 135–144.[4] E. Arikan, “An inequality on guessing and its application to sequential decoding,”
IEEE Transactions onInformation Theory , vol. 42, no. 1, pp. 99–105, Jan. 1996. [Online]. Available: http://ieeexplore.ieee.org/document/481781/[5] E. Arikan and N. Merhav, “Guessing subject to distortion,”
IEEE Trans. Inform. Theory , vol. 44, no. 3, pp.1041–1056, May 1998. [Online]. Available: http://ieeexplore.ieee.org/document/669158/[6] ——, “Joint source-channel coding and guessing with application to sequential decoding,”
IEEETrans. Inform. Theory , vol. 44, no. 5, pp. 1756–1769, Sep. 1998. [Online]. Available: http://ieeexplore.ieee.org/document/705557/[7] D. Malone and W. Sullivan, “Guesswork and Entropy,”
IEEE Transactions on Information Theory , vol. 50,no. 3, pp. 525–526, Mar. 2004. [Online]. Available: http://ieeexplore.ieee.org/document/1273661/[8] R. Sundaresan, “Guessing Under Source Uncertainty,”
IEEE Transactions on Information Theory , vol. 53,no. 1, pp. 269–287, Jan. 2007. [Online]. Available: http://ieeexplore.ieee.org/document/4039677/[9] M. K. Hanawal and R. Sundaresan, “Guessing Revisited: A Large Deviations Approach,” arXiv:1008.1977[cs, math] , Aug. 2010. [Online]. Available: http://arxiv.org/abs/1008.1977[10] M. M. Christiansen and K. R. Duffy, “Guesswork, Large Deviations, and Shannon Entropy,”
IEEETransactions on Information Theory , vol. 59, no. 2, pp. 796–802, Feb. 2013. [Online]. Available:http://ieeexplore.ieee.org/document/6340341/ 1911] I. Sason and S. Verd´u, “Improved Bounds on Lossless Source Coding and Guessing Moments via R´enyiMeasures,” arXiv:1801.01265 [cs, math] , Jan. 2018. [Online]. Available: http://arxiv.org/abs/1801.01265[12] I. Sason, “Tight Bounds on the R´enyi Entropy via Majorization with Applications to Guessingand Compression,”
Entropy
Quantum Information & Computation , Jul. 2015. [Online]. Available: https://dl.acm.org/doi/abs/10.5555/2871422.2871424[14] R. Koenig, R. Renner, and C. Schaffner, “The operational meaning of min- and max-entropy,”
IEEETransactions on Information Theory , vol. 55, no. 9, pp. 4337–4347, Sep. 2009. [Online]. Available:http://arxiv.org/abs/0807.1338[15] S. Fehr and S. Berens, “On the conditional R´enyi entropy,”
IEEE Transactions on Information Theory , vol. 60,no. 11, pp. 6801–6810, Nov. 2014.[16] R. L. Frank and E. H. Lieb, “Monotonicity of a relative R´enyi entropy,”
Journal of Mathematical Physics ,vol. 54, no. 12, p. 122201, December 2013, arXiv:1306.5358.[17] M. M¨uller-Lennert, F. Dupuis, O. Szehr, S. Fehr, and M. Tomamichel, “On quantum R´enyi entropies: a newgeneralization and some properties,”
Journal of Mathematical Physics , vol. 54, no. 12, p. 122203, Dec. 2013,arXiv:1306.3142. [Online]. Available: http://arxiv.org/abs/1306.3142[18] M. M. Wilde, A. Winter, and D. Yang, “Strong converse for the classical capacity of entanglement-breakingand Hadamard channels via a sandwiched R´enyi relative entropy,”
Communications in Mathematical Physics ,vol. 331, no. 2, pp. 593–622, October 2014, arXiv:1306.1586.[19] M. Tomamichel,
Quantum Information Processing with Finite Resources - Mathematical Foundations .Springer, 2016, vol. 5. [Online]. Available: http://arxiv.org/abs/1504.00233[20] E. P. Hanson, “ericphanson/GuessworkQuantumSideInfo.jl,” Jan. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.3632965[21] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, “Julia: A fresh approach to numerical computing,”
SIAM Review , vol. 59, no. 1, pp. 65–98, 2017.[22] M. Udell, K. Mohan, D. Zeng, J. Hong, S. Diamond, and S. Boyd, “Convex optimization in Julia,” in
Proceedingsof the 1st First Workshop for High Performance Technical Computing in Dynamic Languages . IEEE Press,2014, pp. 18–28.[23] J. Watrous,
The Theory of Quantum Information , 1st ed. Cambridge University Press, Apr. 2018.[24] K. R. Parthasarathy, “Extremal decision rules in quantum hypothesis testing,”
Infinite Dimensional Analysis,Quantum Probability and Related Topics , vol. 02, no. 04, pp. 557–568, Dec. 1999.[25] A. W. Marshall, I. Olkin, and B. C. Arnold,
Inequalities: Theory of Majorization and Its Applications
Journal of Mathematical Physics , vol. 59, no. 4, p. 042204, Apr. 2018. [Online]. Available:https://aip.scitation.org/doi/10.1063/1.5000120[27] S. P. Boyd and L. Vandenberghe,
Convex optimization . Cambridge, UK ; New York: Cambridge UniversityPress, 2004, 40706.[28] S. Bhela, D. Deka, H. Nagarajan, and V. Kekatos, “Designing Power Grid Topologies for Minimizing NetworkDisturbances: An Exact MILP Formulation,” arXiv:1903.08354 [math] , Mar. 2019. [Online]. Available:http://arxiv.org/abs/1903.08354 2029] C. Coey, M. Lubin, and J. P. Vielma, “Outer Approximation With Conic Certificates For Mixed-IntegerConvex Problems,” arXiv:1808.05290 [math] , Aug. 2018. [Online]. Available: http://arxiv.org/abs/1808.05290[30] C. H. Bennett and G. Brassard, “Quantum cryptography: Public key distribution and coin tossing,” in
Pro-ceedings of IEEE International Conference on Computers Systems and Signal Processing , Bangalore, India,December 1984, pp. 175–179.[31] M. Yamashita, K. Fujisawa, K. Nakata, M. Nakata, M. Fukuda, K. Kobayashi, and K. Goto, “A high-performance software package for semidefinite programs: SDPA 7,” p. 26, 2010.[32] A. K. Ekert, “Quantum cryptography based on Bell’s theorem,”
Physical Review Letters , vol. 67, no. 6, pp.661–663, August 1991.[33] J. Pliam, “The disparity between work and entropy in cryptology,” 1998, appeared in theTheory of Cryptography Library and has been included in the ePrint Archive. [Online]. Available:http://eprint.iacr.org/1998/024[34] M. Berta, O. Fawzi, and M. Tomamichel, “On variational expressions for quantum relative entropies,”