[PDF] Zero-Delay and Causal Secure Source Coding

Abstract

We investigate the combination between causal/zero-delay source coding and information-theoretic secrecy. Two source coding models with secrecy constraints are considered. We start by considering zero-delay perfectly secret lossless transmission of a memoryless source. We derive bounds on the key rate and coding rate needed for perfect zero-delay secrecy. In this setting, we consider two models which differ by the ability of the eavesdropper to parse the bit-stream passing from the encoder to the legitimate decoder into separate messages. We also consider causal source coding with a fidelity criterion and side information at the decoder and the eavesdropper. Unlike the zero-delay setting where variable-length coding is traditionally used but might leak information on the source through the length of the codewords, in this setting, since delay is allowed, block coding is possible. We show that in this setting, separation of encryption and causal source coding is optimal.

Full PDF

ZZero-Delay and Causal Secure Source Coding ∗ Yonatan Kaspi and Neri MerhavDepartment of Electrical EngineeringTechnion - Israel Institute of TechnologyTechnion City, Haifa 32000, IsraelEmail: { kaspi@tx, merhav@ee } .technion.ac.ilSeptember 27, 2018 Abstract

We investigate the combination between causal/zero-delay sourcecoding and information-theoretic secrecy. Two source coding modelswith secrecy constraints are considered. We start by considering zero-delay perfectly secret lossless transmission of a memoryless source. Wederive bounds on the key rate and coding rate needed for perfect zero-delay secrecy. In this setting, we consider two models which diﬀer bythe ability of the eavesdropper to parse the bit-stream passing fromthe encoder to the legitimate decoder into separate messages. Wealso consider causal source coding with a ﬁdelity criterion and sideinformation at the decoder and the eavesdropper. Unlike the zero-delaysetting where variable-length coding is traditionally used but mightleak information on the source through the length of the codewords, inthis setting, since delay is allowed, block coding is possible. We showthat in this setting, separation of encryption and causal source codingis optimal.

Index Terms : Source coding, Zero-delay, Secrecy, Causal source cod-ing, Rate-Distortion, Side Information ∗ This research was supported by the Israeli Science Foundation (ISF), grant no. 208/08. a r X i v : . [ c s . I T ] N ov Introduction

We consider several source coding problems with secrecy constraints, inwhich an encoder, referred to as Alice, transmits the output of a memorylesssource to a decoder, referred to as Bob. The communication between Aliceand Bob is intercepted by an eavesdropper, referred to as Eve. A secret keyis shared between Alice and Bob with which they can respectively encryptand decrypt the transmission. Attention is restricted to zero-delay andcausal settings. Our setting represents time-critical applications, like livemultimedia streaming, which need to be transmitted or stored securely sothat the contents can be revealed only to authorized parties. Although thereis vast literature dealing with source coding with secrecy constraints, as wellas works that deal with source coding with delay or causality constraints,very little attention was given in the information theory literature to thecombination of those problem areas.This paper has two main sections. We start with zero-delay source cod-ing and include secrecy constraints. Our goal is to characterize the pairs ofcoding rate and key rate (to be formally deﬁned later) with which perfectlysecure, zero-delay, lossless transmission is possible. Two models of eaves-droppers are considered, which diﬀer in their ability to parse the bit-streamwhich is transmitted from Alice to Bob. We continue with the causal sourcecoding setting, as deﬁned by Neuhoﬀ and Gilbert [1], in which delay is al-lowed, but the cascade of encoder and decoder must be a causal functionof the source. In this setting, our goal is to characterize the achievable re-gion of the quadruple composed of rate, distortion, key rate and uncertaintyat the eavesdropper (equivocation, formally deﬁned later). This setting islater extended to the scenario where side information (SI), correlated to thesource, is available to Bob and Eve. We introduce each of these settingsin more depth and discuss the fundamental diﬀerence between them whensecrecy is involved in the sequel after reviewing relevant past work.Shannon [2] introduced the information-theoretic notion of secrecy, wheresecrecy is measured through the remaining uncertainty about the message atthe eavesdropper. This information-theoretic approach of secrecy allows to2onsider secrecy issues at the physical layer, and ensures unconditionally (re-gardless of the eavesdroppers computing power and time) secure schemes,since it only relies on the statistical properties of the system. Wyner in-troduced the wiretap channel in [3] and showed that it is possible to sendinformation at a positive rate with perfect secrecy as long as Eve’s channel isa degraded version of the channel to Bob. When the channels are clean, twoapproaches can be found in the literature of secure communication. Theﬁrst assumes that both Alice and Bob agree on a secret key prior to thetransmission of the source. The second approach assumes that Bob and Eve(and possibly Alice) have diﬀerent versions of SI and secrecy is achievedthrough this diﬀerence.For the case of shared secret key, Shannon showed that in order for thetransmission of a DMS to be fully secure, the rate of the key must be at leastas large as the entropy of the source. Yamamoto ([4] and references therein)studied various secure source coding scenarios that include an extension ofShannon’s result to combine secrecy with rate–distortion theory. In both[2],[4], when no SI is available, it was shown that separation is optimal.Namely, using a source code followed by encryption with the shared key isoptimal. The other approach was treated more recently by Prabhakaranand Ramchandran [5] who considered lossless source coding with SI at bothBob and Eve when there is no rate constraint between Alice and Bob. It wasshown that the Slepian-Wolf [6] scheme is not necessarily optimal when theSI structure is not degraded. Coded SI at Bob and SI at Alice was consideredin [7]. These works were extended by Villard and Piantanida [8] to thecase where distortion is allowed and coded SI is available to Bob. Merhavcombined the two approaches with the wire–tap channel [9]. In [10], Schielerand Cuﬀ considered the tradeoﬀ between rate, key rate and distortions atthe eavesdropper and legitimate decoder when the eavesdropper had causalaccess to the source and/or correlated data. Note that we mentioned only asmall sample of the vast literature on this subject. In the works mentionedabove, there were no constraints on the delay and/or causality of the system.As a result, the coding theorems of the above works introduced arbitrarylong delay and exponential complexity.3he practical need for fast and eﬃcient encryption algorithms for mil-itary and commercial applications along with theoretical advances of thecryptology community, led to the development of eﬃcient encryption al-gorithms and standards which rely on relatively short keys. However, thesecurity of these algorithms depend on computational complexity and theintractability assumption of some hard problems. From the information-theoretic perspective, very little work has been done on the intersection ofzero-delay or causal source coding and secrecy. A notable exception is [11]which considered the combination of preﬁx coding and secrecy . The ﬁgureof merit in [11] is the expected key consumption where it is assumed thatAlice and Bob can agree on new key bits and use them during the trans-mission. However, no concrete scheme was given on how these bits will begenerated securely.For causal source coding, it was shown in [1], that for a discrete mem-oryless source (DMS), the optimal causal encoder consists of time–sharingbetween no more than two memoryless quantizers, followed by entropy cod-ing. In [13], Weissman and Merhav extended [1] to include SI at the de-coder, encoder or both. The discussion in [13] was restricted, however, onlyto settings where the encoder and decoder could agree on the reconstructionsymbol. In [14] this restriction was dropped. Zero-delay source coding withSI for both single user and multi-user was also considered in [14].Without secrecy constraints, the extension of [1] to the zero-delay caseis straightforward and is done by replacing the block entropy coding by in-stantaneous Huﬀman coding. The resulting bit-stream between the encoderand decoder is composed of the Huﬀman codewords. However, this can-not be done when secrecy is involved, even if only lossless compression isconsidered. To see why, consider the case where Eve intercepts a Huﬀmancodeword and further assume the bits of the codeword are encrypted with aone-time pad. While the intercepted bits give no information on the encodedsymbol (since they are independent of it after the encryption), the numberof intercepted bits leaks information on the source symbol. For example, ifthe codeword is short, Eve knows that the encrypted symbol is one with a [11] was presented in parallel to the ISIT2012 presentation of this paper [12]. In this section, we consider the combination of zero-delay source coding andsecrecy. The diﬀerence between zero-delay and causal source coding boilsdown to the replacement of the entropy coding in the scheme of Neuhoﬀ andGilbert [1] with instantaneous coding. However, as the example given inthe Introduction indicates, replacing the entropy coding with instantaneouscoding will result in information leakage to Eve, at least when Eve canparse the bit-stream and knows the length of each codeword. In the setup weconsider in this section, neither Bob nor Eve have access to SI. The extensionto the case where both Bob and Eve have the same SI is straightforward andwill be discussed in the sequel. We start with notation and formal settingin Section 2.1. Section 2.2 will deal with the setting where Eve has parsinginformation while Section 2.3 will deal with the setting where Eve cannotparse the bit-stream. 6 .1 Preliminaries

We begin with notation conventions. Capital letters represent scalar randomvariables (RV’s), speciﬁc realizations of them are denoted by the correspond-ing lower case letters and their alphabets – by calligraphic letters. For i < j ( i , j - positive integers), x ji will denote the vector ( x i , . . . , x j ), where for i = 1 the subscript will be omitted. We denote the expectation of a randomvariable X by E ( X ) and its entropy by H ( X ). For two random variables X, Y , with ﬁnite alphabets X , Y , respectively and joint probability distribu-tion P ( x, y ), the average instantaneous codeword length of X conditionedon Y = y will be given by L ( X | Y = y ) (cid:52) = min l ( · ) ∈A X (cid:40)(cid:88) x ∈X P ( x | y ) l ( x ) (cid:41) . (1)where A X is the set of all possible length functions l : X → Z + that satisfyKraft’s inequality for alphabet of size |X | . L ( X | Y = y ) is obtained bydesigning a Huﬀman code for the probability distribution P ( x | y ). Withthe same abuse of notation common for entropy, we let L ( X | Y ) denotethe expectation of L ( X | Y = y ) with respect to the randomness of Y . Theaverage Huﬀman codeword length of X is given by L ( X ) and will be referredto as the Huﬀman length of X . For three random variables X, Y, Z jointlydistributed according to P ( x, y, z ) We let L ( X | y, Z ) stand for L ( X | y, Z ) (cid:52) = L ( X | Y = y, Z ) = (cid:88) z P ( z | y ) L ( X | Y = y, Z = z ) , (2)namely, the average Huﬀman length of X conditioned on ( Y = y, Z ) wherethe expectation is taken with respect to the randomness of Z conditionedon the event that Y = y . Since more information at both the encoder anddecoder (represented by the conditioning on Y ) cannot increase the optimalcoding rate, we have that L ( X | Y ) ≤ L ( X ), i.e., conditioning reduces theHuﬀman length of the source (this can also be easily shown directly from(1)).We consider the following zero-delay source coding problem: Alice wishes7o losslessly transmit the output of a DMS X , distributed according to P ( x ), to Bob. The communication between Alice and Bob is interceptedby Eve. Alice and Bob operate without delay. When Alice observes X t sheencodes it, possibly using previous source symbols, by an instantaneous codeand transmits the codeword to Bob through a clean digital channel. Bobdecodes the codeword and reproduces X t . A communication stage is deﬁnedto start when the source emits X t and ends when Bob reproduces X t , i.e.,Bob cannot use future transmissions to calculate X t . We will assume thatboth Alice and Bob share access to a completely random binary sequence, U = ( U , U , . . . ), which is independent of the source and will be used askey bits. In addition, Alice has access, at each stage, to a private sourceof randomness { V t } , which is i.i.d and independent of the source and thekey. Neither Bob nor Eve have access to { V t } . Let m , m , . . . , m n , m i ∈ N be a non-decreasing sequence of positive integers. At stage t , Alice uses l ( K t ) (cid:52) = m t − m t − bits that were not used so far from the key sequence.Let K t (cid:52) = ( U m t − +1 , . . . , U m t ) denote the stage t key. The parsing of the keysequence up to stage t should be the same for Alice and Bob to maintainsynchronization. This can be done through a predeﬁned protocol where thekey lengths are ﬁxed in advance or “on the ﬂy” through the transmitted databy a pre-deﬁned protocol (such as detecting a valid codeword). Note thatsuch a scheme can introduce dependencies between the used keys { K t } andthe data at the encoder ( X t , K t − , V t ). Therefore, while U is independentof all other variables, { K t } is not and might depend on the encoder’s datathrough the number of bits from U it contains. We deﬁne the key rate tobe R k = lim sup n →∞ n (cid:80) nt =1 E l ( K t ). Let Z be the set of all ﬁnite lengthbinary strings. Denote Alice’s output at stage t by Z t ∈ Z and let B t denote the unparsed sequence, containing l ( B t ) bits, that were transmitteduntil the end of stage t (note that B t does not contain parsing informationand therefore is diﬀerent from Z t ). The rate of the encoder is deﬁned by R (cid:52) = lim sup n →∞ n E l ( B n ). When we write Z t = f ( K t , A, B, C ) for someencoding function f and any random variables ( A, B, C ), this will mean thatthe length of the key, K t , depend only on A, B, C and not other variableswhich are not parameters of f . 8iven the keys up to stage t , K t , Bob can parse B n into Z , . . . , Z t for any n ≥ t . The legitimate decoder is thus a sequence functions X t = g t ( K t , Z t ).The model of the system is depicted in Fig. 1. Alice

Bob t K , ,..., t t B Z Z Z

Eve t K t X t X ? t V Figure 1: Zero-delay secrecy modelAs discussed in the Introduction, we will treat two secrecy models. Inthe ﬁrst model we will assume that Eve can detect when each stage starts,i.e., it can parse B t into Z , . . . , Z t . In the second model, we will assumethat Eve taps into the bit-stream of a continuous transmission between Aliceand Bob, B (cid:52) = B ∞ but has no information on actual parsing of B into theactual messages, { Z t } . We treat each of the models in detail in the followingsubsections. In this subsection, we assume that Eve can parse B n into Z , Z , . . . , Z n . Inorder for the system to be fully secure, we follow Shannon [2] in deﬁningwhat is a secure system. Deﬁnition 1.

When Eve has parsing information, a system is said to beperfectly secured if for any l, k, m, nP ( x kl | z nm ) = P ( x kl ) . (3)9amely, acquiring any portion of the transmission leaks no informationon any portion of the source, which was not known to Eve in advance.Parsing information is usually not part of any source coding model (withthe exception of “one shot” models or one-to-one codes [15]). In this sub-section, however, we assume that Eve has this knowledge. It is important tonote that this is additional information (or side information) that is givento Eve and, unless it is known in advance (for example if ﬁxed length codesare used) it generally cannot be deterministically calculated by observingthe encrypted bit-stream which passes from Alice to Bob. The motivation,as discussed in the Introduction, is that sometimes such side informationcan be obtained by Eve through other means, for example a packet sniﬀerin a packet network. Moreover, it makes sense to assume that if Eve hassuch side information, Bob can acquire it as well (for example Eve gets thisinformation through a packet sniﬀer, obviously, Bob will have this informa-tion as well). It is well known that when a decoder has parsing information,there is no need to use uniquely decodable (UD) codes and the average rateof the encoder can be lower than the entropy of the source [15]. However,it can be easily seen that whenever Eve has parsing information, the rate islower bounded by (cid:100) log |X |(cid:101) . Proposition 1.

When Eve has parsing information, if the system is per-fectly secured then R ≥ (cid:100) log |X |(cid:101) . (4) Proof:

Since the number of bits composing Z t must be independent of thesource symbols (otherwise the length of Z t will leak information, as in theexample in the Introduction) the parsing information will assist Bob/Eveto parse the bitstream into Z , Z , . . . , but will not assist in interpretingthe code within Z t . Therefore, within each Z t an instantaneous code mustbe used (because of the zero delay constraint). By Kraft’s inequality, the10hortest block that can accommodate a UD code is of length (cid:100) log |X |(cid:101) since1 ≥ (cid:88) x − l ( x ) ≥ (cid:88) x − l max = |X | − l max (5)where l ( x ) is the length of the codeword of x ∈ X and l max is the longestcodeword in the code.The pessimistic conclusion from Proposition 1 is that there is no hopefor compression in this setting. As we will see in the following, althoughsuch a rate is possible, it will be higher than that, when we will want tominimize the key rate.The most general zero-delay encoder is a sequence of functions Z t = f t ( K t , V t , X t ). In this section, we will treat only a subclass of encoders thatsatisfy the Markov chain X t ↔ Z t ↔ K t − . (6)Namely, given the past and current encoder outputs, the current source sym-bol, X t , does not reduce the uncertainty regarding the past keys. Similarly,knowing past keys will not assist a cryptanalyst in decrypting the currentsource symbol. Namely, the past keys are either not reused, and if reused,the resulting message is encrypted by new key bits. We claim that thisconstraint, in the framework of perfect secrecy with a memoryless source isrelatively benign and, in fact, any encoder that calculates a codeword (possi-bly using the whole history of the source and keys, i.e., with the most generalencoder structure), say ˆ Z t , and then outputs Z t = ˆ Z t ⊕ K t will satisfy thisconstraint. Such a structure seems natural for one–time pad encryption. Infact, it turns out that (6) includes a broad class of encoders and any secure encoder with the general structure of Z t = f t ( K t , V t , X t , Z t − ) satisﬁes (6).We show this in Appendix 4. As we will see in the following, although (6)allows for encoder/decoder which can try to use previously sent source sym-bols which at stage t are known only to Alice and Bob to reduce the numberof key bits used at stage t , this will not be possible.The main result of this subsection is the following theorem:11 heorem 1. In the class of codes that satisfy (6) , there exists a pair ofperfectly secure zero-delay encoder and decoder if and only if R k ≥ L ( X ) . Remarks:

1) This theorem is in the spirit of the result of [2], where the entropy is re-placed by the Huﬀman length due to the zero-delay constraint. As discussedin the Introduction, variable-rate coding is not an option when we want thecommunication to be perfectly secret. This means that the encoder shouldeither output constant length (short) blocks or have the transmission lengthindependent of the source symbol in some other way. While by Proposition1, no compression is possible, Theorem 1 shows that the rate of the key canbe as low as L ( X ) which is the minimal length for zero delay coding. In theproof of the direct part of Theorem 1, we show that a constant rate encoderwith block length corresponding to the longest Huﬀman codeword achievesthis key rate. The padding is done by random bits from the encoder’s privatesource of randomness.2) When Both Bob and Eve has SI, say Y t where ( X t , Y t ) are drawn froma memoryless source emitting pairs, a theorem in the spirit of Theorem 1can be derived by replacing the Huﬀman codes by zero-delay zero-error SIaware codes, which were derived by Alon and Orlitsky in [16]. These codesalso have the property that conditioning reduces their expected length andtherefore the proof will contain the same arguments when (6) will be writtenas X t ↔ ( Z t , Y t ) ↔ K t − (see also [14] for example of these proof techniquesusing the codes of [16]).3) Extending Theorem 1 from lossless to lossy source coding is possible byreplacing X by ˆ X where ˆ X will be the output of a zero-delay reproduc-tion coder. If the distortion constraint is “per-letter”, then this is straightforward. If the distortion constraint is on the whole block (as in classicalrate-distortion) then (6) will impose a strong restriction on the reproductioncoder, forcing it to be memoryless (otherwise, knowing past symbols by us-ing K t − will leak information of the current symbol) . Although zero-delayrate distortion results suggest that indeed ˆ X should be memoryless (see [1],[14]), these results do not restrict the reproduction coder to be memoryless12n advance.We prove the converse and direct parts of Theorem 1 in following twosubsections, respectively. Theorem 1 only lower bounds the key rate. Clearlythere is a tradeoﬀ between the key rate and the coding rate. We propose anachievable region in Section 2.2.3. For every lossless secure encoder–decoder pair that satisﬁes (6), we lowerbound the key rate as follows: n (cid:88) t =1 E l ( K t ) = n (cid:88) t =1 L ( K t ) ≥ n (cid:88) t =1 L ( K t | K t − , Z t ) (7)= n (cid:88) t =1 L ( K t , X t | K t − , Z t ) (8) ≥ n (cid:88) t =1 L ( X t | K t − , Z t ) (9)= n (cid:88) t =1 L ( X t | Z t ) (10)= n (cid:88) t =1 L ( X t ) (11)= nL ( X ) . (12)The ﬁrst equality is true since the key bits are incompressible and thereforethe Huﬀman length is the same as the number of key bits. (7) is truesince conditioning reduces the Huﬀman length (the simple proof of thisis omitted). (8) follows since X t is a function of ( K t , Z t ) (the decoder’sfunction) and therefore, given ( K t − , Z t ), the code for K t also reveals X t .(9) is true since with the same conditioning on ( K t − , Z t ), the instantaneouscode of ( K t , X t ) cannot be shorter then the instantaneous code of X t . (10)is due to (6) and ﬁnally, (11) is true since we consider a secure encoder. We13herefore showed that R k ≥ L ( X ). We construct an encoder–decoder pair that are fully secure with R k = L ( X ).Let l H − max denote the longest Huﬀman codeword of X . We know that l H − max ≤ | X | −

1. The encoder output will always be l H − max bits long andwill be built from two ﬁelds. The ﬁrst ﬁeld will be the Huﬀman codewordfor the observed source symbol X t . Denote its length by l H ( X t ). Thiscodeword is then XORed with l H ( X t ) key bits. The second ﬁeld will becomposed of l H − max − l H ( X t ) random bits (taken from the private source ofrandomness) that will pad the encrypted Huﬀman codeword to be of length l H − max . Regardless of the speciﬁc source output, Eve sees constant lengthcodewords composed of random uniform bits. Therefore no informationabout the source is leaked by the encoder outputs. When Bob receives sucha block, it starts XORing it with key bits until it detects a valid Huﬀmancodeword. The rest of the bits are ignored. Obviously, the key rate whichis needed is L ( X ). The direct proof shown above, suggests a single point on the R − R k plane.Namely, the point where R k = L ( X ) (its minimal possible value), but theencoder rate is high and is equal to l H − max . However, there are many otherachievable points which can be shown using the same idea as in the directpart of the proof by replacing the Huﬀman code by another instantaneouscode. For every possible instantaneous code for the source X (not necessarilyoptimal code for this source) we can repeat the arguments of the direct partof the proof by setting the rate (block length) to be the longest codeword inthis code and the XORing only the bits which originate from this code whilepadding the remainder of the block with random bits. Any such code, whichis a sub-optimal code for source coding, will give us an operating point inthe R − R k plane. The extreme case is obtained by using the trivial sourcecode which uses (cid:100) log |X |(cid:101) bits for all symbols and XORing all the bits with14ey bits. Since there is only a ﬁnite number of possible instantaneous codes(when not counting codes that artiﬁcially enlarge the description of shortercodes by adding bits) we will get a ﬁnite number of points in the R − R k plane. The lower convex envelope of these points is achievable by time-sharing. This is illustrated in Fig. 2. We see an interesting phenomenonwhere optimal codes for source coding (e.g, Huﬀman code) will achieve highencoding rates (albeit low key rate) while sub-optimal codes will achieve lowencoding rate (but higher key rate). R R k L ( X ) l H − max ⌈ log ∣ X ∣⌉⌈ log ∣ X ∣⌉ Figure 2: Achievable region

In this subsection, we relax our secrecy requirements and assume that Eveobserves the whole transmission from Alice to Bob, but has no informationon how to parse the bit-stream B n into Z , . . . , Z n . Note that we did notrestrict the eavesdropper in any way since parsing information is generallynot available and cannot be calculated from B n . Moreover, this setting hasa practical motivation, as discussed in the Introduction.Note that although Eve has no parsing information, she still knows thelength of the whole bit-stream, which can still leak information on the source.15or example, suppose Alice uses a preﬁx code that has a 1 bit codewordlength for, say, X = a and sends n symbols to Bob. Even if the bit-streamis encrypted with a one time pad (and therefore not parsable by Eve by ourassumptions) but Eve sees that the total length of the bit-stream is exactly n bits, she knows with certainty that all the symbols in the bit-stream are a . While such a system is not secure by Deﬁnition 1, the probability ofthe described event vanishes exponentially fast. In this section, we have arelaxed deﬁnition of secrecy which allows for encoders that leak informationon the source, but with vanishing probability: Deﬁnition 2.

When Eve has no parsing information, we say that the systemis asymptotically secure if the following holds for every t ≤ n and every x ∈ X : P ( X t = x | B n ) −−−→ n →∞ P X ( X t = x ) a.s. (13)This means that when the bit-stream is long enough, the eavesdropperdoes not learn from it anything about the source symbols with probability1. Note that the encoder from Section 2.2.2 trivially satisﬁes this constraintsince it was a constant block length encoder and the bits within the blockwhere encrypted by a one–time pad. We will see that with the relaxedsecrecy requirement we can reduce the rate of the encoder to be the sameas the rate of the key. As in the previous subsection, where we dealt withencoders satisfying (6), here, we will limit the discussion as well. We willtreat only encoders that satisfylim n →∞ max ≤ t ≤ n (1 − (cid:15) ) (cid:107) P ( x t | B n , k t − ) − P ( x t | B n ) (cid:107) = 0 a.s. (14)where (cid:107) P − Q (cid:107) denotes the variational distance between the probabilitymeasures, (cid:107) P − Q (cid:107) = (cid:80) x | P ( x ) − Q ( x ) | . This constraint means that when n is large, for all t for which a gap remains to the end of the bit-stream, wepractically have the Markov chain as in (6) with Z n replaced by B n . Notethat this requirement is less stringent than (6) and in fact any encoder whichsatisﬁes (6) and is secure with parsing information (i.e., satisﬁes deﬁnition16) will satisfy (14). Since B t can be parsed with K t , the margin between t and n in (14) ensures that for any t considered, there is a portion of B n that cannot be parsed, even if the eavesdropper acquired the previous keys.The discussion that followed the constraint (6) is valid here as well, namely,we only consider encoders that do not reuse old key bits to encrypt newmessages.We have the following theorem: Theorem 2.

In the class of codes that satisfy (14) , there exists a pair ofasymptotically secure, zero-delay, encoder and decoder if and only if R ≥ L ( X ) , R k ≥ L ( X ) . The fact that R ≥ L ( X ) is trivial since we deal with a zero-delay losslessencoder. However, unlike the case of Theorem 1, here it can be achievedalong with R k ≥ L ( X ). Note that if instead of deﬁning the secrecy con-straint as in (13), we required that for every n, t , P ( X t | B n ) = P ( X t ) thena counterpart of Theorem 1 will hold here. However, the encoder will, as inthe proof of the direct part of Theorem 1, operate at constant rate. We provethe converse and direct parts of Theorem 2 in the following two subsections,respectively. 17 .3.1 Converse Part Following the arguments of Section 2.2.1, we have: n (cid:88) t =1 E l ( K t ) = n (cid:88) t =1 L ( K t ) ≥ n (cid:88) t =1 L ( K t | B n , K t − )= n (cid:88) t =1 L ( K t , X t | B n , K t − ) ≥ n (cid:88) t =1 L ( X t | B n , K t − ) ≥ n (1 − (cid:15) ) (cid:88) t =1 L ( X t | B n , K t − ) . (15)Now, let us deﬁne B n = (cid:8) b n : max ≤ t ≤ n (1 − (cid:15) ) (cid:107) P ( x t | b n , k t − ) − P ( x t | b n ) (cid:107) ≤ (cid:15), ∀ k t − (cid:9) .It can be immediately seen that for b n ∈ B n , L ( X t | b n , K t − ) ≥ L ( X t | b n ) − (cid:15) |X | l max . (16)where l max is the longest codeword for X , whose length can be bounded by |X | . Using the deﬁnition given in (2), we continue as follows: n (cid:88) t =1 E l ( K t ) ≥ n (1 − (cid:15) ) (cid:88) t =1 L ( X t | B n , K t − ) ≥ n (1 − (cid:15) ) (cid:88) t =1 (cid:88) b n ∈B n P ( b n ) L ( X t | b n , K t − ) ≥ n (1 − (cid:15) ) (cid:88) t =1 (cid:88) b n ∈B n P ( b n ) L ( X t | b n ) − n (1 − (cid:15) ) (cid:15) | X | l max (17)18e now deﬁne C n = { b n : (cid:107) P ( x t | b n ) − P ( x t ) (cid:107) ≤ (cid:15) } and continue as follows n (cid:88) t =1 E l ( K t ) ≥ n (1 − (cid:15) ) (cid:88) t =1 (cid:88) b n ∈B n P ( b n ) L ( X t | b n ) − n (1 − (cid:15) ) (cid:15) | X | l max ≥ n (1 − (cid:15) ) (cid:88) t =1 (cid:88) b n ∈B n ∩C n P ( b n ) L ( X t | b n ) − n (1 − (cid:15) ) (cid:15) | X | l max ≥ n (1 − (cid:15) ) (cid:88) t =1 (cid:88) b n ∈B n ∩C n P ( b n ) L ( X t ) − n (1 − (cid:15) ) (cid:15) | X | l max = n (1 − (cid:15) ) P ( B n ∩ C n ) L ( X ) − n (1 − (cid:15) ) (cid:15) | X | l max (18)Now since by (13) and (14) we have that P ( B n ∩ C n ) → (cid:15) > R k = lim sup n →∞ n n (cid:88) t =1 E l K t ≥ L ( X ) . (19) The direct part of the proof is achieved by separation. We show that thesimplest memoryless encoder that encodes X t using a Huﬀman code and thenXORs the resulting bits with a one time pad is optimal here. Therefore, boththe coding rate and the key rate of this scheme are equal to L ( X ). Note thatwith such a simple encoder, no prior knowledge of n is needed and thereforesuch an encoder is suitable for real-time streaming applications. We needto show that (13) holds.We outline the idea before giving the formal proof. The bits of B n areindependent of X t since we encrypted them with a one-time pad. Therefore,only the total length of the bit-stream can leak information. Let l ( B n )represent the number of bits in B n . By the strong law of large numbers, n l ( B n ) → L ( X ) a.s . But if l ( B n ) ≈ nL ( X ) then the only thing that Evelearns is that X n is typical (or equivalently that the law of large number isworking) and this, of course, is known in advance. There are events whereEve can indeed learn a lot from the length of B n , but the probability of19hese events vanish as n becomes large.We invoke martingale theory [17] for a formal proof. Note that B n canbe written as B n = ( l ( B n ) , V n ), where V n is uniform over (cid:8) , , . . . , l ( B n ) (cid:9) and is the number which is represented by the bits of B n (in base 2). Given l ( B n ), V n is independent of X n since all bits are encrypted. Therefore, wehave the following chain X t ↔ l ( B n ) ↔ B n . Now, since l ( B n ) is a functionof B n , we have that P ( X t | B n ) = P ( X t | B n , l ( B n )) = P ( X t | l ( B n )).Let Y t = l H ( X t ) where l H ( X t ) represents the Huﬀman codeword lengthassociated with X t . Clearly, l ( B n ) = (cid:80) ni =1 Y i . Also for x ∈ X deﬁne theindicator I t ( x ) = { X t = x } . Finally, deﬁne the ﬁltration F − n = σ { l ( B n ) , l ( B n +1 ) , . . . } (20)Since the Y i -s are independent and since the events in F −∞ (which dependon { Y i } ∞ i =1 ) are invariant to a ﬁnite permutation of the indexes of the Y i − s ,we have by the Hewitt-Savage zero–one law [17, Theorem 4.1.1, p. 162] that F ∞ , where F −∞ = ∩ n F − n , is trivial. Namely, if A ∈ F −∞ , P ( A ) ∈ { , } .For n ≥ t , let M n = E [ I t ( x ) |F − n ]. Note that M n is a bounded (re-verse) martingale which converges almost surely to M ∞ as n → ∞ . Sincethe source is memoryless and since n ≥ t , we have that I t ( x ) is inde-pendent of Y n +1 , Y n +1 ,... and therefore given l ( B n ), X t is independent of l ( B n +1 ) , l ( B n +2 ) , . . . . Thus we have that M n = E [ I t ( x ) | l ( B n ) , l ( B n +1 ) , . . . ] = P ( X t = x | l ( B n )). Also note that M n does not depend on t , since due tosymmetry, P ( X t = x | l ( B n )) = P ( X = x | l ( B n )). This allows us to considernot only ﬁnite t , but also t ≤ n that grow with n to inﬁnity.Finally, the fact that F ∞ is trivial implies that M ∞ is a constant, whichis the expectation of the indicator, i.e., P ( X t = x ). We therefore showedthat P ( X t = x | l ( B n )) → P ( X t = x ) a.s .To see that this encoder satisﬁes (14) note that P ( X t = x t | B n , k t − ) = P ( X t = x t | B t,n , x t − )= P ( X t = x t | B t,n )= P ( X t = x t | l ( B n ) − l ( B t − )) (21)20here B t,n denotes the bit-stream starting from time t until time n . Thelast equation is true since we have that X t ↔ l ( B n ) − l ( B t − ) ↔ B t,n Now,for t ≤ (1 − (cid:15) ) n , the analysis we did above for P ( X t = x t | B n ) is valid here(with a time shift of t ) since l ( B n ) − l ( B t − ) grows as n grows and therefore(14) is satisﬁed. In this section, we extend [1],[13],[14] to include secrecy constraints. Un-like the discussion of Section 2, here we will allow lossy reconstruction andimperfect secrecy.As in any source coding work, we will be interested in the encoder rate R and the minimal distortion, D which is attained with this rate. Withthe addition of secrecy constraints, two more ﬁgures of merit will be added.The ﬁrst is the uncertainty h , (measured by equivocation, to be deﬁnedshortly) at the eavesdropper regarding the source. The second is the rate ofa private key, R k , shared by Alice and Bob with which a given uncertainty h , is achievable. Note that when D >

0, Bob is also left with uncertaintyregarding the original source sequence since he only knows it is containedin a D -ball around its reconstruction. Therefore, even with no attemptof encryption, Eve will “suﬀer” this same uncertainty. This implies that( R, R k , h, D ) are tied together and the goal of this section is to ﬁnd theregion of attainable quadruples.Our system model is depicted in Fig. 3.We will deal with two settings, which diﬀer by the position of the switchesdenoted by S in Fig. 3. Namely, the availability of SI at Bob and Eve. Whilethe setting without SI is a special case of the setting which includes SI, thesesettings are diﬀerent from a causal rate-distortion standpoint. Without SI,it was shown in [1] that the chain of encoder and decoder is equivalent tochain of a causal reproduction coder (to be formally deﬁned shortly) followedby lossless compression. However, when SI is available to the decoder, thisequivalence does not hold since the encoder cannot reproduce ˆ X withoutthe SI. The next two subsections deal with the two settings, starting from21 n Alice Z Bob ̂ X n Eve W n Y n ss K Figure 3: Causal Modelthe simpler case without SI. Formal deﬁnition of the settings will be givenin the beginning of each subsection.

The notation conventions we introduced in the beginning of Section 2.1for random variables, vectors etc., will be used here as well. Let X n be asequence produced by a memoryless source. The alphabet of X , X , as wellas all other alphabets in the sequel, is ﬁnite. The source sequence is givento Alice. In addition to the source, Alice has access to a secret key, K ,uniformly distributed over { , , . . . , M k } , which is independent of X n . Bobshares the same key. Alice uses the source and the key to create a binaryrepresentation Z = { Z k } k ≥ (we omit the dependence of Z on n to simplifynotation). Bob receives Z and with its shared key creates a reproduction ˆ X n ,where ˆ X ∈ ˆ X is the reproduction alphabet. As in [1], the cascade of encoderand decoder will be referred as a reproduction coder , i.e., the reproductioncoder is a family of functions { f k } k ≥ such that ˆ X k = f k ( X n , K ).We say that a reproduction function is causal relative to the source if22or all t : ˆ X t = f t ( X ∞−∞ , K ) = f t ( ˜ X ∞−∞ , K ) if X t −∞ = ˜ X t −∞ . (22)We are given a distortion measure d , d : X × ˆ X → R + where R + denotesthe set of non-negative real numbers. Let D min = min x, ˆ x d ( x, ˆ x ). When asource code with a given induced reproduction coder { f k } is applied to thesource X n , the average distortion is deﬁned by D ( { f k } ) (cid:52) = lim sup n →∞ n E n (cid:88) t =1 d ( X t , ˆ X t ) . (23)Let l ( Z ) denote the number of bits in the bitstream Z . The average rate ofthe encoder is deﬁned by R = lim sup n →∞ n E l ( Z ) . (24)In our model, an eavesdropper, Eve, intercepts the bitstream Z . Wefollow the common assumptions that Eve is familiar with the encoding anddecoding functions, coding techniques, etc., which are employed by Aliceand Bob, but is unaware of the actual realizations of the source and the key.The uncertainty regarding the source sequence X at the eavesdropper afterintercepting Z is measured by the per-letter equivocation, which we denoteby h . Namely, h = lim inf n →∞ n H ( X n | Z ).Unlike [1], [13], where the bit representation which is passed from theencoder to the decoder, Z , was only an intermediate step between a losslessencoder and a lossless decoder, here, Z is important as it should leave Eveas oblivious as possible of the source sequence. However, as in [1], [13],applying a lossless code on Z between the encoder and decoder can onlyimprove the coding rate and will not aﬀect the other ﬁgures of merit.Let R NO − SI denote the set of positive quadruples ( R, R k , D, h ) suchthat for every (cid:15) > n , there exists an encoder and a23ecoder, inducing a causal reproduction coder and satisfying:1 n H ( Z ) ≤ R + (cid:15), (25)1 n H ( K ) ≤ R k + (cid:15), (26)1 n n (cid:88) t =1 E d ( X t , ˆ X t ) ≤ D + (cid:15), (27)1 n H ( X n | Z ) ≥ h − (cid:15). (28)Our goal, in this section, is to ﬁnd the region of quadruples, which areachievable with a causal reproduction coder. To this end, we will need pre-vious results on causal rate-distortion results [1], which we brieﬂy describebelow.In [1], the same model as described above was considered without thesecrecy constraints. The goal of [1] was to ﬁnd the tradeoﬀ between R and D under the constraint that the reproduction coder is causal. Towards thisgoal, the equivalence of the two models in Fig. 4 was proved. Namely, therate of a source code with a given reproduction coder can be only improvedif a lossless source code will be applied to its output. This implied thatit is enough to consider only systems that ﬁrst, generate the reproductionprocesses and then losslessly encode it as is Fig. 4b. Indeed, this separationis the basis for many practical coding systems.Let R c ( D ) denote the minimal achievable rate over all causal reproduc-tion coders { f k } with d ( { f k } ) ≤ D . Also, let r c ( D ) = min f : Ed ( X,f ( X ) ≤ D H ( f ( X )) (29)and ﬁnally, let r c ( D ) denote the lower convex envelope of r c ( D ) The follow-ing theorem is the main result of [1]: Theorem 3. ([1], Theorem 3) R c ( D ) = r c ( D ) . (30)24 n Encoder Z Decoder ̂ X n (a) Original source coding model X n ReproductionCoder Z EntropyCoder ̂ X n EntropyDecoder ̂ X n Lossless Code (b) Equivalent source coding model

Figure 4: Causal source coding modelIt was shown in [1] that r c ( D ) is achieved by time-sharing no more thantwo scalar reproduction coders. R NO − SI Although a scheme that ﬁrst uses the optimal encoder of [1] to create thesame bit-stream and then XOR the resulting bits with key bits is obviouslypossible, it is not immediately clear that such a separation is optimal. Thereason for this is that one needs to ﬁrst rule out the possibility that usingkey bits during the quantization and using entropy coding with the key asSI at both sides might improve performance. The following theorem, whichis the main result of this subsection, shows that the separation scheme isindeed optimal.

Theorem 4. ( R, R k , D, h ) ∈ R NO − SI if and only if the following inequali-ties hold: R ≥ r c ( D ) h ≤ H ( X ) R k ≥ h − H ( X ) + r c ( D ) (31)It is evident from Theorem 4 that the direct part is achieved by the25eparation scheme proposed above. Theorem 4 is a special case of the moregeneral theorem, which includes SI, we prove in the next subsection (Theo-rem 6) and therefore no proof is given here . In this section, we extend the setting of Section 3.1 to include SI at Boband Eve. As in the previous section, we start by describing the model andmentioning related causal rate-distortion results before giving our results.In this section, we assume a memoryless source which emits sequences ofrandom variables ( X n , Y n , W n ). As before, X n is observed by Alice. Boband Eve observe Y n , W n respectively. The sequences ( X n , Y n , W n ) aredistributed according to a joint distribution P ( x n , y n , w n ) = n (cid:89) t =1 P ( x t ) P ( y t | x t ) P ( w t | y t ) . Namely, we assume a degraded SI structure where Bob’s SI is better thanEve’s SI. Although we do not deal with other possible SI structures, we willdiscuss such extensions in the sequel. As in the model without SI, Alice andBob have access to a shared secret key denoted by K , K ∈ { , , . . . , M k } which is independent of the source. Alice sends a binary representation Z = { Z k } k ≥ which is used by the decoder along with the SI and keyto create ˆ X n . We call the cascade of encoder and decoder a reproductioncoder , namely a family of functions { f k } k ≥ such that ˆ X k = f k ( K, X n , Y n ).A reproduction coder is said to be causal with respect to the source and SIif for all t : f t ( k, x n , y n ) = f t ( k, u t , v t ) , if x t = u t , y t = v t (32)The distortion constraint (23) and the deﬁnition of rate (24) remain the sameas in the setting without SI. When SI is available to the decoder, the cascade The achieving scheme of Theorem 4 will use the coding scheme of [1] instead of thecoding scheme of [14] which is used to prove Theorem 6, but the ideas are essentially thesame.

26f encoder and decoder cannot be a recast into a cascade of reproductioncoder followed by lossless coding as in Fig. 4 since the encoder has no accessto the SI the decoder uses improve its reproduction and therefore cannotcalculate ˆ X t . A causal reproduction coder in our setting is composed of afamily of causal encoding functions which calculate messages, S t , which arecausal functions of the source symbols, namely S t = e t ( K, X t ) and causaldecoding functions which use the encoder messages along with the SI andkey to create the reproduction, namely, ˆ X t = g t ( K, S t , Y t ). Note that thisrepresentation stems directly from the causality restriction and every systemthat induces a causal reproduction coder can be written in this way . Asin [1], although forcing causal encoding and decoding functions, this modelallows for arbitrary delays which can be introduced when transmitting theencoder’s output, S n , to the decoder. Namely, Z is not necessarily causalin S n . We will allow events where the decoder fails to decode the bit-stream Z to produce the encoder messages, S n . In such an event, we haveno restriction on the dependence of the output on the SI (and thereforeon the source). However, we restrict that such error events will happenwith small probability. Namely, we require that for every (cid:15) > P ( S n (cid:54) = g ( K, Y n , Z )) < (cid:15) for large enough n and some function g .The secrecy of the system is measured by the uncertainty of Eve withregard to the source sequence, measured by the normalized equivocation n H ( X n | W n , Z ).Let R SI denote the set of positive quadruples ( R, R k , D, h ) such that forevery (cid:15) > n , there exists an encoder and a decoder The causality restriction does not actually force that S t will be a function of X t aslong as when reproducing ˆ X t , only S i ’s which are functions of ( X , X , . . . , X t ) are used.However, such an “out of order” system has an equivalent system which is obtained byreordering the indexes of the messages so that S i is a function of X i . The performance ofboth systems will be equivalent since the reordering does not aﬀect the neither the entropycoding of S n nor the calculation of ˆ X t n H ( Z ) ≤ R + (cid:15), (33)1 n H ( K ) ≤ R k + (cid:15), (34)1 n n (cid:88) t =1 E d ( X t , ˆ X t ) ≤ D + (cid:15), (35)1 n H ( X n | W n , Z ) ≥ h − (cid:15). (36)Our goal is to characterize this region.In the context of causal source coding, such a model was consideredin [13], [14]. In [13] the model was restricted to common reconstructionbetween the encoder and decoder, meaning that both parties agree on thereconstruction. This restriction prevents the decoder from using the SIwhen reconstructing the source and the SI can be used only for losslesstransmission of the reconstruction which is calculated at the encoder. Inthis case, a cascade of a reproduction coder followed by lossless entropycoding which uses SI is valid. The full treatment of SI for this scenario wasrecently given in [14].Let R SIc ( D ) denote the minimal achievable rate over all causal repro-duction coders with access to SI, { f k } , such that D { f k } ≤ D . Also let r SIc ( D ) = min f,g H ( f ( X ) | Y ) (37)where the minimum is over all functions f : X → S and g : S × Y → ˆ X suchthat E d ( X, g ( f ( X ) , Y )) ≤ D . The alphabet S is a ﬁnite alphabet whosesize is part of the optimization process ( |S| ≤ |X | ). Finally, let r SIc ( D ) bethe lower convex envelope of r SIc ( D ).The following theorem is proved in [14]: Theorem 5. ([14], Theorem 4) R SIc ( D ) = r SIc ( D ) (38)28t was shown that r SIc ( D ) is achieved by time-sharing at most two setsof scalar encoders ( f ) and decoders ( g ). Moreover, SI lookahead was shownto be useless in the causal setting. R SI We have the following theorem.

Theorem 6. ( R, R k , D, h ) ∈ R SI if and only if R ≥ r SIc ( D ) ,h ≤ H ( X | W ) ,R k ≥ h − H ( X | W ) + r SIc ( D ) . (39) If h − H ( X | W ) + r SIc ( D ) ≤ , no encryption is needed.Remark: The above theorem pertains to degraded SI structure. It wasshown in [5] that for lossless secure compression with SI, Slepian-Wolf cod-ing is optimal when the SI is degraded, but not optimal otherwise. For ageneral SI structure, a simple scheme will apply memoryless quantization(resulting in a new memoryless source) and then apply the scheme of [5].The output of the scheme of [5] can be further encrypted with key bits asneeded to achieve the desired h . Such a scheme will not violate the causalrestriction. Although schemes that ﬁrst apply memoryless quantization andthen losslessly compress the output are optimal in the context of all knowncausal source coding works, it is not clear that this is the case here. Thechallenge in the converse part is that when applying a causal encoder (butnot memoryless), the resulting process is not necessarily memoryless. Wewere unsuccessful in proving the converse for a general SI structure.We prove the converse and direct parts of Theorem 6 in the followingtwo subsections, respectively. We now proceed to prove the converse part, starting with lower boundingthe encoding rate. We assume a given encoder and decoder pair which form29 causal reproduction coder with (

R, R k , D, h ) ∈ R SI . By the deﬁnition ofour model we have by Fano’s inequality ([18]) that H ( S n | K, Y n , Z ) ≤ n(cid:15) .For n large enough and every encoder and decoder pair that induce acausal reproduction coder and satisfy (33)-(36) the following chain of in-equalities hold: nR ≥ H ( Z ) ≥ H ( Z | K, Y n ) − H ( Z | K, S n , Y n )= I ( S n ; Z | K, Y n )= H ( S n | K, Y n ) − H ( S n | K, Y n , Z ) ≥ H ( S n | K, Y n ) − n(cid:15) (40)= n (cid:88) t =1 H ( S t | K, S t − , Y n ) − n(cid:15) ≥ n (cid:88) t =1 H ( S t | K, S t − , X t − , Y n ) − n(cid:15) = n (cid:88) t =1 H ( S t | K, X t − , Y n ) − n(cid:15) (41)= n (cid:88) t =1 H ( e t ( K, X t − , X t ) | K, X t − , Y n ) − n(cid:15) = n (cid:88) t =1 (cid:90) H ( e t ( X t , k, x t − ) | k, x t − , y t − , Y t , Y nt +1 ) dµ ( k, x t − , y t − ) − n(cid:15) (42)= n (cid:88) t =1 (cid:90) H ( e t ( X t , k, x t − ) | Y t ) dµ ( k, x t − , y t − ) − n(cid:15) (43)where (40) follows from Fano’s inequality and in (41) we used the fact that S t − is a function of ( K, X t − ). In (42), µ ( · ) denotes the joint probabilitymass function of its arguments. In the last line, we used the independenceof X t from the key and SI at time other then t . Now, e t ( X t , k, x t − ) can beseen as a speciﬁc function, f , in the deﬁnition of r SIc ( D ) (37). Also, with( k, x t − , y t − ) ﬁxed, so is s t − and the decoding function30 X t = g t ( k, s t − , S t , y t − , Y t ) can be seen as a speciﬁc choice of g ( · , · ) in (37).With this observation, we continue as follows n ( R + (cid:15) ) ≥ n (cid:88) t =1 (cid:90) H ( e t ( X t , k, x t − ) | Y t ) dµ ( k, x t − , y t − ) ≥ n (cid:88) t =1 (cid:90) r SIc ( E [ d ( X t , g t ( e t ( k, x t − , X t ) , s t − , y t − , Y t )) | k, x t − , y t − ]) × dµ ( k, x t − , y t − ) (44) ≥ n (cid:88) t =1 (cid:90) r SIc ( E [ d ( X t , g t ( e t ( k, x t − , X t ) , s t − , y t − , Y t )) | k, x t − , y t − ]) × dµ ( k, x t − , y t − ) (45) ≥ n (cid:88) t =1 r SIc (cid:16) (cid:90) E [ d ( X t , h t ( e t ( k, x t − , X t ) , s t − , y t − , Y t )) | k, x t − , y t − ]) × dµ ( k, x t − , y t − (cid:17) (46) ≥ n (cid:88) t =1 r SIc ( E [ d ( X t , h t ( e t ( K, X t ) , S t − , Y t ))])= n (cid:88) t =1 r SIc (cid:16) E (cid:104) d ( X t , ˆ X t ) (cid:105)(cid:17) ≥ nr SIc (cid:32) n n (cid:88) t =1 E (cid:104) d ( X t , ˆ X t ) (cid:105)(cid:33) (47) ≥ nr SIc ( D ) , (48)where (44) follows from the deﬁnition of r SIc ( D ) and the discussion precedingthe last equation block, (45) follows from the deﬁnition of r SIc ( D ), (46) and(47) follow from the convexity of r SIc ( D ). Finally, (48) follows from the factthat r SIc ( D ) is non-increasing in D .The key rate can be lower bounded as follows: nR k = H ( K ) ≥ H ( K | Z , W n )= I ( X n ; K | W n , Z ) + H ( K | X n , W n , Z )31 H ( X n | W n , Z ) − H ( X n | K, W n , Z ) + H ( K | X n , W n , Z ) ≥ nh − H ( X n | K, W n , Z ) + H ( K | X n , W n , Z ) ≥ nh − H ( X n | K, W n , Z ) (49)where the line preceding the last is true due to (36). We continue by focusingon H ( X n | K, W n , Z ): H ( X n | K, W n , Z )= I ( X n ; Y n | K, W n , Z ) + H ( X n | K, Y n , W n , Z ) ≤ H ( Y n | W n ) − H ( Y n | K, X n , W n , Z ) + H ( X n | K, Y n , Z ) (50)= H ( Y n | W n ) − H ( Y n | X n , W n ) + H ( X n | K, Y n , Z ) (51)= I ( X n ; Y n | W n ) + H ( X n | K, Y n , S n , Z )+ I ( X n ; S n | K, Y n , Z )= I ( X n ; Y n | W n ) + H ( X n | K, Y n , S n , Z )+ H ( S n | K, Y n , Z ) − H ( S n | K, X n , Y n , Z ) ≤ nI ( X ; Y | W ) + H ( X n | K, Y n , S n , Z ) + n(cid:15) (52) ≤ n ( H ( X | W ) − H ( X | Y )) + H ( X n | K, Y n , S n ) (53)where in (50) we used the degraded structure of the source. Eq. (51) istrue since Z is a function of ( K, X n ) and K is independent of the source.Eq. (52) is true by Fano’s inequality and the fact that S n is a function of( K, X n ). Focusing on the last term of (53) we have H ( X n | K, Y n , S n ) = H ( X n | K, Y n ) − I ( X n ; S n | K, Y n )= nH ( X | Y ) − I ( X n ; S n | K, Y n )= nH ( X | Y ) − H ( S n | K, Y n ) + H ( S n | K, X n , Y n )= nH ( X | Y ) − H ( S n | K, Y n ) (54) ≤ nH ( X | Y ) − nr SIc ( D ) + n(cid:15) (55)where (54) is true since W n is a function of K, X n . Finally, the last linefollows from (40). Combining (55) with (53) into (49), and using the arbi-32rariness of (cid:15) , we showed that R k ≥ h − H ( X | W ) + r SIc ( D ). It is seen from Theorem 6, that separation holds in this case. The directpart of this proof is therefore straightforward: First, we apply the achievingscheme which achieves Theorem 5 and was presented in [14]. Namely, weﬁnd at most two encoding functions, f , f , along with the correspondingtwo decoding functions that use the SI, g , g , which achieve average distor-tion not greater than D + (cid:15) . We apply the encoding functions to the sourcesequence with the appropriate time sharing to create the encoder messages, S n . A Slepian-Wolf code is then applied on S n . Let the resulting binaryrepresentation of the Slepian-Wolf codeword be denoted by B . By construc-tion, we have that H ( B ) ≤ r SIc ( D ) + (cid:15) for n large enough. We now XORthe ﬁrst n [ h − H ( X | W ) + r SIc ( D )] bits of B with a one time pad K of thislength, creating the binary sequence Z which is given to the decoder. Wehave 1 n H ( K ) = h − H ( X | W ) + r SIc ( D ) (56)1 n H ( Z ) ≤ r SIc ( D ) + (cid:15). (57)At the decoder, the Slepian-Wolf code is decoded (with failure probabilitysmaller than (cid:15) for large enough n ) and the decoding functions g , g areapplied on S n and the SI to create the reproduction. We need to show thatthe equivocation constraint is indeed satisﬁed with this scheme: H ( X n | W n , Z ) = H ( X n | W n ) − I ( X n , Z | W n )= nH ( X | W ) − H ( Z | W n ) + H ( Z | X n , W n ) ≥ nH ( X | W ) − H ( Z ) + H ( Z | B , X n , W n )= nH ( X | W ) − H ( Z ) + H ( B ⊕ K | B ) (58)= nH ( X | W ) − H ( Z ) + H ( K ) ≥ n ( h − (cid:15) ) (59)33here in (58) we used the fact that by construction we have the followingchain W n → X n → B → Z . In (59), we used (57) and (59). Note thatwhen the decoding of the Slepian-Wolf code fails, the resulting (arbitrary)reconstruction sequence depends non-causally on the SI, thus breaking thecausal reproduction coder structure. However, the probability of such eventcan be made negligible when n becomes large. We investigated the intersection between causal and zero-delay source cod-ing and information theoretic secrecy. It was shown that simple separationis optimal in the causal setting when we considered encoder and decoderpairs that are causal only with respect to the source. An interesting ex-tension will be to investigate the setting where the use of the key is alsorestricted to be causal, for example when the key is streamed along withthe source and should not be be stored. This will force the quantizer to en-crypt the resulting sequence before the lossless code (see [19] for example).In the zero-delay setting we considered only perfect secrecy. An interestingand algorithmically challenging research direction is to investigate imperfectsecrecy in the zero-delay setting. Moreover, it was mentioned that the ex-tension of our zero-delay results to the case when the same SI is availableto Bob and Eve is straightforward. This will continue to hold when evenif Bob and Eve have diﬀerent versions of SI but P ( w, y ) > w, y ,where W is Eve’s SI and Y is Bob’s SI. However, when imperfect secrecy isconsidered, it is not clear how diﬀerent SI aﬀects the equivocation at Eve.Such a setting is another direction for future research.34 ppendix Proof of Markov chain

We show that any secure encoder that satisﬁes Z t = f ( K t , V t , X t , Z t − )satisﬁes the Markov chain. To see why this is true observe that P ( k t − , x t , z t ) = P ( z t ) P ( x t | z t ) P ( k t − | x t , z t ) (60)Focusing on P ( k t − | x t , z t ) we have P ( k t − | x t , z t ) = P ( k t − , x t , z t − ) P ( z t | k t − , x t , z t − ) P ( x t , z t )= P ( k t − , x t , z t − ) P ( z t | k t − , x t , z t − ) P ( x t , z t ) (61)= P ( k t − , z t − ) P ( x t ) P ( z t | x t , z t − ) P ( x t , z t ) (62)= P ( k t − , z t − ) P ( x t ) P ( z t | z t − ) P ( x t ) P ( z t | x t ) (63)= P ( k t − , z t − ) P ( z t | z t − ) P ( z t − ) P ( z t | z t − )= P ( k t − | z t − ) (64)where in (61) we note that x t − can be computed from z t − , k t − . In (62) weused the independence of ( k t , v t ) from k t − and the fact that z t is a functionof ( k t , v t , x t ). The independence of x t from ( k t − , z t − ) was also used. In(63) we used the secure encoder assumption (independence of X t from Z t ).Therefore we have, P ( k t − , x t , z t ) = P ( x t , z t ) P ( k t − | x t , z t )= P ( x t , z t ) P ( k t − | z t ) (65)and we proved that (6) is satisﬁed. 35 eferences [1] D. Neuhoﬀ and R. K. Gilbert, “Causal source codes,” IEEE Transac-tions on Information Theory , vol. 28, no. 5, pp. 701–713, September1982.[2] C. E. Shannon, “Communication theory of secrecy systems,”

Bell Sys-tems Technical Jouranl , vol. 28, no. 4, pp. 656–715, 1949.[3] A. D. Wyner, “The wire–tap channel,”

BSTJ , vol. 54, no. 8, pp. 1355–1387, 1975.[4] H. Yamamoto, “Rate-distortion theory for the shannon cipher system,”

IEEE Transactions on Information Theory , vol. 43, no. 3, pp. 827–835,May 1997.[5] V. Prabhakaran and K. Ramchandran, “On secure distributed sourcecoding,” in

Information Theory Workshop, 2007. ITW ’07. IEEE , sept.2007, pp. 442 –447.[6] D. Slepian and J. Wolf, “Noiseless coding for correlated informationsources,”

IEEE Transactions on Information Theory , vol. 19, pp. 471–480, 1973.[7] D. Gunduz, E. Erkip, and H. Poor, “Secure lossless compression withside information,” in

Information Theory Workshop, 2008. ITW ’08.IEEE , may 2008, pp. 169 –173.[8] J. Villard and P. Piantanida, “Secure multiterminal source coding withside information at the eavesdropper,”

CoRR , vol. abs/1105.1658, 2011.[9] N. Merhav, “Shannon’s secrecy system with informed receivers andits application to systematic coding for wiretapped channels,”

IEEETransactions on Information Theory , vol. 54, no. 6, pp. 2723 –2734,June 2008.[10] C. Schieler and P. Cuﬀ, “Rate-distortion theory for secrecy systems,”

CoRR , vol. abs/1305.3905, 2013.3611] C. Uduwerelle, S.-W. Ho, and T. Chan, “Design of error-free perfectsecrecy system by preﬁx codes and partition codes,” in

Proc. 2012IEEE International Symposium on Information Theory , Cambridge,MA, USA, July 2012, pp. 1593–1597.[12] Y. Kaspi and N. Merhav, “On real-time and causal secure source cod-ing,” in

Proc. 2012 IEEE International Symposium on InformationTheory , Cambridge, MA, USA, July 2012, pp. 353–357.[13] T. Weissman and N. Merhav, “On causal source codes with side infor-mation,”

IEEE Transactions on Information Theory , vol. 51, no. 11,pp. 4003–4013, November 2005.[14] Y. Kaspi and N. Merhav, “Zero-delay and causal single-user andmulti-user lossy source coding with decoder side information,”

Sub-mitted to IEEE Transactions on Information Theory 2013, CoRR , vol.abs/1301.0079, 2013.[15] N. Alon and A. Orlitsky, “A lower bound on the expected length ofone-to-one codes,”

Information Theory, IEEE Transactions on , vol. 40,no. 5, pp. 1670–1672, 1994.[16] ——, “Source coding and graph entropies,”

IEEE Transactions on In-formation Theory , vol. 42, no. 5, pp. 1329–1339, September 1996.[17] R. Durrett and R. Durrett,

Probability: Theory and Examples , ser.Cambridge Series in Statistical and Probabilistic Mathematics. Cam-bridge University Press, 2010.[18] T. M. Cover and J. A. Thomas,

Elements of Information Theory ,2nd ed. Wiley, 2006.[19] M. Johnson, P. Ishwar, V. Prabhakaran, D. Schonberg, and K. Ram-chandran, “On compressing encrypted data,”