Group Membership Verification with Privacy: Sparse or Dense?
GGroup Membership Verification with Privacy:Sparse or Dense?
Marzieh Gheisari
Univ Rennes, Inria, CNRS, IRISAFrance
Teddy Furon
Univ Rennes, Inria, CNRS, IRISAFrance
Laurent Amsaleg
Univ Rennes, Inria, CNRS, IRISAFrance
Abstract —Group membership verification checks if a biometrictrait corresponds to one member of a group without revealing theidentity of that member. Recent contributions provide privacyfor group membership protocols through the joint use of twomechanisms: quantizing templates into discrete embeddings, andaggregating several templates into one group representation.However, this scheme has one drawback: the data structurerepresenting the group has a limited size and cannot recognizenoisy query when many templates are aggregated. Moreover, thesparsity of the embeddings seemingly plays a crucial role on theperformance verification.This paper proposes a mathematical model for group mem-bership verification allowing to reveal the impact of sparsity onboth security, compactness, and verification performances. Thismodels bridges the gap towards a Bloom filter robust to noisyqueries. It shows that a dense solution is more competitive unlessthe queries are almost noiseless.
I. I
NTRODUCTION
Group membership verification is a procedure checkingwhether an item or an individual is a member of a group. Ifmembership is positively established, then an access to someressources (a building, a file, . . . ) is granted; otherwise theaccess is refused. This paper focuses on privacy preserving group membership verification procedures where membersmust be distinguished from non-members, but where themembers of a group should not be distinguished one another.To this aim, a few recent contributions have proposed torely on the aggregation and the embedding of several distinc-tive templates into a unique and compact high dimensionalfeature representing the members of a group [1], [2]. It hasbeen demonstrated that this allows a good assessment of themembership property at test time. It has also been shown thatthis provides privacy and security. Privacy is enforced becauseit is impossible to infer from the aggregated feature whichoriginal distinctive template matches the one used to probethe system. Security is preserved since nothing meaningfulleaks from embedded data [3], [4].[1] and [2], however face severe limitations. Basically, itseems impossible to create features representing groups havingmany members. In this case, the probability to identify truepositives vanishes and the false negative rate grows accord-ingly. Furthermore, the robustness of the matching procedure
WIFS‘2019, December, 9-12, 2019, Delft, Netherlands. 978-1-7281-3217-4/19/ $ c (cid:13) fades and becomes unable to absorb even the smallest amountof noise that inherently differentiate the enrolled template ofone member and the template captured at query time forthis same member. In contrast, features representing onlyfew group members are robust to noise and cause almost nofalse negatives. A detailled analysis of [1] and [2] suggeststhat these limitations originate from the sparsity level of thefeatures representing group members.This paper investigates the impact of the sparsity level ofthe high dimensional features representing group members onthe quality of (true positive) matches and on their robustness tonoise. It shows it is possible to trade compactness and sparsityfor better security or better verification performance.Sect. II first considers the aggregation of discrete randomsequences, and models this compromise with informationtheoretical tools. Sect. III applies this viewpoint to binaryrandom sequences and shows that the noise on the query hasan impact depending on the sparsity of the sequences. Sect. IVbridges the gap between the templates, i.e. real d -dimensionalvectors, and the discrete sequences considered in the previoussections. Sect. V gathers the experimental results for a groupmembership verification based on faces.II. D ISCRETE S EQUENCES
This section considers the problem of creating a representa-tion Y of a group of n sequences { X , . . . , X n } , whose useis to test whether a query sequence Q is a noisy version ofone of these n original sequences. This test is done at querytime when the original sequences are no longer available andall that remains is the representation Y .The sequences are elements of X m where X is a finitealphabet of cardinality |X | , say X := { , , . . . , |X | − } . Thesequence follows a statistical model giving a central role tothe symbol . The symbols of the sequences are independentand identically distributed with P ( X = s ) = (cid:40) − p ( |X | − if s = 0 p otherwise (1)for p ∈ (0 , / |X | ] . Sparsity means that probability p is small,density means that p is close to / |X | so that X is uniformlydistributed over X . a r X i v : . [ c s . CR ] F e b . Structure of the group representation We impose the following conditions on the aggregation a ( · ) computing the group representation Y = a ( X , . . . , X n ) : • Y is a discrete sequence of the same length Y ∈ Y m , • Symbol Y ( i ) only depends on symbols { X ( i ) , . . . , X n ( i ) } , • The same aggregation is made index-wise: with abuse ofnotation, Y ( i ) = a ( X ( i ) , . . . , X n ( i )) , ∀ i ∈ [ m ] , • Y ( i ) does not depend on any ordering of the set { X ( i ) , . . . , X n ( i ) } ,These requirements are well known in traitor tracing and grouptesting as they usually model the collusion attack or the testresults over groups. Here, they simplify the analysis reducingthe problem to a single letter formulation where index i isdropped involving symbols { X , . . . , X n } , Y and Q .These conditions motivate a 2-stage construction. The firststage computes the type (a.k.a. histogram or tally) T of thesymbols { X , . . . , X n } . Denote by T |X | ,n the set of possibletype values. Its cardinality equals |T |X | ,n | = (cid:0) n + |X |− |X |− (cid:1) whichmight be too big. The second stage applies a surjectivefunction r : T |X | ,n → Y , where Y is a much smaller set. B. Noisy query
At enrollment time, the system receives n sequences, aggre-gates them into the compact representation Y , and then forgetsthe n sequences. At query time, the system receives a newsequence Q conforming with one of the following hypotheses: • H : Q is a noisy version of one of the enrolled sequences.Without loss of generality, Q = X + N . • H : Q = X + N , where X shares the same statisticalmodel but it is independent of { X , . . . , X n } .We model the source of noise (due to different acquisitionconditions) by a discrete communication channel. It is definedby function W : X × X → [0 , with W ( q | x ) := P ( Q = q | X = x ) . We impose some symmetry w.r.t. the symbol : W ( s |
0) = η and W (0 | s ) = η , ∀ s ∈ X \{ } .At query time, the system computes a score S = s ( Q , Y ) and compares to a threshold: hypothesis H is deemed true if S ≥ τ . This test leads to two probabilities of error: • P fp ( n, m ) is the probability of false positive: P fp ( n, m ) := P ( S ≥ τ |H ) . • P fn ( n, m ) is the probability of false negative: P fn ( n, m ) := P ( S < τ |H ) .The emphasis on ( n, m ) is natural. It is expected that: i) themore sequences are aggregated, the less reliable the test is, ii)the longer the sequences are, the more reliable the test is. C. Figures of merit ( C , S , V ) The section presents three information theoretic quantities(expressed in nats) measuring the performances of the scheme.The first two depends on the statistical model of X (es-pecially p ) and the aggregation mechanism a . The last onedepends moreover on the channel.
1) Compactness C : The compactness of the group repre-sentation is measured by the entropy C := H ( Y ) . It roughlymeans that the number of typical sequences Y scales expo-nentially as e mH ( Y ) , which can be theoretically compressedto the rate of H ( Y ) nats per symbol.
2) Security S : We consider an insider aiming at disclos-ing one of the n enrolled sequences. Observing the grouprepresentation Y , its uncertainty is measured by the equiv-ocation S := H ( X | Y ) . This means that the insider does notknow which of the e mH ( X | Y ) typical sequences the enrolledsequences are.
3) Verification V : In our application, the requirement ofutmost importance is to have a very small probability of falsepositive. We are interested in an asymptotical setup where m → + ∞ . This motivates the use of the false positive errorexponent as a figure of merit: E fp ( n ) := lim m → + ∞ − m log P fp ( n, m ) . (2)If E fp ( n ) > , it means that P fp ( n, m ) exponentially vanishesas m becomes larger. The theory of test hypothesis showsthat E fp ( n ) is upper bounded by the mutual information V := I ( Y ; Q ) where Q is a symbol of the query sequence, i.e. anoisy version of X . It means that the necessary length forachieving the requirement P fp ( n, m ) < (cid:15) is [5] m ≥ − log (cid:15) V . (3) D. Noiseless setup
The bigger V and S , the better the performance in terms ofverifiability and security. Yet, they can not be both big at thesame time. The noiseless case when the channel introduces noerror and Q = X simply illustrates the trade-off: V ≤ C (4) V + S = H ( X ) , (5)with H ( X ) = − log p + (1 − p ) log pp and p := P ( X =0) (1). For a given |X | , H ( X ) is maximised by the densesolution: H ( X ) ≤ log |X | with equality for p = 1 / |X | .III. B INARY ALPHABET
This section explores the binary case where X = { , } .We first set the surjection as the identity function s.t. Y = T .Then, the impact of the surjection is investigated. A. Working with types
In the binary case, there are n + 1 type values. Therecan be uniquely labelled by the number of symbols ‘1’ in { X , . . . , X n } , i.e. T = (cid:80) ni =1 X i ∼ B ( n, p ) .
1) Verification:
In the noiseless case, after some rewriting: V = h ( p ) − n (cid:88) t =0 P ( T = t ) h (cid:18) tn (cid:19) , (6)with h ( p ) := − p log( p ) − (1 − p ) log(1 − p ) , the entropy of aBernoulli r.v. B ( p ) . If p = 1 / and n is large: V = 12 n + o (cid:18) n (cid:19) . (7)his is not the maximum of this quantity. For large n , the bestoption is to set p = αn , V = βn + o (cid:18) n (cid:19) , (8)with α = 1 . and β = 0 . . This was proven in the totallydifferent application of traitor tracing [6, Prop. 3.8].This section outlines two setups: the dense setup where p = 1 / , and the sparse setup where p goes to when moresequences are packed in the group representation. Both setupsshare the asymptotical property that V ≈ κ/n for large n .According to (3), we can pack a big number n of sequencesinto one group representation provided that their length m scales proportionally to n .
2) Compactness:
The figure of merit for compactness fortypes is just C = H ( T ) where T follows a binomial distri-bution: T ∼ B ( n, p ) . In the dense setup p = 1 / , the bino-mial distribution is approximated by a Gaussian distribution N ( n/ n/ providing: C = 12 log (cid:16) πen (cid:17) + O (cid:18) n (cid:19) . (9)In the sparse setup p = α/n , the binomial distribution isapproximated by a Poisson distribution P ( α ) [7]: C ≈ α (1 − log( α )) + e − α + ∞ (cid:88) j =0 α j log( j !) j ! . (10)This shows that the types are not compact in the dense setup;It approximatively remains constant in the sparse setup.
3) Security:
Thanks to (5), we only need to calculate H ( X ) = h ( p ) . In the dense setup, H ( X ) = log(2) and S converges to H ( X ) as n increases. Merging into a singlerepresentation protects an individual sequence. If sparse, H ( X ) = αn (cid:16) − log αn (cid:17) + o (cid:18) n (cid:19) . (11)Therefore, S converges to zero as n increases, contrary tothe dense setup. It might be more insightful to see thatthe ratio of uncertainties before and after observing T , i.e. H ( X ) /H ( X | T ) , converges to 1 in both cases. Merging doesprovide some security but sparsity is more detrimental. B. Adding a surjection
The motivation of the surjection onto a smaller set Y isto bound C as C ≤ log |Y| , ∀ n . The Markov chain Q → X → T → Y imposes that V ≤ I ( T ; Q ) . The surjection thusprovoques a loss in verification as depicted in Fig. 1.App. A shows that for |Y| = 2 , this loss is minimized for: r ( t ) = (cid:40) if t < t p otherwise (12)where t p is a threshold depending on p . In the dense setup, t p = n/ and the surjection corresponds to a majority vote Fig. 1. The trade-off ( S , V , C ) for X = { , } , n = 16 , Y = T (blue), Y = r ( T ) for ‘All-1’ (red) and majority vote (green). Dashed plot representsthe projection onto C = 0 . Triangles and stars summarize results (7) to (14). collusion in traitor tracing (a threshold model in group testing).Hence, by [6, Prop. 3.4]: V = 1 nπ + o (cid:18) n (cid:19) . (13)In the sparse setup t p = 1 which corresponds to an ‘All-1’attack in traitor tracing (a the perfect model in group testing).Then the best option is to set p = log(2) /n and [6, Prop. 3.3]: V = (log(2)) n + o (cid:18) n (cid:19) . (14)From (3), the necessary length is m ≥ − n log( (cid:15) ) / (log(2)) .The main property V ≈ κ/n still holds but the surjectionlowers κ from . to . (dense), from . to . (sparse).The sparse setup is still the best option w.r.t. V . C. Relationship with the Bloom filter
A Bloom filter is a well-known data structure Y ∈ { , } m designed for set membership, embedding items to be enrolledinto Y thanks to k hash functions. Its probability of false neg-ative is exactly , whereas the probability of false positive isnot null. The number of hash functions minimizing P fp ( n, m ) is k = (cid:98) log(2) m/n (cid:99) . Then, the necessary length to meet arequired false positive level (cid:15) is m ≥ − n log( (cid:15) ) / (log(2)) .These numbers show the connection with our scheme (14).At the enrollment phase, the hash functions indeed associateto the j -th item a binary sequence X j indicating which bitsof Y have to be set. This sequence is indeed sparse with k/m ≈ log(2) /n . The necessary length is the same. Indeed,the enrollment phase of a Bloom filter is nothing more thanthe ‘All-1’ surjection.The only difference resides in the statistical model. There isat most k symbols ‘1’ in sequence X j whereas, in our model,that follows a binomial distribution B ( m, p ) . Yet, asymptot-ically as m → ∞ , by some concentration phenomenon, thetwo models get similar. This explains why we end up withsimilar optimal parameters. Yet, the Bloom filter only workshen the query object is exactly one enrolled item, whereasthe next section shows that our scheme is robust to noise.IV. R EAL VECTORS
This section deals with real vectors: n vectors to be enrolled { (cid:126)x , . . . , (cid:126)x n } ⊂ R d , and the query vector (cid:126)q ∈ R d . All haveunit norm. An embedding mechanism E : R d → X m makesthe connection with the previous section. As in [8], this studymodels the embedding as a probabilistic function. A. Binary embedding
For instance, for X = { , } , a popular embedding is: X ( i ) = [ (cid:126)x (cid:62) (cid:126)U i > λ x ] , ∀ i ∈ [ m ] (15)where (cid:126)U i i.i.d. ∼ N ( (cid:126) d , I d ) . This in turn gives i.i.d. Bernoullisymbols { X ( i ) } with p = 1 − Φ( λ x ) if (cid:107) (cid:126)x (cid:107) = 1 .At the query time, the embedding mechanism uses the samerandom vectors but a different threshold: Q ( i ) = [ (cid:126)q (cid:62) (cid:126)U i > λ q ] , ∀ i ∈ [ m ] . (16)Under H , suppose that (cid:126)q (cid:62) (cid:126)x = c < . This correlationdefines the channel X → Q with the error rates: η = P ( (cid:126)q (cid:62) (cid:126)U > λ q | (cid:126)x (cid:62) (cid:126)U ≤ λ x ) , (17) η = P ( (cid:126)q (cid:62) (cid:126)U ≤ λ q | (cid:126)x (cid:62) (cid:126)U > λ x ) . (18)The error rate η has the expression (and similarly for η ): η = 1 − − p ) √ π (cid:90) λ x −∞ Φ (cid:18) λ q − cx √ − c (cid:19) e − x dx. (19) B. Induced channel
For this embedding, the parameters ( λ x , λ q , c, d ) for thevectors define the setup ( p, η , η ) for the sequences. It is apriori difficult to find the best tuning ( λ x , λ q ) . For a fixed λ x , η decreases with λ q while η increases. App. B reveals that V is sensitive to η especially with the ‘All-1’ surjection of thesparse solution. Fig. 2 shows indeed that the dense solution ( λ x , λ q ) = (0 , is more robust, unless c is very close to1. Here, we enforce a surjection (identity, All-1, or majorityvote) and make a grid search to find the optimum ( λ x , λ q ) fora given c . It happens that these parameters are better set to0, i.e. dense solution, for the identity and majority vote. Asfor the ‘All-1’ surjection, we observe that λ x is s.t. p ≈ /n and λ q is slightly bigger than λ x to lower η . Yet, this sparsesolution is not as good as the dense solution unless c is closeto 1, i.e. the query vector is very close to the enrolled vector.This observation holds only for the embedding func-tion (15). Hashing functions less prone to error η may exist.V. E XPERIMENTAL WORK
We evaluate our scheme with face recognition. Face imagesare coming from LFW [9], CFP [10] and FEI [11] databases.For each dataset, N individuals are enrolled into randomgroups. There is the same number N q of positive and negative(impostors) queries. TypesAll1Maj
Fig. 2. V as a function of correlation c , d = 256 , n = 15 . Labeled Faces in the Wild:
These are pictures of celebri-ties in all sort of viewpoint and under an uncontrolled envi-ronment. We use pre-aligned LFW images. The enrollment setconsists of N = 1680 individuals with at least two images inthe LFW database. One random template of each individualis enrolled in the system, playing the role of (cid:126)x i . Some other N q = 263 individuals were randomly picked in the databaseto play the role of impostors. Celebrities in Frontal-Profile:
These are frontal andprofile views of celebrities taken in an uncontrolled environ-nement. We only use N = 400 frontal images enrolled in thesystem. The impostor set is a random selection of N q = 100 other individuals. Faculdade de Engenharia Industrial:
The FEI databasecontains images in frontal view in a controlled environnement.We use pre-aligned images. There are subjects with twofrontal images (one with a neutral expression and the otherwith a smiling facial expression). The database is created byrandomly sampling N = 150 individuals to be enrolled, and N q = 50 impostors. A. Experimental Setup
Face descriptors are obtained from a pre-trained networkbased on VGG-Face architecture followed by PCA [12] .FEI corresponds to the scenario of employees entering in abuilding with face recognition, whereas CFP is more difficult,and LFW even more difficult. To equalize the difficulty, weapply a dimension reduction (Probabilistic Principal Compo-nent Analysis [13]) to d = 128 (FEI), (CFP), and (LFW). The parameters of PPCA are learned on a differentset of images, not on the enrolled templates and queries. Thevectors are also L normalized. With such post-processing, theaverage correlation between positive pairs equals 0.83 (FEI),0.78 (CFP), and 0.68 (LFW) with a standard deviation of . .Despite the dimension reduction, the hardest dataset is LFWand the easiest FEI.In one simulation run, the enrollment phase makes randomgroups with the same number n of members. A user claimsshe/he belongs to group g . This claim is true under hypothesis H and false under hypothesis H ( i.e. the user is an impostor).Her/his template is quantized to the sequence Q , and ( Q , g ) is sent to the system, which compares Q to the group repre-
10 15 20 25 30 n P f n FEI n P f n CFP n P f n LFW
EoA-SP AoE-SP EoA-ML AoE-ML Our
Fig. 3. Verification performance P fn @ P fp = 0 . vs. group size n for the baselines (see Sect. V-B) and our scheme. sentation Y g . This is done for all impostors and all queries ofenrolled people. One Monte-Carlo simulation is composed of runs. The figure of merit is P fn when P fp = 0 . . B. Exp.
Our scheme is compared to the following baselines: • EoA-SP and AoE-SP [1] (signal processing approach) • EoA-ML and AoE-ML [2] (machine learning approach)The drawback of these baselines is that the length m of thedata structure is bounded. Here, it is set to maximum value, i.e. m = d the dimension of templates.Our scheme allows more freedom. Setting m = 8 × d produces a much bigger representation. It is not surprising thatour scheme is better than the baselines. Fig. 3 validates ourmotivation to get rid off the drawback of the baselines withlimited m , to achieve better verification performance. Theseresults are obtained with the dense solution. Indeed, despiteall our efforts, we could not achieve better results with thesparse solution. This confirms the lesson learnt from Fig. 2:the dense solution outperforms the sparse solution when theaverage correlation between positive pairs is lower than . .The improvement is also better as the size of groupsincreases. We explain this by the use of the types, i.e. Y = T .Equation (9) shows that C increases with n for the densesolution, compensating for aggregating more templates. C. Exp.
There are two ways for reducing the size of the grouprepresentation. The first means is to decrease m , the secondmeans is to lower C thanks to a surjection. Sect. III-B pre-sented optimal surjections from T ,n to Y = { , } . We foundexperimentally good surjections to sets Y for |Y| ∈ { , , } .This is done according to the following heuristic. Startingfrom T ,n , we iteratively decrease the size of Y by one.This amounts to merge two symbols of Y . By brute force,we analyse all the pairs of symbols measuring the loss in V induced by their merging. By merging the best pair, wedecrease the number of symbols in Y by one. This process isiterated until the targeted size of Y is achieved. This heuristicis not optimal, but it is tractable. Fig. 4 compares these two means. Employing a coarser surjection is slightly better interms of verification performances. D. Unexpected results
We have argued that FEI < CFP < LFW in terms ofdifficulty due to the opposite ordering of the datasets typicalcorrelation c between positive pairs. Eq. (19) shows that alower c produces a higher η (and η ), whence a lower V . InFig. 3, the experimental results contradict this intuition.This may be explained by the Signal to Noise Ratio at thetemplate level. We define it as c /v where c is the averagecorrelation for positive pairs and v is the variance of thiscorrelation for negative pairs. If a negative query is uniformlydistributed over the hypersphere, then its correlation with anenrolled template is approximatively distributed as a centeredGaussian distribution with variance v = 1 /d .Yet, d has no impact on p , η , and η . We suppose thatits impact is tangible on the entropy of the template vectors.Sect. II assumes that the enrolled sequences are statisticallyindependent. This assumption is not granted with the embed-ding of Sect. IV. Yet, a bigger d favors the independence (orat least the decorrelation) between real template vectors.VI. C ONCLUSION
Our theoretical study justifies that the dense setup is moreinteresting in terms of verification performance V and securitylevel S unless we are operating in the high-SNR regime wherethe positive queries are very well correlated with the enrolledtemplates. This statement holds for any embedding, yet someare certainly more suited than others depending on d , c , andthe geometrical relationship among positive pairs.A CKNOWLEDGMENT
This work is supported by the project CHIST-ERA ID IOT20CH21 167534.
PPENDIX
Let us first explain how V is computed. Denote P i ( q, y ) := P ( Q = q, Y = y |H i ) and channel W ( q | x ) := P ( Q = q | X = x ) , ∀ y ∈ Y , q ∈ X and i ∈ { , } . Then, V = (cid:88) q,y P ( q, y ) log P ( q, y ) P ( q, y ) , (20)with P ( q, y ) = P ( Q = q ) P ( Y = y ) and P ( q, y ) = (cid:88) x ∈X P ( Y = y, X = x ) W ( q | x ) . (21) A. Surjection to Y = { , } We assume here the noiseless setup allowing to write P ( Y = y, X = x ) as P ( x, y ) . Inspired by traitor tracing, we considera probabilistic surjection where P ( r ( t ) = 1) = θ t . The vector θ ∈ [0 , n +1 parametrizes the surjection. Denote by ∇ θ V ( t ) the derivative w.r.t. θ t . After some lengthy calculus: ∇ θ V ( t ) = n − K ( p, θ )( t − nK ( p, θ )) , (22) K ( p, θ ) = P ( T = t )∆ ,K ( p, θ ) = h (cid:48) ( P (0 , − h (cid:48) ( P ( Y = 1))∆ , ∆ = h (cid:48) ( P (0 , − h (cid:48) ( P (1 , . It is not possible to cancel the gradient ∇ θ V . The optimal θ thus lies on the boundary of the hypercube [0 , n +1 .This makes the surjection deterministic. Assuming P ( Y =1 | X = 0) < P ( Y = 1 | X = 1) , then < K ( p, θ ) and < K ( p, θ ) ≤ because h (cid:48) ( · ) is strictly decreasing. Thismakes ∇ θ V (0) < and θ must be set to the lowest possiblevalue, i.e. θ = 0 , to increase V at most. This is indeed thecase for any θ t with t < K ( p, θ ) . In the same way, θ n = 1 and so is θ t if t > K ( p, θ ) . Yet, for a given θ , K ( p, θ ) ranges from 0 to as p increases from 0 to 1. Therefore, θ = (0 , . . . , , , . . . , is optimal only over an interval of p . m C P f n CFPLFW
Fig. 4. Verification performance P fn @ P fp = 0 . vs. m C , for n = 16 . Thisquantity is reduced by decreasing m (dashed lines) or by decreasing C thanksto a surjection (solid lines). For n odd and p = 1 / , θ t = 0 if t ≤ ( n + 1) / , and 1 ( i.e. majority vote) otherwise is optimal because K (1 / , θ ) = 1 / ( P ( Y = 1) = 1 / and P (0 ,
1) = 1 − P (1 , ).The ‘All-1’ surjection: θ = (0 , , . . . , makes P (1 ,
1) = 1 so that ∇ θ V ( t ) = + ∞ if t > and < for t = 0 . B. Impact of the channel
Suppose that η is a parameter of the channel W ( ·|· ) . Then ∂ V ∂η = (cid:88) q,y ∂P ( q, y ) ∂η log P ( q, y ) P ( q, y ) , (23)because (cid:80) q,y ∂P ( q,y ) ∂η = ∂ (cid:80) q,y P ( q,y ) ∂η = 0 and (cid:80) q,y P ( q,y ) P ( q,y ) ∂P ( q,y ) ∂η = (cid:80) q ∂ P ( Q = q ) ∂η = 0 .Suppose now that η = η := W ( q | , ∀ q ∈ X \ . Then, ∂P ( q, y ) ∂η = P ( X = 0 , Y = y ) ∀ q ∈ X \{ } . (24)Taking (23) around the noiseless channel where η = 0 and P ( X = 0 , Y = y ) = P (0 , y ) because Q = X : ∂ V ∂η (cid:12)(cid:12)(cid:12)(cid:12) η =0 = (cid:88) y,x (cid:54) =0 P (0 , y ) log P ( x, y ) P ( x, y ) + . . . (25)We only express the first terms to outline that if P ( x, y ) =0 while P (0 , y ) and hence P ( x, y ) are not null, then thisderivative goes to −∞ . A small deviation from the noiselesscase with η (cid:54) = 0 has a major detrimental impact on V . Thatsituation happens for sure when working with type, i.e. Y = T : Consider the null type t obtained when X = . . . = X n =0 : P (0 , t ) > while P ( x, t ) = 0 , ∀ x (cid:54) = 0 .One can prove that the surjection can mitigate this effect if ∃ t (cid:54) = t : r ( t ) = r ( t ) and P (0 , t ) > . This happens with themajority vote of the dense setup, but unfortunately, not withof the ‘All-1’ surjection in the sparse setup.R EFERENCES[1] M. Gheisari, T. Furon, L. Amsaleg, B. Razeghi, and S. Voloshynovskiy,“Aggregation and embedding for group membership verification,” in
Proceedings of the IEEE International Conference on Acoustics, Speechand Signal Processing , 2019.[2] M. Gheisari, T. Furon, and L. Amsaleg, “Privacy preserving groupmembership verification and identification,” in
The IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR) Workshops , June2019.[3] B. Razeghi, S. Voloshynovskiy, D. Kostadinov, and O. Taran, “Privacypreserving identification using sparse approximation with ambiguiza-tion,” in
Proceedings of the IEEE International Workshop on InformationForensics and Security , 2017.[4] B. Razeghi and S. Voloshynovskiy, “Privacy-preserving outsourcedmedia search using secure sparse ternary codes,” in
Proceedings ofthe IEEE International Conference on Acoustics, Speech and SignalProcessing , 2018.[5] C. E. Shannon, “Probability of error for optimal codes in a gaussianchannel,”
Bell System Tech. J. , vol. 38, pp. 611–656, 1959.[6] T. Laarhoven, “Search problems in cryptography from fingerprinting tolattice sieving,” Ph.D. dissertation, Eindhoven University of Technology,2015.[7] J. Boersma, “Solution to problem 87-6* : The entropy of a poissondistribution,”
SIAM Review , vol. 30, no. 2, pp. 314–317, 1988.[8] A. Andoni, P. Indyk, T. Laarhoven, I. P. Razenshteyn, and L. Schmidt,“Practical and optimal LSH for angular distance,”
NIPS , 2015. [Online].Available: http://arxiv.org/abs/1509.028979] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled facesin the wild: A database forstudying face recognition in unconstrainedenvironments,” in
Workshop on faces in’Real-Life’Images: detection,alignment, and recognition , 2008.[10] S. Sengupta, J.-C. Chen, C. Castillo, V. M. Patel, R. Chellappa, andD. W. Jacobs, “Frontal to profile face verification in the wild,” in
Proceeding of the IEEE Winter Conference on Applications of ComputerVision , 2016.[11] C. E. Thomaz and G. A. Giraldi, “A new ranking method for principalcomponents analysis and its application to face image analysis,”
Imageand Vision Computing , vol. 28, no. 6, pp. 902–913, 2010.[12] O. M. Parkhi, A. Vedaldi, A. Zisserman et al. , “Deep face recognition.”in
Proceedings of the British Machine Vision Conference , 2015.[13] M. E. Tipping and C. M. Bishop, “Probabilistic principal componentanalysis,”