Nearest neighbor decoding for Tardos fingerprinting codes
NNearest neighbor decoding for Tardos fingerprinting codes
Thijs Laarhoven [email protected] University of TechnologyEindhoven, The Netherlands
ABSTRACT
Over the past decade, various improvements have been made toTardos’ collusion-resistant fingerprinting scheme [Tardos, STOC2003], ultimately resulting in a good understanding of what isthe minimum code length required to achieve collusion-resistance.In contrast, decreasing the cost of the actual decoding algorithmfor identifying the potential colluders has received less attention,even though previous results have shown that using joint decodingstrategies, deemed too expensive for decoding, may lead to bettercode lengths. Moreover, in dynamic settings a fast decoder maybe required to provide answers in real-time, further raising thequestion whether the decoding costs of score-based fingerprintingschemes can be decreased with a smarter decoding algorithm.In this paper we show how to model the decoding step of score-based fingerprinting as a nearest neighbor search problem, andhow this relation allows us to apply techniques from the field of(approximate) nearest neighbor searching to obtain decoding timeswhich are sublinear in the total number of users. As this does notaffect the encoding and embedding steps, this decoding mechanismcan easily be deployed within existing fingerprinting schemes, andthis may bring a truly efficient joint decoder closer to reality.Besides the application to fingerprinting, similar techniques canbe used to decrease the decoding costs of group testing methods,which may be of independent interest.
CCS CONCEPTS • Security and privacy → DRM ; •
Theory of computation → Nearest neighbor algorithms ; Sorting and searching . KEYWORDS collusion-resistance, fingerprinting codes, watermarking, nearestneighbor searching, group testing
ACM Reference Format:
Thijs Laarhoven. 2019. Nearest neighbor decoding for Tardos fingerprintingcodes. In
Proceedings of ACM Conference (Conference’17).
ACM, New York,NY, USA, 7 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
Conference’17, July 2017, Washington, DC, USA © 2019 Association for Computing Machinery.This is the author’s version of the work. It is posted here for your personal use. Notfor redistribution. The definitive Version of Record was published in
Proceedings ofACM Conference (Conference’17) , https://doi.org/10.1145/nnnnnnn.nnnnnnn.
Fingerprinting techniques for digital content provide a way forcopyright holders to uniquely mark each copy of their content,to prevent unauthorized redistribution of this content: if a digital“pirate” nevertheless decides to publicly share his (fingerprinted)content with others, the owner of the content can obtain this copy,extract the fingerprint, link it to the responsible user, and takeappropriate steps. Digital pirates may try to prevent being caughtby collaborating, and forming a mixed copy of the content fromtheir individual copies, thus mixing up the embedded fingerprint aswell. To guarantee that collusions of pirates cannot get away withthis, collusion-resistant fingerprinting schemes are needed.Mathematically speaking, collusion-resistant fingerprinting canbe modeled as follows. First, the content owner generates codewords x j ∈ { , } ℓ for j = , . . . , n , corresponding to fingerprintsfor the n users, where each of the ℓ columns defines one segmentof the content. Then, a collusion C of c colluders applies a mixingstrategy to their code words { x j } j ∈C to form a new pirate copy y .Here the critical condition we impose on this mixing procedure isthe marking assumption , stating that if x j , i = b for all j ∈ C , thenalso y i = b , for b ∈ { , } . Finally, the owner of the content obtains y , applies a decoding algorithm to y and all the code words { x j } nj = ,and outputs a subset of user indices j . This method is successful if,with high probability over the randomness in the code generationand mixing strategy, the decoding algorithm outputs (a subset of)the colluders, without incriminating any innocent, legitimate users. In the late 1990s, Boneh–Shaw [9] were the first to design a some-what practical, combinatorial solution for collusion-resistant fin-gerprinting. Their construction based on error-correcting codesachieved high success probabilities with a code length of ℓ = O ( c log n ) , i.e. scaling logarithmically in the often large numberof users n , and quarticly in the number of colluders c . The mile-stone work of Tardos [40] later improved upon this with a codelength ℓ = O ( c log n ) , and he proved that this quadratic scalingin c is optimal. Later work focused on bringing down the leadingconstants in Tardos’ scheme [8, 15, 16, 19, 30, 32, 38, 39], lead-ing to an optimal code length of ℓ ∝ π c ln n for the original(symmetric) Tardos score function [24, 25, 37], and an optimal over-all code length of ℓ ∝ c ln n when using further improved de-coders [12, 15, 21, 32, 33].The main focus of most literature on collusion-resistant finger-printing has been on decreasing ℓ – the shorter the fingerprints, thefaster the colluders can be traced. Other aspects of these fingerprint-ing schemes, however, have received considerably less attentionin the literature, and in particular the decoding procedure is often a r X i v : . [ c s . CR ] F e b onference’17, July 2017, Washington, DC, USA Thijs Laarhoven neglected. Indeed, with n users and a code length of ℓ = O ( c log n ) ,the decoding time is commonly O ( ℓ · n ) = O ( c n log n ) for lineardecoders, and up to O ( c n k log n ) for joint decoders , attemptingto decode to groups of k ≤ c colluders simultaneously. In partic-ular, past work has shown that joint decoders achieve superiorperformance to simple decoders [2, 7, 11, 17, 18, 21, 23, 30], but areoften considered infeasible due to their high decoding complexity.Moreover, in dynamic settings [22, 26] where decisions about theaccusation of users need to be made swiftly, an efficient decodingmethod is even more critical. Techniques that can speed up thedecoding procedure may therefore be useful for further improvingthese schemes in practice. In this work, we study how the decoding method in the score-based fingerprinting framework can be improved, leading to fasterdecoding times. In particular, we show that we can typically bringdown the decoding costs of simple decoders from O ( ℓ n ) to O ( ℓ n ρ ) where ρ ≤ c and the instantiation of the score-based framework. This usually comes at a higher space requirementfor indexing the code words in a query-efficient data structure,although arbitrary trade-offs between the space and query time canbe obtained by tweaking the parameters.To obtain these improved results, we show that we can modelthe decoding procedure of Tardos-like fingerprinting as a high-dimensional nearest neighbor search problem. This field of researchstudies methods of storing data in more refined data structures,such that highly similar vectors to any given query point can befound faster than with a linear search through the data. Applyingpractical, state-of-the-art techniques from this area, such as thelocality-sensitive hashing mechanisms of [3, 10] and the more re-cent locality-sensitive filtering of [4, 6], this allows us to obtainfaster, sublinear decoding times, both in theory and in practice. Wegive a recipe how to apply these techniques to any score-basedfingerprinting framework, and give an explicit, detailed analysis ofwhat happens when we use these techniques in combination withthe symmetric score function of Škorić–Katzenbeisser–Celik [37].To illustrate these results for the symmetric score function, let usprovide some explicit decoding costs for the case of c =
2. We obtaina decoding time of O ( ℓ n / ) without using additional memory;a decoding cost of O ( ℓ n / ) when using O ( ℓ n / ) memory; anda subpolynomial decoding cost ℓ n o ( ) when using up to O ( ℓ n ) storage. For larger c the improvement becomes smaller, and inthe limit of large c nearest neighbor techniques offer only minorimprovements over existing decoding schemes – the main benefitlies in defending against small or moderate collusion sizes. The remainder of this paper is organized as follows. In Section 2 wefirst introduce notation, we describe the score-based fingerprint-ing framework, and we cover basics on nearest neighbor searching.Section 3 describes how to apply nearest neighbor techniques to fin-gerprinting, and analyzes the theoretical impact of this application.Section 4 covers basic experiments, to illustrate the potential effectsof these techniques in practice, and Section 5 finally discusses otheraspects of our proposed improvement.
Table 1: Notation used throughout the paper. The first fiverows indicate how concepts in fingerprinting translate toconcepts in nearest neighbor searching.
FP terminology NNS terminology n Number of users ∼ n Number of points ℓ Code length ∼ d Dimension of data x j Code word ∼ v j Data point y Pirate copy ∼ q Query point s j User score ∼ ⟨ v j , q ⟩ Dot product c Colluders α Approximation factor p Probability vector ρ q Query time exponent д Score function ρ s Space exponent
We first recall the score-based fingerprinting framework, introducedby Tardos [40], and describe the model considered in this paper.The fingerprinting game consists of three phases: (1) encoding ,generating the fingerprints and embedding them in the content;(2) the collusion attack , constructing the mixed fingerprint; and (3) decoding , mapping the mixed fingerprint to a set of accused users.
First, the copyright holder generates code words x j ∈ { , } ℓ for each of the users j = , . . . , n . To do this, he firstgenerates a probability vector p ∈ [ , ] ℓ , where each coordinate p i is drawn independently from some fixed probability distribution F .In the original Tardos scheme, and many of its variants, a truncatedversion of the arcsine distribution is used, which has cumulativedensity function F given below. F ( p ) : = π arcsin √ p . ( ≤ p ≤ ) (1)For small c and the symmetric Tardos score function, certain dis-crete distributions are known to be optimal [31], leading to a betterperformance and shorter code lengths. For large c , these optimaldiscrete distributions converge to the arcsine distribution (1).After generating each entry p i from the chosen bias distribution,the code words for the users are generated as follows: for each user j , the i th entry of their codeword x j , i is set to 1 with probability p i ,and 0 with probability 1 − p i . This assignment is done independentlyfor each i and j . These fingerprints are then embedded in the contentand sent to the users. (Note that it is crucial that the bias vector p remains secret, and is not known to the colluders during the attack.) Given a collusion C of some size c , thepirates employ a strategy θ , mapping their code words { x j } j ∈C toa mixed copy y ∈ { , } ℓ . Often the attack is modeled by a vector θ = ( θ , . . . , θ c ) , where θ k : = Pr ( y i = | (cid:205) j ∈C x j , i = k ) . By themarking assumption, we assume θ = θ c = Given y , the content holder attempts to deducewho were responsible for creating this pirate copy. For this, hecomputes scores s j , i : = д ( x j , i , y i , p i ) for some score function д ,and then computes the total user scores as s j : = (cid:205) ni = s j , i . Forwell-chosen parameters, the cumulative scores s j are significantly earest neighbor decoding for Tardos fingerprinting codes Conference’17, July 2017, Washington, DC, USA higher for colluders than for innocent users. The actual decisionwhom to accuse is then made by e.g. setting a threshold z andaccusing users j with s j > z , or by accusing the user with the highestcumulative score. As an example, the symmetric score function ofŠkorić–Katzenbeisser–Celik [37] is given below. д ( x , y , p ) : = + (cid:112) p /( − p ) , if x = y = − (cid:112) ( − p )/ p , if x = y = − (cid:112) p /( − p ) , if x = y = + (cid:112) ( − p )/ p , if x = y = . (2) Since the decoding step remains equallyvalid after linear transformations (i.e. scaling all user scores by acommon positive factor, or shifting all user scores by the sameamount), the following scoring function is equivalent to the sym-metric score function described above:ˆ д ( x , y , p ) : = (cid:40) + / (cid:112) p ( − p ) , if x = y ; − / (cid:112) p ( − p ) , if x (cid:44) y . (3)To see why, note that the contribution of a segment for the sym-metric score function, in terms of how far the scores for a matchand a difference are apart, is independent of y : д ( , , p ) − д ( , , p ) = д ( , , p ) − д ( , , p ) = (cid:112) p ( − p ) . (4)In other words, as long as the difference between a match and adifference in segment i is proportional to 1 / (cid:112) p i ( − p i ) (with apositive contribution for a match, and a negative contribution fora difference), this only constitutes a scaling/transformation of thescores. By scaling the scores by a factor 2, and centering the scoresat 0, we obtain the score function ˆ д . (Note that the threshold z willhave to be scaled and translated by the same amounts to guaranteeequivalent error probabilities.) Next, let us recall some definitions, techniques, and results fromnearest neighbor searching (NNS). Given a data set { v , . . . , v n } ⊂ R d , this problem asks to index these points in a data structure suchthat, when later given a query vector q ∈ R d , one can quicklyidentify the nearest vector to q in the data set. To measure theperformance of an NNS method, we consider the space complexity S = O ( n + ρ s ) and the query time complexity T = O ( n ρ q ) to processa query q . Note that a naive linear search, without any indexingof the data, achieves T = S = O ( n ) or ρ s = ρ q =
1. Ideally agood NNS method should achieve ρ q <
1, perhaps with ρ s > unit sphere , under the ℓ -norm: we assume that ∥ v j ∥ = j ,and ∥ q ∥ =
1. The following lemma states that, if the entire data sethas a small dot product ⟨ v j , q ⟩ : = (cid:205) dj = v j , d q j with q , except forone near neighbor v j ∗ , which has a large dot product with q , thenfinding this unique near neighbor can be done efficiently in sublin-ear time. The parameter α ≥ approximation factor (denoted c in e.g. [4]) – one obtains a sublineartime complexity for NNS only when either an approximate solutionsuffices, or there is a guarantee that the data set contains uniquenearest neighbors which are a factor α closer (under the ℓ -norm)than all other vectors in the data set. Lemma 2.1 (NNS complexities [4]). Suppose that the data points v i and query q have norm , and we are given two guarantees: • For the nearest neighbor v j ∗ , we have ⟨ v j ∗ , q ⟩ ≥ d ; • For all other vectors v j (cid:44) v j ∗ , we have ⟨ v j , q ⟩ ≤ d .Let α = √ − d /√ − d ≥ , and let ρ q , ρ s ≥ satisfy: α √ ρ q + ( α − )√ ρ s ≥ (cid:112) α − . (5) Then we can construct a data structure with ˜ O ( n + ρ s ) space andpreprocessing time, allowing to answer any query q correctly (withhigh probability) with query time complexity ˜ O ( n ρ q ) . To achieve the above complexities, at a high level the data struc-ture looks as follows. Given the normalized data points v j , all lyingon the unit sphere, we first sample many random vectors r k onthe sphere, and for each of these vectors we store which vectorsare close to r k in a bucket B k . The key property we use here is(approximate) transitivity of closeness on the sphere: if x and y areclose, and y and z are close, then also x and z are more likely to beclose than usual. In other words, if v j is close to q , and v j is closeto r k (and contained in bucket B k ), then likely q will also be closeto r k . Therefore, if we create and index many of these buckets B k and, given q , we compute the dot products of q with the vectors r k and only check those buckets B k for potential near neighborsfor which ⟨ q , r k ⟩ is large, then we may only check a small fractionof the entire data set for potential nearest neighbors, while stillfinding all near neighbors. (The actual state-of-the-art techniquesfrom [4] are slightly more sophisticated. For further details, werefer the reader to [3, 4, 6].)Note that Lemma 2.1 only states the scaling behavior of the timeand space complexities, and does not state how large the real over-head is in practical scenarios. For actual applications of NNS tech-niques, we refer the reader to e.g. the benchmarks of [5], comparingimplementations of various NNS techniques for their practicalityon real-world data sets, including data sets on the sphere. To apply NNS techniques to score-based fingerprinting, let us firstshow how we can phrase the decoding step of score-based finger-printing as an NNS problem on the sphere. First, we map the n codewords x j ∈ { , } ℓ to n data points v j ∈ {− , } ℓ by the linearoperation v j = x j − : a 1 in x j is mapped to a 1 in v j , and a 0 in x j to a − v j . Next, given y ∈ { , } ℓ , we map it to a query vector q as q i = ( y i − )/ (cid:112) p i ( − p i ) : the entries of q are ± / (cid:112) p i ( − p i ) ,depending on the value of y i . Note that the Euclidean norms of thedata and query vectors are given by: ∥ v j ∥ = √ ℓ, ∥ q ∥ = (cid:118)(cid:117)(cid:116) ℓ (cid:213) i = p i ( − p i ) . (6)To guarantee that all vectors are normalized, we will later have toscale everything down by ∥ v j ∥ and ∥ q ∥ accordingly. Observe thatwith the modified symmetric score function ˆ д , the user score s j can onference’17, July 2017, Washington, DC, USA Thijs Laarhoven now be equivalently expressed in terms of dot products as follows: s j = ℓ (cid:213) i = ˆ д ( x j , i , y i , p i ) = ℓ (cid:213) i = ( x j , i − )( y i − ) (cid:112) p i ( − p i ) = ⟨ v j , q ⟩ . (7)Therefore, a user score s j is large iff the dot product between v j and q is large, and v j and q are near neighbors in space.With the above translation in mind, we can now apply the afore-mentioned NNS techniques. To apply Lemma 2.1, after normaliza-tion we need to provide two guarantees: • A nearest neighbor v j ∗ (i.e. a code word x j ∗ of a colluder j ∗ ∈ C ) must have a large dot product with the query vector q (i.e. must have a high score s j ); • Other neighbors v j (i.e. innocent users j ) must have a smalldot product with q (i.e. must have a low score s j ).For this, we could derive similar proven bounds on Pr ( s j > z ) forinnocent and guilty users, as previously done in e.g. [8, 25, 37, 39,40], taking into account that the scores have been transformed.Instead let us give a slightly informal, high-level description ofwhat these results may be.Let H be the hypothesis that user j is innocent, and H thehypothesis that user j is a colluder. Let µ b = E p , x j , y ( s j | H b ) for b ∈ { , } . By the central limit theorem, cumulative user scoresare distributed approximately normally for large ℓ , and if bothvariances σ b = E p , x j , y ( s j | H b ) − µ b are small, we may concludethat with high probability these scores are closely concentratedaround their means. Then we can estimate the parameters d and d for Lemma 2.1, after normalization, as follows: d = µ ∥ v j ∥ · ∥ q ∥ , d = µ ∥ v j ∥ · ∥ q ∥ . (8)Here the expressions for ∥ v j ∥ and ∥ q ∥ follow from (6). Note how-ever that both µ and µ are likely dependent on the collusion attack,and may not be known in advance, before the decoding stage. Let us first investigate the simplest case of c =
2, i.e. having twocolluders. Under the assumption that the colluders work symmetri-cally, there is no collusion strategy to consider: if they have the samesymbol, they output this symbol, and if they receive both a 0 and a 1they can choose either with equal probability (i.e. θ = ( , , ) ). For c = p i = for all i , leading to uniformlyrandom codes. This implies ∥ q ∥ = √ ℓ , µ = µ = ℓ , resultingin d = d = , as in Table 2. In Lemma 2.1 this leads to anapproximation factor α = √
2, and a trade-off between the time andspace exponents ρ q and ρ s of:2 √ ρ q + √ ρ s ≥ √ . (9)Without increasing the memory (i.e. for ρ s = ρ q ≥ ,i.e. an asymptotic query time complexity for decoding of O ( n / ℓ ) .Setting ρ q = ρ s we obtain ρ q ≥ , i.e. with O ( n / ℓ ) memory, wecan obtain a query complexity of O ( n / ℓ ) . With a large amount ofmemory and preprocessing time, we can further get a subpolyno-mial query time n o ( ) ℓ at the cost of O ( n ℓ ) memory. Table 2: Numerical data for the interleaving attack, usingthe optimal discrete distributions of [31]. The last threecolumns correspond to three extreme time–space trade-offs:(I) ρ q for ρ s = ; (II) ρ q for ρ q = ρ s ; and (III) ρ s for ρ q = . c ∥ q ∥√ ℓ µ ℓ µ ℓ d d α I II III1 2 .
00 0 .
00 2 .
00 0 .
00 1 . ∞ .
00 0 .
00 0 .
002 2 .
00 0 .
00 1 .
00 0 .
00 0 .
50 1 .
41 0 .
75 0 .
33 5 .
003 2 .
45 0 .
82 1 .
36 0 .
33 0 .
56 1 .
22 0 .
89 0 .
50 8 .
004 2 .
45 0 .
82 1 .
22 0 .
33 0 .
50 1 .
15 0 .
94 0 .
60 15 .
05 2 .
83 1 .
26 1 .
56 0 .
45 0 .
55 1 .
11 0 .
96 0 .
68 25 .
86 2 .
83 1 .
26 1 .
51 0 .
45 0 .
54 1 .
09 0 .
97 0 .
72 37 .
67 3 .
16 1 .
57 1 .
78 0 .
50 0 .
56 1 .
07 0 .
98 0 .
77 57 .
38 3 .
16 1 .
57 1 .
75 0 .
50 0 .
56 1 .
06 0 .
99 0 .
79 75 . ∞ ∞ ∞ ∞ ∞ For c ≥
3, the collusion strategy affects d and d , and the resultingspace and time exponents for the decoding phase. For simplicity, letus focus on the strongest and most natural attack, the interleavingattack , where given k ones and c − k zeros in segment i , the collusionsets y i = k / c (i.e. θ k = kc ). Equivalently, for eachsegment the colluders random choose one of their members, andoutput his content. In that case E p i ( y i ) = p i and we can furthersimplify the expressions for µ and µ : µ = E p , x j , y ( s j | H ) = ℓ · E p (cid:32) p + ( − p ) − p ( − p ) (cid:112) p ( − p ) (cid:33) , (10) µ = E p , x j , y ( s j | H ) = (cid:18) − c (cid:19) · µ + ℓ c · E p (cid:32) (cid:112) p ( − p ) (cid:33) . (11)Using the optimal discrete distributions of [31] for small c , opti-mized for the symmetric score function, and computing the result-ing parameters, we obtain Table 2. Although the entire asymptotictrade-off spectrum is defined by Equation 5 and α , we explicitlyinstantiate these trade-offs in the last three columns, for the near-linear space regime ( ρ s = ρ q = ρ s ), andthe subpolynomial query time regime ( ρ q = As one can see in the table, as c increases the time and space ex-ponents for the decoding phase quickly increase. For instance, for c =
6, we can obtain a query time complexity scaling as n . , withspace scaling as n . , or if we insist on using only quasi-linearmemory in n , the best query time complexity scales as n . .For asymptotically large c , using the arcsine distribution witha cut-off δ > δ ∝ c − / [25]), both theinnocent and guilty scores scale such that, after normalization, weget d , ∝ δ / →
0. The fact that d ≈ d for large c logicallyfollows from the fact that colluders are able to blend in with thecrowd better and better as c increases, requiring large code length.Since d / d →
1, NNS techniques do not give any improvement inthe limit of large c , and the main benefits are obtained when morememory is available to index the code words, and when c is small. earest neighbor decoding for Tardos fingerprinting codes Conference’17, July 2017, Washington, DC, USA To give an example of the potential speed-up in practice, we per-formed experiments for the interleaving attack with c = n = innocent users in each of 1000 trials,where we used a code length of ℓ = p -values of [31] of p = . ± .
289 with equalweights for both possibilities.
NNS data structure.
For the NNS data structure and decoder, weimplemented the asymptotically suboptimal but often more practi-cal hyperplane locality-sensitive hashing method of Charikar [10].For the hyperplane LSH data structure, we chose the number ofhash tables as t = k =
16 (see [10]for more details). For each of the t hash tables, each of the n datavectors is stored in one of the 2 k = q , for each of the t hash tables we (1) compute k innerproducts with random (sparse [1]) unit vectors, with a total cost of1600 sparse dot products, and (2) do look-ups in these hash bucketsfor potential near neighbors (colluders), by computing their scores. Results.
Figure 1 illustrates (running averages of) how many userscores are commonly computed, i.e. how much work is done in thedecoding stage, depending on how high the user scores are; thehigher the score s j , the larger the dot product ⟨ v j , q ⟩ , and the morelikely it is we will find user j (vector v j ) colliding with q in one ofthe hash tables. From 1000 simulations of the collusion process, ap-proximately 4 .
2% of all innocent users were considered as potentialcolluders (i.e. on average 4200 of 10 user scores were computed),and over 31% of all colluders were found through collisions in thehash tables (i.e. on average approximately 1 of the 3 colluders wasfound). On average, the decoding consists of computing 1600 dotproducts for the hash table look-ups, and 4200 score computationsof innocent users, for a total of around 5800 dot products of length ℓ . Compared to a naive linear search, which requires computing all10 user scores, the decoding is a factor 17 faster. This comes at thecost of requiring 100 hash tables, which each store pointers to all n vectors in memory; since pointers are much smaller than the actualvectors, in practice the NNS data structure only required a factor 2more memory compared to no indexing. Theory vs. Practice.
Theoretically, with t =
100 hash tables weare using of the order t · n = n . memory, i.e. setting ρ s = .
40, (al-though in practice the memory only increases by a small amount).With c =
3, according to Table 2 we have α ≈ .
22 for the op-timal asymptotic trade-offs, which according to (5) would thusresult in ρ q ≈ .
58. In reality the average query cost is comput-ing 5800 dot products, corresponding to a query exponent ρ q = log ( )/ log ( ) ≈ .
75. In practice, one may indeed notice that ρ q is slightly higher than the theoretical values suggest, but thememory increase is commonly much less than expected.Note that although in most runs at least one colluder is success-fully found in the hash tables (and has a large score), sometimesnone of the colluders are found, and in some applications findingall colluders may be required. To get a higher success rate of findingcolluders, one could for instance use multiprobing [3] to still get asignificantly lower decoding cost for finding all colluders comparedto a linear search, without further increasing the memory. % % % % % % % % % % % → Score percentile → S c o r e sc o m pu t ed ( R A ) Figure 1: An illustration of (running averages of) how manyuser scores are computed, as a function of the user scores.
Besides the main analysis on the costs of the decoding method,and the associated effect on the memory complexity, let us finallydiscuss a few more properties and aspects of the techniques outlinedin this paper, which may affect how practical this method truly is.
Score function.
Although we only explicitly analyzed the applica-tion to the symmetric score function [37], the same techniques canbe applied to any score-based scheme. For other score functions, thenormalization factor q may depend on the attack strategy however,making an accurate instantiation of the NNS data structure harderunless the attack is known in advance. Effects on encoding and embedding.
Even if the decoding methodis more efficient, deploying this method in practice may not be cost-effective if the method for generating fingerprints and embeddingthese in the data needs to be modified. This is fortunately not aconcern here, as the only thing that needs to be modified is howthe owner of the content stores the code words x j for decodingpurposes: the exact same encoding and embedding techniques canstill be used. Decoding accuracy.
One of the main reasons NNS techniques arefast, is that they allow for a small margin of error in the decodingprocedure. In the application to fingerprinting, this means that thedecoder may not always identify colluders from straightforwardlook-ups in the hash tables. This problem can be mitigated withmultiprobing techniques [3, 35], or one could use NNS techniqueswithout false negatives, such as [34].
Overhead of NNS techniques.
With NNS techniques, we reducethe asymptotic decoding time from O ( ℓ n ) to O ( ℓ n ρ ) with ρ ≤ n . Note however that the set-ting considered here, of solving NNS for data on the sphere, can behandled effectively with practical NNS techniques such as [3, 10],which have previously been proven to be much faster than linear onference’17, July 2017, Washington, DC, USA Thijs Laarhoven searches on various benchmarks [5], and our preliminary experi-ments confirm the improvement in practice. Dynamic settings.
For streaming applications [14, 22, 26, 41],decisions on whether to accuse users or not need to be made in real-time as well. As the NNS techniques considered here commonlyrely on static data, it is not directly obvious whether the same speed-ups can be obtained when the data arrives in a streaming fashion.Interested readers may consider [27, 28] for further reading on NNStechniques that may be relevant for streaming data.
Joint decoding.
While simple decoders with a decoding cost lin-ear in n might be considered reasonably efficient, in the joint decod-ing setting, the decoding cost of O ( ℓ n k ) for k ≥ O ( ℓ n kρ ) , and the improve-ment may be even more noticeable than for simple decoders. Group testing.
As discussed in e.g. [20, 21, 23, 29, 36], the grouptesting problem of detecting infected individuals among a largepopulation using simultaneous testing [13], is equivalent to thefingerprinting problem where the collusion strategy is fixed tothe all-1 attack: whenever allowed by the marking assumption,the colluders output the symbol 1. Similar techniques as describedabove can be applied there to reduce the decoding time complexityboth for simple and joint group testing methods.
ACKNOWLEDGMENTS
The author thanks Peter Roelse for discussions on the potentialrelevance of these techniques. The author is supported by an NWOVeni Grant under project number 016.Veni.192.005.
REFERENCES [1] Dimitris Achlioptas. 2001. Database-Friendly Random Projections. In
PODS .274–281. https://doi.org/10.1145/375551.375608[2] Ehsan Amiri and Gábor Tardos. 2009. High Rate Fingerprinting Codes and theFingerprinting Capacity. In
SODA . 336–345. http://dl.acm.org/citation.cfm?id=1496808[3] Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn,and Ludwig Schmidt. 2015. Practical and Optimal LSH for An-gular Distance. In
NIPS . 1225–1233. https://papers.nips.cc/paper/5893-practical-and-optimal-lsh-for-angular-distance[4] Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn, and Erik Waingarten. 2017.Optimal hashing-based time-space trade-offs for approximate near neighbors. In
SODA . 47–66. https://doi.org/10.1137/1.9781611974782.4[5] Martin Aumueller, Erik Bernhardsson, and Alexander Faithfull. 2017. ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algo-rithms. In
SISAP . 34–49. https://doi.org/10.1007/978-3-319-68474-1_3[6] Anja Becker, Léo Ducas, Nicolas Gama, and Thijs Laarhoven. 2016. New directionsin nearest neighbor searching with applications to lattice sieving. In
SODA . 10–24.https://doi.org/10.1137/1.9781611974331.ch2[7] Waldemar Berchtold and Marcel Schäfer. 2012. Performance and Code LengthOptimization of Joint Decoding Tardos Fingerprinting. In
MMSec . 27–32. https://doi.org/10.1145/2361407.2361412[8] Oded Blayer and Tamir Tassa. 2008. Improved Versions of Tardos’ FingerprintingScheme.
Designs, Codes and Cryptography
48, 1 (2008), 79–103. https://doi.org/10.1007/s10623-008-9200-z[9] Dan Boneh and James Shaw. 1998. Collusion-Secure Fingerprinting for DigitalData.
IEEE Transactions on Information Theory
44, 5 (1998), 1897–1905. https://doi.org/10.1109/18.705568[10] Moses S. Charikar. 2002. Similarity Estimation Techniques from Rounding Algo-rithms. In
STOC . 380–388. https://doi.org/10.1145/509907.509965[11] Ana Charpentier, Fuchun Xie, Caroline Fontaine, and Teddy Furon. 2009. Expec-tation Maximization Decoding of Tardos Probabilistic Fingerprinting Code. In
SPIE Media Forensics and Security . 1–15. https://doi.org/10.1117/12.806034 [12] Mathieu Desoubeaux, Cédric Herzet, William Puech, and Gaëtan Le Guelvouit.2013. Enhanced Blind Decoding of Tardos Codes with New MAP-Based Functions.In
MMSP . 283–288. https://doi.org/10.1109/MMSP.2013.6659302[13] Robert Dorfman. 1943. The Detection of Defective Members of Large Populations.
The Annals of Mathematical Statistics
CRYPTO . 354–371.https://doi.org/10.1007/3-540-48405-1_23[15] Teddy Furon and Mathieu Desoubeaux. 2014. Tardos Codes for Real. In
WIFS .24–29. https://doi.org/10.1109/WIFS.2014.7084298[16] Teddy Furon, Arnaud Guyader, and Frédéric Cérou. 2008. On the Design andOptimization of Tardos Probabilistic Fingerprinting Codes. In IH . 341–356. https://doi.org/10.1007/978-3-540-88961-8_24[17] Teddy Furon, Arnaud Guyader, and Frédéric Cérou. 2012. Decoding fingerprintsusing the Markov Chain Monte Carlo method. In WIFS . 187–192. https://doi.org/10.1109/WIFS.2012.6412647[18] Teddy Furon and Luis Pérez-Freire. 2009. EM Decoding of Tardos Traitor TracingCodes. In
MMSec . 99–106. https://doi.org/10.1145/1597817.1597835[19] Teddy Furon and Luis Pérez-Freire. 2009. Worst Case Attacks Against BinaryProbabilistic Traitor Tracing Codes. In
WIFS . 56–60. https://doi.org/10.1109/WIFS.2009.5386484[20] Thijs Laarhoven. 2013. Efficient Probabilistic Group Testing Based on TraitorTracing. In
ALLERTON . 1358–1365. https://doi.org/10.1109/Allerton.2013.6736699[21] Thijs Laarhoven. 2014. Capacities and capacity-achieving decoders for variousfingerprinting games. In
IH&MMSec . 123–134. https://doi.org/10.1145/2600918.2600925[22] Thijs Laarhoven. 2015. Optimal Sequential Fingerprinting: Wald vs. Tardos. In
IH&MMSec . 97–107. https://doi.org/10.1145/2756601.2756603[23] Thijs Laarhoven. 2016.
Search problems in cryptography . Ph.D. Dissertation.Eindhoven University of Technology. http://repository.tue.nl/837539[24] Thijs Laarhoven and Benne de Weger. 2013. Discrete Distributions in the TardosScheme, Revisited. In
IH&MMSec . 13–18. https://doi.org/10.1145/2482513.2482533[25] Thijs Laarhoven and Benne de Weger. 2014. Optimal Symmetric Tardos TraitorTracing Schemes.
Designs, Codes and Cryptography
71, 1 (2014), 83–103. https://doi.org/10.1007/s10623-012-9718-y[26] Thijs Laarhoven, Jeroen Doumen, Peter Roelse, Boris Škorić, and Benne de Weger.2013. Dynamic Tardos Traitor Tracing Schemes.
IEEE Transactions on InformationTheory
59, 7 (2013), 4230–4242. https://doi.org/10.1109/TIT.2013.2251756[27] Yan-Nei Law and Carlo Zaniolo. 2005. An Adaptive Nearest Neighbor Classi-fication Algorithm for Data Streams. In
PKDD , Alípio Mário Jorge, Luís Torgo,Pavel Brazdil, Rui Camacho, and João Gama (Eds.). Springer Berlin Heidelberg,108–120.[28] Xiaoyan Liu and Hakan Ferhatosmanoğlu. 2003. Efficient k -NN Search on Stream-ing Data Series. In Advances in Spatial and Temporal Databases , Thanasis Hadzila-cos, Yannis Manolopoulos, John Roddick, and Yannis Theodoridis (Eds.). SpringerBerlin Heidelberg, Berlin, Heidelberg, 83–101.[29] Peter Meerwald and Teddy Furon. 2011. Group Testing Meets Traitor Tracing. In
ICASSP . 4204–4207. https://doi.org/10.1109/ICASSP.2011.5947280[30] Peter Meerwald and Teddy Furon. 2012. Toward Practical Joint Decoding ofBinary Tardos Fingerprinting Codes.
IEEE Transactions on Information Forensicsand Security
7, 4 (2012), 1168–1180. https://doi.org/10.1109/TIFS.2012.2195655[31] Koji Nuida, Satoshi Fujitsu, Manabu Hagiwara, Takashi Kitagawa, Hajime Watan-abe, Kazuto Ogawa, and Hideki Imai. 2009. An Improvement of Discrete TardosFingerprinting Codes.
Designs, Codes and Cryptography
52, 3 (2009), 339–362.https://doi.org/10.1007/s10623-009-9285-z[32] Jan-Jaap Oosterwijk, Boris Škorić, and Jeroen Doumen. 2013. Optimal SuspicionFunctions for Tardos Traitor Tracing Schemes. In
IH&MMSec . 19–28. https://doi.org/10.1145/2482513.2482527[33] Jan-Jaap Oosterwijk, Boris Škorić, and Jeroen Doumen. 2015. A Capacity-Achieving Simple Decoder for Bias-Based Traitor Tracing Schemes.
IEEE Trans-actions on Information Theory
61, 7 (2015), 3882–3900. https://doi.org/10.1109/TIT.2015.2428250[34] Rasmus Pagh. 2016. Locality-sensitive hashing without false negatives. In
SODA .1–9. http://arxiv.org/abs/1507.03225[35] Rina Panigrahy. 2006. Entropy based nearest neighbor search in high dimensions.In
SODA . 1186–1195. http://dl.acm.org/citation.cfm?id=1109688[36] Boris Škorić. 2015. Tally-based simple decoders for traitor tracing and grouptesting.
IEEE Transactions on Information Forensics and Security
10, 6 (2015),1221–1233. https://doi.org/10.1109/TIFS.2015.2403575[37] Boris Škorić, Stefan Katzenbeisser, and Mehmet U. Celik. 2008. Symmetric TardosFingerprinting Codes for Arbitrary Alphabet Sizes.
Designs, Codes and Cryptog-raphy
46, 2 (2008), 137–166. https://doi.org/10.1007/s10623-007-9142-x[38] Boris Škorić and Jan-Jaap Oosterwijk. 2015. Binary and q -ary Tardos Codes,Revisited. Designs, Codes and Cryptography
74, 1 (2015), 75–111. https://doi.org/10.1007/s10623-013-9842-3[39] Boris Škorić, Tatiana U. Vladimirova, Mehmet U. Celik, and Joop C. Talstra.2008. Tardos Fingerprinting is Better Than We Thought.
IEEE Transactions on earest neighbor decoding for Tardos fingerprinting codes Conference’17, July 2017, Washington, DC, USA
Information Theory
54, 8 (2008), 3663–3676. https://doi.org/10.1109/TIT.2008.926307[40] Gábor Tardos. 2003. Optimal Probabilistic Fingerprint Codes. In
STOC . 116–125.https://doi.org/10.1145/780542.780561 [41] Tamir Tassa. 2005. Low Bandwidth Dynamic Traitor Tracing Schemes.