New upper bounds for (b,k)-hashing
NNew upper bounds for ( ๐, ๐ ) -hashing Stefano Della Fiore, Simone Costa, Marco Dalai,
Department of Information Engineering, University of Brescia{s.della๏ฌore001, simone.costa, marco.dalai}@unibs.it
Abstract โFor ๏ฌxed integers ๐ โฅ ๐ , the problem of perfect ( ๐, ๐ ) -hashing asks for the asymptotic growth of largest subsetsof { , , . . . , ๐ } ๐ such that for any ๐ distinct elements in the set,there is a coordinate where they all differ.An important asymptotic upper bound for general ๐, ๐ , wasderived by Fredman and Komlรณs in the โ80s and improved forcertain ๐ โ ๐ by Kรถrner and Marton and by Arikan. Only veryrecently better bounds were derived for the general ๐, ๐ caseby Guruswami and Riazanov, while stronger results for smallvalues of ๐ = ๐ were obtained by Arikan, by Dalai, Guruswamiand Radhakrishnan and by Costa and Dalai.In this paper, we both show how some of the latter resultsextend to ๐ โ ๐ and further strengthen the bounds for somespeci๏ฌc small values of ๐ and ๐ . The method we use, whichdepends on the reduction of an optimization problem to a ๏ฌnitenumber of cases, shows that further results might be obtainedby re๏ฌned arguments at the expense of higher complexity. Index Terms โperfect hashing, list decoding, zero-error capac-ity
I. I
NTRODUCTION
Let ๐ , ๐ and ๐ be integers, with ๐ โฅ ๐ , and let C bea subset of { , , . . . , ๐ } ๐ with the property that for any ๐ distinct elements we can ๏ฌnd a coordinate where they all differ.Such a set can be interpreted, by looking at it coordinate-wise, as a family of ๐ hashing functions on some universe ofsize |C| . The required property then says that the family isa perfect hash family, that is, any ๐ elements in the universeare ๐ -partitioned by at least one function. Alternatively C canbe interpreted as a code of rate ๐ log |C| for communicationover a channel with ๐ inputs. Assume that the channels is a ๐ /( ๐ โ ) channel, meaning that any ๐ โ of the ๐ inputsshare one output but no ๐ distinct inputs do (see Figure 1).The required property for C is what is needed for the code tobe a zero-error code when list decoding with list-size ๐ โ isallowed. We refer the reader to [8], [9], [13], [14] and [4] foran overview of the the more general context of this problem. InputOutput
Fig. 1. A / channel. Edges represent positive probabilities. Here, zero-errorcommunication is possible when decoding with list-size equal to . We will call any subset C of { , , . . . , ๐ } ๐ with the de-scribed property a ( ๐, ๐ ) -hash code. For the reasons mentionedabove, bounding the size of ( ๐, ๐ ) -hash codes is a combina-torial problem which has been of interest both in computerscience and information theory. It is known that ( ๐, ๐ ) -hashcodes of exponential size in ๐ can be constructed and thequantity of interest is usually the rate of such codes. We willthus study the quantity ๐ ( ๐,๐ ) = lim sup ๐ โโ ๐ log |C ๐ | , (1)where the C ๐ are ( ๐, ๐ ) -hash codes of length ๐ with maximalrate. Note that, throughout, all logarithms are to base 2. Fewlower bounds on ๐ ( ๐,๐ ) are known. First results in this sensewere given by [9], [8] and a better bound was derived in [12]for ( ๐, ๐ ) = ( , ) . More recently, new lower bounds werederived in [16] for in๏ฌnitely many other values of ๐ . The๏ฌrst, landmark result concerning upper bounds was obtainedby Fredman and Komlรณs [9], who showed that ๐ ( ๐,๐ ) โค ๐ ๐ โ ๐ ๐ โ log ( ๐ โ ๐ + ) , (2)where ๐ ๐ โ = ๐ ( ๐ โ ) ยท ยท ยท ( ๐ โ ๐ + ) . Progresses have sincebeen rare. A generalization of the bound given in equation (2)was derived by Kรถrner and Marton [12] in the form ๐ ( ๐,๐ ) โค min โค ๐ โค ๐ โ ๐ ๐ + ๐ ๐ + log ๐ โ ๐๐ โ ๐ โ . (3)This was further improved for different values of ๐ and ๐ by Arikan [3]. In the case ๐ = ๐ , an improvement was ๏ฌrstobtained for ๐ = in [2] and then in [6], [7]. It was provedonly recently in [10] that the Fredman-Komlรณs bound is nottight for any ๐ > ; explicit better values were given therefor ๐ = , , and for larger ๐ modulo a conjecture which isproved in [5], where further improvements are also obtainedfor ๐ = , .In this paper, we develop a new strategy to attack someof the cases which appear not to be optimally handled bythose methods, obtaining new bounds for ๐ = ๐ = , . . . , .Furthermore, we also show that our procedure improves onthe existing literature for some ๐ โ ๐ cases, among whichfor example ( ๐, ๐ ) = ( , ) , ( , ) , ( , ) , ( , ) . In orderto evaluate in a fair way these ๐ โ ๐ cases, we ๏ฌrst analyzethe results (not derived in the referenced papers) which areobtained when the methods of [6] and [5] are extended to ๐ โ ๐ , and compare them with the ones of [12], [3] and [10]. a r X i v : . [ c s . I T ] J a n he generalization of the procedure used in [6] is rathereasy and it provides us the following bound ๐ ( ๐,๐ ) โค (cid:32) ๐ + ๐ ( ๐ โ ๐ + ) log ๐ โ ๐ โ (cid:33) โ . (4)In Table I we give a comparison between the bounds (4) and(3), the bounds from [3] and [10] and the generalized boundfrom [5] for different values of ๐ and ๐ . The integers in theparentheses for the bound (3) represent the minimizing ๐ ; aparameter ๐ with the same role is involved in the other boundsand it will be discussed later. For the bounds of [5], [3] and[10] it is equal to ๐ โ , while for the bound of [6] it is equalto .In Table II we compare our new bounds with the best knownbounds for ๐ = ๐ = , . . . , and for ( ๐, ๐ ) = ( , ) , ( , ) , ( , ) , ( , ) . TABLE IU
PPER BOUNDS ON ๐ ( ๐,๐ ) . A LL NUMBERS ARE ROUNDED UPWARDS . ( ๐, ๐ ) [5]* [6]* [3] [10] [12] ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) โ The generalized bound for the ( ๐, ๐ ) caseTABLE IIU PPER BOUNDS ON ๐ ( ๐,๐ ) . A LL NUMBERS ARE ROUNDED UPWARDS . ( ๐, ๐ ) This work [5] [6] [3] [10] ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) The paper is structured as follows. In the Section II wegive the general structure of the method used in the men-tioned recent series of works to ๏ฌnd upper bounds using the The interested reader will ๏ฌnd, upon inspection of the proof of Theorem3 in [6], that modulo using a hypergraph version of the Hansel Lemma, theonly new condition to check is that the upper bound given in (4) is greaterthan log ๐ โ ๐ โ for every ๐ โฅ ๐ โฅ . hypergraph version of the Hanselโs lemma. In Section III wepresent the main new ingredient of this paper, which is a wayto improve the bounds derived in [5] by means of a morecareful analysis of a quadratic form that was also objectiveof that study. In Section IV, we show how this idea can beeffectively implemented after an appropriate reduction of theproblem to a list of cases that can be studied exhaustively.II. S TRUCTURE OF THE G ENERAL M ETHOD
The best upper bounds on ๐ ( ๐,๐ ) available in the literaturecan all be seen as different applications of a central idea,which is the study of ( ๐, ๐ ) -hashing by comparison with acombinations of binary partitions. This main line of approachto the problem comes from the original work of Fredman andKรณmlos [9]. A clear and productive formulation of the ideawas given by Radhakrishnan in terms of Hanselโs lemma [15],which remained the main tool used in all recent results [7],[10] and [5]. We state the Lemma here and brie๏ฌy revise forthe reader convenience how this was applied in those works. Lemma 1 (Hansel for Hypergraphs [11], [14]):
Let ๐พ ๐๐ be a complete ๐ -uniform hypergraph on ๐ vertices and let ๐บ , . . . , ๐บ ๐ be ๐ -partite ๐ -uniform hypergraphs on those samevertices such that โช ๐ ๐บ ๐ = ๐พ ๐๐ . Let ๐ ( ๐บ ๐ ) be the number ofnon-isolated vertices in ๐บ ๐ . Then log ๐๐ โ ๐ โ๏ธ ๐ = ๐ ( ๐บ ๐ ) โฅ log ๐๐ โ . (5)The application to ( ๐, ๐ ) -hashing relies on the followingobservation. Given a ( ๐, ๐ ) -hash code ๐ถ , ๏ฌx any ๐ elements ๐ฅ , ๐ฅ , . . . , ๐ฅ ๐ in ๐ถ , with ๐ = , . . . , ๐ โ . For any coordinate ๐ let ๐บ ๐ฅ ,...,๐ฅ ๐ ๐ be the ( ๐ โ ๐ ) -partite ( ๐ โ ๐ ) -uniform hypergraphwith vertex set ๐บ \ { ๐ฅ , ๐ฅ , . . . , ๐ฅ ๐ } and edge set ๐ธ = (cid:8) ( ๐ฆ , . . . , ๐ฆ ๐ โ ๐ ) : ๐ฅ ,๐ , . . . , ๐ฅ ๐,๐ , ๐ฆ ,๐ , . . . , ๐ฆ ๐ โ ๐,๐ are all distinct (cid:9) . (6)Since ๐ถ is a ( ๐, ๐ ) -hash code, then (cid:208) ๐ ๐บ ๐ฅ ,...,๐ฅ ๐ ๐ is the complete ( ๐ โ ๐ ) -uniform hypergraph on ๐บ \ { ๐ฅ , ๐ฅ , . . . , ๐ฅ ๐ } and so log ๐ โ ๐๐ โ ๐ โ ๐ โ๏ธ ๐ = ๐ ( ๐บ ๐ฅ ,...,๐ฅ ๐ ๐ ) โฅ log | ๐ถ | โ ๐๐ โ ๐ โ . (7)This inequality allows one to upper bound | ๐ถ | by upperbounding the left hand side. Inequality (7) holds for any choiceof ๐ฅ , ๐ฅ , . . . , ๐ฅ ๐ , so the main goal is proving that the left handside is not too large for all possible choices of ๐ฅ , ๐ฅ , . . . , ๐ฅ ๐ .The choice can be deterministic or we can take the expectationover any random selection.Note that if the ๐ฅ ,๐ , ๐ฅ ,๐ , . . . , ๐ฅ ๐,๐ are not all distinct (let ussay that they โcollideโ) then the hypergraph in (6) is empty,that is the corresponding ๐ in the left hand side of (7) iszero. So, using codewords ๐ฅ , ๐ฅ , . . . , ๐ฅ ๐ which collide in manycoordinates helps in upper bounding |C| . On the other hand, ina coordinate ๐ where the codewords do not collide, ๐ ( ๐บ ๐ฅ ,...,๐ฅ ๐ ๐ ) depends on what a fraction of the code uses the remaining ๐ โ ๐ symbols in the alphabet. This can be made small โonaverageโ if ๐ฅ , . . . , ๐ฅ ๐ are picked randomly. More precisely, let ๐ be probability distribution of the ๐ -th coordinate of ๐ถ , thatis, ๐ ๐,๐ is the fraction of elements of ๐ถ whose ๐ -th coordinateis ๐ . Then, we have ๐ ( ๐บ ๐ฅ ,...,๐ฅ ๐ ๐ ) = (cid:40) ๐ฅ , . . . , ๐ฅ ๐ collide in coordinate ๐ (cid:16) | ๐ถ || ๐ถ |โ ๐ (cid:17) (cid:16) โ (cid:205) ๐โ = ๐ ๐,๐ฅ โ๐ (cid:17) otherwise . (8)So, one can make the left hand side in (7) small by using ๐ฅ , . . . , ๐ฅ ๐ which collide in many coordinates and at the sametime have in the remaining coordinates symbols ๐ฅ โ๐ for whichthe ๐ ๐,๐ฅ โ๐ are not too small. This can be obtained โon averageโif ๐ฅ , . . . , ๐ฅ ๐ are picked in some random way over the code,since this will force values with large ๐ ๐,๐ฅ โ๐ to a appearfrequently as the ๐ -th coordinate in some of the ๐ฅ , . . . , ๐ฅ ๐ .There are different ways to turn this into a precise agrumentto bound the right hand side of (7). We refer the reader to[5] for a detailed discussion, and we only discuss here theprocedure as used there, since it is the base for our currentcontribution.The idea is to partition the code C in subcodes C ๐ , ๐ โ ฮฉ .The only requirement is that each subcode has size whichgrows unbounded with ๐ and uses in any of its ๏ฌrst โ coordinates only ( ๐ โ ) symbols. It can be show, by an easyextension of the method used for the case ๐ = ๐ and ๐ = ๐ โ in [5], that if the original code has rate ๐ , then for any ๐ > one can do this with a choice of โ = ๐ ( ๐ โ ๐ )/ log (cid:16) ๐๐ โ (cid:17) for ๐ large enough. Given such a partition of our code, if weselect codewords ๐ฅ , . . . , ๐ฅ ๐ within the same subcode C ๐ , theywill collide in the ๏ฌrst โ coordinates and the correspondingcontribution to the l.h.s. of (7) will be zero. We then add therandomization. We pick randomly one of the subcodes C ๐ andrandomly select the codewords ๐ฅ , . . . , ๐ฅ ๐ within C ๐ . We thenupper bound the expected value of the left hand side of (7)under this random selection to obtain an upper bound on |C| ,that is log | ๐ถ | โ ๐๐ โ ๐ โ โค log ๐ โ ๐๐ โ ๐ โ E ๐ ( E [ โ๏ธ ๐ โ[ โ + ,๐ ] ๐ ( ๐บ ๐ฅ ,๐ฅ ,...,๐ฅ ๐ ๐ )| ๐ ]) = log ๐ โ ๐๐ โ ๐ โ โ๏ธ ๐ โ[ โ + ,๐ ] E ๐ ( E [ ๐ ( ๐บ ๐ฅ ,๐ฅ ,...,๐ฅ ๐ ๐ )| ๐ ]) . (9)Here, each subcode C ๐ is taken with probability ๐ ๐ = |C ๐ |/|C| , and ๐ฅ , . . . , ๐ฅ ๐ are taken uniformly at random (with-out repetitions) from C ๐ .As mentioned before, let ๐ ๐ be the probability distribution ofthe ๐ -th coordinate of ๐ถ , and let instead ๐ ๐ | ๐ be the distributionof the ๐ -th coordinate of the subcode ๐ถ ๐ (with components,say, ๐ ๐,๐ | ๐ ) . Then, for ๐ > โ , we can write E [ ๐ ( ๐บ ๐ฅ ,...,๐ฅ ๐ ๐ )| ๐ ] = ( + ๐ ( )) โ๏ธ distinct ๐ ,...,๐ ๐ ๐ ๐,๐ | ๐ ๐ ๐,๐ | ๐ ยท ยท ยท ๐ ๐,๐ ๐ | ๐ ( โ ๐ ๐,๐ โ ยท ยท ยท โ ๐ ๐,๐ ๐ ) (10) where the ๐ ( ) is meant as ๐ โ โ and is due, under theassumption that ๐ถ ๐ grows unbounded with ๐ , to samplingwithout replacement within ๐ถ ๐ . Now, since ๐ ๐ = |C ๐ |/|C| , ๐ ๐ is actually the expectation of ๐ ๐ | ๐ over the random ๐ , thatis, using a different dummy variable ๐ to index the subcodesfor convenience, ๐ ๐ = โ๏ธ ๐ ๐ ๐ ๐ ๐ | ๐ . Using this in (10), one notices that when taking furtherexpectation over ๐ it is possible to operate a symmetrizationin ๐ and ๐ . If we denote with ฮจ for the polynomial functionde๏ฌned for two probability distribution ๐ = ( ๐ , ๐ , . . . , ๐ ๐ ) and ๐ = ( ๐ , ๐ , . . . , ๐ ๐ ) as ฮจ ( ๐, ๐ ) = ( ๐ โ ๐ โ ) ! (11) โ๏ธ ๐ โ ๐ ๐ ๐ ๐ ( ) ๐ ๐ ( ) . . . ๐ ๐ ( ๐ ) ๐ ๐ ( ๐ + ) + ๐ ๐ ( ) ๐ ๐ ( ) . . . ๐ ๐ ( ๐ ) ๐ ๐ ( ๐ + ) . (12)Then the expectation of (10) over ๐ can be written as E [ ๐ ( ๐บ ๐ฅ ,๐ฅ ,...,๐ฅ ๐ ๐ )] = ( + ๐ ( )) โ๏ธ ๐,๐ โ ฮฉ ๐ ๐ ๐ ๐ ฮจ ( ๐ ๐ | ๐ , ๐ ๐ | ๐ ) . (13)In [5], the global maximum of the function ฮจ ( ๐, ๐ ) , overarbitrary distributions ๐ and ๐ , say ฮจ max = max ๐,๐ ฮจ ( ๐, ๐ ) , (14)was used to deduce the inequality, valid for any ๐ > โ , E [ ๐ ( ๐บ ๐ฅ ,๐ฅ ,...,๐ฅ ๐ ๐ )] โค ( + ๐ ( )) ฮจ max . (15)Then log | ๐ถ | โค ( + ๐ ( )) ( ๐ โ โ ) ฮจ max log ๐ โ ๐๐ โ ๐ โ , (16)from which, using the value of โ described above, one deduces ๐ โค ( + ๐ ( )) ๏ฃฎ๏ฃฏ๏ฃฏ๏ฃฏ๏ฃฏ๏ฃฏ๏ฃฐ โ ๐ log (cid:16) ๐๐ โ (cid:17) ๏ฃน๏ฃบ๏ฃบ๏ฃบ๏ฃบ๏ฃบ๏ฃป ฮจ max log ๐ โ ๐๐ โ ๐ โ . This gives the explicit bound ๐ ( ๐,๐ ) โค ฮจ max log ๐ โ ๐๐ โ ๐ โ + (cid:16) ๐๐ โ (cid:17) . (17)A weakness in this bound comes from the fact that distri-butions ๐ and ๐ that maximize ฮจ ( ๐, ๐ ) could exhibit someopposing asymmetries, in the sense that they give higherprobabilities to different symbols. When used as a replacementfor each of the pairs of ๐ ๐ | ๐ and ๐ ๐ | ๐ in (13), we have arather conservative bound, because pairs ( ๐, ๐ ) which givehigh values for ฮจ ( ๐, ๐ ) will give low values for ฮจ ( ๐ ; ๐ ) and ฮจ ( ๐ ; ๐ ) , and equation (13) contains a weighted contributionfrom all pairings of ๐ ๐ | ๐ and ๐ ๐ | ๐ . In other words, observedthat (13) is a quadratic form in the distribution ๐ with kernel ( ๐, ๐ ) , if the kernel has maximum value ฮจ max in some off-diagonal ( ๐, ๐ ) -positions to which there correspond small โin-diagonalโ values at ( ๐, ๐ ) and ( ๐, ๐ ) , then using ฮจ max as abound for the whole quadratic form can be quite a conservativeapproach.In this paper, we approach (13) more carefully by clusteringthe possible distributions ๐ ๐ | ๐ in different groups dependingon how balanced or unbalanced they are, and bounding ฮจ ( ๐ ๐ | ๐ , ๐ ๐ | ๐ ) for ๐ ๐ | ๐ and ๐ ๐ | ๐ in those different groups. Fromthis, we deduce a bound on the quadratic form. Note thatsince in the problem under consideration (that is, as ๐ โ โ )we have no limit in the granularity of the distributions ๐ ๐,๐ ,the quadratic form that we have to bound might in principlehave a limiting value which is only achieved with a continuousdistribution ๐ over the simplex of ๐ -dimensional distributions P ๐ . Still, once we consider a ๏ฌnite number of clusters ๐ forthe distributions ๐ ๐ | ๐ , our quadratic form is upper boundedby a corresponding ๐ -dimensional one. In our derivation, wewill use ๐ + clusters with some symmetric structure whichallows us to further reduce the complexity to an equivalentfour dimensional form and then to a quadratics in one singlevariable. III. B OUNDING THE QUADRATIC FORM
Based on the discussion in the previous Section, we nowenter the problem of determining better upper bounds on theright hand side of (13). We simplify here the notation andconsider the quadratic form โ๏ธ ๐,๐ ๐ ๐ ๐ ๐ ฮจ ( ๐, ๐ ) (18)where ๐ and ๐ run over an arbitrary ๏ฌnite set of points in thesimplex P ๐ of ๐ -dimensional probability distribution and ๐ isa probability distribution over such set. We consider partitionsof P ๐ in disjoint subsets to ๏ฌnd upper bounds on the quadraticform (18) in terms of simpler ones. If we have a partition {P ๐ , P ๐ , . . . , P ๐๐ } of P ๐ and we de๏ฌne ๐ ๐,โ = sup ๐ โP ๐๐ ,๐ โP โ๐ ฮจ ( ๐, ๐ ) , ๐ ๐ = โ๏ธ ๐ โP ๐๐ ๐ ๐ , then clearly โ๏ธ ๐,๐ ๐ ๐ ๐ ๐ ฮจ ( ๐, ๐ ) โค โ๏ธ ๐,โ โ๏ธ ๐ โP ๐๐ โ๏ธ ๐ โP โ๐ ๐ ๐ ๐ ๐ ๐ ๐,โ โค โ๏ธ ๐,โ ๐ ๐ ๐ โ ๐ ๐,โ . (19)This is a convenient simpli๏ฌcation since we have now an ๐ -dimensional problem which we might be able to deal with insome computationally feasible way. We will use this procedurewith two different partitions in terms of how balanced or un-balanced the distributions are. We take ๐ + subsets with somesymmetry which allows us to further reduce the complexity. Partition based on maximum value.
We ๏ฌrst consider apartition of P ๐ in terms of the largest probability value whichappears in a distribution. We use a parameter ๐ < /( ๐ โ ) ; allquantities will depend on ๐ but we do not write this in order to avoid cluttering the notation. We de๏ฌne ๐ sets of unbalanceddistributions q P ๐๐ = { ๐ โ P ๐ : ๐ ๐ > โ ๐ } for every โค ๐ โค ๐ , and correspondingly a set of balanceddistributions q P ๐ = { ๐ โ P ๐ : ๐ ๐ โค โ ๐ โ ๐ } . Note that these are all disjoint sets since ๐ < /( ๐ โ ) .Following the scheme mentioned above, we can consider thevalues ๐ ๐,โ and ๐ ๐ for this speci๏ฌc partition. However, due tosymmetry, the values ๐ ๐,โ can be reduced to only four cases,depending on whether ๐ and ๐ are both balanced, one balancedand one unbalanced, or both unbalanced, either on the samecoordinate or on different coordinates.Assuming โค ๐, โ โค ๐ with ๐ โ โ , the following quantitiesare then well de๏ฌned and independent of the speci๏ฌc valueschosen for ๐ and โ q ๐ = sup ๐,๐ โ q P ๐ ฮจ ( ๐, ๐ ) q ๐ = sup ๐ โ q P ๐ ,๐ โ q P ๐๐ ฮจ ( ๐, ๐ ) q ๐ = sup ๐,๐ โ q P ๐๐ ฮจ ( ๐, ๐ ) q ๐ = sup ๐ โ q P ๐๐ ,๐ โ q P โ๐ ฮจ ( ๐, ๐ ) (20)These values can then be used in (19) in place of the values ๐ ๐,โ . Partition based on the minimum value.
We also considera partition of P ๐ using constraints from below. Again we usea parameter ๐ which will be then tuned. We assume here ๐ < / ๐ . Consider now the following disjoint sets of unbalanceddistributions (cid:98) P ๐๐ = { ๐ โ P ๐ : ๐ ๐ < ๐ , ๐ โ โฅ ๐ ๐ โ โ , ๐ โ > ๐ ๐ โ โ < ๐ } for โค ๐ โค ๐ , that is, distributions in (cid:98) P ๐๐ have a minimumcomponent in the ๐ -th coordinate, which is smaller than ๐ , andstrictly smaller than any of the preceding components (unlessof course ๐ = ). Correspondingly, de๏ฌne a set of balanceddistributions as (cid:98) P ๐ = { ๐ โ P ๐ : ๐ ๐ โฅ ๐ โ ๐ } . The symmetry argument mentioned before also applies in thiscase and we can continue in analogy replacing the ๐ ๐,โ of(19) with the following quantities (cid:98) ๐ = sup ๐,๐ โ (cid:98) P ๐ ฮจ ( ๐, ๐ ) (cid:98) ๐ = sup ๐ โ (cid:98) P ๐ ,๐ โ (cid:98) P ๐๐ ฮจ ( ๐, ๐ ) (cid:98) ๐ = sup ๐,๐ โ (cid:98) P ๐๐ ฮจ ( ๐, ๐ ) (cid:98) ๐ = sup ๐ โ (cid:98) P ๐๐ ,๐ โ (cid:98) P โ๐ ฮจ ( ๐, ๐ ) (21)where again โค ๐, โ โค ๐ with ๐ โ โ .Applying the above scheme with the symmetric partitionswe just de๏ฌned, we can now rewrite the upper bound ofequation (19) in the form โ๏ธ ๐,๐ ๐ ๐ ๐ ๐ ฮจ ( ๐, ๐ )โค ๐ ๐ + ๐ โ๏ธ ๐> ๐ ๐ ๐ + โ๏ธ ๐> ๐ ๐ ๐ + โ๏ธ <๐<โ ๐ ๐ ๐ โ ๐ . (22)all ๐ be the maximum value achieved by the right handside of (22) over all possible probability distributions ๐ = ๐ , ๐ , . . . , ๐ ๐ (which will of course depend on whether we usethe (cid:98) ๐ ๐ โs or q ๐ ๐ โs values in place of the ๐ ๐ โs). The optimizationof (22), once known the ๐ ๐ โs values, is easy using the standardlagrange multipliers method (or see Lemma 2 of [17]). Thenwe can then replace ฮจ max in (17) with ๐ to derive the bound ๐ ( ๐,๐ ) โค ๐ log ๐ โ ๐๐ โ ๐ โ + (cid:16) ๐๐ โ (cid:17) . We will describe in the next Section our procedure to deter-mine, or upper bound the values (cid:98) ๐ ๐ , q ๐ ๐ and the corresponding ๐ . Here we only state the obtained results.Using the partition based on the maximum value { q P ๐๐ } ๐ = ,...,๐ we obtain the following theorem. Theorem 1:
We have ๐ ( , ) โค . , ๐ ( , ) โค . , ๐ ( , ) โค . ,๐ ( , ) โค . , ๐ ( , ) โค . . Using the partition based on the minimum value { (cid:98) P ๐๐ } ๐ = ,...,๐ we obtain the following theorem. Theorem 2:
We have ๐ ( , ) โค . , ๐ ( , ) โค . ,๐ ( , ) โค โ . . Based on the results in [7], on its generalization given inequation (4) and on Theorem 2 when ( ๐, ๐ ) = ( , ) , we areled to formulate the following conjecture. Conjecture 1:
For ๐ โฅ ๐ > , ๐ ( ๐,๐ ) โค min โค ๐ โค ๐ โ (cid:169)(cid:173)(cid:171) ๐๐ โ + ๐ ๐ + ๐ ๐ + log ๐ โ ๐๐ โ ๐ โ (cid:170)(cid:174)(cid:172) โ . Note that the conjectured expression can be seen as a modi-๏ฌcation of the Kรถrner-Marton bound in (3) which takes intoaccount the effects of pre๏ฌx-based partitions.IV. C
OMPUTATION OF ๐ Thanks to a straightforward generalization of some lemmasde๏ฌned and proved in [17], we have determined and inspectedusing Mathematica all the possible maximum points (see theAppendices in [17]) in which each q ๐ ๐ (or (cid:98) ๐ ๐ ) can be attained,obtaining the following propositions. Proposition 1:
For ๐ = ๐ โ , we have that ( ๐, ๐ ) ๐ | ๐ | ๐ | ๐ | ๐ ( , ) / ( , ) / ( , ) / ( , ) / . ยท โ . ยท โ ( , ) / . ยท โ . ยท โ | ๐ attained at ( ๐ , . . . , ๐ ; ๐ , . . . , ๐ ) | ๐ attained at ( , , . . . ,
0; 0 , ๐ โ , . . . , ๐ โ ) | ๐ attained at ( โ ๐ , ๐๐ โ , . . . , ๐๐ โ ; 1 โ ๐ , ๐๐ โ , . . . , ๐๐ โ ) | ๐ attained at ( โ ๐ , ๐๐ โ , . . . , ๐๐ โ ,
0; 0 , ๐๐ โ , . . . , ๐๐ โ , โ ๐ ) Proposition 2:
For ๐ = , ( ๐, ๐ ) = ( , ) and ๐ = ( + โ ) we have that (cid:99) ๐ ๐ Attained at point ( ๐ ; ๐ ) Values โ (cid:99) ๐ ( ๐ , โ ๐๐ โ , . . . , โ ๐๐ โ ; ๐พ, ๐ฟ, . . . , ๐ฟ ) , ๐ฟ โ . (cid:99) ๐ ( , ๐ โ , . . . , ๐ โ ; ๐พ, ๐ฟ, . . . , ๐ฟ ) , ๐ฟ = ๐ (cid:99) ๐ ( ๐ , โ ๐๐ โ , . . . , โ ๐๐ โ , ๐ , ๐ผ, . . . , ๐ผ, ๐ฝ ) , ๐ฝ โ . (cid:99) ๐ ( , ๐ โ , . . . , ๐ โ ; ๐พ, ๐ฟ, . . . , ๐ฟ ) , ๐ฟ = ๐ For ๐ = , ( ๐, ๐ ) = ( , ) and ๐ = we have that (cid:99) ๐ ๐ Attained at point ( ๐ ; ๐ ) Values โ (cid:99) ๐ ( ๐ , โ ๐๐ โ , . . . , โ ๐๐ โ ; ๐พ, ๐ฟ, . . . , ๐ฟ ) , ๐ฟ โ . (cid:99) ๐ ( , ๐ โ , . . . , ๐ โ ; ๐พ, ๐ฟ, . . . , ๐ฟ ) , ๐ฟ โ . (cid:99) ๐ ( ๐ , โ ๐๐ โ , . . . , โ ๐๐ โ , ๐ , ๐ผ, . . . , ๐ผ, ๐ฝ ) , ๐ฝ โ . (cid:99) ๐ ( , ๐ โ , . . . , ๐ โ ; ๐พ, ๐ฟ, . . . , ๐ฟ ) , ๐ฟ โ . For ๐ = , ( ๐, ๐ ) = ( , ) and ๐ = we have that (cid:99) ๐ ๐ Attained at point ( ๐ ; ๐ ) Values โ (cid:99) ๐ ( ๐ , . . . , ๐ ; ๐ , . . . , ๐ ) (cid:99) ๐ ( ๐ , โ ๐๐ โ , . . . , โ ๐๐ โ ; ๐พ, ๐ฟ, . . . , ๐ฟ ) , ๐ฟ โ . (cid:99) ๐ ( ๐ , , โ ๐๐ โ , . . . , โ ๐๐ โ ; 0 , , , . . . , ) (cid:99) ๐ ( , , . . . ,
0; 0 , ๐ โ , . . . , ๐ โ ) . The values reported for (cid:98) ๐ are not approximate values of theexact values of (cid:98) ๐ but, instead, they are upper bounds. Remark 1:
We point out that the value (cid:98) ๐ for ( ๐, ๐ ) = ( , ) is only attained for uniform distributions.As a consequence of Propositions 1, 2 and equation (22)we are able to evaluate the values of ๐ for both the partitions { q ๐ ๐๐ } ๐ = ,...,๐ and { (cid:98) ๐ ๐๐ } ๐ = ,...,๐ . Then we state the followingtheorem Theorem 3:
Using the partition { q ๐ ๐๐ } ๐ = ,...,๐ we get โข for ( ๐, ๐ ) = ( , ) we have that ๐ โ . ; โข for ( ๐, ๐ ) = ( , ) we have that ๐ โ . ; โข for ( ๐, ๐ ) = ( , ) we have that ๐ โ . . โข for ( ๐, ๐ ) = ( , ) we have that ๐ โ . . โข for ( ๐, ๐ ) = ( , ) we have that ๐ โ . .Using the partition { (cid:98) ๐ ๐๐ } ๐ = ,...,๐ we get โข for ( ๐, ๐ ) = ( , ) we have that ๐ โ . ; โข for ( ๐, ๐ ) = ( , ) we have that ๐ โ . ; โข for ( ๐, ๐ ) = ( , ) we have that ๐ = โ . .For the values of ( ๐, ๐ ) reported in Table I except thecases in which ๐ = , ๐ = ๐ = , , and ( ๐, ๐ ) = ( , ) , ( , ) , ( , ) , it is interesting to note that the bounds in bold(the generalized bounds [5] or [6]) are achieved for uniformdistributions. This means that, for these particular cases, anynew upper bounds that can be found on the quadratic form inequation (13) cannot further improve those bounds. However,for such globally balanced codes, one can use a differentargument based on the minimum distance of the code to geteven stronger upper bounds. A proof that ๐ ( , ) < / , basedon the Aaltonen bound [1], can be found in [17]. EFERENCES[1] M. Aaltonen.
A new upper bound on nonbinary block codes , DiscreteMath. vol 83, 139-160, 1990.[2] E. Arikan, An upper bound on the zero-error list-coding capacity,
IEEETransactions on Information Theory (1994), 1237โ1240.[3] E. Arikan, An improved graph-entropy bound for perfect hashing, IEEEInternational Symposium on Information Theory (1994).[4] S. Bhandari and J. Radhakrishnan, Bounds on the Zero-Error List-Decoding Capacity of the q/(q-1) Channel, .[5] S. Costa, M. Dalai.
New bounds for perfect ๐ -hashing, in press onDiscrete Applied Mathematics, 2020 .[6] M. Dalai, V. Guruswami, and J. Radhakrishnan, An improved bound onthe zero-error listdecoding capacity of the 4/3 channel, IEEE InternationalSymposium on Information Theory (ISIT) (2017), 1658โ1662.[7] M. Dalai, V. Guruswami, and J. Radhakrishnan, An improved boundon the zero-error listdecoding capacity of the 4/3 channel, in
IEEETransactions on Information Theory , vol. 66, no. 2, pp. 749-756, Feb.2020[8] P. Elias, Zero error capacity under list decoding,
IEEE Transactions onInformation Theory (1988), 1070โ1074. [9] Michael L. Fredman and Jรกnos Komlรณs, On the Size of SeparatingSystems and Families of Perfect Hash Functions, SIAM Journal onAlgebraic Discrete Methods (1984), 61โ68.[10] V. Guruswami, A. Riazanov, Beating Fredman-Komlos for perfect ๐ -hashing, Leibniz International Proceedings in Informatics (2019).[11] G. Hansel, Nombre minimal de contacts de fermature nรฉcessaires pourrรฉaliser une fonction boolรฉenne symรฉtrique de ๐ variables, C. R. Acad.Sci. Paris , pp. 6037โ6040, 1964.[12] J. Korner and K. Marton, New Bounds for Perfect Hashing via Infor-mation Theory,
European Journal of Combinatorics (1988), 523โ530.[13] J. Korner, FredmanโKomlรณs bounds and information theory, SIAMJournal on Algebraic Discrete Methods (1986), 560โ570.[14] A. Nilli, โPerfect hashing and probability,โ Combinatorics, Probabilityand Computing arXiv preprint arXiv:1908.08792 , (2019).[17] S. Della Fiore, S. Costa and M. Dalai, Further strengthening of upperbounds for perfect ๐ -Hashing, arXiv preprint arXiv:2012.00620arXiv preprint arXiv:2012.00620