[PDF] Variable-Length Coding for Zero-Error Channel Capacity

Abstract

The zero-error channel capacity is the maximum asymptotic rate that can be reached with error probability exactly zero, instead of a vanishing error probability. The nature of this problem, essentially combinatorial rather than probabilistic, has led to various researches both in Information Theory and Combinatorics. However, the zero-error capacity is still an open problem, for example the capacity of the noisy-typewriter channel with 7 letters is unknown. In this article, we propose a new approach to construct optimal zero-error codes, based on the concatenation of words of variable-length, taken from a generator set. Three zero-error variable-length coding schemes, referred to as "variable-length coding", "intermingled coding" and "automata-based coding", are under study. We characterize their asymptotic performances via linear difference equations, in terms of simple properties of the generator set, e.g. the roots of the characteristic polynomial, the spectral radius of an adjacency matrix, the inverse of the convergence radius of a generator series. For a specific example, we construct an "intermingled" coding scheme that achieves asymptotically the zero-error capacity.

Full PDF

11 Variable-Length Coding for Zero-ErrorChannel Capacity

Nicolas Charpenay ∗ and Maël Le Treust † ETIS UMR 8051, Université Paris Seine, Université Cergy-Pontoise, ENSEA, CNRS,6, avenue du Ponceau, 95014 Cergy-Pontoise CEDEX, FRANCEEmail: {nicolas.charpenay ; mael.le-treust}@ensea.fr

Abstract

The zero-error channel capacity is the maximum asymptotic rate that can be reached with error probabilityexactly zero, instead of a vanishing error probability. The nature of this problem, essentially combinatorial rather thanprobabilistic, has led to various researches both in Information Theory and Combinatorics. However, the zero-errorcapacity is still an open problem, for example the capacity of the noisy-typewriter channel with letters is unknown.In this article, we propose a new approach to construct optimal zero-error codes, based on the concatenation ofwords of variable-length, taken from a generator set. Three zero-error variable-length coding schemes, referred toas “variable-length coding”, “intermingled coding” and “automata-based coding”, are under study. We characterizetheir asymptotic performances via linear difference equations, in terms of simple properties of the generator set, e.g.the roots of the characteristic polynomial, the spectral radius of an adjacency matrix, the inverse of the convergenceradius of a generator series. For a speciﬁc example, we construct an “intermingled” coding scheme that achievesasymptotically the zero-error capacity. Index Terms

Zero-Error Information Theory, Channel Coding, Analytic Combinatorics, Graph Theory, Combinatorics onWords, Linear Difference Equation, Automata

I. I

NTRODUCTION

In [1], Shannon investigates the zero-error information transmission by considering codes that must allow fora correct decoding with probability one, instead of an asymptotic probability one. This subtle difference radicallychanges the nature of the problem, as the exact values of the non-null transition probabilities of the channel do notappear anymore. Shannon deﬁned the zero-error capacity of a channel as the maximum asymptotic rate that can bereached with error probability exactly zero. The characterization of the zero-error capacity of an arbitrary channelis a wide open problem, that shares deep connections with Graph Theory. Equivalently, the zero-error capacity ∗ Nicolas Charpenay gratefully acknowledges ﬁnancial support from ENS Paris-Saclay† Maël Le Treust gratefully acknowledges ﬁnancial support from INS2I CNRS, DIM-RFSI, SRV ENSEA, UFR-ST UCP, The Paris SeineInitiative and IEA Cergy-Pontoise. This research has been conducted as part of the project Labex MME-DII (ANR11-LBX-0023-01).

January 13, 2020 DRAFT a r X i v : . [ c s . I T ] J a n is the asymptotic limit of the independence number of iterated strong products of channel graphs. This probleminspired Berge’s notion of perfect graphs [2, Chap 16], for which the zero-error capacity is given by the one-shotindependence number [3, Theorem 4.18]. Over time, this open problem has attracted a lot of attention both fromthe Information Theory and Combinatorics communities, see [4, Chap. 27]. A. Motivations

The zero-error capacity problem has several applications. In data center storage systems, the automatic treatmenta large amount of data imposes reliability constraints which do not support any positive probability of error, even ifarbitrarily small. In [5], Kovaˇcevi´c investigated the zero-error capacity, for the duplication channels in

DNA-baseddata storage systems , and in [6], [7], for the timing channels in molecular communication .The analysis of the zero-error capacity problem provides an important insight into the exponential decrease ofthe error probability for the classical channel coding problem, see the discussion in [8, pp. 203]. In particular, itshares deep connections with the channel coding problem in the ﬁnite-blocklength regime [9], of interests for thetransmission of short packet in

IoT networks . For example, the error-exponent goes to inﬁnity when the coding rateapproaches the largest rate of a zero-error code [10], [11].When considering channel with memory [12], the zero-error capacity problem covers the problems of codingfor error-correction [13]. In [14], [15], Dalai uses the zero-error capacity tools in order to bound on the minimumdistance of codes. In [16], Bose et al. describe codes which achieve zero error capacities in limited magnitude errorchannels.

B. Tools and bounds for zero-error capacity

In [1], Shannon gives a sufﬁcient condition to determine the zero-error capacity, based on the existence of anadjacency-reducing mapping for the channel graph. This condition works well to almost all channels with 5 symbolsor less, and boils down to speciﬁc instances of tesselation covers, studied by Abreu et al. in [17]. In [18], Lovaszgives bounds on the zero-error capacity with the well-known θ function, and determines the capacity for the speciﬁcclass of auto-complementary and vertex-transitive channel graphs. The θ number possesses the property of beingmultiplicative with respect to the strong product. Another graph invariant, called the Rosenfeld number [19], alsopresents this property. In [20], Hales studies this number and shows that Shannon’s adjacency-reducing mappingscondition in [1] is not necessary. In [21], Haemers proves another upper bound on the zero-error capacity of achannel, based on the rank of the adjacency matrix of the channel graph. In [22], Alon builds a counterexample forthe zero-error capacity of a union of channels, disproving the conjecture formulated by Shannon in [1], which statesthat the zero-error capacity, deﬁned without the logarithm, is additive with respect to disjoint union of channels.In [23], the authors study a variant of the zero-error capacity problem in which the strong product of graphs isreplaced by the direct product. Instead of considering the independence number of product graphs, Hahn et al. studyin [24], the ratio between the independence number and the number of vertices, and they derive bounds based onthe chromatic and the fractional chromatic numbers. In [25], Körner and Orlitsky give a review of the literature onzero-error capacity and its variants. January 13, 2020 DRAFT C , corresponding to the noisy-typewriter channel with letters, isunknown. Some recent lower and upper bounds are stated in [26]. C. About cycles

In [27], Gallai extends Shannon’s condition to bipartite channel graphs, that is a graph without odd cycles, leadingto further interests in determining the zero-error capacity for odd cycles. The product of odd cycles of differentsizes is investigated by Sonnemann and Krafft, in [28], and by Vesel, in [29], while Bohman and Holzman studythe complementary graphs of odd cycles, in [30]. The channel graph with odd cycles of 7 vertices, denoted by C , is depicted in Fig. 1. Its zero-error capacity is still unknown, despite several attempts to build zero-error codeson these channels, see [31], [32], and [26]. In Fig. 2, we depict the best known bounds on the zero-error rate forsmall number of channel uses, for the channel graphs C and C . An other interpretation of the zero-error capacityproblem for cycles is the tiling problem, studied in [33], whose solutions also provide upper bounds for every ﬁnitenumber of channel uses. In [34], Badalyan and Markosyan determine the maximum independent sets of productsof cycles-powers, that are cycles with edges added towards the vertices of distance at most k . In [35] and [36],Bohman characterizes the asymptotic zero-error capacity of odd cycles, when the size of the cycle goes to inﬁnity.In [37], Mathew and Östergård improve several lower bounds on the capacities of odd cycles using stochastic searchmethods.The graphs with odd cycles are also related to Berge’s conjecture [38], later proved in [39] by Chudnovsky etal., namely “a graph G is perfect if and only if either G or its complementary graph ¯ G , have odd cycles of length5 or more”. Since the zero-error capacity of the cycle graph C is known, as well as the zero-error capacity forperfect graph, see [3, Theorem 4.18], C is the minimal connected graph for which the zero-error capacity is stillan open problem. D. Variable-length coding

In this paper, we investigate the zero-error capacity by constructing variable-length coding schemes. In [40],Shannon determines the asymptotic performances of variable-length coding via the characteristic polynomial oflinear difference equations. In [41], Weidmann et al. study variable-length arithmetic coding scheme for a jointsource-channel coding scenario. In [42], Flajolet and Sedgewick investigate rational expressions and variable-lengthcoding schemes through their respective generating series. In [43], Devroye propose a zero-error variable-lengthcommunication scheme, assuming that the transmitter has a perfect, but rate limited channel feedback. In [44], Guoand Watanabe present a family of graph where no ﬁnite-length code can achieve the zero-error capacity.

January 13, 2020 DRAFT

Fig. 2: Best known lower and upper bounds on the maximum zero-error rate achievable for small number of channeluses, see [18], [26], [30]–[33], [37]

E. Recent information-theoretic literature

In [45]–[47], Devroye et al. investigate the zero-error capacity of the primitive relay channel, by proposing aone-shot relaying scheme, termed color-and-forward. They highlight the connection to the zero-error source codingproblem with receiver’s side information, studied by Witsenhausen in [48]. In [49], the authors deﬁne a new notionof product of graph that allows to compute recursively the optimal relaying scheme. Several multi-user channels,such as the relay, the multiple-access, the broadcast, and the interference channels, are investigated in [50], wherenecessary and sufﬁcient conditions regarding the positivity of the zero-error capacity are provided. The zero-errorcapacity with noisy channel feedback is studied in [51] and [52], where dynamic programming provide lower andupper bounds.In [53], [54], Wang and Shayevitz investigate the combination of zero-error source and channel coding schemes,by introducing the notion of “graph information ratio”, also related to the relative Shannon capacity of two graphs,introduced by Körner and Marton in [55]. In [56], Shayevitz investigates the zero-error broadcasting problem byintroducing the ρ -capacity function, for which upper and lower bounds are derived. In [57], Ordentlich and Shayevitzinvestigate the zero-error capacity region of the multiple access channel, called the binary adder. They provide anew outer bound that strictly improves upon the bound obtained by Urbanke and Li, in [58].In [59]–[61], Wiese et al. deﬁne the zero-error wiretap codes by requiring that every output at the eavesdroppercan be generated by at least two inputs. They deﬁne the zero-error secrecy capacity as the supremum of rates forwhich there exists a zero-error wiretap code, and they show it either equals zero or the zero-error capacity of thechannel between the encoder and the legitimate receiver.In [62], Ruiz and Pérez-Cruz construct linear codes over rings, and provide a lower bound on the zero-errorcapacity for the noisy-typewriter channel with odd letters of the form n + , that outperforms Bohman’s bound in January 13, 2020 DRAFT [30]. In [63], Cullina et al. introduced a different notion of product of graph by removing edges between sequenceswhich differ in more than d positions. They provide upper and lower bound on the asymptotic independence numberof such an iterated product of graph. In [64], Dalai improves the bound on the zero-error list-decoding capacity,introduced by Elias in [65]. In [66], Xu and Radziszowski study the construction of lower bounds for multicolorRamsey numbers of product graphs, and their relation to the zero-error capacity. In particular, the authors prove thatthe supremum of the zero-error capacity over all graphs with independence number equal to , cannot be achievedby any ﬁnite graph power. F. Scenarios and contributions

In this paper, we design three coding algorithms that are based on a generator set of zero-error words, refereed toas variable-length, intermingled, and automata-based coding schemes. We characterize their respective asymptoticperformances via the root of the characteristic polynomial, in Theorem III.4, the spectral radius of the adjacencymatrix, in Theorem IV.5, the inverse of the convergence radius of the generator series, in Theorem VI.10. • Variable-length coding scheme, in Sec. III, produces channel inputs sequences by concatenation of zero-errorwords from a generator set. • Intermingled coding scheme, in Sec. IV, allows to stop the transmission and switch to another word from thegenerator set. Thus, additional information is embedded over the positions of such a stops and switches. • An example with the channel graph C ⊞ , is stated in Sec. V. We construct explicitly the generator set ofzero-error words, for which the asymptotic rate of the intermingled coding achieves the zero-error capacity. • Automata-based coding scheme, in Sec. VI, generalizes the two previous algorithms by allowing multipleinterleaving of the same word from the generator set.The paper is organized as follows. The deﬁnitions of the zero-error channel capacity and of the maximumindependence number of the product graph of the channel are stated in Sec. II. The variable-length coding and theintermingled coding are studied in Sec. III and IV. In Sec V, we provide an example based on the channel graph C ⊞ . The automata-based coding scheme is investigated in Sec. VI. The proofs are stated in the Appendices.II. P RESENTATION OF THE MODEL

A. Notations ⬩ Given a ﬁnite set A , we denote by P ( A ) its power set, and A its cardinality. ⬩ We use the following notations for matrices slicing. Given a matrix M ∈ M n,p ( R ) we deﬁne M ∶ ,j ≐ ⎛⎜⎜⎜⎜⎜⎜⎜⎝ M ,j ⋮ M n,j ⎞⎟⎟⎟⎟⎟⎟⎟⎠ and M i, ∶ ≐ ( M i, ⋯ M i,p ) . (1)We use the same notation for tuples or words : let w = x ⋯ x ∣ w ∣ be a word over the alphabet X , we note ∣ w ∣ the length of w and we deﬁne w i ∶ = x i x i + ...x ∣ w ∣ and w ∶ j = x ...x j (2) January 13, 2020 DRAFT ⬩ We note the support of a vector supp , that is the set of the indexes of its non-null components. ⬩ Let A ∈ M m,n ( R ) and B ∈ M p,q ( R ) be two matrices, we note A ⊗ B ∈ M mp,nq ( R ) their Kroneckerproduct, that is : A ⊗ B ≐ ( A i ,j B i ,j ) i ∈ (cid:74) ,m (cid:75) × (cid:74) ,p (cid:75) ,j ∈ (cid:74) ,n (cid:75) × (cid:74) ,q (cid:75) (3)and we also note A ⊗ L ≐ A ⊗ ... ⊗ A ( L times). B. Zero-error capacity

We consider a Discrete Memoryless Channel (DMC) where X denotes the input alphabet, Y denotes the outputalphabet, and W = ( W x,y ) x ∈ X ,y ∈ Y denotes the transition probabilities. Deﬁnition II.1 (Channel graph)

The channel graph G W ≐ ( V ( G W ) , E ( G W )) is deﬁned by the input alphabetas set of vertices V ( G W ) = X , and xx ′ ∈ E ( G W ) if W x,y > and W x ′ ,y > for some output y ∈ Y .Two inputs x and x ′ are distinguishable if they satisfy W x,y = or W x ′ ,y = for all input y , equivalently x and x ′ are distinguishable ⇔ max y ∈ Y min ( W x,y , W x ′ ,y ) = ⇔ xx ′ ∉ E ( G W ) . (4) A family of inputs is distinguishable if its elements are pairwise distinguishable.

The channel graph is the main tool for the characterization of the zero-error capacity, that is the maximumasymptotic number of bits that can be transmitted with zero error. Before stating the deﬁnition, we introduceFekete’s Lemma, see [67, Lemma 11.6, pp. 103].

Lemma 1 (Fekete)

For all superadditive sequence ( u l ) l ∈ N , i.e. u l + l ′ ≥ u l + u l ′ for all ( l, l ′ ) , lim l → ∞ u l l exists and isequal to sup l u l l (it can be +∞ ). Deﬁnition II.2 (Zero-error capacity)

Let N ( W, L ) be the maximum number of distinguishable inputs for L usesof the channel W . Equivalently, N ( W, L ) is the value of the optimization problem max N ⊆ X L N (5) s.t. max y ∈ Y L ( W ⊗ Lx,y ) x ∈ N ≤ . (6) Then the zero-error capacity of W is deﬁned by C ( W ) ≐ lim sup L → ∞ L log ( N ( W, L )) (7) = [ Fekete ] sup L L log ( N ( W, L )) , (8) where the second equality comes from Fekete’s lemma as the sequence log N ( W, ⋅ ) is superadditive for all W .Indeed, for L + L ′ channel uses, there exists at least N ( W, L ) ⋅ N ( W, L ′ ) distinguishable inputs, by using successivelythe codebook for L ′ channel uses and the codebook for L channel uses. January 13, 2020 DRAFT

C. Maximum independent set of the strong product channel graph

The zero-error capacity relates to the graph theoretic notions of strong product and maximum independent set . Deﬁnition II.3 (Strong product ⊠ ) Let G ≐ ( V ( G ) , E ( G )) and G ′ ≐ ( V ( G ′ ) , E ( G ′ )) , we deﬁne the strongproduct, i.e. the AND product, G ⊠ G ′ ≐ ( V ( G ⊠ G ′ ) , E ( G ⊠ G ′ )) by V ( G ⊠ G ′ ) ≐ V ( G ) × V ( G ′ ) , (9) ∀ ( v , v ′ ) ≠ ( v , v ′ ) , ( v , v ′ )( v , v ′ ) ∈ E ( G ⊠ G ′ ) if (10) ⎛⎜⎜⎜⎝ v v ∈ E ( G ) or v = v ⎞⎟⎟⎟⎠ AND ⎛⎜⎜⎜⎝ v ′ v ′ ∈ E ( G ′ ) or v ′ = v ′ ⎞⎟⎟⎟⎠ . As an example, we consider two channel graphs G = G ′ = G ⊠ G ′ is theking’s graph, corresponding to two channel uses. We denote by G ⊠ LW = G W ⊠ ... ⊠ G W , the L -times iterated strong product, and we give several equivalentinterpretations. ⬩ ( x l ) l ≤ L and ( x ′ l ) l ≤ L are not distinguishable. ⬩ For all l ≤ L , there exists y l such that W x l ,y l > and W x ′ l ,y l > . ⬩ xx ′ ∈ E ( G ⊠ LW ) . ⬩ ⟨ W ⊗ Lx, ∶ , W ⊗ Lx ′ , ∶ ⟩ > . Deﬁnition II.4 (Disjoint union ⊞ ) Given two graphs G = ( V ( G ) , E ( G )) and G ′ = ( V ( G ′ ) , E ( G ′ )) , we deﬁne G ⊞ G ′ = ( V ( G ⊞ G ′ ) , E ( G ⊞ G ′ )) to be the disjoint union between G and G ′ , that is : V ( G ⊞ G ′ ) = V ( G ) ∪ V ( G ′ ) , (11) vv ′ ∈ E ( G ⊞ G ′ ) if ⎛⎜⎜⎜⎝ v, v ′ ∈ V ( G ) and vv ′ ∈ E ( G )⎞⎟⎟⎟⎠ OR ⎛⎜⎜⎜⎝ v, v ′ ∈ V ( G ′ ) and vv ′ ∈ E ( G ′ )⎞⎟⎟⎟⎠ . (12) Remark II.5

Since the set of ﬁnite graphs with the laws ⊞ , ⊠ have a semiring structure, we denote the graphwith one vertex and the graph with zero vertex. Deﬁnition II.6 (Independent set)

An independent set S is a subset of V ( G ) such that ∀ s, s ′ ∈ S , ss ′ ∉ E ( G ) . Deﬁnition II.7 (Independence number α ) The independence number of a graph G is the size of the largestindependent set of G . It is denoted by α ( G ) . January 13, 2020 DRAFT

Proposition II.8

The maximum number of distinguishable inputs N ( W, L ) is the independence number α ( G ⊠ LW ) of the product graph G ⊠ LW . Thus it makes sense to work directly on channel graphs, independently of the channel transition probabilities W = ( W x,y ) x ∈ X ,y ∈ Y , that generated the channel graph.III. V ARIABLE - LENGTH CODING

In this section, we introduce a variable-length coding scheme based on a generator set of words, tailored for thezero-error transmission. We characterize the asymptotic coding rate via the unique positive root of the characteristicpolynomial of the generator set.

Deﬁnition III.1 (Set of words)

For a given ﬁnite set S , we deﬁne the set of words over S by S ∗ ≐ ⋃ l ∈ N S l , (13) with the usual concatenation law, i.e. the neutral element for that law is the empty word, denoted by (cid:15) . The lengthof a word w ∈ S ∗ is the integer l such that w ∈ S l and is denoted by ∣ w ∣ . For a given subset S ′ ⊆ S ∗ , andintegers l, l ′ such that l ≤ l ′ , we deﬁne S ′ [ l ] ≐ { w ∈ S ′ ∣ ∣ w ∣ = l } , (14) S ′ [ l ∶ l ′ ] ≐ { w ∈ S ′ ∣ l ≤ ∣ w ∣ ≤ l ′ } . (15)The variable-length coding is based on the following idea, instead of determining the maximum number ofdistinguishable inputs over L channel uses when L goes to inﬁnity, we consider codes in X ∗ and we determinethe asymptotic number of transmitted symbols per channel use. Deﬁnition III.2 (Generator set)

The generator set is a ﬁnite subset C of X ∗ composed of words of variable length.The generator set C is zero-error for the channel W if ∀ c, c ′ ∈ C such that c ≠ c ′ , ∣ c ∣ ≤ ∣ c ′ ∣ , and cc ′∶ ∣ c ∣ ∉ E ( G ⊠ ∣ c ∣ W ) , (16) with the convention that vertices in G ⊠ ∣ c ∣ W are auto-adjacent. At the end of its transmission, the word c ∈ C isdistinguishable from any other word c ′ ∈ C . The variable-length coding scheme produces the channel input sequences by concatenating the words from thegenerator set C . The set of all possible channel input sequences of length L , constructed with such a procedure,is denoted by C ∗ [ L ] . The zero-error property extends naturally from the generator set C to the set of channel inputsequences C ∗ [ L ] . Deﬁnition III.3 (Asymptotic rate of variable-length codes)

We consider the generator set C that is zero-errorfor the channel W . The asymptotic rate of C is deﬁned by r ( C ) ≐ lim L → ∞ L log C ∗ [ L ] = [ Fekete ] sup L ∈ N L log C ∗ [ L ] . (17) January 13, 2020 DRAFT

The average number of transmitted symbols per channel use is deﬁned by ν ( C ) ≐ lim L → ∞ L √ C ∗ [ L ] = [ Fekete ] sup L ∈ N L √ C ∗ [ L ] = r ( C ) . (18) We can apply Fekete’s lemma only if the values of log C ∗ [ l ] are ﬁnite for all l large enough, i.e. if and only if gcd (∣ c ∣ , c ∈ C ) = . When gcd (∣ c ∣ , c ∈ C ) = d ≠ , we deﬁne the rate as r ( C ) ≐ lim L → ∞ dL log C ∗ [ dL ] , (19) and we take again ν ( C ) ≐ r ( C ) with this new deﬁnition. The asymptotic rate r ( C ) corresponds to the asymptotic number of bits transmitted per channel use byconcatenating the variable-length words from the generator set C . For each generator set C , we have r ( C ) ≤ C ( W ) if C is zero-error. Theorem III.4 (Rate computation of variable-length codes)

Let W be a DMC and C ⊆ X ∗ a generator set thatis zero-error for the channel W . We denote by l > , (resp. l > ), the maximal length, (resp. the minimal length),of the words in C . Then ν ( C ) is the unique positive solution of the characteristic polynomial X l − l ∑ l = l C [ l ] X l − l = , (20) where C [ l ] = { c ∈ C ∣ ∣ c ∣ = l } . The proof of Theorem III.4 is stated in App. B and relies on standard properties of linear difference equations,pointed out by Shannon in [40, Part I] for discrete noiseless systems, see also [68, pp. 13]. In Sec. V, we provide anexample based on a channel graph for which we compute explicitly the asymptotic rate r ( C ) of the variable-lengthcode. Remark III.5 (Linear difference formulation)

We provide an equivalent formulation for the average number oftransmitted symbols per channel use, for a variable-length code with generator set C . Given C ∗ [ l ] for l ∈ (cid:75) L − l, L (cid:75) ,one can compute C ∗ [ L + ] as a linear combination of the ( C ∗ [ l ] ) l ∈ (cid:75) L − l,L (cid:75) , i.e. C ∗ [ L ] = l ∑ l = l C [ l ] C ∗ [ L − l ] . (21) In particular, the companion matrix of the characteristic polynomial (20) M = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ ( ) ⋱⋱ ⋱ ( ) C [ l ] C [ l − ] ⋯ C [ ] C [ ] ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ , (22) satisﬁes M ( C ∗ [ l ] ) l ∈ (cid:75) L − l,L (cid:75) = ( C ∗ [ l + ] ) l ∈ (cid:75) L − l,L (cid:75) . (23) January 13, 2020 DRAFT0

Then the asymptotic rate corresponds to the dominant term after computing the Jordan decomposition of M .Note that the number of generated words for any bounded length window (cid:74) L − l, L (cid:75) behaves equivalentlyasymptotically, for all l ∈ N we have L √ C ∗ [ L − l ∶ L ] → L → ∞ ν ( C ) . (24) Proposition III.6 C ( W ) = sup C log ν ( C ) .Proof . [Proposition III.6] On one hand, variable length codes cannot achieve a better rate than C ( W ) . On theother hand, the ﬁxed length codes, i.e. subsets of X L , are included in the set of variable length codes, so we have C ( W ) ≥ sup C log ν ( C ) ≥ sup L log α ( G ⊠ LW ) L = C ( W ) . (25)Thus ﬁnding an optimal variable-length code is equivalent to ﬁnding a family of ﬁxed-length codes with optimalsupremum rate. For some scenario, a variable-length code might be far easier to describe than an inﬁnite family ofﬁxed-length codes. IV. I NTERMINGLED CODING

In this section, we generalize the previous variable-length coding scheme, by revisiting Shannon’s intermingledcoding scheme [1, proof of Th. 4]. Instead of allowing only the concatenation of words taken from the generatorset C ⊆ X ∗ , the intermingled coding scheme allows to stop the transmission of one word c ∈ C and resume it later,in an "intermingling" pattern, so that additional information is embedded on positions of such stops. Deﬁnition IV.1 (Intermingled codes)

An intermingled code is composed of a generator set C ⊆ X ∗ that is zero-error for the channel W , and a succession rule ρ ∶ ∏ c ∈ C (cid:74) , ∣ c ∣ − (cid:75) → P ( C ) , (26) such that ρ ( z ) is nonempty for all z ∈ ∏ c ∈ C (cid:74) , ∣ c ∣ − (cid:75) , which maps the possible transmission states to the possibletransmittable characters at the next time step. The encoder chooses a time horizon L , then maps the message tosome sequence ( x l ) l ≤ L ∈ X L based on the following algorithm : ⬩ Initialize z ← ( , ..., ) vector of size C and l ← ⬩ While l ≤ L : ⋄ Choose a word w ∈ ρ ( z ) ( ⊆ C ) ⋄ Transmit x l = w z w + over the channel ⋄ z w ← z w + ∣ w ∣ ⋄ l ← l + An intermingled code ( C , ρ ) is zero-error for the channel W if all the possible sequences generated by thisalgorithm are distinguishable, i.e. for all generated codewords x and x ′ from X L , there exists a time step l ≤ L such that x l x ′ l ≠ E ( G W ) . January 13, 2020 DRAFT1

Remark IV.2

The choice : ρ ∶ z ↦ ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ C if z = ( , ..., ) , { c ∈ C ∣ z c ≠ } otherwise, (27) corresponds to the variable-length coding scheme of Sec. III, based on the simple concatenation of words. Thevariable-length codes are a subclass of intermingled codes. Deﬁnition IV.3 (Asymptotic rate of intermingled codes)

The asymptotic rate of an intermingled code ( C , ρ ) isdeﬁned by r ( C , ρ ) ≐ lim L → ∞ L log S L , (28) where S L denotes the set of channel input sequences x ∈ X L that are generated by the algorithm described inDeﬁnition IV.1 with time horizon L , with z w = ( , ..., ) at the last time step. We also deﬁne the average numberof transmitted symbols per channel use : ν ( C , ρ ) ≐ r ( C ,ρ ) (29) The existence of the limit is given by Fekete’s lemma, as log S L + L ′ ≥ log S L + log S L ′ for all L, L ′ ∈ N .Similarly to the Deﬁnition III.3, if there exists a nonempty maximal family of integers ( k i ) i ∈ I such that d = gcd ( k i , i ∈ I ) ≠ with S k i L = for all i ∈ I and L ∈ N , then Fekete’s lemma cannot be applied directly andwe redeﬁne the rate by r ( C , ρ ) ≐ lim L → ∞ dL log S dL , (30) and we take again ν ( C ) ≐ r ( C ,ρ ) with this new deﬁnition. Deﬁnition IV.4 (Transition graph)

Let ( C , ρ ) be a zero-error intermingled code, we deﬁne its transition graph G = ( V ( G ) , E ( G )) by V = ∏ c ∈ C (cid:74) , ∣ c ∣ − (cid:75) , (31) ∀ v, v ′ ∈ V , vv ′ ∈ E if ∃ i ∈ ρ ( v ) , ∀ j ≤ l, v ′ j = v j + i = j mod j, (32) i.e. v ′ can be obtained by adding 1 to only one of the components of v that is included in ρ ( v ) , modulo ∣ v ∣ . Theorem IV.5 (Rate computation of intermingled codes)

We consider the intermingled code ( C , ρ ) that is zero-error for the channel W . The asymptotic rate satisﬁes r ( C , ρ ) = log max i ∣ λ i ( M G )∣ , (33) where ( λ i ) i ≤ V are the elements of the spectrum of M G , which is the adjacency matrix of G the transition graph. The proof of Theorem IV.5 is stated in App. C. In Sec. V, we provide an example based on the channel graph C ⊞ , for which we compute explicitly the asymptotic rate r ( C ) of the intermingled code. January 13, 2020 DRAFT2

Remark IV.6 (Optimality of intermingled codes)

In [1, Theorems 3 and 4], Shannon proves the optimality ofintermingled codes for two parallel channels, when an adjacency-reducing mapping can be built over one of thetwo channels.

Given the generator set of an intermingled code C , one can deﬁne the maximal succession rule as argmax ρ r ( C , ρ ) (34)s.t. ( C , ρ ) is zero-error . (35)Thus ﬁnding the optimal rate over the set of intermingled codes boils down to an optimization problem over P ( C ) ∏ c ∈ C (cid:74) , ∣ c ∣ − (cid:75) : the succession rule is in general far too complex to be described, in order to make this approachtractable. The main interest lies instead in the possible existence of an optimal intermingled generator of the familyof ﬁxed-length codes that asymptotically reaches the capacity, even if no ﬁnite ﬁxed-length code can reach it.V. E XAMPLE

Let us consider the channel graph C ⊞ , depicted in Fig. 3 with vertices { , , , , } ∪ { } . We consider twogenerator sets C = { } ∪ { , , , , } and C ′ = { , , , , } ∪ { , } , that are zero-error for anychannel W whose graph is C ⊞ . 0 5123 4Fig. 3: The channel graph C ⊞ A. Variable-length coding scheme

The variable-length code obtain with the generator set C = { } ∪ { , , , , } is depicted in Fig. 4. Themaximal length, (resp. the minimum length), is l = , (resp. l = ). By Theorem III.4, the asymptotic rate of thevariable-length code is r ( C ) = log ( ν ( C )) where ν ( C ) is solution of the characteristic polynomial X − X − = .This gives us ν ( C ) = + √ ≃ . and a rate r ( C ) ≃ . , which is inferior to the zero-error capacity of thischannel C = log ( + √ ) ≃ . and also + √ ≃ . , obtained by combining Lovasz capacity result for thegraph C in [18], with the result of Shannon for adjacency reducing mapping, in [1]. Lemma 2 (Small number of channel uses)

For a small number of channel uses and C = { } ∪ { , , , , } , the transmission rates are given byChannel uses: L L √ C ∗ [ L ] √ ≃ . √ ≃ . √ ≃ . √ ≃ . January 13, 2020 DRAFT3 ⋮ ⋮ ⋮ C ∗ [ ] = C ∗ [ ] = C ∗ [ ] = C ∗ [ ] = (cid:15) Fig. 4: Set of channel input sequences obtained by concatenation of words from the generator set C = { } ∪ { , , , , } .Let us now consider the generator set C ′ = { , , , , } ∪ { , } that is also zero-error for the graphis C ⊞ . The maximal length, (resp. the minimum length), of words in C is l = , (resp. l = ). The correspondingvariable-length code achieves a rate r ( C ′ ) = log ( ν ( C ′ )) where ν ( C ′ ) is solution of the characteristic polynomial X − X − = . This gives us ν ( C ′ ) = + √ ≃ . and a rate r ( C ′ ) ≃ . , which is also inferior to theknow zero-error capacity of this channel C = log ( + √ ) ≃ . and also + √ ≃ . . Lemma 3

For a small number of channel uses and C ′ = { , , , , } ∪ { , } , the transmission ratesare given byChannel uses: L L √ C ′∗ [ L ] √ ≃ . √ ≃ . √ ≃ . Remark V.1 (One-shot maximum independent set)

Coding on the one-shot maximum independent set with C ′′ ≐ { , , } achieves a better rate than with the generator sets C and C ′ , as ν ( C ′′ ) = .B. Impact of non-dominant eigenvalues As shown in the proof of Theorem III.4 in App. B, the asymptotic rate can be read in the dominant eigenvalueof the adjacency matrix of the transition graph. Studying the other eigenvalues gives the exact shape and decay-rateof the teeth patterns in the rate curves. A closed-form expression can be obtained with a Jordan decomposition ofthe adjacency matrix.For example, the code C ′ generates the following transition matrix : ⎛⎜⎜⎜⎜⎜⎜⎜⎝ ⎞⎟⎟⎟⎟⎟⎟⎟⎠ = P ⎛⎜⎜⎜⎜⎜⎜⎜⎝ − − √ + √ ⎞⎟⎟⎟⎟⎟⎟⎟⎠ P − . (36) January 13, 2020 DRAFT4

Fig. 5: Transmission rates of the generated ﬁxed-length codes corresponding to the generator sets C = { , , , , , } and C ′ = { , , , , , , } , with their respective limits.We have for all L ∈ N : ⎛⎜⎜⎜⎜⎜⎜⎜⎝ C ′∗ [ L + ] C ′∗ [ L + ] C ′∗ [ L + ] ⎞⎟⎟⎟⎟⎟⎟⎟⎠ = P ⎛⎜⎜⎜⎜⎜⎜⎜⎝ − − √ + √ ⎞⎟⎟⎟⎟⎟⎟⎟⎠ L P − ⎛⎜⎜⎜⎜⎜⎜⎜⎝ C ′∗ [ ] C ′∗ [ ] C ′∗ [ ] ⎞⎟⎟⎟⎟⎟⎟⎟⎠ . (37)The general solution for linear difference equations [68, pp. 13], writes C ′∗ [ L ] = h ( + √ ) L + h ( − ) L + h ( − √ ) L , (38)where the parameters ( h , h , h ) ∈ R are determined by the initial values ( C ′∗ [ ] , C ′∗ [ ] , C ′∗ [ ] ) = ( , , ) .Straightforward computations lead to C ′∗ [ L ] = + √ ( + √ ) L + ( − ) L + − √ ( − √ ) L . (39)We observe two oscillating terms in this expression, which are asymptotically dominated by ( + √ ) L , creatingthe vanishing teeth-shaped patterns on the rate-length curves. The oscillations’ amplitude decrease at the rate ( ∣ − ∣∣ + √ ∣ ) L = ( √ − ) L ≃ . L . Since C ′∗ [ L ] = h ν ( C ′ ) L + O (( − ) L ) with h = + √ ≠ and (cid:187)(cid:187)(cid:187)(cid:187)(cid:187)(cid:187) − ν ( C ′ ) (cid:187)(cid:187)(cid:187)(cid:187)(cid:187)(cid:187) < we have L √ C ′∗ [ L ] − ν ( C ′ ) = e L ln ( C ′∗ [ L ] ) − e ln ( ν ( C ′ )) ∼ e ln ( ν ( C ′ )) ( L ln ( C ′∗ [ L ] ) − ln ( ν ( C ′ ))) (40) ∼ ν ( C ′ ) L ln ( h ) + ν ( C ′ ) ln ( + O ( ( − ) L ν ( C ′ ) L )) ∼ ln ( h ) ν ( C ′ ) L . (41)Thus the speed of convergence towards the asymptotic rate is O ( L ) . January 13, 2020 DRAFT5

Fig. 6: Value of L √ C ′∗ [ L ] with its oscillationsIn Fig. 6, we represent L √ C ′∗ [ L ] along with the three following functions for t ∈ ( , ] that capture itsoscillations. f ( t ) = ( + √ ( + √ ) t +

47 2 t + − √ (√ − ) t ) t , (42) f ( t ) = ( + √ ( + √ ) t −

47 2 t − − √ (√ − ) t ) t , (43) f ( t ) = ( + √ ( + √ ) t ) t = ( + √ ) t ( + √ ) . (44) C. Intermingled coding scheme

Now we consider the intermingled code ( C , ρ ) , where C = { } ∪ { , , , , } , ρ ∶ ( z , z , z , z , z , z ) ↦ ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ C if ( z , z , z , z , z ) = ( , , , , ) , { } ∪ { c ∈ C ∣ z c ≠ } otherwise. (45)This code is zero-error as z has at most one positive component for all time step. Following the Deﬁnition IV.4,the corresponding transition graph is depicted in Fig. 7. Proposition V.2

The intermingled coding scheme ( C , ρ ) deﬁned by C = { } ∪ { , , , , } and equation (45) , achieves the zero-error capacity C = log ( + √ ) of the channel graph C ⊞ . For a given generator set C = { } ∪ { , , , , } , the intermingled coding scheme performs better than thevariable-length coding since the asymptotic rate r ( C ) = log ( + √ ) ≃ . obtained in Sec. V-A, is strictlysmaller than the channel capacity C = log ( + √ ) ≃ . obtained with the intermingled code ( C , ρ ) . We alsorecall that since ( + √ ) L ∉ N for all L , any ﬁxed-length code is suboptimal, as noticed in [25]. January 13, 2020 DRAFT6 e e e e e Fig. 7: Transition graph of the intermingled coding scheme ( C , ρ ) deﬁned by (45). Proof . [Proposition V.2] The adjacency matrix M G of the transition graph depicted in Fig. 7, with the possibletransmission states ( , e , e , e , e , e ) , where ( e i ) denotes the canonical basis of R , is given by M G = ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ . (46)The characteristic polynomial of this matrix is ( X − − √ )( X − + √ )( X − ) . The intermingled code hasthe rate equal to its maximal root, that is + √ , thus corresponding to the zero-error capacity C = log ( + √ ) of the channel graph C ⊞ . D. Variable-length code for the channel graph C We recall that the zero-error capacity corresponding to the channel graph C , depicted in Fig. 1, is stillunknown, see [26]. Let us build a zero-error variable-length code on C . We consider the generator set ̂ C = { , , , , , , } , ν ( ̂ C ) is the positive solution of X = X + , so ν ( ̂ C ) = . Note that it is equal tothe cardinality of the one-shot maximum independent set, e.g. ̂ C ′ = { , , } .VI. A UTOMATA - BASED CODING

Our previous Theorems III.4 and IV.5 share deep connections with the Automata Theory. In this section, weformulate the variable-length and intermingled coding schemes as particular automata. By using the concepts of regular expressions and generator series , we simplify the rate analysis of the optimal coding scheme for the channelgraph C ⊞ . January 13, 2020 DRAFT7

A. Main concepts

The deﬁnitions and characterization proofs in this subsection can be retrieved from [69].

Deﬁnition VI.1 (Deterministic ﬁnite automaton (DFA))

A deterministic ﬁnite automaton (DFA) is a tuple A = ( X , S , τ, s start , S accept ) , which consists in ⋄ a ﬁnite alphabet X , ⋄ a ﬁnite set of states S , ⋄ a transition function τ ∶ S × X → S , ⋄ an initial state s start ∈ S , ⋄ a subset of accept states S accept ⊆ S .We extend τ to S × X ∗ by deﬁning τ ( s, x ...x n ) ≐ τ ( τ ( ...τ ( s, x ) , x ) , ... ) , x n ) . (47)An example of automaton is depicted in Fig. 8. The automaton starts with the initial state s start , receives a word w ∈ X ∗ as entry and at each time step l ≤ ∣ w ∣ it applies the transition function to determine its new state s l + , thatis s l + = τ ( s l , w l ) . The automaton accepts the word if the ﬁnal state belongs to S accept , that is τ ( s start , w ) ∈ S accept .Otherwise the word is rejected.The set of words accepted by A is called language recognized by A and is noted L ( A ) . Deﬁnition VI.2 (Transition graph of a DFA)

We deﬁne the transition graph G A ≐ ( V ( G A ) , E ( G A )) of a DFA A = ( X , S , τ, s start , S accept ) by V ( G A ) = S , (48) ss ′ ∈ E ( G A ) if ∃ x ∈ X , τ ( s, x ) = { s ′ } . (49) Deﬁnition VI.3 (Regular expression)

The languages of DFA’s are easily described by regular expressions , as in (54) . Given a regular expression E , the corresponding language is denoted by L ( E ) . The regular expressions arebuilt with the letters of X , the symbols (cid:15) and ∅ and the operators + , ⋅ and ∗ where L ( ∅ ) = ∅ , L ( (cid:15) ) = { (cid:15) } , and ∀ x ∈ X , L ( x ) = { x } . (50) Given two regular expressions E and E ′ , we have L ( E ∗ ) = L ( E ) ∗ , (51) L ( E + E ′ ) = L ( E ) ∪ L ( E ′ ) , (52) L ( E ⋅ E ′ ) = L ( E ) ⋅ L ( E ′ ) = { ww ′ ∣ w ∈ L ( E ) , w ′ ∈ L ( E ′ )} . (53) Deﬁnition VI.4 (Regular language)

A regular language is a subset A ⊆ X ∗ such that there exists a DFA A suchthat A = L ( A ) . January 13, 2020 DRAFT8

There exists other equivalent characterizations for regular languages : ⬩ Languages recognized by nondeterministic ﬁnite automata: the deterministic automata have the same recogni-tion power as the nondeterministic ones. ⬩ Languages generated by regular expressions.

Remark VI.5 (Connection with variable-length and intermingled coding in Sec. III and IV)

The variable-lengthcoding scheme presented in Sec. III corresponds to a subclass of regular languages that can be expressed by usingthe generator set with at most 1 imbricated star, these are called star-height one languages. Furthermore, theautomaton that recognises these languages can be found in App. B, with the graph G that can be straightforwardlycompleted as a transition graph between states, by adding an initial and ﬁnal set F .The intermingled coding scheme generates a particular class of regular languages as well, the transition functioncan be also derived from the transition graph of the code, with the adequate completion, as in Fig. 8.B. Example of a deterministic ﬁnite automaton We consider the deterministic ﬁnite automaton with ⬩ alphabet X = (cid:74) , (cid:75) , ⬩ set of states S = { s , ..., s } ∪ { sink } , ⬩ s start = s as the initial and only accepting state ( S accept = { s } ), ⬩ a transition function described on Fig. 8. On each state s i , an arrow starts towards each letter x of X , and thestate s j at the extremity of this arrow corresponds to the result of the transition function { s j } = τ ( s i , x ) . s sink s s s s s X

12 52 13 44530Fig. 8: Discrete ﬁnite automaton (DFA) corresponding to the intermingled coding scheme of Sec. V-C.This automaton recognizes the language L ( E ) corresponding to the following regular expression E = ( + ( ) ∗ + ( ) ∗ + ( ) ∗ + ( ) ∗ + ( ) ∗ ) ∗ . (54)This language is exactly the union over L ∈ N of the codewords of length L generated by the intermingledcoding pattern over C ⊞ , as in Sec. V-C. January 13, 2020 DRAFT9

C. Rational coding scheme

Deﬁnition VI.6 (Rational codes)

Let E be a regular expression such that L ( E ) is inﬁnite and E = ( E ′ ) ∗ , where E ′ is a regular expression over the alphabet X . For all L ∈ N , the codebook of the rational code is deﬁned by L ( E ) [ L ] , that is the set of words of length L accepted by the automaton corresponding to the regular expression E . This code is zero-error if for all L ∈ N , the codewords from L ( E ) [ L ] are distinguishable, i.e. ∀ x, x ′ ∈ L ( E ) [ L ] , xx ′ ∉ E ( G ⊠ LW ) . Deﬁnition VI.7 (Rational code rate)

The asymptotic rate r ( E ) of a rational code E is deﬁned by r ( E ) ≐ lim L → ∞ L log ( L ( E ) [ L ] ) , and its average number of transmitted symbols per channel use ν ( E ) ≐ r ( E ) .Note that the existence of the limit is given by Fekete’s lemma, as for all L, L ′ ∈ N we have log L ( E ) [ L + L ′ ] ≥ log L ( E ) [ L ] + log L ( E ) [ L ′ ] . (55) Let E ′ be the regular expression such that E = ( E ′ ) ∗ . If gcd (∣ w ∣ , w ∈ L ( E ′ )) ≠ , then Fekete’s lemma cannotbe applied directly and we redeﬁne the rate by r ( E ) ≐ lim L → ∞ dL log ( L ( E ) [ dL ] ) , (56) where d = gcd (∣ w ∣ , w ∈ L ( E ′ )) and ν ( C ) ≐ r ( C ) , and we take again ν ( E ) ≐ r ( E ) with this new deﬁnition. The asymptotic rate of the rational code can be computed via the generator series of its associated regularexpression.

Deﬁnition VI.8 (Generator series)

Given a regular expression E , we deﬁne its generator series by F E ∶ z ↦ ∑ l ∈ N L ( E ) [ l ] z l . (57) The generator series F E of a regular expression E is always a rational fraction, as the terms of the series followa linear difference equation. Proposition VI.9 (Recursive computation of the generator series)

Let us denote E l ≐ E ⋅ ... ⋅ E l times. If theregular expressions E and E ′ satisfy L ( E ) ∩ L ( E ′ ) = ∅ , then we have F E + E ′ ( z ) = F E ( z ) + F E ′ ( z ) , (58) F EE ′ ( z ) = F E ( z ) F E ′ ( z ) . (59) If the sets ( L ( E l )) l ∈ N are disjoint, then we have F E ∗ ( z ) = − F E ( z ) . (60) January 13, 2020 DRAFT0

The proof of Proposition VI.9 is stated in App. D. For regular languages, the sequence of the number of words oflength L ∈ N satisﬁes a linear difference equation. By using the same arguments as in the proof of Theorem IV.5in App. C, one can determine a closed-form expression for this sequence and easily derive its asymptotic rate. Theorem VI.10 (Rate computation of rational codes)

Let E be a rational code, and let A be the automatonthat recognize the language L ( E ) . Then ν ( E ) equals the spectral radius of the adjacency matrix of the transitiongraph of A .Furthermore ν ( E ) is also the inverse of the convergence radius of the generator series F E , or equivalently ν ( E ) = ∣ p ∣ where p is the pole of the generator series F E with the smallest modulus. Although this theorem is a well-known result in Automata Theory, see [42, Proposition 8.1, pp. 20], we providethe proof in App. E.

Theorem VI.11 (Channel generator series)

We deﬁne the channel generator series by ∑ l ∈ N α ( G ⊠ lW ) z l . (61) The zero-error capacity of the channel C is the inverse of the convergence radius of the channel generator series.Proof . [Theorem VI.11] This is a direct consequence of the Cauchy-Hadamard theorem for power series, see [70,Theorem 2.6, pp. 55], which states that the convergence radius of a power series ∑ l ∈ N a l z l is equal to l n √ a l . D. Computation of the asymptotic rate based on the generator series

The Proposition V.2 shows that the intermingled code ( C , ρ ) deﬁned by C = { } ∪ { , , , , } and equation(45), is optimal for the optimal channel graph C ⊞ . The corresponding regular expression writes E ≐ ( + ( ) ∗ + ( ) ∗ + ( ) ∗ + ( ) ∗ + ( ) ∗ ) ∗ . (62)By using the properties (58)-(60) of the generator series F E , we compute the asymptotic rate of the automatondeﬁned by the regular expression E . F E ( z ) = − F + ( ) ∗ + ( ) ∗ + ( ) ∗ + ( ) ∗ + ( ) ∗ ( z ) (63) = − F ( z ) − F ( ) ∗ ( z ) − F ( ) ∗ ( z ) − F ( ) ∗ ( z ) − F ( ) ∗ ( z ) − F ( ) ∗ ( z ) (64) = − z − z − z (65) = z − z + z − , (66)where equation (65) comes from (59) and F ( ) ∗ ( z ) = F ( z ) F ∗ ( z ) F ( z ) = z ⋅ F ∗ ( z ) ⋅ z (67) = z ∑ l ∈ N z l = z − z . (68) January 13, 2020 DRAFT1 as it directly follows from the deﬁnition that F i ( z ) = z for all i ∈ { , ..., } .This rational fraction (66) has two poles − ± √ . The pole with smallest modulus is − + √ , which has √ + asinverse. We retrieved the optimal rate log (√ + ) of Proposition V.2 with much simpler computations.A CKNOWLEDGMENT

The authors thank Iryna Andriyanova for her insightful comment on the Theory of Automata, and ClaudioWeidmann for pointing Shannon’s results in [40, Part I].R

EFERENCES[1] C. Shannon, “The zero error capacity of a noisy channel,”

IRE Transactions on Information Theory , vol. 2, no. 3, pp. 8–19, 1956.[2] C. Berge,

Graphs and Hypergraphs , ser. North-Holland mathematical library. Amsterdam, 1973.[3] M. Grötschel, L. Lovász, and A. Schrijver, “Polynomial algorithms for perfect graphs,”

Ann. Discrete Math , vol. 21, pp. 325–356, 1984.[4] S. Klavzar, R. Hammack, and W. Imrich, “Handbook of graph products,” 2011.[5] M. Kovaˇcevi´c, “Zero-error capacity of duplication channels,”

IEEE Transactions on Communications , vol. 67, no. 10, pp. 6735–6742, Oct2019.[6] M. Kovaˇcevi´c and P. Popovski, “Zero-error capacity of a class of timing channels,”

IEEE Transactions on Information Theory , vol. 60,no. 11, pp. 6796–6800, Nov 2014.[7] M. Kovaˇcevi´c, M. Stojakovi´c, and V. Y. F. Tan, “Zero-error capacity of p -ary shift channels and ﬁfo queues,” IEEE Transactions onInformation Theory , vol. 63, no. 12, pp. 7698–7707, Dec 2017.[8] I. Csiszar and J. Körner,

Information theory: coding theorems for discrete memoryless systems . Cambridge University Press, 2011.[9] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in the ﬁnite blocklength regime,”

IEEE Transactions on Information Theory ,vol. 56, no. 5, pp. 2307–2359, May 2010.[10] M. Dalai and Y. Polyanskiy, “Bounds on the reliability of a typewriter channel,” in , July 2016, pp. 1715–1719.[11] ——, “Bounds on the reliability function of typewriter channels,”

IEEE Transactions on Information Theory , vol. 64, no. 9, pp. 6208–6222,Sep. 2018.[12] G. Cohen, E. Fachini, and J. Körner, “Zero-error capacity of binary channels with memory,”

IEEE Transactions on Information Theory ,vol. 62, no. 1, pp. 3–7, Jan 2016.[13] R. Ahlswede, N. Cai, and Z. Zhang, “Zero-error capacity for models with memory and the enlightened dictator channel,”

IEEE Transactionson Information Theory , vol. 44, no. 3, pp. 1250–1252, May 1998.[14] M. Dalai, “An elias bound on the bhattacharyya distance of codes for channels with a zero-error capacity,” in , June 2014, pp. 1276–1280.[15] ——, “Elias bound for general distances and stable sets in edge-weighted graphs,”

IEEE Transactions on Information Theory , vol. 61,no. 5, pp. 2335–2350, May 2015.[16] B. Bose, N. Elarief, and L. G. Tallini, “On codes achieving zero error capacities in limited magnitude error channels,”

IEEE Transactionson Information Theory , vol. 64, no. 1, pp. 257–273, Jan 2018.[17] A. Abreu, L. Cunha, T. Fernandes, C. de Figueiredo, L. Kowada, F. Marquezino, D. Posner, and R. Portugal, “The graph tessellation covernumber: extremal bounds, efﬁcient algorithms and hardness,” in

Latin American Symposium on Theoretical Informatics . Springer, 2018,pp. 1–13.[18] L. Lovász, “On the shannon capacity of a graph,”

IEEE Transactions on Information Theory , vol. 25, no. 1, pp. 1–7, 1979.[19] M. Rosenfeld, “On a problem of ce shannon in graph theory,”

Proceedings of the American Mathematical Society , vol. 18, no. 2, pp.315–319, 1967.[20] R. Hales, “Numerical invariants and the strong product of graphs,”

Journal of Combinatorial Theory, Series B , vol. 15, no. 2, pp. 146–155,1973.[21] W. Haemers et al. , “An upper bound for the shannon capacity of a graph,” in

Colloq. Math. Soc. János Bolyai , vol. 25, 1978, pp. 267–272.[22] N. Alon, “The shannon capacity of a union,”

Combinatorica , vol. 18, no. 3, pp. 301–310, 1998.

January 13, 2020 DRAFT2 [23] P. K. Jha and S. Klavzar, “Independence in direct-product graphs,”

Ars Combinatoria , vol. 50, pp. 53–64, 1998.[24] G. Hahn, P. Hell, and S. Poljak, “On the ultimate independence ratio of a graph,”

European Journal of Combinatorics , vol. 16, no. 3, pp.253–261, 1995.[25] J. Körner and A. Orlitsky, “Zero-error information theory,”

IEEE Transactions on Information Theory , vol. 44, no. 6, pp. 2207–2229, 1998.[26] S. C. Polak and A. Schrijver, “New lower bound on the shannon capacity of C from circular graphs,” Information Processing Letters ,vol. 143, pp. 37–40, 2019.[27] T. Gallai, “Graphen mit triangulierbaren ungeraden vielecken,”

Magyar Tud. Akad. Mat. Kutató Int. Közl , vol. 7, pp. 3–36, 1962.[28] E. Sonnemann and O. Krafft, “Independence numbers of product graphs,”

Journal of Combinatorial Theory, Series B , vol. 17, no. 2, pp.133–142, 1974.[29] A. Vesel, “The independence number of the strong product of cycles,”

Computers & Mathematics with Applications , vol. 36, no. 7, pp.9–21, 1998.[30] T. Bohman and R. Holzman, “A nontrivial lower bound on the shannon capacities of the complements of odd cycles,”

IEEE Transactionson Information Theory , vol. 49, no. 3, pp. 721–722, 2003.[31] A. Vesel and J. Žerovnik, “Improved lower bound on the shannon capacity of C ,” Information Processing Letters , vol. 81, no. 5, pp.277–282, 2002.[32] B. Codenotti, I. Gerace, and G. Resta, “Some remarks on the shannon capacity of odd cycles,”

Ars Combinatoria , vol. 66, pp. 243–258,2003.[33] L. Baumert, R. McEliece, E. Rodemich, H. Rumsey, R. Stanley, and H. Taylor, “A combinatorial packing problem,”

Computers in Algebraand Number Theory , vol. 4, 1971.[34] S. H. Badalyan and S. E. Markosyan, “On the independence number of the strong product of cycle-powers,”

Discrete Mathematics , vol.313, no. 1, pp. 105–110, 2013.[35] T. Bohman, “A limit theorem for the shannon capacities of odd cycles. I,”

Proceedings of the American Mathematical Society , vol. 131,no. 11, pp. 3559–3569, 2003.[36] ——, “A limit theorem for the shannon capacities of odd cycles. II,”

Proceedings of the American Mathematical Society , vol. 133, no. 2,pp. 537–543, 2005.[37] K. A. Mathew and P. R. Östergård, “New lower bounds for the shannon capacity of odd cycles,”

Designs, Codes and Cryptography ,vol. 84, no. 1-2, pp. 13–22, 2017.[38] C. Berge, “Farbung von graphen, deren samtliche bzw. deren ungerade kreise starr sind,”

Wissenschaftliche Zeitschrift , 1961.[39] M. Chudnovsky, N. Robertson, P. D. Seymour, and R. Thomas, “Progress on perfect graphs,”

Mathematical Programming , vol. 97, no.1-2, pp. 405–422, 2003.[40] C. E. Shannon, “A mathematical theory of communication,”

Bell system technical journal , vol. 27, no. 3, pp. 379–423, 1948.[41] S. Ben-Jamaa, C. Weidmann, and M. Kieffer, “Analytical tools for optimizing the error correction performance of arithmetic codes,”

IEEETransactions on Communications , vol. 56, no. 9, pp. 1458–1468, 2008.[42] P. Flajolet and R. Sedgewick, “Analytic combinatorics: functional equations, rational and algebraic functions,” report available [on-line]at https://hal.inria.fr/inria-00072528 , 2001.[43] M. Asadi and N. Devroye, “On the zero-error capacity of channels with rate limited noiseless feedback,” in , Oct 2018, pp. 1141–1146.[44] F. Guo and Y. Watanabe, “On graphs in which the shannon capacity is unachievable by ﬁnite product,”

IEEE Transactions on InformationTheory , vol. 36, no. 3, pp. 622–623, May 1990.[45] Y. Chen and N. Devroye, “Zero-error relaying for primitive relay channels,”

IEEE Transactions on Information Theory , vol. 63, no. 12,pp. 7708–7715, Dec 2017.[46] Y. Chen, S. Shahi, and N. Devroye, “Colour-and-forward: Relaying “what the destination needs” in the zero-error primitive relay channel,”in , Sep. 2014, pp. 987–995.[47] Y. Chen and N. Devroye, “On the optimality of colour-and-forward relaying for a class of zero-error primitive relay channels,” in , June 2015, pp. 1272–1276.[48] H. Witsenhausen, “The zero-error side information problem and chromatic numbers (corresp.),”

IEEE Transactions on Information Theory ,vol. 22, no. 5, pp. 592–593, Sep. 1976.[49] M. Asadi, K. Palacio-Baus, and N. Devroye, “A relaying graph and special strong product for zero-error problems in primitive relaychannels,” in , June 2018, pp. 281–285.

January 13, 2020 DRAFT3 [50] N. Devroye, “When is the zero-error capacity positive in the relay, multiple-access, broadcast and interference channels?” in , Sep. 2016, pp. 672–678.[51] M. Asadi and N. Devroye, “On the zero-error capacity of channels with noisy feedback,” in , Oct 2017, pp. 642–649.[52] L. Zhao and H. H. Permuter, “Zero-error feedback capacity of channels with state information via dynamic programming,”

IEEETransactions on Information Theory , vol. 56, no. 6, pp. 2640–2650, June 2010.[53] L. Wang and O. Shayevitz, “Graph information ratio,” in , June 2017,pp. 913–917.[54] L. Wang and O. Shayevitz, “Graph information ratio,”

SIAM Journal on Discrete Mathematics , vol. 31, no. 4, pp. 2703–2734, 2017.[55] J. Körner and K. Marton, “Relative capacity and dimension of graphs,”

Discrete Mathematics , vol. 235, no. 1, pp. 307 – 315, 2001.[56] S. Hu and O. Shayevitz, “The ρ -capacity of a graph,” IEEE Transactions on Information Theory , vol. 63, no. 4, pp. 2241–2253, April2017.[57] O. Ordentlich and O. Shayevitz, “A vc-dimension-based outer bound on the zero-error capacity of the binary adder channel,” in , June 2015, pp. 2366–2370.[58] R. Urbanke and Quinn Li, “The zero-error capacity region of the 2-user synchronous bac is strictly smaller than its shannon capacityregion,” in , June 1998, pp. 61–.[59] M. Wiese, T. J. Oechtering, K. H. Johansson, P. Papadimitratos, H. Sandberg, and M. Skoglund, “Secure estimation and zero-error secrecycapacity,”

IEEE Transactions on Automatic Control , vol. 64, no. 3, pp. 1047–1062, March 2019.[60] M. Wiese, K. H. Johansson, T. J. Oechtering, P. Papadimitratos, H. Sandberg, and M. Skoglund, “Uncertain wiretap channels and secureestimation,” in , July 2016, pp. 2004–2008.[61] ——, “Secure estimation for unstable systems,” in , Dec 2016, pp. 5059–5064.[62] F. J. R. Ruiz and F. PÃl’rez-Cruz, “Zero-error codes for the noisy-typewriter channel,” in , Oct2011, pp. 495–497.[63] D. Cullina, M. Dalai, and Y. Polyanskiy, “Rate-distance tradeoff for codes above graph capacity,” in , July 2016, pp. 1331–1335.[64] M. Dalai, V. G. Carnegie, and J. Radhakrishnan, “An improved bound on the zero-error list-decoding capacity of the 4/3 channel,” in , June 2017, pp. 1658–1662.[65] P. Elias, “Zero error capacity under list decoding,”

IEEE Transactions on Information Theory , vol. 34, no. 5, pp. 1070–1074, Sep. 1988.[66] X. Xu and S. P. Radziszowski, “Bounds on shannon capacity and ramsey numbers from product of graphs,”

IEEE Transactions onInformation Theory , vol. 59, no. 8, pp. 4767–4770, Aug 2013.[67] J. H. van Lint and R. M. Wilson,

A Course in Combinatorics , 2nd ed. Cambridge University Press, 2001.[68] D. H. Greene and D. E. Knuth,

Mathematics for the Analysis of Algorithms . Springer Science & Business Media, 2007.[69] P. Linz,

An introduction to formal languages and automata . Jones & Bartlett Learning, 2006.[70] S. Lang,

Complex analysis . Springer Science & Business Media, 1999, vol. 103.[71] P. Lax,

Functional analysis , ser. Pure and applied mathematics. Wiley, 2002. A PPENDIX AP ROOF OF L EMMAS

A. Proof of Lemma 1 (Fekete)

Let (cid:15) > , L ′ be an integer such that u L ′ L ′ ≥ sup l u l l + (cid:15) . Let L ≥ L ′ and ( r, q ) be the remainder and quotient ofthe euclidean division of L by L ′ . Then we have L = L ′ q + r and u L L ≥ qu L ′ + rL ≥ qL ′ L ( sup l u l l + (cid:15) ) + rL → L → ∞ sup l u l l + (cid:15). (69)Thus the limit of ( u L L ) L ∈ N equals its supremum. January 13, 2020 DRAFT4

B. Proof of Lemma 2

For L = , the average number of transmitted symbols per channel use of the generated ﬁxed length code is √ ≃ . : C ∗ [ L ] = C ∗ [ ] = { , , , , , } = . (70)For L = , the average number of transmitted symbols per channel use of the generated ﬁxed length code is √ ≃ . : C ∗ [ L ] = C ∗ [ ] = { } ∪ { , , , , } ∪ { , , , , } = . (71)For L = , the average number of transmitted symbols per channel use of the generated ﬁxed length code is √ ≃ . : C ∗ [ L ] = C ∗ [ ] = { , , , , } ∪ { , , , , } ∪ { , , , , } ∪ { , , , , } ∪ { } = . (72)For L = , the average number of transmitted symbols per channel use of the generated ﬁxed length code is √ ≃ . : C ∗ [ L ] = C ∗ [ ] = { } + { } × [ ] + { } × [ ] × { } + { } × [ ] × { } + { } × [ ] + [ ] × { } + [ ] × { } × [ ] + [ ] × { } = + + + + + + + = . (73) C. Proof of Lemma 3

For L = , the average number of transmitted symbols per channel use of the generated ﬁxed length code is √ ≃ . : C ′∗ [ L ] = C ′∗ [ ] = C ′ = . (74)For L = , the average number of transmitted symbols per channel use of the generated ﬁxed length code is √ ≃ . : C ′∗ [ L ] = C ′∗ [ ] = { , , , , } = . (75) January 13, 2020 DRAFT5

For L = , the average number of transmitted symbols per channel use of the generated ﬁxed length code is √ ≃ . : C ′∗ [ L ] = C ′∗ [ ] = { , , , , } × { , } ∪ { , } × { , , , , } = + = . (76)A PPENDIX BP ROOF OF T HEOREM

III.4

FOR VARIABLE - LENGTH CODES

Let us index the generator set C = { κ , ..., κ C } . Then we deﬁne the directed graph G ≐ ( V , E ) with V = {( i, j ) ∣ κ i ∈ C , j ∈ (cid:74) , ∣ κ i ∣ (cid:75) } , (77) ( i, j )( i ′ , j ′ ) ∈ E if ( j = ∣ κ i ∣ and j ′ = ) OR ( i = i ′ and j ′ = j + ) . (78)We also deﬁne the set of ﬁnal nodes by F ≐ {( i, ∣ κ i ∣) ∣ i ≤ C } . ⋮ ⋮ ⋮⋮ ⋮⋮ ⋮ , , , ∣ κ ∣ C , C , C , ∣ κ C ∣ F Fig. 9: The transition graph G deﬁned by (77) and (78).We recall that in a directed graph G = ( V , E ) with adjacency matrix M G , the number of paths of length L froma vertex v to another vertex v ′ is given by ( M LG ) vv ′ . Thus the number of paths of length L from the vertex ( , ∣ κ ∣) to the set F in G is given by ∑ v ∈ F ( M LG ) ( , ∣ κ ∣) ,v = ⟨ M LG , I ( , ∣ κ ∣) , F ⟩ , (79)where I ( , ∣ κ ∣) , F ≐ ( v = ( , ∣ κ ∣) and v ′ ∈ F ) v,v ′ ∈ V .By construction, the number of distinct words of length L that can be achieved by concatenation of the elementsfrom the generator set C , is equal to the number of paths of length L from the vertex ( , ∣ κ ∣) to the set F in G .Indeed, each branch ( i, ) → ( i, ) → ... → ( i, ∣ κ i ∣) corresponds to the transmission of the word κ i , and each pathfrom ( , ∣ κ ∣) to the set F is a succession of such branches. Thus the number of channel input sequences satisﬁes C ∗ [ L ] = ∑ v ∈ F ( M LG ) ( , ∣ κ ∣) ,v = ⟨ M LG , I ( , ∣ κ ∣) , F ⟩ . (80) January 13, 2020 DRAFT6

Lemma 4 ν ( C ) = max i ∣ λ i ( M G )∣ , where ( λ i ( M G )) i ≤ V are the elements of the spectrum of M G .Proof . [Lemma 4] We denote by ∥ ⋅ ∥ , the euclidean norm for matrices. If gcd (∣ c ∣ , c ∈ C ) = we have L √ C ∗ [ L ] = L √⟨ M LG , I ( , ∣ κ ∣) , F ⟩ (81) = L √∥ M LG ∥ L √√√√√√⎷⟨ M LG ∥ M LG ∥ , I ( , ∣ κ ∣) , F ⟩ . (82)We use Gelfand formula L √∥ M LG ∥ → L → ∞ max i ∣ λ i ( M G )∣ , (83)where ( λ i ( M G )) i ≤ V are the elements of the spectrum of M G , see [71, Theorem 4, pp. 195].Now let us show that there exists two positive constants m and m such that for all L large enough, m ≤ ⟨ M LG ∥ M LG ∥ , I ( , ∣ κ ∣) , F ⟩ ≤ m . The existence of m is given by Cauchy-Schwarz inequality : ⟨ M LG ∥ M LG ∥ , I ( , ∣ κ ∣) , F ⟩ ≤ ∥ I ( , ∣ κ ∣) , F ∥ = m. (84)Since we assumed the code has a positive rate, ⟨ M LG ∥ M LG ∥ , I ( , ∣ κ ∣) , F ⟩ cannot converge to 0, as L √∥ M LG ∥ isasymptotically bounded. Thus the value m exists. Therefore L √⟨ M LG ∥ M LG ∥ , I ( , ∣ κ ∣) , F ⟩ converges to 1 when L goes toinﬁnity. Now we have L √ C ∗ [ L ] → L → ∞ max i ∣ λ i ( M G )∣ . (85)This proof can be straigtforwardly adapted to the case where gcd (∣ c ∣ , c ∈ C ) = d ≠ by taking dL instead of L . Lemma 5

There exists a unique positive number δ such that ∑ ll = l C [ l ] δ − l = .Proof . [Lemma 5] The following function f ∶ R +∗ → R +∗ , (86) x ↦ l ∑ l = l C [ l ] x − l , (87)is strictly decreasing on R +∗ , tends to +∞ on + and goes to + on +∞ : there exists a unique δ such that ∑ ll = l C [ l ] δ − l = . Note that δ ≥ as ( C [ l ] ) l ≤ l are integers with at least one nonzero term, i.e. l ≠ . Lemma 6

The value max i ∣ λ i ( M G )∣ is the unique positive solution of X l = ∑ ll = l C [ l ] X l − l .Proof . [Lemma 6] Let ν be an eigenvector for an eigenvalue λ ≠ of M G . They must satisfy : M G ν = λν . Thusfor all i ≤ C , j < ∣ κ i ∣ , λν ( i,j ) = ( M G ν ) ( i,j ) = ∑ k ∈ V ( M G ) ( i,j ) ,k ν k (88) = ∑ k ∈ V k = ( i,j + ) ν k = ν ( i,j + ) , (89) January 13, 2020 DRAFT7 and λν ( i, ∣ κ i ∣) = ( M G ν ) ( i, ∣ κ i ∣) = ∑ k ∈ V ( M G ) ( i, ∣ κ i ∣) ,k ν k (90) = ∑ k ∈ V k = ( ⋅ , ) ν k = ∑ i ′ ≤ C ν ( i ′ , ) . (91)This gives us the equation ∑ i ≤ C ν ( i, ) = ∑ i ≤ C λ − ∣ κ i ∣ + ν ( i, ∣ κ i ∣) = ∑ i ≤ C λ − ∣ κ i ∣ ∑ i ′ ≤ C ν ( i ′ , ) . (92)Since ∑ i ≤ C ν ( i, ) = would imply ν = , λ must satisfy the polynomial equation = ∑ i ≤ C λ − ∣ κ i ∣ = ∑ l ≤ l C [ l ] λ − l . (93)Thus for all eigenvalue λ ≠ we have ≤ ∑ l ≤ l C [ l ] ∣ λ ∣ − l . (94)There exists a unique positive real solution δ of = ∑ l ≤ l C [ l ] X − l , which is given by Lemma 5 ; if δ is an eigenvalueof M G then it has the maximum modulus, since the equality is reached with the above modulus inequality.Now we deﬁne ν max ≐ ( δ j − ∣ κ i ∣ ) ( i,j ) ∈ V , let us show that ν max is an eigenvector for the eigenvalue δ . Let i ≤ C , j < ∣ κ i ∣ , then ( M G ν max ) ( i,j ) = [ j < ∣ κ i ∣] ( ν max ) ( i,j + ) = δ j + − ∣ κ i ∣ = δ ( ν max ) ( i,j ) . (95)and ( M G ν max ) ( i, ∣ κ i ∣) = ∑ i ′ ≤ C ( ν max ) ( i ′ , ) = δ ∑ i ′ ≤ C δ − ∣ κ i ′ ∣ (96) = δ l ∑ i = l C [ i ] δ − i = [ Lemma 2 ] δ = δ ⋅ δ ∣ κ i ∣ − ∣ κ i ∣ = δ ( ν max ) ( i, ∣ κ i ∣) . (97)The condition M G ν max = δν max is veriﬁed on each vertex, i.e. ν max is an eigenvector for the eigenvalue δ . Thus δ = max i ∣ λ i ( M G )∣ and it is the unique positive solution of X l = ∑ ll = l C [ l ] X l − l .The desired result follows directly from lemmas 4 and 6.A PPENDIX CP ROOF OF T HEOREM

IV.5

FOR INTERMINGLED CODES

The following proof holds when Fekete’s lemma can be applied directly to the sequence ( log ( S L )) L ∈ N , thatis when S L ≠ for all L large enough, but the proof can be straightforwardly adapted to the other cases bydeﬁning the rate as in (30).Let ( C , ρ ) be an intermingled code over the channel W and G = ( V , E ) the transition graph of the transmissionstates. Let M G be the adjacency matrix of G . Similarly to the previous section, we recall that for all L , the number January 13, 2020 DRAFT8 of paths from v to v ′ of length L is given by ( M LG ) vv ′ . Now by construction, S L is equal to the number of pathsstarting from ( , ..., ) of length L and ﬁnishing at ( , ..., ) , that is S L = ( M LG ) ( ,..., )( ,..., ) = ⟨ M LG , I , ⟩ , (98)with I , = ( v = ( ,..., ) and v ′ = ( ,..., ) ) v,v ′ ∈ V .Now we can proceed similarly as in the proof of Lemma 4 in App. B. r ( C , ρ ) = lim L → ∞ L log S L (99) = lim L → ∞ L log ⟨ M LG , I , ⟩ (100) = lim L → ∞ log ( L √∥ M LG ∥) + L log (⟨ M LG ∥ M LG ∥ , I , ⟩) . (101)The quantity ⟨ M LG ∥ M LG ∥ , I , ⟩ is positive, bounded as L goes to inﬁnity by Cauchy-Schwartz inequality, and doesnot converge to zero because of the positive rate. Thus L log (⟨ M LG ∥ M LG ∥ , I , ⟩) → when L goes to inﬁnity and byusing Gelfand formula, log ( L √∥ M LG ∥) converges to log max i ∣ λ i ( M G )∣ where ( λ i ) i ≤ V are the elements of thespectrum of M G . A PPENDIX DP ROOF OF P ROPOSITION

VI.9For all regular expressions E and E ′ such that L ( E ) ∩ L ( E ′ ) = ∅ , we have F E + E ′ ( z ) = ∑ l ∈ N ( L ( E ) ∪ L ( E ′ )) [ l ] z l = [ Hyp. ] ∑ l ∈ N ( L ( E ) [ l ] + L ( E ′ ) [ l ] ) z l = F E ( z ) + F E ′ ( z ) . (102)For all regular expressions E and E ′ we have F EE ′ ( z ) = ∑ l ∈ N ( L ( E ) ⋅ L ( E ′ )) [ l ] z l = ∑ l ′ ∈ N ∑ l ′′ ∈ N L ( E ) [ l ′ ] L ( E ′ ) [ l ′′ ] z l ′ + l ′′ = F E ( z ) F E ′ ( z ) . (103)For all regular expression E , let us note E l ≐ E ⋅ ... ⋅ E l times. Assume that the sets ( L ( E l )) l ∈ N are disjoint,then we have F E ∗ ( z ) = ∑ l ∈ N L ( E ∗ ) [ l ] z l = [ Hyp. ] ∑ l ∈ N ( ∑ l ′ ∈ N L ( E l ′ ) [ l ] ) z l (104) = ∑ l ′ ∈ N ( ∑ l ∈ N L ( E l ′ ) [ l ] z l ) = ∑ l ′ ∈ N F E l ′ ( z ) = ∑ l ′ ∈ N F E ( z ) l ′ = − F E ( z ) . (105)A PPENDIX EP ROOF OF T HEOREM

VI.10The following proof holds when Fekete’s lemma can be applied directly to ( log ( L ( E ) [ L ] )) L ∈ N , that is when L ( E ) [ L ] ≠ for all L large enough, but the proof can be straightforwardly adapted to the other cases by redeﬁningthe rate as in (56). January 13, 2020 DRAFT9

Let E be a rational code, A = ( X , S , τ, s start , S accept ) a DFA which recognizes L ( E ) and E ′ be the regularexpression such that E = ( E ′ ) ∗ . Let G A be the transition graph of the DFA A and M A its adjacency matrix.Similarly to the proofs in App. B and C, for all L ∈ N we have L ( E ) [ L ] = ∑ s ∈ S accept ( M LA ) s start s , (106) ν ( E ) = lim L → ∞ L √ L ( E ) [ L ] (107) = lim L → ∞ L √∥ M LA ∥ L √√√√√√⎷⟨ M LA ∥ M LA ∥ , I s start , S accept ⟩ , (108)where I s start , S accept = ( s = s start ,s ′ ∈ S accept ) s,s ′ ∈ S . By Gelfand’s formula L √∥ M LA ∥ → L → ∞ max s ∣ λ s ( M A )∣ , where ( λ s ( M A )) s ∈ S is the spectrum of M A . On the other hand, the quantity ⟨ M LA ∥ M LA ∥ , I s start , S accept ⟩ is positive, boundedas L goes to inﬁnity by Cauchy-Schwartz inequality, and does not converge to zero because of the positive rate.Thus L √⟨ M LA ∥ M LA ∥ , I s start , S accept ⟩ → when L goes to inﬁnity.goes to inﬁnity.