[PDF] Fast and Simple Modular Subset Sum

Abstract

We revisit the Subset Sum problem over the finite cyclic group Z m for some given integer m . A series of recent works has provided near-optimal algorithms for this problem under the Strong Exponential Time Hypothesis. Koiliaris and Xu (SODA'17, TALG'19) gave a deterministic algorithm running in time O ~ ( m 5/4 ) , which was later improved to O(m log 7 m) randomized time by Axiotis et al. (SODA'19). In this work, we present two simple algorithms for the Modular Subset Sum problem running in near-linear time in m , both efficiently implementing Bellman's iteration over Z m . The first one is a randomized algorithm running in time O(m log 2 m) , that is based solely on rolling hash and an elementary data-structure for prefix sums; to illustrate its simplicity we provide a short and efficient implementation of the algorithm in Python. Our second solution is a deterministic algorithm running in time O(m polylog m) , that uses dynamic data structures for string manipulation. We further show that the techniques developed in this work can also lead to simple algorithms for the All Pairs Non-Decreasing Paths Problem (APNP) on undirected graphs, matching the near-optimal running time of O ~ ( n 2 ) provided in the recent work of Duan et al. (ICALP'19).

Full PDF

aa r X i v : . [ c s . D S ] O c t Fast and Simple Modular Subset Sum

Kyriakos Axiotis

Massachusetts Institute of Technology

Arturs Backurs

Toyota Technological Institute at Chicago

Karl Bringmann

Saarland University and Max-Planck Institute for Informatics, Saarland Informatics Campus

Ce Jin

Massachusetts Institute of Technology

Vasileios Nakos

Saarland University and Max-Planck Institute for Informatics, Saarland Informatics Campus

Christos Tzamos

University of Wisconsin-Madison

Hongxun Wu

Institute for Interdisciplinary Information Sciences, Tsinghua University, China

Abstract

We revisit the Subset Sum problem over the ﬁnite cyclic group Z m for some given integer m . Aseries of recent works has provided near-optimal algorithms for this problem under the Strong Ex-ponential Time Hypothesis. Koiliaris and Xu (SODA’17, TALG’19) gave a deterministic algorithmrunning in time e O ( m / ), which was later improved to O ( m log m ) randomized time by Axiotis etal. (SODA’19).In this work, we present two simple algorithms for the Modular Subset Sum problem runningin near-linear time in m , both eﬃciently implementing Bellman’s iteration over Z m . The ﬁrst oneis a randomized algorithm running in time O ( m log m ), that is based solely on rolling hash and anelementary data-structure for preﬁx sums; to illustrate its simplicity we provide a short and eﬃcientimplementation of the algorithm in Python. Our second solution is a deterministic algorithmrunning in time O ( m polylog m ), that uses dynamic data structures for string manipulation.We further show that the techniques developed in this work can also lead to simple algorithmsfor the All Pairs Non-Decreasing Paths Problem (APNP) on undirected graphs, matching the near-optimal running time of e O ( n ) provided in the recent work of Duan et al. (ICALP’19). Theory of computation → Algorithm design techniques

Keywords and phrases

Modular Subset Sum, rolling hash, dynamic strings

Funding

Karl Bringmann and Vasileios Nakos:

This work is part of the project TIPEA that hasreceived funding from the European Research Council (ERC) under the European Unions Horizon2020 research and innovation programme (grant agreement No. 850979).

Arturs Backurs:

Supportedby an NSF Grant CCF-2006806.

In the

Subset Sum problem, one is given a multiset X = { x , x , . . . , x n } of integers alongwith an integer target t , and is asked to decide if there exists a subset of X that sums tothe target t . In the Modular Subset Sum generalization of the problem, all sums are takenover the ﬁnite cyclic group Z m for some given integer m .Subset Sum is a fundamental problem in Computer Science known to be NP-complete butonly weakly as it admits pseudo-polynomial time algorithms. In particular, the DynamicProgramming algorithm of Bellman [7] solves the problem in O ( nt ) time. It works by Fast and Simple Modular Subset Sum iteratively computing all attainable subset sums when using only the ﬁrst i integers. Morespeciﬁcally, it starts with S = { } and computes S i as S i − ∪ ( S i − + x i ), where S i − + x i = { s + x i | s ∈ S i − } .The above algorithm can be straightforwardly applied to give an O ( nm ) time algorithmfor the modular case. Recent work by Koiliaris and Xu [24] obtained an improved determin-istic algorithm running in e O ( m / ) that relies on structural results from number theory [20].A follow up work by Axiotis et al. [4] presented a randomized algorithm that improves therunning time to O ( m log m ) using ideas based on linear sketching. The obtained runningtime matches (up to subpolynomial factors) the conditional lower bound of Abboud etal. [1] based on the Strong Exponential Time Hypothesis which implies that no O ( m − ε )algorithms exist for any constant ε > randomized and runs in time O ( m log m ). More precisely, the algorithm produces the whole set X ∗ of attainable subset sums of the multiset X in time O ( | X ∗ | log m ). The idea behind ouralgorithm is a fast implementation of Bellman’s iteration and requires only two elementarytechniques, rolling hashing and a data structure for maintaining preﬁx sums. These tech-niques are already taught in undergraduate level algorithms classes. We believe that oursimple algorithm can serve as an example application when these techniques are introduced.Our second algorithm (see Section 4) is deterministic and solves Modular Subset Sumin time e O ( m ) = O ( m polylog m ). More precisely, the algorithm produces the set X ∗ ofattainable subset sums in time e O ( | X ∗ | ) = O ( | X ∗ | polylog | X ∗ | ). This algorithm is basedon a classic data structure for string manipulation, and apart from this data structure thealgorithm is simple. The idea of solving Modular Subset Sum via dynamic string datastructures has already been suggested in [4], however, the algorithm proposed in [4] runs intime O ( | X ∗ | polylog m ), which we improve to O ( | X ∗ | polylog | X ∗ | ). Techniques for the First Algorithm

We ﬁrst explain the technical innovation behind our randomized O ( m log m ) algorithm(Theorem 3 in Section 3). At the core of our argument is a new method for computing thesymmetric diﬀerence S △ S between two sets S , S ⊆ [ m ] in output-sensitive time uponspeciﬁc updates on those two sets. The idea is to use hashing to compare the indicatorvectors of the two sets. If the two hashes are the same, then the two sets are the samew.h.p. If not, we compute the symmetric diﬀerence of the sets S and S by recursing onthe ﬁrst and the second half of the universe, { , . . . , ⌈ m/ ⌉} and {⌈ m/ ⌉ + 1 , . . . , m } . Intotal, at most log m + 1 hashes need to be computed per element of S △ S . Each hash thatneeds to be computed corresponds to a contiguous interval of the indicator vectors. It canbe evaluated in O (log m ) time given access to a data structure that maintains preﬁx sumsof a polynomial rolling hash function for the indicator vectors of each of the sets. All our running time bounds assume that the usual arithmetic operations on log( m )-bit numbers canbe performed in constant time. After an O ( n + m )-time preprocessing we can assume that n = O ( m ), see Section 2. After thispreprocessing, we can express the running time in terms of m only. We ignore the preprocessing timein most running time bounds stated in this paper; this only hides an additive O ( n ). After an O ( n log n )-time preprocessing we can assume that n = O ( | X ∗ | ), see Section 2. We ignore thispreprocessing in our output-sensitive running time bounds; this only hides an additive O ( n log n ). . Axiotis, A. Backurs, K. Bringmann, C. Jin, V. Nakos, C. Tzamos and H. Wu 3 We show that this idea can be applied to other problems beyond Modular Subset Sum. Inparticular, we consider the problem of all-pairs non-decreasing paths (APNP) in undirectedgraphs, where we obtain near-optimal running time O ( n log n ) improving the state of theart for this problem, see Appendix A.These two algorithms for Modular Subset Sum and APNP are simple to describe andto analyze. To illustrate their simplicity, we provide short but detailed implementations inPython for both algorithms in the appendix (see Appendix B and C). Techniques for the Second Algorithm

Now let us describe our deterministic O ( m polylog m ) algorithm (Theorem 4 in Section 4).The core of this algorithm is again a fast method for computing the symmetric diﬀerence S △ S for sets S , S ⊆ [ m ]. Consider the indicator vectors of S and S and interpretthem as length- m strings z , z over alphabet { , } . Then the symmetric diﬀerence S △ S corresponds to all positions at which the strings z , z diﬀer. We thus obtain the ﬁrstelement of the symmetric diﬀerence by computing the longest common preﬁx of z and z . Generalizing this idea, we can enumerate the symmetric diﬀerence using one longestcommon preﬁx query per output element. We implement such queries by using a classicdata structure for dynamically maintaining a family of strings under concatenations, splits,and equality tests due to Mehlhorn et al. [26].Implementing this idea naively leads to a running time of O ( | X ∗ | polylog m ). By workingon the run-length encoding of the strings z , z , we further improve the running time to O ( | X ∗ | polylog | X ∗ | ). Further Related Work

In addition to Modular Subset Sum, there has recently been a lot of interest in obtainingfaster algorithms for other related problems, like non-modular Subset Sum [8, 24, 23] andKnapsack [28, 9, 22, 6, 5, 21, 16], and providing conditional lower bounds [1, 11, 25].

Let X be a multiset of integers in Z m . Recall that we denote by X ∗ the set of all attainablesubset sums of X modulo m . In this section we present a preprocessing that ensures n = O ( | X ∗ | ), and thus also n = O ( m ), see Lemma 2. This is inspired by a similar preprocessingby Koiliaris and Xu [24, Lemma 2.4].For any x ∈ Z m we write µ X ( x ) for the multiplicity of x in X , that is, how often x appears in the multiset X . Note that the cardinality | X | is equal to the total multiplicity P m − x =0 µ X ( x ). ◮ Lemma 1 (Cf. Lemma 2.3 in [24]) . Let x ∈ X with µ X ( x ) ≥ . Consider the multiset Y resulting from removing two copies of x from X and adding the number x mod m to it.Then X ∗ = Y ∗ and | Y | = | X | − . Proof.

Clearly, any subset sum of Y modulo m is also a subset sum of X modulo m . Inthe other direction, for any subset of X containing at least two copies of x we can replacetwo of these copies by one copy of 2 x mod m , thereby transforming it into a subset of Y with the same sum modulo m . This proves X ∗ = Y ∗ . The cardinality | Y | = | X | − ◭ Fast and Simple Modular Subset Sum ◮ Lemma 2.

Given a multiset X of n integers in Z m , in time O (min { n log n, n + m } ) wecan compute a multiset Y over Z m such that Y ∗ = X ∗ and | Y | ≤ min { n, | X ∗ |} . Note that in particular | Y | ≤ min { n, m } . Algorithm 1

Single Step of the Preprocessing function Preprocessing-Check ( x ) if µ X ( x ) ≥ then µ X ( x ) − = 2 µ X (2 x mod m ) += 1 Preprocessing-Check ( x ) Preprocessing-Check (2 x mod m ) Proof.

We exhaustively apply Lemma 1 by calling

Preprocessing-Check ( x ) on each x ∈ X . After these calls have ended, Lemma 1 is no longer applicable, and thus theresulting multiset Y satisﬁes µ Y ( x ) ≤ x ∈ Z m . By Lemma 1 we have Y ∗ = X ∗ andthus the support of Y is a subset of X ∗ , which implies | Y | ≤ | X ∗ | . The inequality | Y | ≤ n is trivial, since the cardinality only decreases throughout this algorithm.Since each successful check decreases the cardinality, there are O ( n ) successful checks.Since each successful check calls two additional checks, there are O ( n ) unsuccessful checks.It follows that the procedure Preprocessing-Check is called O ( n ) times in total.It remains to argue in which running time one call of Preprocessing-Check can beimplemented. An easy solution is to store an array M of length m such that M [ x ] = µ X ( x ). Initializing M takes time m . One call of Preprocessing-Check can then easilybe implemented in time O (1). This yields total time O ( n + m ).In order to avoid time (or space) O ( m ), we can alternatively store all distinct elementsof X in a balanced binary search tree T , and store the number µ X ( x ) at the node corres-ponding to x in T . Building this tree initially takes time O ( n log n ), for sorting the set X .One call of Preprocessing-Check can then be implemented in time O (log n ), resulting ina total running time of O ( n log n ). ◭ We hence assume n ≤ | X ∗ | ≤ m in the remainder of this paper. To describe our implementation, we consider Bellman’s iteration S i = S i − ∪ ( S i − + x i ).Our goal is to compute X ∗ = S n , that is, the set of all attainable modular subset sums.Note that, if we could eﬃciently compute the new sums C i , ( S i − + x i ) \ S i − , we would beable to implement the Bellman interation as S i = S i − ∪ C i . We will shortly show that C i can be computed in output-sensitive time O (( | C i | + 1) log m ). This implies that the totaltime to evaluate S n = C ∪ . . . ∪ C n is O (( | C | + 1) log m ) + . . . + O (( | C n | + 1) log m ) ≤O (( | X ∗ | + n ) log m ) ≤ O ( | X ∗ | log m ) ≤ O ( m log m ) since the sets C i are disjoint andtheir union is of size | X ∗ | ≤ m .We now argue how to compute these new subset sums, ( S i − + x i ) \ S i − , eﬃciently.Instead of considering the set diﬀerence between the sets S i − + x i and S i − , we will consider Here and in the remainder of this paper, we write S + x := { s + x mod m | s ∈ S } , for a givenmodulus m . . Axiotis, A. Backurs, K. Bringmann, C. Jin, V. Nakos, C. Tzamos and H. Wu 5 their symmetric diﬀerence ( S i − + x i ) △ S i − = C i ∪ D i , where D i = S i − \ ( S i − + x i ). Animportant observation made in [4] is that since the sets S i − and S i − + x i have the samesize, the symmetric diﬀerence will have size exactly 2 | C i | as | C i | = | D i | . Thus, recoveringthis larger set C i ∪ D i in output sensitive time is asymptotically the same as recovering onlythe elements of C i . For notational convenience, we call elements of D i “ghost sums” in thesense that they are not new subset sums.We now provide a recursive function (Algorithm 2) that given a set of integers S andintegers a, b, x ∈ { , . . . , m } computes (( S + x ) \ S ) ∩ [ a, b ). Calling the function with S = S i − , a = 0, b = m and x = x i , we can recover C i . We will show that the function outputs C i intime O (( | C i | + 1) log m ), which is what we need. Algorithm 2

Find new subset sums in range [ a, b ) function Find-New-Sums ( a, b, x, S ) if ( S + x ) ∩ [ a, b ) = S ∩ [ a, b ) then return ∅ if b = a + 1 then if a ∈ ( S + x ) \ S then return { a } ⊲ a is a new subset sum else return ∅ ⊲ a ∈ S \ ( S + x ) is a ghost sum else return Find-New-Sums ( a, ⌊ a + b ⌋ , x, S ) ∪ Find-New-Sums ( ⌊ a + b ⌋ , b, x, S )We implement the function eﬃciently by maintaining a data structure for the charac-teristic vector of the set S that allows eﬃcient membership queries, updates and equalitychecks between diﬀerent parts of the vector as required in line 2.We interpret the set S as a characteristic vector and write S i = 1 if i ∈ S and S i = 0 if i S . We also extend this notation to i < i ≥ m by setting S i = S i mod m . To check that( S + x ) ∩ [ a, b ) = S ∩ [ a, b ), we need to check that the binary sequences ( S + x ) a , . . . , ( S + x ) b − and S a , . . . , S b − are equal. To check the equality of the two sequences we will use polynomialidentity testing. In particular, let r be a uniformly random integer from { , . . . , p − } fora large enough prime p (which we will choose later). Then, with high probability, it issuﬃcient to check that P b − i = a ( S + x ) i r i = P b − i = a S i r i (mod p ) to conclude the equality ofthe sequences. The latter condition is equivalent to r x P b − x − i = a − x S i r i ≡ P b − i = a S i r i (mod p ),which we can rearrange to P m + b − x − i = m + a − x S i r i ≡ r m − x P b − i = a S i r i (mod p ). This is the same as f ( m + b − x ) − f ( m + a − x ) ≡ r m − x ( f ( b ) − f ( a )) (mod p ), where f ( t ) , P t − i =0 S i r i mod p for all t = 0 , . . . , m . Correctness

To argue the correctness, we observe that for any two binary sequences x, y ∈ { , } t , prime p and a random integer r ∈ { , . . . , p − } we have Pr[ P i x i r i = P i y i r i (mod p )] ≤ t/p if x = y and Pr[ P i x i r i = P i y i r i (mod p )] = 1 if x = y . This is also known as the Rabin-Karprolling hash function. Choosing p = Θ( m log( m ) /δ ), suﬃces to have the algorithm fail withprobability at most δ . This is because a single randomized comparison fails with probabilityat most δm log m and by a union bound the probability that any of the m log m comparisonsperformed in the algorithm fails is at most δ . Assuming basic arithmetic operations between O (log m )-bit numbers take constant time, we can choose δ = 1 / poly( m ) to obtain a highprobability of success. Fast and Simple Modular Subset SumRunning Time

We will show that the preﬁx sums f ( t ) = P t − i =0 S i r i (mod p ) can be evaluated in time O (log n ), which will lead to the required running time for computing C i as we will seelater. Additionally, we need that the data structure can update the characteristic vector ofthe set S in O (log n ) time (to be able to implement the Bellman iteration S i = S i − ∪ C i eﬃciently). These requirements can be abstracted as follows. We have a sequence of integers T , . . . , T m − and in each step we either want to compute the preﬁx sum g ( t ) , P t − i =0 T i for some integer t ∈ { , . . . , m } or we want to update an arbitrary integer T i for some i ∈ { , . . . , m } . Our goal is to implement the queries and updates in O (log m ) time. Thisindeed can be done by a simple binary tree. Such a data structure implies that we cancheck the condition on line 2 in O (log m ) time. To bound the ﬁnal running time, consider aparticular position where S i − and S i − + x i diﬀer. This position can cause the condition( S + x ) ∩ [ a, b ) = S ∩ [ a, b ) to fail (and the algorithm to proceed to line 3) at most O (log m )times. Each time we spend O (log m ) time to check the condition and the total number ofpositions where S i − and S i − + x i diﬀer is 2 | C i | . In total, this implies that the functionoutputs C i in time O (log m ) · O (log m ) · | C i | = O ( | C i | log m ), assuming that | C i | >

0. If C i = ∅ the running time is O (log m ). Finally, we observe that we can perform the Bellmaniteration S i = S i − ∪ C i in O ( | C i | log m ) time.Combining the above, we arrive at our ﬁrst result. ◮ Theorem 3.

Modular Subset Sum can be solved in O ( m log m ) time with high probability. A sample implementation in Python is given in Appendix B. It uses a simple and eﬃcientimplementation of binary trees for maintaining preﬁx sums [17].

This section is devoted to the second algorithm for Modular Subset Sum. In particular, weprove the following theorem. Recall that we can assume n = O ( | X ∗ | ) (after an O ( n log n )-time preprocessing). ◮ Theorem 4.

Modular Subset Sum can be solved by a deterministic algorithm in time O ( | X ∗ | polylog | X ∗ | ) , where X ∗ denotes the set of attainable subset sums of X modulo m . We ﬁrst set up the necessary notation on strings. A string z of length | z | is a sequence ofletters from alphabet Σ referred to as z [0] , . . . , z [ | z | − z [ i..j ] we denote the substringfrom letter z [ i ] up to letter z [ j ]. We write z [ ..j ] as shorthand for z [0 ..j ] and similarly z [ i.. ]for z [ i.. | z | − We start by reviewing a classic tool in string algorithms. This is a data structure foreﬃciently maintaining a family F of strings over alphabet Σ under the following updateoperations. AddString ( c ): Given a letter c ∈ Σ, this operation adds the 1-letter string c to F . Concatenate ( s, s ′ ): Given strings s, s ′ ∈ F , concatenate them and add the resultingstring to F . The two strings s, s ′ remain in F . The bounds are known to be tight up to a log log m factor in the cell-probe model [27]. In particular,if the query (update) time is log O (1) m , then the update (query) time is Ω(log( m ) / log log m ). . Axiotis, A. Backurs, K. Bringmann, C. Jin, V. Nakos, C. Tzamos and H. Wu 7 Split ( s, i ): Given a string s ∈ F and a number i , split s into two strings s [ ..i −

1] and s [ i.. ] and add these strings to F . The string s remains in F . Equal ( s, s ′ ): Given strings s, s ′ ∈ F , return true if s = s ′ .Note that no string is ever removed from F . Mehlhorn et al. [26] were the ﬁrst to designa data structure supporting these operations in polylogarithmic time. Their time boundshave been further improved [2, 3, 19], but since we will ignore logarithmic factors we shallnot make use of those improvements. ◮ Theorem 5 ([26]) . There is a deterministic data structure for maintaining a family ofstrings under the operations

AddString , Concatenate , Split , and

Equal such that anysequence of k operations resulting in total size N = P s ∈F | s | runs in time O ( k polylog( kN )) . The data structure even works for very large alphabet Σ, as long as Σ is ordered and wecan compare any two letters in time O (1).We observe that as an application of the above we obtain the following data structure. ◮ Lemma 6.

There is a deterministic data structure that maintains a length- m string z overalphabet { , } , initialized as z = 0 m , under the following operations, where any sequence of k operations runs in time O ( k polylog( km )) : Add ( i ) : Given ≤ i < m , set z [ i ] := 1 , LCP ( i, j ) : Return the length of the longest common preﬁx of z [ i.. ] and z [ j.. ] . Proof.

For the initialization of z = 0 m , we ﬁrst run AddString (0) and then, using O (log m )concatenations, we generate the strings 0 i and we combine them according to the binaryrepresentation of m to obtain the string 0 m .For Add ( i ), we split z at i and at i + 1 to obtain the strings z [ ..i −

1] and z [ i + 1 .. ].We then run AddString (1), and ﬁnally we concatenate twice to obtain the resulting string z ′ = Concatenate ( Concatenate ( z [ ..i − , , z [ i + 1 .. ]).For a longest common preﬁx query LCP ( i, j ), we ﬁrst split z at i and at j to obtain thestrings y := z [ i.. ] and y := z [ j.. ]. Then we perform a binary search for the largest ℓ suchthat y [ ..ℓ ] = y [ ..ℓ ]. Each step of the binary search uses two splits, to construct the strings y [ ..ℓ ] and y [ ..ℓ ], and one equality test.Hence, we can simulate k operations among Add and

LCP using O ( k log m ) operationsamong Equal , AddString , Concatenate , and

Split . The total string length of the con-structed family F is N = O ( mk log m ), and thus the total time is O ( k log m polylog( kN )) = O ( k polylog( km )). ◭ Recall that given X = { x , . . . , x n } ⊆ Z m our aim is to compute the set X ∗ ⊆ Z m consistingof all subset sums of X . As in our ﬁrst algorithm for Modular Subset Sum, we followBellman’s approach by initializing S := { } and iteratively computing S i := ( S i − + x i ) ∪ S i − for i = 1 , . . . , n . As we have seen in the ﬁrst algorithm, it suﬃces to compute thesymmetric diﬀerence E i := ( S i − + x i ) △ S i − in time O (( | E i | + 1) polylog m ); then over alliterations we compute S n = X ∗ in time O ( | X ∗ | polylog m ).It remains to show how to compute the symmetric diﬀerence E i in each iteration. Tothis end, let z be the indicator vector of S i copied twice, that is, z is a string of length 2 m over alphabet { , } where z [ j ] indicates whether j mod m is in S i , for any 0 ≤ j < m .We maintain the data structure from Lemma 6 for the string z . Since this data structure Fast and Simple Modular Subset Sum initializes z as 0 m , we call Add (0) and

Add ( m ) to initialize z correctly according to S = { } .At the beginning of the i -th iteration, note that z [ m − x i .. m − x i ] is the indicator vectorof S i − + x i . The query LCP (0 , m − x i ) yields a number d ′ such that d := d ′ + 1 is minimalwith z [ d ] = z [ m − x i + d ]. In other words, d is the smallest element of the symmetricdiﬀerence E i of S i − and S i − + x i (unless d ≥ m , in which case we have that E i is theempty set). We ﬁnd the next element of E i by calling LCP ( d + 1 , m − x i + d + 1). Repeatingthis argument, we compute the set E i using O ( | E i | + 1) LCP operations. This ﬁnishes thedescription of how to compute the symmetric diﬀerence E i . We maintain the string z forthe next iteration by setting z [ d ] = z [ d + m ] = 1 for each d ∈ E i with d S i − . This uses O ( | E i | ) Add operations.In total, we run O ( | X ∗ | + n ) = O ( | X ∗ | ) LCP and

Add operations on a string of length 2 m .This takes total time O ( | X ∗ | polylog ( | X ∗ | m )) = O ( | X ∗ | polylog m ) according to Lemma 6.In particular, this running time is bounded by e O ( m ). We further improve this running timein Section 4.4 below. For pseudocode see Algorithm 3. Algorithm 3

Algorithm for Modular Subset Sum using dynamic strings. function ModularSubsetSumViaDynamicStrings ( X, m ) S := { } Initialize z = 0 m (as in Lemma 6) z. Add (0) z. Add ( m ) for i = 1 , . . . , n do E i := ∅ d := 1 + z. LCP (0 , m − x i ) while d < m do E i := E i ∪ { d } d := d + 1 + z. LCP ( d + 1 , m − x i + d + 1) for each d ∈ E i do if d S then S := S ∪ { d } z. Add ( d ) z. Add ( d + m ) return S In order to reconstruct a subset Y ⊆ X summing to a given target t , we augment the abovealgorithm as follows. We store the set S i in a balanced binary search tree T i . For eachnumber d ∈ S i \ S i − , in the node corresponding to d in T i we store a pointer to the nodecorresponding to d − x i . At the end of the algorithm T n stores S n = X ∗ , the set of allsubset sums of X . Note that computing T n augmented by these pointers takes total time O ( | X ∗ | log | X ∗ | ) and thus does not increase the asymptotic running time of the algorithm.With this bookkeeping, given any target integer t ∈ Z m we ﬁrst search for t in T n tocheck whether t ∈ X ∗ . If t ∈ X ∗ , then starting from the node corresponding to t in T n ,we follow the stored pointers to reconstruct a subset Y ⊆ X summing to t modulo m . Thetotal running time of this solution reconstruction is O ( | Y | + log | X ∗ | ). . Axiotis, A. Backurs, K. Bringmann, C. Jin, V. Nakos, C. Tzamos and H. Wu 9 Clearly, we have | Y | ≤ n . This is essentially the only control we have over the size | Y | ,in particular we do not guarantee Y to be a smallest subset summing to t . We now improve the running time from O ( | X ∗ | polylog m ) to e O ( | X ∗ | ) = O ( | X ∗ | polylog | X ∗ | ),ﬁnishing the proof of Theorem 4. Observe that all steps of the algorithm (including the solu-tion reconstruction) run in time e O ( | X ∗ | ), except for Lemma 6. Hence, it suﬃces to replacethis lemma by the following improved variant, which makes use of run-length encoding. ◮ Lemma 7.

There is a deterministic data structure that can initialize z = 0 m and perform k Add and

LCP operations in total time O ( k polylog k ) . Proof.

Recall that we assume that arithmetic operations on O (log m )-bit numbers can beperformed in time O (1). In particular, the string length m can be processed in time O (1).Denote by S ⊆ Z m the set of which z is the indicator vector. That is, initially we have S = ∅ and on operation Add ( i ) we update S := S ∪ { i } . We store S in a balanced binarysearch tree T . We also augment T to store at each node the size of its subtree. This allowsus to perform the following queries in time O (log | S | ): Rank:

Given a number v , determine the number of keys stored in T that are smallerthan v , Select:

Given a number v , determine the v -th number stored in T (in sorted order).Note that T can be updated in time O (log | S | ) per operation. The total time for maintaining T during k Add and

LCP operations is O ( k log k ), since | S | ≤ k .We compress the string z by replacing each run of 0’s by one symbol. Speciﬁcally, letΣ := { } ∪ { (0 , L ) | ≤ L ≤ m } . Note that symbols in Σ can be read and comparedin time O (1). We convert string z ∈ { , } m to a string C ( z ) ∈ Σ ∗ by replacing eachmaximal substring 0 L of z by the symbol (0 , L ). For simplicity, we also add the symbol (0 , z . For example, the string z = 10001100 is convertedto C ( z ) = 1(0 , , , C ( z ). Wemaintain C ( z ) using the binary search tree T , by implementing initialization, Add , and

LCP as follows.

Initialization.

Given m , we initialize z = 0 m and thus C ( z ) = (0 , m ). This string isgenerated by calling AddString ( c ) for c = (0 , m ) ∈ Σ, which takes time O (1). Add.

Given i , we want to set z [ i ] := 1. We denote by a < i < b the predecessor andsuccessor of i in S , so that z [ a..b ] = 10 b − a −

1. Note that a and b can be computed from T .Using a rank query on i , we can infer the corresponding position h with C ( z )[ h − ..h + 1] =1(0 , b − a − C ( z ) at h and at h + 1 to obtain the strings C ( z )[ ..h −

1] and C ( z )[ h +1 .. ]. We then construct the string (0 , i − a − , b − i −

1) using

AddString thriceand

Concatenate twice. Finally, we concatenate C ( z )[ ..h −

1] and (0 , i − a − , b − i − C ( z )[ h + 1 .. ] to form the new string C ( z ) after setting z [ i ] := 1. LCP.

Given i, j , let y := z [ i.. ] and y := z [ j.. ]. We ﬁrst construct the strings C ( y ) and C ( y ). This is similar to the last paragraph: Denote the predecessor and successor of i by a < i ≤ b , so that z [ a..b ] = 10 b − a −

1. Using a rank query, we ﬁnd the corresponding position h with C ( z )[ h − ..h + 1] = 1(0 , b − a − C ( z ) at h + 1 and concatenating it after(0 , b − i −

1) yields C ( y ). (If b − i − ,

0) = (0 , b − i − C ( y ). We now perform a binary search for the largest ℓ such that C ( y )[ ..ℓ ] = C ( y )[ ..ℓ ], using two Split and one

Equal operation per binary search step.We use a rank and a select query to determine the length ∆ of the string corresponding to C ( y )[ ..ℓ ], that is, C ( z [ i..i + ∆ − C ( y )[ ..ℓ ]. If C ( y )[ ℓ + 1] = 1 or C ( y )[ ℓ + 1] = 1 orone of these symbols is undeﬁned (i.e., out of bounds) then LCP ( i, j ) = ∆ + 1. Otherwise,we have C ( y )[ ℓ + 1] = (0 , L ) and C ( y )[ ℓ + 1] = (0 , L ), and then LCP ( i, j ) = ∆ +min { L , L } + 1.Hence, we can simulate k operations among Add and

LCP using O ( k log k ) operationsamong Equal , AddString , Concatenate , and

Split . The total string length of theconstructed family F is N = O ( k log k ), since after k operations each constructed string hasat most k O ( k ). By Theorem 5, the total time is O ( k polylog( kN )) = O ( k polylog k ). ◭ References Amir Abboud, Karl Bringmann, Danny Hermelin, and Dvir Shabtay. SETH-based lowerbounds for subset sum and bicriteria path. In

SODA , pages 41–57. SIAM, 2019. Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. Dynamic pattern matching. Tech-nical report, Department of Computer Science, University of Copenhagen, 1998. DIKU Report98/27, 16 pages, http://cs.au.dk/~gerth/papers/diku-98-27.pdf . Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. Pattern matching in dynamictexts. In

SODA , pages 819–828. ACM/SIAM, 2000. Kyriakos Axiotis, Arturs Backurs, Ce Jin, Christos Tzamos, and Hongxun Wu. Fast modularsubset sum using linear sketching. In

SODA , pages 58–69. SIAM, 2019. Kyriakos Axiotis and Christos Tzamos. Capacitated dynamic programming: Faster knapsackand graph algorithms. In

ICALP , volume 132 of

LIPIcs , pages 19:1–19:13, 2019. MohammadHossein Bateni, MohammadTaghi Hajiaghayi, Saeed Seddighin, and Cliﬀ Stein.Fast algorithms for knapsack via convolution and prediction. In

STOC , pages 1269–1282.ACM, 2018. Richard E. Bellman. Dynamic programming. 1957. Karl Bringmann. A near-linear pseudopolynomial time algorithm for subset sum. In

SODA ,pages 1073–1084. SIAM, 2017. Timothy M. Chan. Approximation schemes for 0-1 knapsack. In

SOSA@SODA , volume 61of

OASICS , pages 5:1–5:12, 2018. Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic progressions.

J. Symb. Comput. , 9(3):251–280, 1990. Marek Cygan, Marcin Mucha, Karol Wegrzycki, and Michal Wlodarczyk. On problems equi-valent to (min, +)-convolution.

ACM Trans. Algorithms , 15(1):14:1–14:25, 2019. Ran Duan, Yong Gu, and Le Zhang. Improved time bounds for all pairs non-decreasing pathsin general digraphs. In

ICALP , volume 107 of

LIPIcs , pages 44:1–44:14, 2018. Ran Duan, Ce Jin, and Hongxun Wu. Faster algorithms for all pairs non-decreasing pathsproblem. In

ICALP , volume 132 of

LIPIcs , pages 48:1–48:13, 2019. Ran Duan, Ce Jin, and Hongxun Wu. Faster algorithms for all pairs non-decreasing pathsproblem. arXiv preprint arXiv:1904.10701 , 2019. Ran Duan and Seth Pettie. Fast algorithms for (max, min)-matrix multiplication and bottle-neck shortest paths. In

SODA , pages 384–391. SIAM, 2009. Friedrich Eisenbrand and Robert Weismantel. Proximity results and faster algorithms forinteger programming using the Steinitz lemma. In

SODA , pages 808–816. SIAM, 2018. Peter M. Fenwick. A new data structure for cumulative frequency tables.

Software: Practiceand Experience , 24(3):327–336, 1994. François Le Gall. Powers of tensors and fast matrix multiplication. In

ISSAC , pages 296–303.ACM, 2014. Pawel Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Lacki, and PiotrSankowski. Optimal dynamic strings. In

SODA , pages 1509–1528. SIAM, 2018. . Axiotis, A. Backurs, K. Bringmann, C. Jin, V. Nakos, C. Tzamos and H. Wu 11 Yahya Ould Hamidoune, Anna S Lladó, and Oriol Serra. On complete subsets of the cyclicgroup.

Journal of Combinatorial Theory, Series A , 115(7):1279–1285, 2008. Klaus Jansen and Lars Rohwedder. On integer programming and convolution. In

ITCS ,volume 124 of

LIPIcs , pages 43:1–43:17, 2019. Ce Jin. An improved FPTAS for 0-1 knapsack. In

ICALP , volume 132 of

LIPIcs , pages76:1–76:14, 2019. Ce Jin and Hongxun Wu. A simple near-linear pseudopolynomial time randomized algorithmfor subset sum. In

SOSA@SODA , volume 69 of

OASICS , pages 17:1–17:6, 2019. Konstantinos Koiliaris and Chao Xu. Faster pseudopolynomial time algorithms for subsetsum.

ACM Trans. Algorithms , 15(3):40:1–40:20, 2019. Marvin Künnemann, Ramamohan Paturi, and Stefan Schneider. On the ﬁne-grained com-plexity of one-dimensional dynamic programming. In

ICALP , volume 80 of

LIPIcs , pages21:1–21:15, 2017. Kurt Mehlhorn, R. Sundar, and Christian Uhrig. Maintaining dynamic sequences underequality tests in polylogarithmic time.

Algorithmica , 17(2):183–198, 1997. Mihai Pˇatraşcu and Erik D. Demaine. Logarithmic lower bounds in the cell-probe model.

SIAM Journal on Computing , 35(4):932–963, 2006. Donguk Rhee. Faster fully polynomial approximation schemes for knapsack problems, 2015.Masterthesis, Massachusetts Institute of Technology. Virginia Vassilevska, Ryan Williams, and Raphael Yuster. All pairs bottleneck paths andmax-min matrix products in truly subcubic time.

Theory Comput. , 5(1):173–189, 2009. Virginia Vassilevska Williams. Nondecreasing paths in a weighted graph or: How to optimallyread a train schedule.

ACM Trans. Algorithms , 6(4):70:1–70:24, 2010. Virginia Vassilevska Williams. Multiplying matrices faster than Coppersmith-Winograd. In

STOC , pages 887–898. ACM, 2012.

A Simple and Fast Algorithm for All-Pairs Non-Decreasing Paths

In the APNP problem, given an edge-weighted graph, the goal is to compute for any pair ofnodes a and b the minimum cost of a path from a to b that uses non-decreasing edge-weights.The cost of such a path is deﬁned to be the largest edge-weight encountered on the path.There has been a number of works that sequentially improved the running time for thedirected and undirected case of APNP [30, 12, 13]. The directed case is a generalization ofthe max-min matrix product [29, 15] and the best known algorithm for both problems runsin time e O ( n ( ω +3) / ), where ω is the exponent of fast matrix multiplication [10, 31, 18]. Incontrast, the undirected case is known to be solvable in e O ( n ) time [13].We show how to solve the undirected APNP problem by a simple algorithm in time O ( n log n ). This improves the previously best result in terms of log-factors, and it isoptimal up to a single log-factor. For simplicity, in the following we call the undirected caseof APNP simply APNP. In this section we prove the following theorem. ◮ Theorem 8.

All-Pairs Non-Decreasing Paths can be solved in O ( n log n ) time w.h.p. Let G = ( V, E ) be an undirected graph with n = | V | nodes and m = O ( n ) edgeshaving edge weights w ( e ) for e ∈ E . A path is a sequence of edges e , e , . . . , e ℓ , such that e i , e i +1 share an endpoint for all 1 ≤ i ≤ ℓ −

1. A non-decreasing path is a path satisfying w ( e i ) ≤ w ( e i +1 ) for all 1 ≤ i ≤ ℓ −

1. The weight of this non-decreasing path is deﬁned tobe w ( e ℓ ), the weight of the last edge. The All Pairs Non-Decreasing Paths Problem (APNP)asks to determine the minimum weight non-decreasing path between every pair of vertices.For simplicity, we focus on the strictly increasing version of the problem where there areno edges of equal weight. The general case can be converted to the distinct weights case (see Lemma 23 in Duan et al. [14]), through a simple reduction. The reduction looks forconnected components formed by edges of the same weight and replaces these edges withnew ones with distinct weights. This preprocessing step runs in O ( n ) time and the numberof edges in the new graph at most doubles. It thus suﬃces to focus on the distinct weightscase.The algorithm starts by ordering all edges of the graph from the smallest weight to thelargest and inspecting the edges in this order. For every vertex u of the graph we maintaina set of vertices v that can be reached from u by a non-decreasing path using only the edgesthat have been inspected so far. Initially the sets for all vertices are empty. The ﬁrst time avertex v is added to a list corresponding to a vertex u determines the cost of minimum non-decreasing path from u to v . In particular, if the vertex v is added to the list correspondingto the vertex u in the phase when we are inspecting edge e , the weight of the minimumnon-decreasing path from u to v is equal to the weight w ( e ) of the edge e . Let C e be the setof newly discovered reachability pairs added in the phase when inspecting the edge e . Wewill shortly describe how we can compute C e in an output-sensitive time O ( | C e | log n + 1).This implies that the total running time is upper bounded by O ( n log n ) since the totalnumber of node pairs is upper bounded by n and each pair is discovered at most once.Now we describe how to compute C e in O ( | C e | log n + 1) time. Let e = ( a, b ). Let u be a vertex that can reach a but cannot reach b only using the edges inspected so far (notincluding e ) via a non-decreasing path. We observe that, by adding the edge e = ( a, b ),the vertex u can now reach vertex b (by ﬁrst going to a and then traversing the edge e ).Similarly, if u can reach b but cannot reach a , after adding e , u can reach a . On the otherhand, if u can reach both a and b (or cannot reach both), no new edges will be added from u after inspecting edge e . Let R a be the set of vertices u that can reach a but cannot reach b (right before inspecting e ), and R b be the set of vertices that can reach b but not a . Weconclude that C e = (( R a \ R b ) × { b } ) ∪ (( R b \ R a ) × { a } ). Therefore it is suﬃcient to beable to compute R a \ R b and R b \ R a in O ( | C e | log n + 1) time. If we spend O (log n ) timeper single vertex from one of these two sets, we obtain the required running time. We use asimilar idea as we used for Modular Subset Sum. Let R ai = 1 if the i -th vertex of the graphbelongs to R a and R ai = 0 otherwise. For a random integer r ∈ { , . . . , p − } (for a largeenough prime p ) we build a tree data structure that stores partial sums of the sequence R a · r , R a · r , R a · r , . . . in its internal leaves. In particular, we associate the leaves ofa complete binary tree with the elements of the sequence and each node recursively storesthe sum of values of its children. We can update an element of the sequence by spending O (log n ) time on the data structure. Furthermore, if we have data structures for R a and R b ,we can recursively inspect subtrees (whose hash values disagree) of the two data structuresto ﬁnd all elements from R a \ R b and R b \ R a . The time spent to ﬁnd one element is O (log n ).Thus, if we store such a data structure for each vertex of the graph, we can update themeﬃciently and compute C e in time O ( | C e | log n + 1) for any edge e .A sample implementation in Python is given in Appendix C. B Python Implementation of Modular Subset Sum

Below we present a simple implementation of our ﬁrst algorithm for Modular Subset Sum(Theorem 3) in Python. It maintains a binary indexed tree that keeps track of the pre-ﬁx sums of polynomial hashes of the characteristic vector of the attainable subsets. To The code can be also found at https://ideone.com/YlLwMQ . . Axiotis, A. Backurs, K. Bringmann, C. Jin, V. Nakos, C. Tzamos and H. Wu 13 easily deal with rollover due to the cyclicity of the mod operation, a separate copy of thecharacteristic vector is kept translated by m .It takes as an input a list of numbers W and the modulus m , and returns a list of length m , where the entry at position s is None if s is not a possible subset sum of W modulo m ,or contains the last number from W that was added to create the subset sum s . import random def ModularSubsetSum (W , m):p = p > m log m r = random . randint ( ,p) r in [0 ,p) powr = [ ] r for i in range ( *m): , r i ( mod p) powr . append (( powr [- ] * r) % p) tree = [ ] * ( *m) def read (i): if i<= : return 0return tree [i- ] + read (i-(i & -i)) def update (i , v): while i < len( tree ):tree [i] += vi += (i+ ) & -(i+ ) def FindNewSums (a ,b ,w):h1 = ( read (b)- read (a))* powr [m-w] % p S ∩ [ a, b ) h2 = ( read (b+m-w)- read (a+m-w)) % p ( S + w ) ∩ [ a, b ) if h1 == h2: return [] if b == a+ : if sums [a] is None : return [a] return [] return FindNewSums (a ,(a+b) // ,w) + FindNewSums ((a+b) // ,b,w) def AddNewSum (s, w):sums [s] = wupdate (s , powr [s]), update (s+m, powr [s+m]) sums = [

None ] * mAddNewSum( , ) for w in W: for s in FindNewSums ( ,m ,w):AddNewSum(s ,w) return sums Example

Find all modular subset sums mod 8 with numbers 1, 3 and 6:

ModularSubsetSum ([ , , ] , ) Recovering the subset

To recover the subset making a particular subset sum, we repeatedly subtract the last numberadded in the subset sum s until we get down to 0. def RecoverSubset ( sums , s): if sums [s] is None : return Noneif s <= : return [] return RecoverSubset (sums , (s- sums [s]) % len ( sums )) + [ sums [s] ]sums = ModularSubsetSum ([ , , ] , )RecoverSubset (sums , ) RecoverSubset (sums , ) C Python Implementation of All-Pairs Non-Decreasing Paths

Below we present a simple implementation of our algorithm in Python for computing min-imum weight non-decreasing path between all pairs of n vertices. It takes as an input a list E of edges of the graph in increasing order of their weights and the number n of vertices.Note that the actual weights of the edges do not matter besides their relative order. Thealgorithm returns an n × n matrix path . path [ u, v ] = None if there is no way to reach v from u by traversing edges with increasing weights. Otherwise path [ u, v ] = par , where par is the previous vertex on the minimum weight non-decreasing path from u to v . For everyvertex of the graph the algorithm keeps track of partial hashes of vertices that can reachthis vertex in a tree data structure. import random def AllPairsNonDecreasingPaths (E , n):p = p > n log n r = random . randint ( ,p) r in [0 ,p) powr = [ ] r for i in range (n): , r i ( mod p) powr . append (( powr [- ] * r) % p)N = <<(n- ). bit_length () tree = [ [ ]*( *N) for _ in range (n) ] def update (v ,node , val ): while node > :tree [v][ node ] += valnode >>= def FindNewPaths (a,b, node ): if tree [a][ node ] == tree [b][ node ]: return [] if node >= N:u = node - N if path [u][a] is None : return [(u ,a ,b)] return [(u,b,a)] return FindNewPaths (a ,b, * node ) + FindNewPaths (a ,b , * node + ) def AddNewPath (u, v , par):path [u][v] = parupdate (v ,u+N, powr [u]) path = [ [

None ] * n for _ in range (n) ] for i in range (n):AddNewPath (i,i,i) for (a ,b) in E: for (u ,v , par) in FindNewPaths (a ,b , ): The code can be also found at https://ideone.com/S9RAhX . . Axiotis, A. Backurs, K. Bringmann, C. Jin, V. Nakos, C. Tzamos and H. Wu 15 AddNewPath (u ,v, par) return path

Example

Find all pairs non-decreasing paths in a graph with n nodes and edges (1 ,

2) and (0 , AllPairsNonDecreasingPaths ([( , ) ,( , )] , ) Recovering the path between two vertices

To recover a speciﬁc path between two vertices, we repeatedly move to the last node visitedbefore the destination until we reach the source. def

RecoverPath (path ,u ,v): if path [u][v] is None : return Noneif u == v: return [u] return RecoverPath (path , u, path [u][v]) + [ v ]path = AllPairsNonDecreasingPaths ([( , ) ,( , )], )RecoverPath (path , , ) RecoverPath (path , , ))