How many longest increasing subsequences are there?
HHow many longest increasing subsequences are there?
Phil Krabbe, ∗ Hendrik Schawe,
2, 1, † and Alexander K. Hartmann ‡ Institut f¨ur Physik, Universit¨at Oldenburg, 26111 Oldenburg, Germany Laboratoire de Physique Th´eorique et Mod´elisation,UMR-8089 CNRS, CY Cergy Paris Universit´e, 95000 Cergy, France (Dated: March 31, 2020)We study the entropy S of longest increasing subsequences (LIS), i.e., the logarithm of the numberof distinct LIS. We consider two ensembles of sequences, namely random permutations of integers andsequences drawn i.i.d. from a limited number of distinct integers. Using sophisticated algorithms,we are able to exactly count the number of LIS for each given sequence. Furthermore, we are notonly measuring averages and variances for the considered ensembles of sequences, but we samplevery large parts of the probability distribution p ( S ) with very high precision. Especially, we are ableto observe the tails of extremely rare events which occur with probabilities smaller than 10 − . Weshow that the distribution of the entropy of the LIS is approximately Gaussian with deviations inthe far tails, which might vanish in the limit of long sequences. Further we propose a large-deviationrate function which fits best to our observed data. I. INTRODUCTION
Imagine a game of numbers: Given a sequence of n numbers, mark the largest subset of numbers such thatevery marked number is larger (or equal) to all markednumbers appearing left of it in the sequence. The markednumbers will be a (weakly) increasing subsequence . Thenumber of marked elements is called the length l . If thesubsequence maximizes l over all possible subsequences itis called a longest (weakly) increasing subsequence (LIS)[1]. An early study of this problem was by Stanis(cid:32)lawUlam [2] as a toy example to illustrate the Monte Carlomethod in a textbook, which lead to its byname Ulam’sproblem . Though, it should be noted that in the sameyear Ref. [3] also discusses the connection of LIS to
Youngtableaux . Ulam’s study found that LIS of random permu-tations have a mean length l which grows with the size ofthe sequence n as (cid:104) l (cid:105) = c √ n . The Monte Carlo simula-tions estimated c ≈ . c = 2was proven rigorously [4].But the length of LIS of permutations attracted muchmore interest. In mathematics the whole distribution p ( l ) was analyzed. First upper and lower tails were de-termined rigorously [5–7], and later it was proven thatthe central part is a Tracy-Widom distribution [8]. Atthe time this result was an unexpected connection be-tween LIS and random matrix theory, where this Tracy-Widom distribution describes the fluctuations of thelargest eigenvalues of the Gaussian unitary ensemble, i.e.,an ensemble of Hermitian random matrices. In the fol-lowing years it turned out that the LIS was an extremelysimple model at the center of a growing class of seeminglyunrelated problems. Beginning with a mapping of a 1+1-dimensional polynuclear growth model of the Kardar-Parisi-Zhang type onto LIS [9] a plethora of models were ∗ [email protected] † [email protected] ‡ [email protected] shown to exhibit the properties of LIS of random per-mutations, namely that their fluctuations are distributedaccording to one of the Tracy-Widom distributions. Ex-amples range from other surface growth processes, likea direct mapping of a ballistic deposition model on LIS[10] or experimental observations of a Tracy-Widom dis-tribution in the fluctuations of real surface growth [11],to the totally asymmetric exclusion process [12] and di-rected polymers [13]. For an overview and insight intothe connections between these models, there are somereview articles [14–16].Recently, different ensembles of sequences than therandom permutation were studied like random walkswith different distributions of their jump lengths [17–20].Besides its role in mathematics and physics, the LISfound applications in computer science, where it is sug-gested as a measure of sortedness of large amounts ofdata [21] or to find structures in time series while pre-serving privacy of the data, which is useful in the contextof, e.g., fraud detection using financial data streams [22].Also in bioinformatics the LIS found applications in thecontext of sequence alignment, e.g., for DNA or proteinsequences [23].Here, we are interested in another property of this fa-mous problem. First, note that the LIS is not neces-sarily unique for any given sequence. For example, con-sider the sequence σ = (7 , , , , , , , , , l = 3 of the longest increasing subsequenceis uniquely defined, this sequence has M = 7 distinctLIS: (4 , , , , , , , , , , , , , , l is thoroughlystudied, but about the number of distinct LIS M of agiven sequence, only very little is known. Nevertheless,for example for the above mentioned application like de-termining sortedness or fraud detection, the actual num-ber of distinct LIS will allow to estimate the reliabilityof decisions based on the LIS calculation much better.For this reason and to gain fundamental insight into thesolution space structure, we study by computer simula-tions [24] here this quantity, namely its logarithm, i.e., a r X i v : . [ c ond - m a t . d i s - nn ] M a r the entropy S = ln M .One of the few results is that the number of increas-ing subsequences of a fixed length grows exponentiallyin n [1, 25]. Although, this suggests that it is infeasibleto count the LIS by enumeration, we will introduce inSec. II B an algorithm to count the LIS efficiently with-out the need to enumerate them. Due to the exponentialgrowth, it indeed makes sense to finally measure the en-tropy S . Since we want to explore the whole distributionof the entropy, including the tails of extremely rare eventswith probabilities of, say, 10 − , we have to apply a so-phisticated Markov chain sampling scheme, which willbe explained in Sec. II C. Finally, Sec. III shows the re-sults of our study before Sec. IV concludes this study.But first, we introduce the two ensembles we studied inSec. II A. II. MODELS AND METHODS
For completeness we define the LIS in a more formalway than in the introduction. Let σ = ( σ , σ , . . . , σ n )be a sequence of numbers. A LIS is a longest sequence λ σ = ( σ i , σ i , . . . , σ i l ) with σ i ≤ σ i ≤ . . . ≤ σ i l suchthat i < i < . . . < i l ≤ n . We denote by M thenumber of distinct sequences λ σ fulfilling this propertyand S = ln M is the entropy. A. Ensembles of random sequences
In this study, we scrutinize two ensembles of randomsequences: First and more in depth, random permuta-tions, for which an example with one LIS marked is vi-sualized in Fig. 1(a). (a) σ i i (b) σ i i FIG. 1. Visualization of two sequences σ . The horizontal axisshows the index i of the value σ i . The elements belonging toone LIS are marked by circles. (a) Random permutation. (b)Random sequence with 11 distinct elements. Second, we study a parameterized random sequenceconsisting of at most K + 1 distinct ordered elements.We call this “ K ensemble”. An example for K = 10 isshown in Fig. 1(b). In the limit K = 0, it consists onlyof identical elements and has therefore a unique LIS with a length of l = n . The other limit K → ∞ consists of se-quences with unique elements, which can be mapped to apermutation by replacing each element by its rank, whichin turn will not change the LIS. Thus, we can interpolatewith K between a non-degenerate LIS to the well knowncase of random permutations. Indeed, the length of theLIS of this ensemble was studied in Ref. [26].As a technical remark, note that the algorithms ex-plained in the following chapter find the strictly mono-tonic increasing subsequence, but for the K ensemble, wewant to find the weakly increasing subsequence. With asimple mapping of the sequence (with elements from N )to a new sequence α i = σ i + in of rational numbers wecan apply algorithms for strict LIS on α to find all weakLIS of σ . B. Counting the number of distinct LIS
Algorithms to find the length of the LIS are rathersimple and there exists some variety. A popular choiceis patience sort [27], which originally is a sorting algo-rithm especially suited for partially sorted data [28], butcan be simplified to an efficient algorithm to find thelength of the LIS of a given sequence in time O ( n ln n )[4]. But there are more alternatives, e.g., a fast algo-rithm in O ( n ln ln n ) [29, 30], approximate algorithms forsequences whose members can not be saved [31] or al-gorithms which are exact within a sliding window [32].Even for the enumeration of LIS, there is literature intro-ducing algorithms [29], which are able to, e.g., generateLIS with special properties [33].Here, we introduce a method to count (and enumerate)distinct LIS of any sequence efficiently. Note that we donot claim to be the first to introduce an algorithm tocount the number of LIS. Some of the existing enumera-tion algorithms could be extended with the same princi-ple we use to allow for efficient counting. Also there is atleast one algorithm description for counting the LIS in awell known programmer forum [34]. However, we couldnot find any reference to published literature. Thereforewe show our approach, which is an extension of patiencesort.Like patience sort, this method takes elements sequen-tially from the front of the sequence and places them ontop of a selected stack from a set ( s , . . . , s k ) of stacks,such that each stack s i is sorted in decreasing order, i.e.,the smallest element is on top of the stack, and the num-ber of stacks is minimal. Thus, in the beginning there isjust one stack containing the first element of the given se-quence. This placement can be achieved by always plac-ing the current element taken from the sequence on theleftmost stack, whose top element is larger than the cur-rent element. If this is not possible, i.e., if all top elementsare smaller than the current element, one opens a newstack s k +1 right of the currently rightmost stack. Notethat therefore the top elements of the stacks are ascend-ingly sorted and the correct stack of each element can befound via binary search. The final number of stacks isequal to the length of the LIS l [4].For counting the LIS, we need to extend this algorithmby introducing pointers. The basic idea is that for anyLIS, exactly one number will be taken from each stack[35]. These pointers will take care of the order constraintsin the following way: Each time an element is placed on astack, pointers are added to some elements of the previ-ous stack. This idea is already described in Ref. [4], butadditionally to the pointers mentioned there, which willpoint to the currently topmost element of the previousstack, we also add pointers to all elements of the previousstack which are smaller than the current element. Themeaning of such a pointer from any element σ j to σ j ( j > j ) will be that in a LIS σ j can appear before σ j .An example structure is shown in Fig. 2. (a) (b) (c) FIG. 2. Construction of the DAG for the sequence(7 , , , , , , , , , , , , , , , , , , , , , M = 7. The set of all pointers, i.e., edges, forms a directedacyclic graph (DAG). The DAG can be used to enumer-ate all LIS by following all paths originating from anyelement of the rightmost stack. This will yield all LIS inreversed order. For our purpose, we just have to countall paths originating from the rightmost stack. There-fore we propagate the information by how many pathsan element can be reached through the DAG. All ele-ments of the rightmost stack are initialized with 1. Theelements of the stack left are assigned the sum of all in-coming edges. This is repeated until the leftmost stackis reached. The sum of all paths ending in elements ofthe leftmost stack is the total number of LIS.To estimate the run time, note that we have to iterateover all incoming edges, of which there are at most O ( n )in a DAG with n nodes. Also the construction takesthe maximum of the number of edges for constructingthe pointers and O ( n ln n ) for constructing the stacks,such that the runtime of this algorithm is O ( n ) in theworst case. Note however that typical DAGs generatedhere have far fewer edges. We observed that typicallythe length of a LIS and therefore the number of stacks is O ( √ n ). Each stack can only be connected to the previousstack. Assuming that stacks are typically of size O ( √ n ),this leads to at most O ( n ) edges between each pair ofneighboring stacks and therefore O ( n √ n ) total edges. C. Sampling rare events
Using the algorithm above, we can determine the num-ber of LIS M for arbitrary sequences. Therefore, gener-ating random sequences allows us to sample S = ln M ,build histograms from the samples and estimate the dis-tribution p ( S ) from them. But to observe any eventwhich occurs with a probability of r , we would have togenerate O (1 /r ) samples and O (1 /r ) to reduce the sta-tistical error enough to determine the probability withreasonable accuracy. Since we would like to know theprobability distribution also in the extremely rare eventtails, we have to use a more sophisticated method thanthis proposed simple sampling .Our approach is to bias the ensemble in a controlledway towards extremely improbable configurations, gatherenough samples there and correct the bias afterwards.This will lead to small statistical errors across large partsof the support. This method [36] was successfully appliedin a wide range of problems from graph theory [37, 38],over stochastic geometry [39], nonequilibrium work dis-tributions [40], the Kardar-Parisi-Zhang equation [41] tothe exploration of the tails of the distribution of the LIS’slength for random permutations and random walks [19].The exact method is inspired by equilibrium thermo-dynamics, where the Metropolis algorithm [42] is used togenerate samples of systems in the canonical ensemble atsome temperature T , which governs the typical values ofthe energies observed in this system. Here, we identifythe energy with our observable of interest S . This al-lows us to use the “temperature” parameter to bias thegenerated states towards improbable values of S .This method builds a Markov chain consisting of se-quences σ ( i ) , where i is the step counter of the chain.For each step in the chain, from the present sequence σ ( i ) a trial sequence σ (cid:48) is constructed by performingsome changes to σ ( i ) . For the standard ensemble ofrandom permutations, where we have performed large-deviation simulations, we used as change to swap twoelements. The trial sequence is accepted, i.e., σ ( i +1) = σ (cid:48) with the Metropolis acceptance probability P acc =min (cid:0) , e − ∆ S/T (cid:1) depending on the temperature T andthe change ∆ S between σ (cid:48) and σ ( i ) . Otherwise the pre-vious sequence is repeated in the chain, i.e., σ ( i +1) = σ ( i ) This procedure is sketched in Fig. 3 and will eventuallyresult in sequences σ occurring in the chain which aredistributed according to Q T ( σ ) = 1 Z T e − S ( σ ) /T Q ( σ ) (1)where Q ( σ ) is the natural distribution of sequences,which we would obtain from simple sampling and Z T S S S S S ...changeaccept changereject changeaccept changeaccept ... FIG. 3. Sketch of a Markov chain of sequence realizations generated by swaps of two random elements of the permutation. Alldistinct LIS are marked by lines of distinct color. The acceptance of a sequence as the next sequence of the Markov chain isdependent on the number of LIS in the realization M , since the energy is identified with S = ln M . is the partition function of our artificial temperature en-semble. From here it is just a question of elemental al-gebra to connect our estimates of the probability den-sity function in the artificial temperature ensemble p T ( S )to the distribution of the unbiased ensemble we want tostudy p ( S ): p T ( S ) = (cid:88) { σ | S ( σ )= S } Q T ( σ ) (2)= (cid:88) { σ | S ( σ )= S } Z T e − S ( σ ) /T Q ( σ ) (3)= 1 Z T e − S ( σ ) /T p ( S ) . (4)Depending on the value of T , a simulation will generatedata for S in a specific intervall. Thus, to obtain thedistribution p ( S ) over a large range of the support, weperformed simulations for many values of T . This re-quires finely tuned values of the temperatures. The ratioof all constants Z T can be obtained from overlaps of p T i and p T j , since the actual distribution needs to be unique,i.e., p T j ( S ) e S/T j Z T j = p T i ( S ) e S/T i Z T i . (5)We used in the order of 30 temperatures per size n , wherelarger sizes typically required more temperatures. Also,like for all Markov chain Monte Carlo one has to care-fully ensure the equilibration of the process and discardsequences of the chain which are still correlated too muchwith previous sequences. Note that equilibration canbe ensured rather conveniently [36], by performing twosets of simulations starting with very different initial se-quences, with low and with high values of S , respectively.The Markov chains can be considered to be equilibrated,when the values of S agree within fluctuations betweenthe two sets. III. RESULTS
First, we study the behavior of typical sequences forthe two ensembles. In the second part, we will investigatethe large-deviation behavior of the standard permutationensemble.
A. Typical behavior
To investigate the typical behavior of the permutationensemble, we consider different system sizes up to largesequences of n = 524288 = 2 elements. The estimatedprobability density functions p ( S ) of the LIS-entropy ofpermutations is shown in the range of typical probabili-ties in Fig. 4(a). These data are collected over 10 sam-ples for each system size. Clearly, the mean value andwidth of the distribution increase with n .Indeed, we observe for the mean a growth of the form (cid:104) S (cid:105) = c √ n. (6)(Also see below in Fig. 5(b).) Note that the fits resultedin rather large reduced χ goodness of fit values (causedby the very high precision of the measured means) sug-gesting that there are corrections to this form for finitesizes. Our best estimate for the prefactor under the as-sumption that the above relation is correct, is c ≈ . σ S = b √ n with b ≈ . (cid:104) S (cid:105) ≈ c √ n and of the mean count (cid:104) M (cid:105) ≈ e c √ n with c ≈ . √ n = (cid:104) l (cid:105) [25, Eq. (11.5)], (cid:104) m (cid:105) ≈ (cid:0) e (cid:1) √ n = e e/ √ n . Thus, since c < e/ ≈ .
2, our nu-merical results suggest the actual mean count of LISto be much lower (and e (cid:104) S (cid:105) even lower in accordanceto Jensen’s inequality). In other words, the numberof longest increasing subsequences is exponential lowerthan the number of increasing subsequences of the same (a) p ( S ) S n = 2048 n = 8192 n = 32768 n = 131072 n = 524288 (b) − − − − − − − − − − σ p ( S ) ( S − h S i ) /σn = 2048 n = 8192 n = 32768 n = 131072 n = 524288 Gaussian
FIG. 4. (a) Probability density p ( S ) of the entropy S for ran-dom permutations of different length n , obtained with simplesampling. (b) Probability densities p ( S ) collapse on an ap-proximate standard Gaussian shape for multiple system sizesif shifted by their mean (cid:104) S (cid:105) ≈ . √ n and scaled with theirwidth σ ≈ . √ n . Note the logarithmic vertical axis. length. This can be understood in the following way:We consider IS of given length l , which is the averageLIS length for this value of n . Now, looking at the en-semble of sequences, some will have a LIS length l < l .For them, there are no IS of length l , so they will notcontribute any IS to the average. This will be a frac-tion of sequences. Some sequences will have LIS lengthof l = l , they will contribute all their LISs to the aver-age. Finally, some fraction of sequences will have a LISlength l > l . Here, all subsequences of length l of all LISwill be IS contributing to the average. Since there areexponentially many subsequences (and maybe even moreIS which are not subsequences of a LIS), they will domi-nate the average number of IS, thus leading to a strongerexponential growth as compared to the average numberof LIS.We use our estimates for the mean and standard devia-tion to rescale the distributions for different system sizesin Fig. 4(b) and observe a collapse on a shape whichcan be approximated well by a standard Gaussian. Es-pecially, the strongest deviations from this scaling form occur for small sizes, while the larger sizes seem to con-verge to the limiting shape. This is expected since thecorrections to the scaling we used, which are mentionedabove, should be stronger for smaller values of n . Webacked this observation by classical normality tests [43–45], which are able to distinguish this distribution froma normal distribution with very high confidence at smallsystem sizes, but become less confident for the largestsystem sizes (details not shown here). Especially theweak Kolmogorov-Smirnov test is not able to distinguishthe distributions from a normal distribution with a sig-nificance level below 10% for all sizes n ≥ realizations.Due to our limited sample size, the tails of the mea-sured distribution are subject to statistical errors. InSec. III B we will present higher quality data for the fartails to show that the approximation by a Gaussian shapeis valid deep into the tails. Even for extreme rare eventswe can not exclude the possibility that the distributionconverges to a Gaussian in the large- n limit.Next, we look at the ensemble of random sequenceswith a limited number of K + 1 distinct elements. Forconstant values of K Fig. 5(a) shows the average entropy.The trivial case of K = 0, which allows only one LIS oflength l = n , corresponds to an entropy of S = 0 andis not visualized. The case K = 1, which consists of se-quences containing two distinct elements, has a low andinterestingly almost n -independent entropy. There aretypically only one or two distinct LIS in such sequencesindependent of the length of the sequence. Our data forlarger values of K show two phenomena. First, largervalues of K lead to larger entropies and second, the de-pendency of the entropy on the length of the sequencediminishes for the limit of large n , i.e., for each fixed K there should be a limit entropy approached for n → ∞ .Indeed, fits of a function f ( n ) = b + a ln( n − c ) with thelimiting value lim n →∞ f ( n ) = b to our data confirm thisguess for all values of K we considered. Note that theshape of the fitting form is purely heuristical. We firsttried standard shapes like approaching a constant with apower law or an exponential, but they did not work outwell.Since b seems to grow roughly linear with K (notshown), this leads to the conjecture that for K ∝ n the saturation of the entropy should not occur and in-stead grow with the size n . Especially, we observed thisbehavior for the permutation case, which is identical tothe K → ∞ limit, as explained in Sec. II A. Indeed, inFig. 5(b) we can observe a quick convergence with in-creasing n to the behavior of the random permutation.Our conjecture Eq. (6) for its growth is visualized as aline. B. The Far Tails
In this section we study the distribution of the entropyof the LIS for the random permutation ensemble with a (a) . S nK = 50 K = 25 K = 10 K = 5 K = 1 (b) S n K S = 10% K S = 100% permutation S = 0 . √ n FIG. 5. Average entropy S as a function of the sequencelength n , for the permutation ensemble and for the K ensem-ble with different values of K . Error bars are usually smallerthan the width of the lines. (a) K ensemble with constant K .Lines are fits of the form f ( n ) = b + a ln( n − c ) (b) Permutationensemble and with K ∝ n . The line is the growth observedfor random permutations (cid:104) S (cid:105) ≈ . √ n . focus on the far tails. Due to the much larger numericaleffort, we are able to show results up to sequence lengthsof n = 8192.The data presented in the previous section, which wasobtained via simple sampling, resulted in a distributionwhich appeared to be very well approximated by a Gaus-sian. We want to investigate whether this is still truewhen including our high-precision estimates of the fartails. Here, we can observe a slightly faster than Gaus-sian decay, see Fig. 6(a). There the distribution for aselection of sequence lengths n is shown including the fartail and fitted with (normalized) Gaussians. While theydescribe the high-probability region of the distribution(shown in the inset) very well, the deviation becomesstronger for increasingly rare events.To test whether this deviation remains in the n → ∞ limit, we rescale the distributions, like in Fig. 4(b), to beindependent of system size, see Fig. 6(b). First, we seethat this collapse does not work as well in the far tails,as it does in the high-probability region. Interestingly, (a) − − − − − − − − − − p ( S ) S (b) − − − − − − σ p ( S ) ( S − h S i ) /σ A n FIG. 6. (a) Probability density p ( S ) for multiple system sizeswith extremely high precision data for the far tails. The insetshows a zoom on the high probability region. The lines arefits to Gaussian distributions, which fit very well in the high-probability region, but do not describe the whole tails of thedistribution. (b) Same rescaling of the axes as Fig. 3(b). Thisshows that the different system sizes move towards the Gaus-sian for larger sizes. The lines are linear interpolations of allavailable data points, not all of them are shown as symbolsfor clarity. The inset shows the area between the logarithmof the rescaled distribution and the logarithm of a standardnormal distribution with a fit used for extrapolation. there is a crossing for different system sizes. Left of thecrossing larger sizes tend towards the Gaussian, whichhints that for larger sizes the Gaussian approximationbecomes better even in the intermediate tails. Carefulexamination of the crossing shows that its position isdependent on the system size and larger systems crossfarther on the right than smaller systems. This is againa hint that the Gaussian approximation becomes validover larger ranges of the distribution for larger systemsizes.To quantify this observation, we can not use classi-cal statistical tests like we could do it for the data weobtained via simple sampling. Instead we will use acrude estimate of similarity, similar to one already usedin [46]. We compare the area A between the logarithm ofthe scaled empirical distributions p s ( x ) = σp ( S ), where x = ( S − (cid:104) S (cid:105) ) /σ (cf. Fig. 6(b)), and the logarithm of astandard normal probability density function p G to esti-mate whether they become more similar for larger sizes.To be comparable across all system sizes, we limit thisdifference to the largest range of the horizontal axis, forwhich we have data for all sizes, i.e., A = (cid:90) | ln p s ( x ) − ln p G ( x ) | d x. (7)Using this method we observe a strictly decreasingarea, as shown in the inset of Fig. 6(b). If we extrap-olate it using a power law with offset A ( n ) = c + an − b we obtain a result for the offset c = 344 ± n → ∞ .Next, we can use our empirical data of the distribu-tion to test whether a large deviation principle holds,that is whether the behavior of the distribution can beexpressed by a rate function Φ in the n → ∞ limit, de-fined by Φ( s ) = − lim n →∞ n ln p n ( sS max ,n ) [47], where S max ,n is the maximum possible value for a given valueof n and therefore s ∈ [0 , s = S/S max ,n , is done to describethe largest fluctuations by one size-independent functionΦ( s ). Since we have the distributions p n ( S ) for mul-tiple finite n , we can calculate empirical rate functions Φ n ( s ) for each n and extrapolate whether they convergeto a limiting curve, which is a strong hint that this curveis the size-independent rate function Φ( s ), which wouldestablish a large deviation principle. If a rate functionexists, it governs the fluctuations around the mean. Forexample, the existence of a rate function with mild prop-erties implies the law of large numbers and the centrallimit theorem for the corresponding process. For brevity,we will omit n -subscripts for S max , n and p n ( S ).To analyze our data, we have to determine S max for theLIS. A maximum entropy is achieved if for many groupsof elements, one can choose independently between dif-ferent elements. Thus, consider a sequence, which con-sists of groups of k decreasing elements, followed by k decreasing elements, which are larger than all elementsbefore and so on. An example for n = 9 and k = 3is (3 , , , , , , , , n/k and can contain for each block of k an arbi-trary element, resulting in M = k n/k distinct LIS. Theentropy S = ln M = nk ln k is maximized at k = e , andsince we are limited to integer k , at k = 3. This resultsin a maximum entropy of S max = n ln 3 /
3, i.e., linear in n . In Fig. 7(a) the empirical rate functions are visual-ized, but no convergence to a common tail is visible.The best common tail we can generate, happens for a slightly modified rate function with an unusual expo-nent Φ u ( s ) = − lim n →∞ n / ln p ( S ). Note however, thatsuch an exponent is not out of the question. For exam-ple, the rate function of the right tail of the distribu-tion for the rescaled length ˜ l = l/ √ n of LIS behaves asΦ(˜ l ) = − lim n →∞ n / ln p ( l ) [7]. Our result is shown inFig. 7(b) in double logarithmic scale to emphasize thecollapse in the right tail on a power law ∼ s κ with aslope of κ ≈
2, consistent with the before observed al-most Gaussian right tail. (a) . − l og ( p ( S )) / n S/S max (b) − − − − − .
01 0 . − l og ( p ( S )) / n / S/S max ∝ ( S/S max ) FIG. 7. (a) Usual empirical rate function Φ( s ) = − lim n →∞ n ln p ( S ). No convergence is visible, the curvesshift to the left with increasing sequence length n . (b)Empirical rate function with unusual exponent Φ u ( s ) = − lim n →∞ n / ln p ( S ) for the random permutation case inlog-log scale to emphasize the convergence to a common tailwith a power-law shape. Finally, we want to understand what leads to sequenceswith atypically many or few distinct LIS. For this pur-pose we used the sequences generated by simple samplingand by the large-deviation approach to study their cor-relation with the length of the corresponding LIS. Thismight give insight into qualitative mechanisms governingthe degeneracy of the LIS. This correlation is visualized inFig. 8 for random permutations of length n = 8192. Typ-ical permutations have a LIS-entropy around (cid:104) S (cid:105) ≈ (cid:104) l (cid:105) ≈ S . Though, extremely degenerate LIS necessi-tate longer LIS. This is somewhat counter intuitive, sincethere are more increasing subsequences of shorter length[1]. We assume therefore that the mechanism here isthat these rare sequences have a structure which is insome sense modular (cf. Sec. III B for the configurationof maximum entropy) to allow for many almost identicalLIS. These may differ independently in many places andtherefore can combinatorically combine such small differ-ences. Then, since a longer LIS has more members, thecombinatorial character leads to an entropy advantagefor long LIS. This higher number of combinatorial possi-bilities becomes necessary at some point to support evenmore degenerate LIS, thus we see a very strong correla-tion for extremely high values of S . However, we assumethat for very long LIS, the entropy has to decrease again.In the extreme case of l = n the sequence has to besorted and can only contain a single LIS. Since we donot observe very long LIS in our sampled data, it meansthat they are combinatorially suppressed. In order tohave access to this region, one would have to perform abiased sampling with respect to the length and measurethe entropy, or, even better, measure the two-dimensionaldistribution p ( S, l ) by a two-temperature large-deviationapproach, which would be numerically very demanding. l S MCMCsimple samplingmean
FIG. 8. The length of the LIS l as function of the entropy S for a sequence length n = 8192. The dark gray data pointsare gathered using simple sampling and represent typical se-quences. The other data points are collected with the Markovchain Monte Carlo (MCMC) method described in Sec. II Cand represent extremely rare sequences with atypical entropy. IV. CONCLUSIONS
Here, we studied the entropy S of longest increas-ing subsequences of random permutations by counting the number of distinct LIS. Using an extension of thepatience-sort algorithm, this can be readily obtained forany given sequence. Especially we applied Markov chainMonte Carlo techniques to explore the far tail of the prob-ability distribution of S in the regime of extremely rareevents with probabilities less than 10 − .Concerning the typical behavior, we found that the av-erage entropy grows as a square root in the length of thepermutation, i.e., the number of LIS grows exponentially,as expected. The fluctuations of the entropy are in goodapproximation Gaussian, but show deviations from thisshape in the far tails.Further, we use the data of the far tails to empiri-cally scrutinize the rate function, the central piece oflarge-deviation theory. For the right tails we propose arate function with an unusual exponent Φ ( S/S max ) = − lim n →∞ ln p ( S ) /n / ∼ ( S/S max ) , towards whichthe right tails of all studied system sizes seem to con-verge. This means, the standard large-deviation prin-ciple, where one would see a convergence with a factor1 /n instead of 1 /n / , does not hold, but still the tails ofthe distribution can be described by some rate function.Note that for the distribution of LIS lengths also a ratefunction with a factor different from 1 /n was found inprevious work.Additionally to random permutations, we studied anensemble with a limited amount of distinct elements inthe sequences. For fixed number of distinct elements,we observed that S converges to a constant value whichis independent of the sequence length for long sequences.For any number of distinct symbols, which is proportionalto the sequence length, this will converge to the same LIS-entropy as random permutations for large system sizes.Also, the datastructure used to count the LIS can beused to perform unbiased sampling of all LIS, which isa line of research the authors are working on right now.Here, also other ensembles of sequences could be of in-terest, like one-dimensional random walks. Finally, forfuture research it could be interesting, yet numericallyextremely demanding, to study the two-dimensional dis-tributions like p ( S, l ). ACKNOWLEDGMENTS
HS would like to thank Naftali Smith for interest-ing discussions about LIS and acknowledges the OpLa-Dyn grant obtained in the 4th round of the TransAtalticprogram Digging into Data Challenge (2016-147 ANROPLADYN TAP-DD2016). The simulations were per-formed at the HPC Cluster CARL, located at the Uni-versity of Oldenburg (Germany) and funded by the DFGthrough its Major Research Instrumentation Programme(INST 184/108-1 FUGG) and the Ministry of Scienceand Culture (MWK) of the Lower Saxony State. [1] D. Romik,
The Surprising Mathematics of LongestIncreasing Subsequences (Cambridge University Press,USA, 2015).[2] S. M. Ulam, in
Modern Mathematics for the Engineer:Second Series , Dover Books on Engineering Series, editedby E. Beckenbach and M. Hestenes (Dover Publications,Incorporated, 2013) Chap. 11, pp. 261–281.[3] C. Schensted, Canadian Journal of Mathematics ,179191 (1961).[4] D. Aldous and P. Diaconis, Bulletin of the AmericanMathematical Society , 413 (1999).[5] T. Sepp¨al¨ainen, Probability Theory and Related Fields , 221 (1998).[6] B. F. Logan and L. A. Shepp, Advances in Mathmatics , 206 (1977).[7] J.-D. Deuschel and O. Zeitouni, Combinatorics, Proba-bility and Computing , 247 (1999).[8] J. Baik, P. Deift, and K. Johansson, Journal of the Amer-ican Mathematical Society , 1119 (1999).[9] M. Pr¨ahofer and H. Spohn, Phys. Rev. Lett. , 4882(2000).[10] S. N. Majumdar and S. Nechaev, Physical Review E ,011103 (2004).[11] K. A. Takeuchi and M. Sano, Phys. Rev. Lett. ,230601 (2010).[12] K. Johansson, Communications in Mathematical Physics , 437 (2000).[13] J. Baik and E. M. Rains, Journal of Statistical Physics , 523 (2000).[14] T. Kriecherbauer and J. Krug, Journal of Physics A:Mathematical and Theoretical , 403001 (2010).[15] S. N. Majumdar, in Complex Systems: Lecture Notesof the Les Houches Summer School 2006 , Les Houches,edited by J. Bouchaud, M. M´ezard, and J. Dalibard (El-sevier Science, 2006) Chap. 4.[16] I. Corvin, Random Matrices: Theory and Applications , 1130001 (2012).[17] O. Angel, R. Balka, and Y. Peres, Mathematical Pro-ceedings of the Cambridge Philosophical Society , 173(2017).[18] J. R. G. Mendon¸ca, Journal of Physics A: Mathematicaland Theoretical , 08LT02 (2017).[19] J. B¨orjes, H. Schawe, and A. K. Hartmann, Phys. Rev.E , 042104 (2019).[20] J. R. G. Mendon¸ca, H. Schawe, and A. K. Hartmann,Phys. Rev. E , 032102 (2020).[21] P. Gopalan, T. S. Jayram, R. Krauthgamer, and R. Ku-mar, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms , SODA 07(Society for Industrial and Applied Mathematics, USA,2007) p. 318327.[22] L. Bonomi and L. Xiong, Transactions on Data Privacy , 73 (2016).[23] H. Zhang, Bioinformatics , 1391 (2003).[24] A. K. Hartmann, Big Practical Guide to Computer Sim-ulations (World Scientific, Singapore, 2015).[25] J. M. Hammersley, in
Proceedings of the Sixth BerkeleySymposium on Mathematical Statistics and Probability,Volume 1: Theory of Statistics (University of CaliforniaPress, Berkeley, Calif., 1972) pp. 345–394. [26] C. Houdr and T. J. Litherland, “On the longest in-creasing subsequence for finite and countable alphabets,”in
High Dimensional Probability V: The Luminy Vol-ume , Collections, Vol. Volume 5, edited by C. Houdr,V. Koltchinskii, D. M. Mason, and M. Peligrad (Insti-tute of Mathematical Statistics, Beachwood, Ohio, USA,2009) pp. 185–212.[27] C. L. Mallows, SIAM Review , 375 (1963).[28] B. Chandramouli and J. Goldstein, in ACM SIGMODInternational Conference on Management of Data (SIG-MOD 2014) (ACM SIGMOD, 2014).[29] S. Bespamyatnikh and M. Segal, Information ProcessingLetters , 7 (2000).[30] M. Crochemore and E. Porat, Information and Compu-tation , 1054 (2010).[31] A. Arlotto, V. V. Nguyen, and J. M. Steele, StochasticProcesses and their Applications , 3596 (2015).[32] M. H. Albert, A. Golynski, A. M. Hamel, A. Lpez-Ortiz,S. Rao, and M. A. Safari, Theoretical Computer Science , 405 (2004).[33] Y. Li, L. Zou, H. Zhang, and D. Zhao, Proc. VLDBEndow. , 181192 (2016).[34] A. Salauyou, (2014), https://stackoverflow.com/questions/22923646/number-of-all-longest-increasing-subsequences/22945390 .[35] Other numbers is the same stack appear earlier in thesame sequence but are larger, or they appear later butare smaller.[36] A. K. Hartmann, Phys. Rev. E , 056102 (2002).[37] A. K. Hartmann, The European Physical Journal B ,627 (2011).[38] H. Schawe and A. K. Hartmann, The European PhysicalJournal B , 73 (2019).[39] H. Schawe, A. K. Hartmann, and S. N. Majumdar, Phys.Rev. E , 062159 (2018).[40] A. K. Hartmann, Phys. Rev. E , 052103 (2014).[41] A. K. Hartmann, P. L. Doussal, S. N. Majumdar,A. Rosso, and G. Schehr, Europhys. Lett. , 67004(2018).[42] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth,A. H. Teller, and E. Teller, The journal of chemicalphysics , 1087 (1953).[43] R. D’Agostino and E. S. Pearson, Biometrika , 613(1973).[44] W. H. Press, S. A. Teukolsky, W. T. Vetterling, andB. P. Flannery, Numerical recipes 3rd edition: The art ofscientific computing (Cambridge university press, 2007).[45] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haber-land, T. Reddy, D. Cournapeau, E. Burovski, P. Pe-terson, W. Weckesser, J. Bright, S. J. van der Walt,M. Brett, J. Wilson, K. Jarrod Millman, N. May-orov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson,C. Carey, ˙I. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Hen-riksen, E. A. Quintero, C. R. Harris, A. M. Archibald,A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and S. . .Contributors, arXiv e-prints , arXiv:1907.10121 (2019),arXiv:1907.10121 [cs.MS].[46] H. Schawe and A. K. Hartmann, arXiv preprintarXiv:2003.03415 (2020). [47] H. Touchette, Physics Reports478