Universality in Numerical Computations with Random Data. Case Studies
aa r X i v : . [ m a t h . NA ] J u l Universality in Numerical Computations withRandom Data. Case Studies.
Percy Deift ∗ , Govind Menon † , Sheehan Olver ‡ and Thomas Trogdon ∗ June 28, 2018
Abstract
The authors present evidence for universality in numerical computa-tions with random data. Given a (possibly stochastic) numerical algo-rithm with random input data, the time (or number of iterations) toconvergence (within a given tolerance) is a random variable, called thehalting time. Two-component universality is observed for the fluctuationsof the halting time, i.e. , the histogram for the halting times, centered bythe sample average and scaled by the sample variance (see Eqs. 1 and2 below), collapses to a universal curve, independent of the input datadistribution, as the dimension increases. Thus, up to two components,the sample average and the sample variance, the statistics for the haltingtime are universally prescribed. The case studies include six standard nu-merical algorithms, as well as a model of neural computation and decisionmaking. A link to relevant software is provided in [1] for the reader whowould like to do computations of his’r own.
In earlier work [2], two of the authors (P.D. and G.M., together with C. Pfrang)considered the problem of computing the eigenvalues of a real, n × n randomsymmetric matrix M = ( M ij ). They considered matrices chosen from differentensembles E using a variety of different algorithms A . Let S n denote the spaceof real, n × n symmetric matrices. Standard eigenvalue algorithms involve itera-tions of isospectral maps ϕ = ϕ A : S n → S n , spec( ϕ A ( M )) = spec( M ) for M ∈ S n . If M ∈ S n is given, one considers the sequence of matrices M k +1 = ϕ ( M k ), k ≥
0, with M = M . Clearly, spec( M k +1 ) = spec( M k ) = · · · = spec( M ), andunder appropriate conditions M k = ϕ ( k ) A ( M ) converges to a diagonal matrix,Λ = diag( λ , . . . , λ n ). Necessarily, the λ i ’s are the desired eigenvalues of M .In [2], the authors discovered the following phenomenon: For a given accu-racy ǫ , a given matrix size n ( ǫ small, n large, in an appropriate scaling range)and a given algorithm A , the fluctuations in the time to compute the eigenval-ues to accuracy ǫ with the given algorithm A , were universal , independent of ∗ Courant Institute, New York, NY, USA † Brown University, Providence, RI, USA ‡ The University of Sydney, Sydney, NSW, Australia E . More precisely, they considered fluctuations in the deflation time T (The notion of deflation time is generalized to the notion of halting time in subsequent calculations). Recall that if an n × n matrix hasblock form M = (cid:18) M M M M (cid:19) where M is k × k and M is ( n − k ) × ( n − k ) for some 1 ≤ k ≤ n − M = diag( M , M ) is obtained from M by deflation . If k M k = k M k ≤ ǫ , then the eigenvalues { λ i } of M differfrom the eigenvalues { ˆ λ i } of ˆ M by O ( ǫ ). Let T = T ǫ,n,A,E ( M ) be the time (= ϕ A ) it takes to deflate a random matrix M , chosenfrom an ensemble E , to order ǫ , using algorithm A , i.e. T is the smallest timesuch that for some k , 1 ≤ k ≤ n − k ( ϕ ( T ) A ( M )) k = k ( ϕ ( T ) A ( M )) k ≤ ǫ .As explained in [2], T is a useful measure of the time required to compute theeigenvalues of M : Generically, at worst O ( n ) deflations are needed to computethe eigenvalues of M , and at best, O (log n ). The fluctuations τ ǫ,n,A,E ( M ) of T are defined by τ ǫ,n,A,E ( M ) = T ǫ,n,A,E ( M ) − h T ǫ,n,A,E i σ ǫ,n,A,E , (1)where h T ǫ,n,A,E i is the sample average of T ǫ,n,A,E ( M ) taken over matrices M from E , and σ ǫ,n,A,E is the sample variance. For a given E , a typical sample sizein [2] was of order 5 ,
000 to 10 ,
000 matrices M , and the output of the calculationsin [2] was recorded in the form of a histogram for τ ǫ,n,A,E .Most of the calculations in [2] concerned three eigenvalue algorithms: the QRalgorithm, the QR algorithm with shifts (the version of QR used in practice), andthe Toda algorithm. The QR algorithm is based on the factorization of a(n in-vertible) matrix M as M = QR , where Q is orthogonal and R is upper-triangularwith R ii >
0. Given M ∈ S n , with M = QR , M ′ = ϕ A ( M ) = ϕ QR ( M ) ≡ RQ .Clearly, M ′ = Q T M Q ∈ S n and spec( M ′ ) = spec( M ). Practical implemen-tation of the QR algorithm requires the use of a shift, i.e. the QR algorithmwith shifts [3]. As shown in [2], shifting does not affect universality. The
Todaalgorithm involves the solution M ( t ) of the Toda equation dMdt = [ B ( M ) , M ] = B ( M ) M − M B ( M ), where B ( M ) = M + − M T + , M + is the upper triangularpart of M , and M ( t = 0) = M . For all t >
0, spec( M ( t )) = spec( M ), andas t → ∞ , we again have M ( t ) → Λ = diag( λ , . . . , λ n ) where { λ i } are theeigenvalues of M . For the convenience of the reader, in Figure 1, we reproduce,in particular, histograms for τ ǫ,n,A,E , from [2] for the QR algorithm ( A = QR)with two different ensembles and varying values of n and ǫ .From Figure 1, we see that eigenvalue computation with the QR algorithmexhibits two-component universality, i.e. , the fluctuations τ ǫ,n,A,E obey a uni-versal law for all ensembles E under consideration. The same is true for allthree algorithms considered in [2]: The laws are different, however, for differentalgorithms A . 2 . . . . . . . . Normalized Deflation Time F r equen cy − . . . . . . . . Normalized Deflation Time F r equen cy − . . . . . . . . Normalized Deflation Time F r equen cy − . . . . . . . . Normalized Deflation Time F r equen cy Figure 1: The observation of two-component universality for τ ǫ,n,A,E when A = QR. This figure is taken from [2]. Overlayed histograms demonstratethe collapse of the histogram of τ ǫ,n,A,E to a single curve. See the Appendixfor the definitions of our choices for E . In the top-left figure, E = GOE, and40 histograms for τ ǫ,ǫ,A,E , are plotted one on top of the other for ǫ = 10 − k , k = 2 , , , n = 10 , , . . . , ≈ , E = BE . In the lower figure, all 40 + 40 histogramsare overlayed and universality is evident: the data appears to follow a universallaw for the fluctuations. 3n the current paper, the work in [2] has been extended in various ways asfollows. All matrix ensembles are described in the Appendix. In the first set of computations, the authors consider the eigenvalue problem forrandom matrices M ∈ S n using the Jacobi algorithm (see, e.g. [4]): for M ∈ S n ,choose i < j such that | M ij | ≥ max ≤ i ′ 000 samples. We see two-component universality emerge for n sufficiently large: the histograms follow a universal (independent of E ) law.to sample the matrices. A novel technique for sampling such unitary ensembleswas introduced recently [8] by two of the authors, S.O. and T.T., together withN. R. Rao, taking advantage of the representation of the eigenvalues of M as adeterminantal point process whose kernel is given in terms of orthogonal poly-nomials (see also [9]). Using this sampling technique, the authors of the presentpaper have considered the QR algorithm for various unitary ensembles . His-tograms for the halting (= deflation) time fluctuations τ ǫ,n,A,E , A = QR , aregiven in Figure 3 and again two-component universality is evident. In a third set of computations in this paper, the authors start to address thequestion of whether two-component universality is just a feature of eigenvaluecomputation, or is present more generally in numerical computation. In partic-ular, the authors consider the solution of the linear system of equations W x = b where W is real and positive definite, using the conjugate gradient (CG) method.The method is iterative (see e.g. [10] and also Remark 1 below) and at iter-ation k of the algorithm an approximate solution x k of W x = b is found andthe residual r k = W x k − b is computed. For any given ǫ > 0, the method ishalted when k r k k < ǫ , and the halting time k ǫ ( W, b ) recorded. The authors Here M = QR where Q is unitary and again R is upper triangular with R ii > The notation k · k is used to denote the standard ℓ norm on n -dimensional Euclideanspace F r e qu e n c y Matrix size " ! F r e qu e n c y Matrix size " Figure 3: The observation of two-component universality for τ ǫ,n,A,E when A =QR, E = QUE , COSH , GUE and ǫ = 10 − . Here we are using deflation time( = halting time), as in [2]. The left figure displays three histograms, one eachfor GUE , COSH and QUE, when n = 70. The right figure displays the sameinformation for n = 150. All histograms are produced with 16 , 000 samples.Two-component universality emerges for n sufficiently large: the histogramsfollow a universal (independent of E ) law. This is surprising because COSHand QUE have eigenvalue distributions that differ significantly from GUE inthat they do not follow the so-called semi-circle law . These histograms appearto collapse to the same curve in Figure 1. This is a further surprise, given thewell-known fact that Orthogonal and Unitary Ensembles give rise to different(eigenvalue) universality classes. 6 ! F r e qu e n c y Matrix size " ! ! F r e qu e n c y Matrix size " Figure 4: The observation of two-component universality for τ ǫ,n,A,E when A = CG and E = cLOE , cPBE with ǫ = 10 − . The left figure displaystwo histograms, one for cLOE and cPBE, when n = 100. The right figuredisplays the same information for n = 500. All histograms are produced with16 , 000 samples. Two-component universality is evident for n sufficiently large:the histograms follow a universal (independent of E ) law. The critical scaling(see Appendix A.3) has significant impact on the distribution of the conditionnumber and forces h τ ǫ,n,A,E i ≈ nα , α < 1. If the scaling m = 2 n is chosenin the ensemble E then the CG method converges too quickly and the haltingtime tends to take only 10-15 different values for each value of m . No interestinglimiting statistics are present. Conversely, if m = n the CG method convergesslowly ( h k ǫ,m,A,E i ≫ m ) and rounding errors dominate the computation. Ex-periments do not indicate two-component universality if m = 2 n or m = n . Thescaling m = n + 2 ⌊√ n ⌋ identifies a critical scaling region. Within this scalingregion, we see two-component universality emerge for n sufficiently large: thehistograms follow a universal (independent of E ) law.consider n × n matrices W chosen from two different positive definite ensembles E (see Appendix A.3) and vectors b = ( b j ) chosen independently with iid en-tries { b j } . Given ǫ (small) and n (large), and ( W, b ) ∈ E , the authors recordthe halting time k ǫ,n,A,E , A = CG, and compute the fluctuations τ ǫ,n,A,E ( W, b ).The histograms for τ ǫ,n,A,E are given in Figure 4, and again, two-componentuniversality is evident. In a fourth set of computations, the authors again consider the solution of W x = b but here W has the form I + X and X ≡ X n is a random, real non-symmetric matrix and b = ( b j ) is independent with uniform iid entries { b j } . As W = I + X is (almost surely) no longer positive definite the conjugate gradient7 ! F r e qu e n c y Matrix size " ! ! F r e qu e n c y Matrix size " Figure 5: The observation of two-component universality for τ ǫ,n,A,E when A = GMRES, E = cSGE , cSBE and ǫ = 10 − . The left figure displays twohistograms, one for cSGE and one for cSBE, when n = 100. The right figuredisplays the same information for n = 500. All histograms are produced with16 , 000 samples. The critically scaled ensembles cSBE and cSGE are of the form I + X n with k X n k ≈ 2. If the matrix is too close to the identity, the haltingtime will take almost constant values, i.e. k ǫ,n,A,E = 8, independent of n . Ifthe matrix is too far from the identity, the fact that it is unstructured makesGMRES perform poorly and the algorithm typically completes in n steps, themaximum possible number of iterations (see Remark 1 below). With the properscaling of X , we see two-component universality emerge for n sufficiently large:the histograms follow a universal (independent of E ) law.algorithm breaks down, and the authors solve ( I + X ) x = b using the GeneralizedMinimal Residual (GMRES) algorithm [11]. Again, the algorithm is iterativeand at iteration k of the algorithm an approximate solution x k of ( I + X ) x = b is found and the residual r k = ( I + X ) x k − b is computed. As before, for anygiven ǫ > 0, the method is halted when k r k k < ǫ and k ǫ,n,A,E ( X, b ) is recorded.As in the conjugate gradient problem (Section 1.3), the authors compute thehistograms for the fluctuations of the halting time τ ǫ,n,A,E (2) for two ensembles E , where now A = GMRES. The results are given in Figure 5, where againtwo-component universality is evident. Remark 1. The computations in Sections 1.3 and 1.4 are particularly reveal-ing for the following reason. Both the CG and GMRES algorithms proceed bygenerating approximations x n to the solution in progressively larger subspaces V k of R n , x k ∈ V k , dim V k = k (almost surely). These algorithms terminatein at most n steps, in the absence of rounding errors. If the matrix W in thecase of CG, or I + X in the case of GMRES, is too close to the identity, thenthe algorithm will converge in O (1) steps, essentially independent of n . On theother hand, if W or I + X is too far from the identity, the algorithm will con- erge only after n steps (GMRES) or be dominated by rounding errors (CG).Thus in both cases there are no meaningful statistics. What the calculations inSections 1.3 and 1.4 reveal is that if the ensembles for CG and GMRES are suchthat the matrices W and I + X , respectively, are typically not too close to, andnot too far, from the identity, then the algorithms exhibit significant statisticalfluctuations, and two-component universality is immediately evident. (for morediscussion see the captions for Figures 4 and 5). Analogous considerations applyin Section 1.5 below. In a fifth set of computations, the authors raise the issue of whether two-component universality is just a feature of finite-dimensional computation, oris also present in problems which are intrinsically infinite dimensional. In par-ticular, is the universality present in numerical computations for PDEs? As acase study, the authors consider the numerical solution of the Dirichlet problem∆ u = 0 in a star-shaped region Ω ⊂ R with u = f on ∂ Ω. The boundary isdescribed by a periodic function of the angle θ , r = r ( θ ), and similarly f = f ( θ ),0 ≤ θ ≤ π . Two ensembles, BDE and UDE (as described in Appendix A.6),are derived from a discretization of the problem with specific choices for r , de-fined by a random Fourier series. The boundary condition f is chosen randomlyby letting { f ( πjn ) } n − j =0 be iid uniform on [ − , τ ǫ,n,A,E from these computations are given in Figure 6 and again, two-component universality is evident. What is surprising, and quite remarkable,about these computations is that the histograms for τ ǫ, ,A,E in this case arethe same as the histograms for τ ǫ, ,A,E in Figure 5 (see Figure 6 for the over-layed histograms). In other words, UDE and BDE are structured with randomcomponents, whereas cSGE and cSBE have no structure, yet they produce thesame statistics (modulo two components). In all the computations discussed so far, the randomness in the computa-tions resides in the initial data. In the sixth set of computations, the au-thors consider an algorithm which is intrinsically stochastic. They considera genetic algorithm to compute Fekete points (see [12, p. 142]). Such points P ∗ = ( P ∗ , P ∗ , . . . , P ∗ N ) ∈ R N are the global minimizers of the objective func-tion H ( P ) = 2 N ( N − X ≤ i = j ≤ N log | P i − P j | − + 1 N N X i =1 V ( P i )for real-valued functions V = V ( x ) which grow sufficiently rapidly as | x | → ∞ .It is well-known (see, e.g. [12]) that as N → ∞ , the counting measures δ P ∗ = Aside from round-off errors, see comments below Figure 4 ! F r e qu e n c y Matrix size " ! ! F r e qu e n c y Matrix size " ! ! F r e qu e n c y Matrix size " Figure 6: The observation of two-component universality for τ ǫ,n,A,E when A = GMRES, E = UDE , BDE and ǫ = 10 − . The left figure displays twohistograms, one for UDE and one for BDE, when n = 100. The right figuredisplays the same information for n = 500. The bottom figure consists of fourhistograms, two taken from Figure 5 ( E = cSGE , cSBE) and two taken fromthe right figure above ( E = UDE , BDE). All histograms are produced with16 , 000 samples. It is interesting to note two properties. First, as we observefrom our computations, BDE and UDE are of the form I + X n where X n hasa norm that grows proportional to some fractional power of n . While this typeof growth in the case of Section 1.4 (Figure 5) would cause GMRES to takeits maximum possible number of iterations, that is k = n , nevertheless, in thecontext of Section 1.5, non-trivial statistics emerge. In light of Remark 1, weconjecture that structure is necessary for GMRES to perform well when the per-turbation of the identity has an unbounded spectral radius in the large n limit.The second, and most important feature, is that two-component universalityfor matrices of the form I + X n persists as the computations are moved fromstructured randomness (UDE and BDE) to unstructured randomness (cSBEand cSGE): the histograms follow a universal (independent of E ) law.10 N P Ni =1 δ P ∗ i converge to the so-called equilibrium measure µ V which plays akey role in the asymptotic theory of the orthogonal polynomials generated bymeasure e − NV ( x ) dx on R . Genetic algorithms involve two basic components ,“mutation” and “crossover”. The authors implement the genetic algorithm inthe following way. The Algorithm Fix a distribution D on R . Draw an initial population P = P = { P i } ni =1 consisting of n = 100 vectors in R N , N large, with elements thatare iid uniform on [ − , F D ( P ) : ( R N ) n → ( R N ) n is definedby one of the following two procedures: • Mutation : Pick one individual P ∈ P at random (uniformly). Thenpick two integers n , n from { , , . . . , N } at random (uniformly andindependent). Three new individuals are created. – ˜ P — draw n iid numbers { x , . . . , x n } from D and perturb the first n elements of P : ( ˜ P ) i = ( P ) i + x i , i = 1 , . . . , n , and ( ˜ P ) i = ( P ) i for i > n . – ˜ P — draw N − n iid numbers { y n +1 , . . . , y N } from D and perturbthe last N − n elements of P : ( ˜ P ) i = ( P ) i + y i , i = n + 1 , . . . , N ,and ( ˜ P ) i = ( P ) i for i ≤ n . – ˜ P — draw | n − n | iid numbers { z , . . . , z | n − n | } from D andperturb elements n ∗ = 1 + min( n , n ) through n ∗ = max( n , n ):( ˜ P ) i = ( P ) i + z i − n ∗ +1 , i = n ∗ , . . . , n ∗ , and ( ˜ P ) i = ( P ) i for i n ∗ , . . . , n ∗ } . • Crossover : Pick two individuals P, Q from P at random (independentand uniformly). Then pick two numbers n , n from { , , . . . , N } (inde-pendent and uniformly). Two new individuals are created. – ˜ P — Replace the n th element of P with the n th element of Q andperturb it (additively) with a sample of D . – ˜ P — Replace the n th element of Q with the n th element of P andperturb it (additively) with a sample of D .At each step, the application of either crossover or mutation is chosen withequal probability. The new individuals are appended to P and P 7→ P ′ = F D ( P ) ∈ ( R N ) n is constructed by choosing the 100 P i ’s in ˜ P which yield thesmallest values of H ( P ). The algorithm produces a sequence of populations P , P , . . . , P k , . . . in ( R N ) n , P k +1 = F D ( P ), n = 100, and halts, with haltingtime recorded, for a given ǫ , when min P ∈P k H ( P ) − inf P ∈ R N H ( P ) < ǫ .The histograms for the fluctuations τ ǫ,N,A,E , with A = Genetic are given inFigure 7, for two choices of V , V ( x ) = x and V ( x ) = x − x , and differentchoices of E ≃ D . For V ( x ) = x inf P ∈ R N H ( P ) is known explicitly, and for V ( x ) = x − x , inf P ∈ R N H ( P ) is approximated by a long run of the geneticalgorithm. As before, two-component universality is evident. After mutation we have ˜ P = P ∪ { ˜ P , ˜ P , ˜ P } and after crossover, ˜ P = P ∪ { ˜ P , ˜ P } F r e qu e n c y Dimension " ! F r e qu e n c y Dimension " ! F r e qu e n c y Dimension " ! F r e qu e n c y Dimension " Figure 7: The observation of two-component universality for τ ǫ,N,A,E when A = Genetic, ǫ = 10 − and E ≃ D where D is chosen to be either uniform on[ − / (10 N ) , / (10 N )] or taking values ± / (10 N ) with equal probability. Thetop row is created with the choice V ( x ) = x and the bottom row with V ( x ) = x − x . Each of the plots in the left column displays two histograms, one foreach choice of D when N = 10. The right column displays the same informationfor N = 40. All histograms are produced with 16 , 000 samples. It is evidentthat the histograms collapse onto a universal curve, one for each V .12 .7 Curie–Weiss Model In the seventh and final set of computations, the authors pick up on a commonnotion in neuroscience that the human brain is a computer with software andhardware. If this is indeed so, then one may speculate that two-componentuniversality should certainly be present in some cognitive actions. Indeed, sucha phenomenon is in evidence in the recent experiments of Bakhtin and Correll[13]. In [13], data from experiments with 45 human participants was analyzed.The participants are shown 200 pairs of images. The images in each pair consistof nine black disks of variable size. The disks in the images within each pair haveapproximately the same area so that there is no a priori bias. The participantsare then asked to decide which of the two images covers a larger (black) areaand the time T required to make a decision is recorded. For each participant,the decision times for the 200 pairs are collected and the fluctuation histogram is tabulated. The experimental results are in good agreement with a dynamicalCurie-Weiss model frequently used in describing decision processes [14]. Aseach of the 45 participants operates, presumably, in his’r own stochastic neuralenvironment, this is a remarkable demonstration of two-component universalityin cognitive action.At its essence the Curie–Weiss model is Glauber dynamics on the hypercube {− , } N with a microscopic approximation of a drift-diffusion process. Con-sider N variables { X i ( t ) } Ni =1 , X i ( t ) ∈ {− , } . The state of the system at time t is X ( t ) = ( X ( t ) , X ( t ) , . . . , X N ( t )). The transition probabilities are giventhrough the expressions P ( X i ( t + ∆ t ) = X i ( t ) | X ( t ) = x ) = c i ( x )∆ t + o (∆ t ) , where c i ( x ) is the spin flip intensity. The observable considered is M ( X ( t )) = N P Ni =1 X i ( t ) ∈ [ − , M ( X (0)) = 0, a state with no a priori bias, as in the case of the experimentalsetup. The halting (or decision) time for this model is k = inf { t : | M ( X ( t )) | ≥ ǫ } , the time at which the system makes a decision. Here ǫ ∈ (0 , 1) may not besmall.This model is simulated by first sampling an exponential random variablewith mean λ ( t ) = ( P i c i ( X ( t ))) − to find the time ∆ t at which the systemchanges state. Sampling the random variable Y , P ( Y = i ) = c i ( X ( t )) λ ( t ), i = 1 , , . . . , N produces an integer j , determining which spin flipped. Define X i ( t + s ) ≡ X i ( t ) if s ∈ [0 , ∆ t ) for i = 1 , , . . . , N and X i ( t + ∆ t ) ≡ X i ( t ), X j ( t + ∆ t ) ≡ − X j ( t ) for i = j . This procedure is repeated with t replaced by t + ∆ t to evolve the system.Central to the application of the model is the assumption on the statisticsof the spin flip intensity c i ( x ). If one changes the basic statistics of the c i ’s,will the limiting histograms for the fluctuations of k be affected as N becomeslarge? In response to this question the authors consider the following choices In [13] the authors do not display the histogram for the fluctuations directly, but suchinformation is easily inferred from their figures (see Figure 6 in [13]). ! F r e qu e n c y Dimension " ! ! Halting Time Fluctuations F r e qu e n c y Dimension " Figure 8: The observation of two-component universality for τ ǫ,N,A,E when A = Curie–Weiss, E ≃ o i , u i , v i , ǫ = . β = 1 . 3. The left figure displaysthree histograms, one for each choice of E when N = 50. The right figuredisplays the same information for N = 200. All histograms are produced with16 , 000 samples. The histogram for E = o i corresponds to the case studied in[13, 14]. It is clear from these computations that the fluctuations collapse onto the universal curve for E = o i . Thus, reasonable changes in the spin flipintensity do not appear to change the limiting histogram. This indicates whythe specific choice made in [13] of E = o i is perhaps enough to capture thebehavior of many individuals.for E ≃ c i ( x ) ( β = 1 . c i ( x ) = o i ( x ) = e − βx i M ( x ) (the case studied in [13]), c i ( x ) = u i ( x ) = e − βx i ( M ( x ) − M ( x ) / , or c i ( x ) = v i ( x ) = e − βx i ( M ( x )+ M ( x )) .The resulting histograms for the fluctuations τ ǫ,N,A,E of T are given in Figure 8.Once again, two-component universality is evident. Thus the universality inthe decision process models mirrors the universality observed among the 45participants in the experiment of Bakhtin and Correll. Two distinct themes are combined in this work: (1) the notion of universality inrandom matrix theory and statistical physics; (2) the use of random ensemblesin scientific computing. The origin of both these ideas dates to the 1950s in thework of (1) Wigner [7, 15], and (2) von Neumann and Goldstine [16]. There hasbeen considerable progress in the rigorous understanding of universality in ran-dom matrix theory (see e.g. [17, 18] and the references therein). In contrast, theperformance of numerical algorithms on random ensembles is less understood,though results in this area include probabilistic bounds for condition numbersand halting times for numerical algorithms [19, 20, 21].The work presented here reveals empirical evidence for two-component uni-14ersality in several numerical algorithms. The results of [2] and Sections 1.1-1.5reveal universal fluctuations of halting times for iterative algorithms in numer-ical linear algebra on random matrix ensembles with both dependent and in-dependent entries. In each instance, the process of numerical computation ona random matrix may be viewed as the evolution of a random ensemble bya deterministic dynamical system. In a similar light, the algorithms of Sec-tion 1.6 and 1.7 may be seen as stochastic dynamical systems with that inSection 1.7 having a close connection with neural computation. In all theseexamples, the empirical observations presented here suggest new universal phe-nomena in non-equilibrium statistical mechanics. The results of Section 1.4 and1.5 reveal that numerical computations with a structured ensemble with somerandom components may have the same statistics (modulo two-components)as an unstructured ensemble. This brings to mind the situation in the 1950swhen Wigner introduced random matrices as a model for scattering resonancesof neutrons off heavy nuclei: the neutron-nucleus system has a well-defined andstructured Hamiltonian, but nevertheless the resonances for neutron scatteringare well-described statistically by the eigenvalues of an (unstructured) randommatrix. Materials All algorithms discussed here are implemented in Mathematica . A package isavailable for download [1] that contains all relevant data and the code to generatethis data. The package supports parallel evaluation for most algorithms andruns easily on personal computers. A.1 Gaussian Ensembles The Gaussian Orthogonal Ensemble (GOE) is given by ( X + X T ) / √ n where X is an n × n matrix of standard iid Gaussian variables. The Gaussian UnitaryEnsemble (GUE) is given by ( X + X ∗ ) / √ n where X is an n × n matrix ofstandard iid complex Gaussian variables. A.2 Bernoulli Ensemble The Bernoulli Ensemble (BE) is given by an n × n matrix X consisting ofiid random variables that take the values ± / √ n with equal probability subjectonly to the constraint X T = X . A.3 Positive Definite Ensembles The critically-scaled Laguerre Orthogonal Ensemble (cLOE) is given by W = XX T /m where X is an n × m matrix with standard iid Gaussian entries. Thecritically-scaled positive definite Bernoulli ensemble (cPBE) is given by W = XX T /m where X is an n × m matrix consisting of iid Bernoulli variables taking15he values ± m = n + 2 ⌊√ n ⌋ . A.4 Shifted Ensembles The critically-scaled shifted Bernoulli Ensemble (cSBE) is given by I + X/ √ n where X is an n × n matrix consisting of iid Bernoulli variables taking thevalues ± I + X/ √ n where X is an n × n matrix of standard iid Gaussianvariables. With this scaling P ( |k X/ √ n k − | > ǫ ) → n → ∞ [22]. A.5 Unitary Ensembles The Quartic Unitary Ensemble (QUE) is a complex, unitary ensemble withprobability distribution proportional to e − n tr M dM . The Cosh Unitary Ensem-ble (COSH) has its distribution proportional to e − tr cosh M dM . A.6 Dirichlet Ensembles We consider the numerical solution of the equation ∆ u = 0 in Ω and u = f on ∂ Ω. Here we let Ω be the star-shaped region interior to the curve ( x, y ) =( r ( θ ) cos( θ ) , r ( θ ) sin( θ )) where r ( θ ) for 0 ≤ θ < π is given by r ( θ ) = 1 + P mj =1 ( X j cos( jθ ) + Y j sin( jθ )) , and X j and Y j are iid random variables on[ − / (2 m ) , / (2 m )]. The boundary integral equation πu ( P ) − Z ∂ Ω u ( P ) ∂∂n Q log | P − Q | dS Q = − f ( P ) , P ∈ ∂ Ω , is solved by discretizing in θ with n points and applying the trapezoidal rulewith n = 2 m (see [23]). For the Bernoulli Dirichlet Ensemble (BDE), X m and Y m are Bernoulli variables taking values ± / (2 m ) with equal probability. Forthe Uniform Dirichlet Ensemble (UDE), X m and Y m are uniform variables on[ − / (2 m ) , / (2 m )]. Acknowledgments We acknowledge the partial support of the National Science Foundation throughgrants NSF-DMS-130318 (TT), NSF-DMS-1300965 (PD) and NSF-DMS-07-48482 (GM) and the Australian Research Council through the Discovery EarlyCareer Research Award (SO). Any opinions, findings, and conclusions or rec-ommendations expressed in this material are those of the authors and do notnecessarily reflect the views of the funding sources.16 eferences [1] Trogdon T (2014) Numerical Universality:https://bitbucket.org/trogdon/numericaluniversality.[2] Pfrang C, Deift P, Menon G (2012) How long does it take to compute theeigenvalues of a random symmetric matrix? arXiv Prepr. arXiv1203.4635 .[3] Parlett BN (1998) The Symmetric Eigenvalue Problem (SIAM, Philadel-phia, PA), p 398.[4] Golub GH, Loan CFV (2013) Matrix Computations (JHU Press, Baltimore,MD), p 756.[5] Deift P, Nanda T, Tomei C (1983) Ordinary differential equations and thesymmetric eigenvalue problem. SIAM J. Numer. Anal. SIAM J. Matrix Anal. Appl. Random Matrices (Academic Press, New York, NY), p688.[8] Olver S, Rao NR, Trogdon T (2014) Sampling unitary invariant ensembles. arXiv Prepr. arXiv1404.0071 .[9] Li XH, Menon G (2013) Numerical solution of Dyson Brownian motion anda sampling scheme for invariant matrix ensembles. J. Stat. Phys. Iterative methods for sparse linear systems (SIAM, Philadel-phia, PA), p 528.[11] Saad Y, Schultz MH (1986) GMRES: a generalized minimal residual algo-rithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. Logarithmic Potentials with External Fields (Springer, New York, NY), p 509.[13] Bakhtin Y, Correll J (2012) A neural computation model for decision-making times. J. Math. Psychol. Ann. Henri Poincar´e Math. Proc. Cambridge Philos. Soc. Proc. AMS Random matrix theory: invariant ensembles anduniversality , Courant Lecture Notes in Mathematics (Amer. Math. Soc.,Providence, RI) Vol. 18, p 217.[18] Erd˝os L, Yau H (2012) Universality of local spectral statistics of randommatrices. Bull. Am. Math. Soc. Math. Comput. SIAM J. Matrix Anal. Appl. Bull. Am. Math.Soc. Ann.Probab.