[PDF] Universality in Numerical Computations with Random Data. Case Studies

Abstract

The authors present evidence for universality in numerical computations with random data. Given a (possibly stochastic) numerical algorithm with random input data, the time (or number of iterations) to convergence (within a given tolerance) is a random variable, called the halting time. Two-component universality is observed for the fluctuations of the halting time, i.e., the histogram for the halting times, centered by the sample average and scaled by the sample variance, collapses to a universal curve, independent of the input data distribution, as the dimension increases. Thus, up to two components, the sample average and the sample variance, the statistics for the halting time are universally prescribed. The case studies include six standard numerical algorithms, as well as a model of neural computation and decision making. A link to relevant software is provided in for the reader who would like to do computations of his'r own.

Full PDF

aa r X i v : . [ m a t h . NA ] J u l Universality in Numerical Computations withRandom Data. Case Studies.

Percy Deift ∗ , Govind Menon † , Sheehan Olver ‡ and Thomas Trogdon ∗ June 28, 2018

Abstract

The authors present evidence for universality in numerical computa-tions with random data. Given a (possibly stochastic) numerical algo-rithm with random input data, the time (or number of iterations) toconvergence (within a given tolerance) is a random variable, called thehalting time. Two-component universality is observed for the ﬂuctuationsof the halting time, i.e. , the histogram for the halting times, centered bythe sample average and scaled by the sample variance (see Eqs. 1 and2 below), collapses to a universal curve, independent of the input datadistribution, as the dimension increases. Thus, up to two components,the sample average and the sample variance, the statistics for the haltingtime are universally prescribed. The case studies include six standard nu-merical algorithms, as well as a model of neural computation and decisionmaking. A link to relevant software is provided in [1] for the reader whowould like to do computations of his’r own.

In earlier work [2], two of the authors (P.D. and G.M., together with C. Pfrang)considered the problem of computing the eigenvalues of a real, n × n randomsymmetric matrix M = ( M ij ). They considered matrices chosen from diﬀerentensembles E using a variety of diﬀerent algorithms A . Let S n denote the spaceof real, n × n symmetric matrices. Standard eigenvalue algorithms involve itera-tions of isospectral maps ϕ = ϕ A : S n → S n , spec( ϕ A ( M )) = spec( M ) for M ∈ S n . If M ∈ S n is given, one considers the sequence of matrices M k +1 = ϕ ( M k ), k ≥

0, with M = M . Clearly, spec( M k +1 ) = spec( M k ) = · · · = spec( M ), andunder appropriate conditions M k = ϕ ( k ) A ( M ) converges to a diagonal matrix,Λ = diag( λ , . . . , λ n ). Necessarily, the λ i ’s are the desired eigenvalues of M .In [2], the authors discovered the following phenomenon: For a given accu-racy ǫ , a given matrix size n ( ǫ small, n large, in an appropriate scaling range)and a given algorithm A , the ﬂuctuations in the time to compute the eigenval-ues to accuracy ǫ with the given algorithm A , were universal , independent of ∗ Courant Institute, New York, NY, USA † Brown University, Providence, RI, USA ‡ The University of Sydney, Sydney, NSW, Australia E . More precisely, they considered ﬂuctuations in the deﬂation time T (The notion of deﬂation time is generalized to the notion of halting time in subsequent calculations). Recall that if an n × n matrix hasblock form M = (cid:18) M M M M (cid:19) where M is k × k and M is ( n − k ) × ( n − k ) for some 1 ≤ k ≤ n − M = diag( M , M ) is obtained from M by deﬂation . If k M k = k M k ≤ ǫ , then the eigenvalues { λ i } of M diﬀerfrom the eigenvalues { ˆ λ i } of ˆ M by O ( ǫ ). Let T = T ǫ,n,A,E ( M ) be the time (= ϕ A ) it takes to deﬂate a random matrix M , chosenfrom an ensemble E , to order ǫ , using algorithm A , i.e. T is the smallest timesuch that for some k , 1 ≤ k ≤ n − k ( ϕ ( T ) A ( M )) k = k ( ϕ ( T ) A ( M )) k ≤ ǫ .As explained in [2], T is a useful measure of the time required to compute theeigenvalues of M : Generically, at worst O ( n ) deﬂations are needed to computethe eigenvalues of M , and at best, O (log n ). The ﬂuctuations τ ǫ,n,A,E ( M ) of T are deﬁned by τ ǫ,n,A,E ( M ) = T ǫ,n,A,E ( M ) − h T ǫ,n,A,E i σ ǫ,n,A,E , (1)where h T ǫ,n,A,E i is the sample average of T ǫ,n,A,E ( M ) taken over matrices M from E , and σ ǫ,n,A,E is the sample variance. For a given E , a typical sample sizein [2] was of order 5 ,

000 to 10 ,

000 matrices M , and the output of the calculationsin [2] was recorded in the form of a histogram for τ ǫ,n,A,E .Most of the calculations in [2] concerned three eigenvalue algorithms: the QRalgorithm, the QR algorithm with shifts (the version of QR used in practice), andthe Toda algorithm. The QR algorithm is based on the factorization of a(n in-vertible) matrix M as M = QR , where Q is orthogonal and R is upper-triangularwith R ii >

0. Given M ∈ S n , with M = QR , M ′ = ϕ A ( M ) = ϕ QR ( M ) ≡ RQ .Clearly, M ′ = Q T M Q ∈ S n and spec( M ′ ) = spec( M ). Practical implemen-tation of the QR algorithm requires the use of a shift, i.e. the QR algorithmwith shifts [3]. As shown in [2], shifting does not aﬀect universality. The

Todaalgorithm involves the solution M ( t ) of the Toda equation dMdt = [ B ( M ) , M ] = B ( M ) M − M B ( M ), where B ( M ) = M + − M T + , M + is the upper triangularpart of M , and M ( t = 0) = M . For all t >

0, spec( M ( t )) = spec( M ), andas t → ∞ , we again have M ( t ) → Λ = diag( λ , . . . , λ n ) where { λ i } are theeigenvalues of M . For the convenience of the reader, in Figure 1, we reproduce,in particular, histograms for τ ǫ,n,A,E , from [2] for the QR algorithm ( A = QR)with two diﬀerent ensembles and varying values of n and ǫ .From Figure 1, we see that eigenvalue computation with the QR algorithmexhibits two-component universality, i.e. , the ﬂuctuations τ ǫ,n,A,E obey a uni-versal law for all ensembles E under consideration. The same is true for allthree algorithms considered in [2]: The laws are diﬀerent, however, for diﬀerentalgorithms A . 2 . . . . . . . . Normalized Deflation Time F r equen cy − . . . . . . . . Normalized Deflation Time F r equen cy − . . . . . . . . Normalized Deflation Time F r equen cy − . . . . . . . . Normalized Deflation Time F r equen cy Figure 1: The observation of two-component universality for τ ǫ,n,A,E when A = QR. This ﬁgure is taken from [2]. Overlayed histograms demonstratethe collapse of the histogram of τ ǫ,n,A,E to a single curve. See the Appendixfor the deﬁnitions of our choices for E . In the top-left ﬁgure, E = GOE, and40 histograms for τ ǫ,ǫ,A,E , are plotted one on top of the other for ǫ = 10 − k , k = 2 , , , n = 10 , , . . . , ≈ , E = BE . In the lower ﬁgure, all 40 + 40 histogramsare overlayed and universality is evident: the data appears to follow a universallaw for the ﬂuctuations. 3n the current paper, the work in [2] has been extended in various ways asfollows. All matrix ensembles are described in the Appendix. In the ﬁrst set of computations, the authors consider the eigenvalue problem forrandom matrices M ∈ S n using the Jacobi algorithm (see, e.g. [4]): for M ∈ S n ,choose i < j such that | M ij | ≥ max ≤ i ′

000 samples. We see two-component universality emerge for n suﬃciently large: the histograms follow a universal (independent of E ) law.to sample the matrices. A novel technique for sampling such unitary ensembleswas introduced recently [8] by two of the authors, S.O. and T.T., together withN. R. Rao, taking advantage of the representation of the eigenvalues of M as adeterminantal point process whose kernel is given in terms of orthogonal poly-nomials (see also [9]). Using this sampling technique, the authors of the presentpaper have considered the QR algorithm for various unitary ensembles . His-tograms for the halting (= deﬂation) time ﬂuctuations τ ǫ,n,A,E , A = QR , aregiven in Figure 3 and again two-component universality is evident. In a third set of computations in this paper, the authors start to address thequestion of whether two-component universality is just a feature of eigenvaluecomputation, or is present more generally in numerical computation. In partic-ular, the authors consider the solution of the linear system of equations

W x = b where W is real and positive deﬁnite, using the conjugate gradient (CG) method.The method is iterative (see e.g. [10] and also Remark 1 below) and at iter-ation k of the algorithm an approximate solution x k of W x = b is found andthe residual r k = W x k − b is computed. For any given ǫ >

0, the method ishalted when k r k k < ǫ , and the halting time k ǫ ( W, b ) recorded. The authors Here M = QR where Q is unitary and again R is upper triangular with R ii > The notation k · k is used to denote the standard ℓ norm on n -dimensional Euclideanspace F r e qu e n c y Matrix size " ! F r e qu e n c y Matrix size " Figure 3: The observation of two-component universality for τ ǫ,n,A,E when A =QR, E = QUE , COSH , GUE and ǫ = 10 − . Here we are using deﬂation time( = halting time), as in [2]. The left ﬁgure displays three histograms, one eachfor GUE , COSH and QUE, when n = 70. The right ﬁgure displays the sameinformation for n = 150. All histograms are produced with 16 ,

000 samples.Two-component universality emerges for n suﬃciently large: the histogramsfollow a universal (independent of E ) law. This is surprising because COSHand QUE have eigenvalue distributions that diﬀer signiﬁcantly from GUE inthat they do not follow the so-called semi-circle law . These histograms appearto collapse to the same curve in Figure 1. This is a further surprise, given thewell-known fact that Orthogonal and Unitary Ensembles give rise to diﬀerent(eigenvalue) universality classes. 6 ! F r e qu e n c y Matrix size " ! ! F r e qu e n c y Matrix size " Figure 4: The observation of two-component universality for τ ǫ,n,A,E when A = CG and E = cLOE , cPBE with ǫ = 10 − . The left ﬁgure displaystwo histograms, one for cLOE and cPBE, when n = 100. The right ﬁguredisplays the same information for n = 500. All histograms are produced with16 ,

000 samples. Two-component universality is evident for n suﬃciently large:the histograms follow a universal (independent of E ) law. The critical scaling(see Appendix A.3) has signiﬁcant impact on the distribution of the conditionnumber and forces h τ ǫ,n,A,E i ≈ nα , α <

1. If the scaling m = 2 n is chosenin the ensemble E then the CG method converges too quickly and the haltingtime tends to take only 10-15 diﬀerent values for each value of m . No interestinglimiting statistics are present. Conversely, if m = n the CG method convergesslowly ( h k ǫ,m,A,E i ≫ m ) and rounding errors dominate the computation. Ex-periments do not indicate two-component universality if m = 2 n or m = n . Thescaling m = n + 2 ⌊√ n ⌋ identiﬁes a critical scaling region. Within this scalingregion, we see two-component universality emerge for n suﬃciently large: thehistograms follow a universal (independent of E ) law.consider n × n matrices W chosen from two diﬀerent positive deﬁnite ensembles E (see Appendix A.3) and vectors b = ( b j ) chosen independently with iid en-tries { b j } . Given ǫ (small) and n (large), and ( W, b ) ∈ E , the authors recordthe halting time k ǫ,n,A,E , A = CG, and compute the ﬂuctuations τ ǫ,n,A,E ( W, b ).The histograms for τ ǫ,n,A,E are given in Figure 4, and again, two-componentuniversality is evident. In a fourth set of computations, the authors again consider the solution of

W x = b but here W has the form I + X and X ≡ X n is a random, real non-symmetric matrix and b = ( b j ) is independent with uniform iid entries { b j } . As W = I + X is (almost surely) no longer positive deﬁnite the conjugate gradient7 ! F r e qu e n c y Matrix size " ! ! F r e qu e n c y Matrix size " Figure 5: The observation of two-component universality for τ ǫ,n,A,E when A = GMRES, E = cSGE , cSBE and ǫ = 10 − . The left ﬁgure displays twohistograms, one for cSGE and one for cSBE, when n = 100. The right ﬁguredisplays the same information for n = 500. All histograms are produced with16 ,

000 samples. The critically scaled ensembles cSBE and cSGE are of the form I + X n with k X n k ≈

2. If the matrix is too close to the identity, the haltingtime will take almost constant values, i.e. k ǫ,n,A,E = 8, independent of n . Ifthe matrix is too far from the identity, the fact that it is unstructured makesGMRES perform poorly and the algorithm typically completes in n steps, themaximum possible number of iterations (see Remark 1 below). With the properscaling of X , we see two-component universality emerge for n suﬃciently large:the histograms follow a universal (independent of E ) law.algorithm breaks down, and the authors solve ( I + X ) x = b using the GeneralizedMinimal Residual (GMRES) algorithm [11]. Again, the algorithm is iterativeand at iteration k of the algorithm an approximate solution x k of ( I + X ) x = b is found and the residual r k = ( I + X ) x k − b is computed. As before, for anygiven ǫ >

0, the method is halted when k r k k < ǫ and k ǫ,n,A,E ( X, b ) is recorded.As in the conjugate gradient problem (Section 1.3), the authors compute thehistograms for the ﬂuctuations of the halting time τ ǫ,n,A,E (2) for two ensembles E , where now A = GMRES. The results are given in Figure 5, where againtwo-component universality is evident. Remark 1.

The computations in Sections 1.3 and 1.4 are particularly reveal-ing for the following reason. Both the CG and GMRES algorithms proceed bygenerating approximations x n to the solution in progressively larger subspaces V k of R n , x k ∈ V k , dim V k = k (almost surely). These algorithms terminatein at most n steps, in the absence of rounding errors. If the matrix W in thecase of CG, or I + X in the case of GMRES, is too close to the identity, thenthe algorithm will converge in O (1) steps, essentially independent of n . On theother hand, if W or I + X is too far from the identity, the algorithm will con- erge only after n steps (GMRES) or be dominated by rounding errors (CG).Thus in both cases there are no meaningful statistics. What the calculations inSections 1.3 and 1.4 reveal is that if the ensembles for CG and GMRES are suchthat the matrices W and I + X , respectively, are typically not too close to, andnot too far, from the identity, then the algorithms exhibit signiﬁcant statisticalﬂuctuations, and two-component universality is immediately evident. (for morediscussion see the captions for Figures 4 and 5). Analogous considerations applyin Section 1.5 below. In a ﬁfth set of computations, the authors raise the issue of whether two-component universality is just a feature of ﬁnite-dimensional computation, oris also present in problems which are intrinsically inﬁnite dimensional. In par-ticular, is the universality present in numerical computations for PDEs? As acase study, the authors consider the numerical solution of the Dirichlet problem∆ u = 0 in a star-shaped region Ω ⊂ R with u = f on ∂ Ω. The boundary isdescribed by a periodic function of the angle θ , r = r ( θ ), and similarly f = f ( θ ),0 ≤ θ ≤ π . Two ensembles, BDE and UDE (as described in Appendix A.6),are derived from a discretization of the problem with speciﬁc choices for r , de-ﬁned by a random Fourier series. The boundary condition f is chosen randomlyby letting { f ( πjn ) } n − j =0 be iid uniform on [ − , τ ǫ,n,A,E from these computations are given in Figure 6 and again, two-component universality is evident. What is surprising, and quite remarkable,about these computations is that the histograms for τ ǫ, ,A,E in this case arethe same as the histograms for τ ǫ, ,A,E in Figure 5 (see Figure 6 for the over-layed histograms). In other words, UDE and BDE are structured with randomcomponents, whereas cSGE and cSBE have no structure, yet they produce thesame statistics (modulo two components). In all the computations discussed so far, the randomness in the computa-tions resides in the initial data. In the sixth set of computations, the au-thors consider an algorithm which is intrinsically stochastic. They considera genetic algorithm to compute Fekete points (see [12, p. 142]). Such points P ∗ = ( P ∗ , P ∗ , . . . , P ∗ N ) ∈ R N are the global minimizers of the objective func-tion H ( P ) = 2 N ( N − X ≤ i = j ≤ N log | P i − P j | − + 1 N N X i =1 V ( P i )for real-valued functions V = V ( x ) which grow suﬃciently rapidly as | x | → ∞ .It is well-known (see, e.g. [12]) that as N → ∞ , the counting measures δ P ∗ = Aside from round-oﬀ errors, see comments below Figure 4 ! F r e qu e n c y Matrix size " ! ! F r e qu e n c y Matrix size " ! ! F r e qu e n c y Matrix size " Figure 6: The observation of two-component universality for τ ǫ,n,A,E when A = GMRES, E = UDE , BDE and ǫ = 10 − . The left ﬁgure displays twohistograms, one for UDE and one for BDE, when n = 100. The right ﬁguredisplays the same information for n = 500. The bottom ﬁgure consists of fourhistograms, two taken from Figure 5 ( E = cSGE , cSBE) and two taken fromthe right ﬁgure above ( E = UDE , BDE). All histograms are produced with16 ,

000 samples. It is interesting to note two properties. First, as we observefrom our computations, BDE and UDE are of the form I + X n where X n hasa norm that grows proportional to some fractional power of n . While this typeof growth in the case of Section 1.4 (Figure 5) would cause GMRES to takeits maximum possible number of iterations, that is k = n , nevertheless, in thecontext of Section 1.5, non-trivial statistics emerge. In light of Remark 1, weconjecture that structure is necessary for GMRES to perform well when the per-turbation of the identity has an unbounded spectral radius in the large n limit.The second, and most important feature, is that two-component universalityfor matrices of the form I + X n persists as the computations are moved fromstructured randomness (UDE and BDE) to unstructured randomness (cSBEand cSGE): the histograms follow a universal (independent of E ) law.10 N P Ni =1 δ P ∗ i converge to the so-called equilibrium measure µ V which plays akey role in the asymptotic theory of the orthogonal polynomials generated bymeasure e − NV ( x ) dx on R . Genetic algorithms involve two basic components ,“mutation” and “crossover”. The authors implement the genetic algorithm inthe following way. The Algorithm

Fix a distribution D on R . Draw an initial population P = P = { P i } ni =1 consisting of n = 100 vectors in R N , N large, with elements thatare iid uniform on [ − , F D ( P ) : ( R N ) n → ( R N ) n is deﬁnedby one of the following two procedures: • Mutation : Pick one individual P ∈ P at random (uniformly). Thenpick two integers n , n from { , , . . . , N } at random (uniformly andindependent). Three new individuals are created. – ˜ P — draw n iid numbers { x , . . . , x n } from D and perturb the ﬁrst n elements of P : ( ˜ P ) i = ( P ) i + x i , i = 1 , . . . , n , and ( ˜ P ) i = ( P ) i for i > n . – ˜ P — draw N − n iid numbers { y n +1 , . . . , y N } from D and perturbthe last N − n elements of P : ( ˜ P ) i = ( P ) i + y i , i = n + 1 , . . . , N ,and ( ˜ P ) i = ( P ) i for i ≤ n . – ˜ P — draw | n − n | iid numbers { z , . . . , z | n − n | } from D andperturb elements n ∗ = 1 + min( n , n ) through n ∗ = max( n , n ):( ˜ P ) i = ( P ) i + z i − n ∗ +1 , i = n ∗ , . . . , n ∗ , and ( ˜ P ) i = ( P ) i for i n ∗ , . . . , n ∗ } . • Crossover : Pick two individuals

P, Q from P at random (independentand uniformly). Then pick two numbers n , n from { , , . . . , N } (inde-pendent and uniformly). Two new individuals are created. – ˜ P — Replace the n th element of P with the n th element of Q andperturb it (additively) with a sample of D . – ˜ P — Replace the n th element of Q with the n th element of P andperturb it (additively) with a sample of D .At each step, the application of either crossover or mutation is chosen withequal probability. The new individuals are appended to P and P 7→ P ′ = F D ( P ) ∈ ( R N ) n is constructed by choosing the 100 P i ’s in ˜ P which yield thesmallest values of H ( P ). The algorithm produces a sequence of populations P , P , . . . , P k , . . . in ( R N ) n , P k +1 = F D ( P ), n = 100, and halts, with haltingtime recorded, for a given ǫ , when min P ∈P k H ( P ) − inf P ∈ R N H ( P ) < ǫ .The histograms for the ﬂuctuations τ ǫ,N,A,E , with A = Genetic are given inFigure 7, for two choices of V , V ( x ) = x and V ( x ) = x − x , and diﬀerentchoices of E ≃ D . For V ( x ) = x inf P ∈ R N H ( P ) is known explicitly, and for V ( x ) = x − x , inf P ∈ R N H ( P ) is approximated by a long run of the geneticalgorithm. As before, two-component universality is evident. After mutation we have ˜ P = P ∪ { ˜ P , ˜ P , ˜ P } and after crossover, ˜ P = P ∪ { ˜ P , ˜ P } F r e qu e n c y Dimension " ! F r e qu e n c y Dimension " ! F r e qu e n c y Dimension " ! F r e qu e n c y Dimension " Figure 7: The observation of two-component universality for τ ǫ,N,A,E when A = Genetic, ǫ = 10 − and E ≃ D where D is chosen to be either uniform on[ − / (10 N ) , / (10 N )] or taking values ± / (10 N ) with equal probability. Thetop row is created with the choice V ( x ) = x and the bottom row with V ( x ) = x − x . Each of the plots in the left column displays two histograms, one foreach choice of D when N = 10. The right column displays the same informationfor N = 40. All histograms are produced with 16 ,

000 samples. It is evidentthat the histograms collapse onto a universal curve, one for each V .12 .7 Curie–Weiss Model In the seventh and ﬁnal set of computations, the authors pick up on a commonnotion in neuroscience that the human brain is a computer with software andhardware. If this is indeed so, then one may speculate that two-componentuniversality should certainly be present in some cognitive actions. Indeed, sucha phenomenon is in evidence in the recent experiments of Bakhtin and Correll[13]. In [13], data from experiments with 45 human participants was analyzed.The participants are shown 200 pairs of images. The images in each pair consistof nine black disks of variable size. The disks in the images within each pair haveapproximately the same area so that there is no a priori bias. The participantsare then asked to decide which of the two images covers a larger (black) areaand the time T required to make a decision is recorded. For each participant,the decision times for the 200 pairs are collected and the ﬂuctuation histogram is tabulated. The experimental results are in good agreement with a dynamicalCurie-Weiss model frequently used in describing decision processes [14]. Aseach of the 45 participants operates, presumably, in his’r own stochastic neuralenvironment, this is a remarkable demonstration of two-component universalityin cognitive action.At its essence the Curie–Weiss model is Glauber dynamics on the hypercube {− , } N with a microscopic approximation of a drift-diﬀusion process. Con-sider N variables { X i ( t ) } Ni =1 , X i ( t ) ∈ {− , } . The state of the system at time t is X ( t ) = ( X ( t ) , X ( t ) , . . . , X N ( t )). The transition probabilities are giventhrough the expressions P ( X i ( t + ∆ t ) = X i ( t ) | X ( t ) = x ) = c i ( x )∆ t + o (∆ t ) , where c i ( x ) is the spin ﬂip intensity. The observable considered is M ( X ( t )) = N P Ni =1 X i ( t ) ∈ [ − , M ( X (0)) = 0, a state with no a priori bias, as in the case of the experimentalsetup. The halting (or decision) time for this model is k = inf { t : | M ( X ( t )) | ≥ ǫ } , the time at which the system makes a decision. Here ǫ ∈ (0 ,

1) may not besmall.This model is simulated by ﬁrst sampling an exponential random variablewith mean λ ( t ) = ( P i c i ( X ( t ))) − to ﬁnd the time ∆ t at which the systemchanges state. Sampling the random variable Y , P ( Y = i ) = c i ( X ( t )) λ ( t ), i = 1 , , . . . , N produces an integer j , determining which spin ﬂipped. Deﬁne X i ( t + s ) ≡ X i ( t ) if s ∈ [0 , ∆ t ) for i = 1 , , . . . , N and X i ( t + ∆ t ) ≡ X i ( t ), X j ( t + ∆ t ) ≡ − X j ( t ) for i = j . This procedure is repeated with t replaced by t + ∆ t to evolve the system.Central to the application of the model is the assumption on the statisticsof the spin ﬂip intensity c i ( x ). If one changes the basic statistics of the c i ’s,will the limiting histograms for the ﬂuctuations of k be aﬀected as N becomeslarge? In response to this question the authors consider the following choices In [13] the authors do not display the histogram for the ﬂuctuations directly, but suchinformation is easily inferred from their ﬁgures (see Figure 6 in [13]). ! F r e qu e n c y Dimension " ! ! Halting Time Fluctuations F r e qu e n c y Dimension " Figure 8: The observation of two-component universality for τ ǫ,N,A,E when A = Curie–Weiss, E ≃ o i , u i , v i , ǫ = . β = 1 .

3. The left ﬁgure displaysthree histograms, one for each choice of E when N = 50. The right ﬁguredisplays the same information for N = 200. All histograms are produced with16 ,

000 samples. The histogram for E = o i corresponds to the case studied in[13, 14]. It is clear from these computations that the ﬂuctuations collapse onto the universal curve for E = o i . Thus, reasonable changes in the spin ﬂipintensity do not appear to change the limiting histogram. This indicates whythe speciﬁc choice made in [13] of E = o i is perhaps enough to capture thebehavior of many individuals.for E ≃ c i ( x ) ( β = 1 . c i ( x ) = o i ( x ) = e − βx i M ( x ) (the case studied in [13]), c i ( x ) = u i ( x ) = e − βx i ( M ( x ) − M ( x ) / , or c i ( x ) = v i ( x ) = e − βx i ( M ( x )+ M ( x )) .The resulting histograms for the ﬂuctuations τ ǫ,N,A,E of T are given in Figure 8.Once again, two-component universality is evident. Thus the universality inthe decision process models mirrors the universality observed among the 45participants in the experiment of Bakhtin and Correll. Two distinct themes are combined in this work: (1) the notion of universality inrandom matrix theory and statistical physics; (2) the use of random ensemblesin scientiﬁc computing. The origin of both these ideas dates to the 1950s in thework of (1) Wigner [7, 15], and (2) von Neumann and Goldstine [16]. There hasbeen considerable progress in the rigorous understanding of universality in ran-dom matrix theory (see e.g. [17, 18] and the references therein). In contrast, theperformance of numerical algorithms on random ensembles is less understood,though results in this area include probabilistic bounds for condition numbersand halting times for numerical algorithms [19, 20, 21].The work presented here reveals empirical evidence for two-component uni-14ersality in several numerical algorithms. The results of [2] and Sections 1.1-1.5reveal universal ﬂuctuations of halting times for iterative algorithms in numer-ical linear algebra on random matrix ensembles with both dependent and in-dependent entries. In each instance, the process of numerical computation ona random matrix may be viewed as the evolution of a random ensemble bya deterministic dynamical system. In a similar light, the algorithms of Sec-tion 1.6 and 1.7 may be seen as stochastic dynamical systems with that inSection 1.7 having a close connection with neural computation. In all theseexamples, the empirical observations presented here suggest new universal phe-nomena in non-equilibrium statistical mechanics. The results of Section 1.4 and1.5 reveal that numerical computations with a structured ensemble with somerandom components may have the same statistics (modulo two-components)as an unstructured ensemble. This brings to mind the situation in the 1950swhen Wigner introduced random matrices as a model for scattering resonancesof neutrons oﬀ heavy nuclei: the neutron-nucleus system has a well-deﬁned andstructured Hamiltonian, but nevertheless the resonances for neutron scatteringare well-described statistically by the eigenvalues of an (unstructured) randommatrix.

Materials

All algorithms discussed here are implemented in

Mathematica . A package isavailable for download [1] that contains all relevant data and the code to generatethis data. The package supports parallel evaluation for most algorithms andruns easily on personal computers.

A.1 Gaussian Ensembles

The Gaussian Orthogonal Ensemble (GOE) is given by ( X + X T ) / √ n where X is an n × n matrix of standard iid Gaussian variables. The Gaussian UnitaryEnsemble (GUE) is given by ( X + X ∗ ) / √ n where X is an n × n matrix ofstandard iid complex Gaussian variables. A.2 Bernoulli Ensemble

The Bernoulli Ensemble (BE) is given by an n × n matrix X consisting ofiid random variables that take the values ± / √ n with equal probability subjectonly to the constraint X T = X . A.3 Positive Deﬁnite Ensembles

The critically-scaled Laguerre Orthogonal Ensemble (cLOE) is given by W = XX T /m where X is an n × m matrix with standard iid Gaussian entries. Thecritically-scaled positive deﬁnite Bernoulli ensemble (cPBE) is given by W = XX T /m where X is an n × m matrix consisting of iid Bernoulli variables taking15he values ± m = n + 2 ⌊√ n ⌋ . A.4 Shifted Ensembles

The critically-scaled shifted Bernoulli Ensemble (cSBE) is given by I + X/ √ n where X is an n × n matrix consisting of iid Bernoulli variables taking thevalues ± I + X/ √ n where X is an n × n matrix of standard iid Gaussianvariables. With this scaling P ( |k X/ √ n k − | > ǫ ) → n → ∞ [22]. A.5 Unitary Ensembles

The Quartic Unitary Ensemble (QUE) is a complex, unitary ensemble withprobability distribution proportional to e − n tr M dM . The Cosh Unitary Ensem-ble (COSH) has its distribution proportional to e − tr cosh M dM . A.6 Dirichlet Ensembles

We consider the numerical solution of the equation ∆ u = 0 in Ω and u = f on ∂ Ω. Here we let Ω be the star-shaped region interior to the curve ( x, y ) =( r ( θ ) cos( θ ) , r ( θ ) sin( θ )) where r ( θ ) for 0 ≤ θ < π is given by r ( θ ) = 1 + P mj =1 ( X j cos( jθ ) + Y j sin( jθ )) , and X j and Y j are iid random variables on[ − / (2 m ) , / (2 m )]. The boundary integral equation πu ( P ) − Z ∂ Ω u ( P ) ∂∂n Q log | P − Q | dS Q = − f ( P ) , P ∈ ∂ Ω , is solved by discretizing in θ with n points and applying the trapezoidal rulewith n = 2 m (see [23]). For the Bernoulli Dirichlet Ensemble (BDE), X m and Y m are Bernoulli variables taking values ± / (2 m ) with equal probability. Forthe Uniform Dirichlet Ensemble (UDE), X m and Y m are uniform variables on[ − / (2 m ) , / (2 m )]. Acknowledgments

We acknowledge the partial support of the National Science Foundation throughgrants NSF-DMS-130318 (TT), NSF-DMS-1300965 (PD) and NSF-DMS-07-48482 (GM) and the Australian Research Council through the Discovery EarlyCareer Research Award (SO). Any opinions, ﬁndings, and conclusions or rec-ommendations expressed in this material are those of the authors and do notnecessarily reﬂect the views of the funding sources.16 eferences [1] Trogdon T (2014) Numerical Universality:https://bitbucket.org/trogdon/numericaluniversality.[2] Pfrang C, Deift P, Menon G (2012) How long does it take to compute theeigenvalues of a random symmetric matrix? arXiv Prepr. arXiv1203.4635 .[3] Parlett BN (1998)

The Symmetric Eigenvalue Problem (SIAM, Philadel-phia, PA), p 398.[4] Golub GH, Loan CFV (2013)

Matrix Computations (JHU Press, Baltimore,MD), p 756.[5] Deift P, Nanda T, Tomei C (1983) Ordinary diﬀerential equations and thesymmetric eigenvalue problem.

SIAM J. Numer. Anal.

SIAM J. Matrix Anal. Appl.

Random Matrices (Academic Press, New York, NY), p688.[8] Olver S, Rao NR, Trogdon T (2014) Sampling unitary invariant ensembles. arXiv Prepr. arXiv1404.0071 .[9] Li XH, Menon G (2013) Numerical solution of Dyson Brownian motion anda sampling scheme for invariant matrix ensembles.

J. Stat. Phys.

Iterative methods for sparse linear systems (SIAM, Philadel-phia, PA), p 528.[11] Saad Y, Schultz MH (1986) GMRES: a generalized minimal residual algo-rithm for solving nonsymmetric linear systems.

SIAM J. Sci. Stat. Comput.

Logarithmic Potentials with External Fields (Springer, New York, NY), p 509.[13] Bakhtin Y, Correll J (2012) A neural computation model for decision-making times.

J. Math. Psychol.

Ann. Henri Poincar´e

Math. Proc. Cambridge Philos. Soc.

Proc. AMS

Random matrix theory: invariant ensembles anduniversality , Courant Lecture Notes in Mathematics (Amer. Math. Soc.,Providence, RI) Vol. 18, p 217.[18] Erd˝os L, Yau H (2012) Universality of local spectral statistics of randommatrices.

Bull. Am. Math. Soc.

Math. Comput.

SIAM J. Matrix Anal. Appl.

Bull. Am. Math.Soc.

Ann.Probab.