[PDF] A computing strategy and programs to resolve the Gerstenhaber Problem for commuting triples of matrices

Abstract

We describe a MATLAB program that could produce a negative answer to the Gerstenhaber Problem by the construction of three commuting n×n matrices A,B,C over a field F such that the subalgebra F[A,B,C] they generate has dimension greater than n . This problem has remained open for nearly 60 years, following Gerstenhaber's surprising result (Annals Math.) that dimF[A,B]≤n for any two commuting matrices A,B . The property fails for four or more commuting matrices. We also make the MATLAB files freely available.

Full PDF

AA COMPUTING STRATEGY AND PROGRAMS TORESOLVE THE GERSTENHABER PROBLEM FORCOMMUTING TRIPLES OF MATRICES

JOHN HOLBROOK AND KEVIN C. O’MEARA

Abstract.

We describe a MATLAB program that could produce a negativeanswer to the Gerstenhaber Problem by the construction of three commuting n × n matrices A, B, C over a ﬁeld F such that the subalgebra F [ A, B, C ] theygenerate has dimension greater than n . This problem has remained open fornearly 60 years, following Gerstenhaber’s surprising result (Annals Math.) thatdim F [ A, B ] ≤ n for any two commuting matrices A, B . The property fails forfour or more commuting matrices. We also make the MATLAB ﬁles freelyavailable.

Easily-stated, longstanding problems have always fascinated mathematicians(think of Fermat’s Last Theorem). The Gerstenhaber Problem (GP) for com-muting matrices ﬁts into this category. But to our knowledge, our project is theﬁrst serious attempt to use a computer to provide a negative answer, despite verystrong evidence that this is the case (see, for example, [10], [18]). Recently, thesecond author showed in [18] that the GP is Turing computable, marking a hugereduction in the complexity of the problem. Moreover, he showed it suﬃces to an-swer the GP in the case of the ﬁelds F = Z /p of integers mod p for primes p . Thusthe GP is ideally suited to a computer search. We outline a computing strategythat the authors have developed during the last few years. We will also make the MATLAB ﬁles freely available upon request to the authors using the e-mail

[email protected] so that others can join in this safari. Graduate students with a knowledge ofbasic algebra and

MATLAB could ﬁnd this an attractive project (and possible pathto early fame!). Particularly those with access to fast computers and parallelcomputing.The authors are not experts in computing. One important goal of our paper isto encourage some of the very talented young folk out there, those who really areexperts in computing, to join in. Undoubtedly fresh ideas would then ﬂow. Wechose to write our programs in the programming language

MATLAB , which wasintroduced in the early 1980’s with particular emphasis on eﬃcient handling ofoperations with large arrays (matrices being a special case). Nowadays

MATLAB

Date : 15 June, 2020. a r X i v : . [ m a t h . A C ] J un oes so much more, of course, but it is still generally regarded as the best program-ming language for numerical linear algebra calculations with very large matrices.So many of our subroutines rely heavily on the basic “calculus” of matrices — rowoperations and echelon forms (in our case, essentially for integer matrices, but attimes of the order of 10 , × , p arithmetic does not get aroundthis. Some readers may prefer using a diﬀerent programming language.The GP makes sense over any ﬁeld F . However, the diﬃculties of quicklyproducing commuting triples and accurately calculating dimension depend on F .Even with the rational ﬁeld Q , where the GP can be reduced to integer matricesafter clearing fractions, these calculations are no picnic. It is inherent in the GPthat large powers of matrices (maybe the 50 th power of a 100 ×

100 matrix) needto be considered. But then we quickly face roundoﬀ errors. This will cast seriousdoubts on the accuracy of dimension (which is a highly discontinuous function ofthe matrix entries). Within

MATLAB , where the standard precision is 16 places,one can opt for higher precision by using variable precision arithmetic, vpa, toany speciﬁed degree of accuracy. But this dramatically slows calculations (and weexpect to have to run hundreds of millions of trials to ﬂush out a a counterexamplefor the GP). On the other hand, matrix calculations done mod p (even for a 4-digitprime) are fast and accurate.A crucial tool in our approach is the use of the Weyr canonical form for matrices,which is far better suited to commuting problems than its cousin, the Jordan form.In fact, it would be extremely diﬃcult to formulate our approach in terms ofthe Jordan form (because of a simultaneous triangularization result mentioned inSection 2). In Section 2, we record some of the properties of the Weyr form, whichare then built into our computer programs. Essentially these allow a recursiveapproach, starting with a pair of smaller-sized commuting matrices, to both therapid construction of random commuting triples of large matrices and calculationof the dimension of the subalgebra they generate.It appears that the choice of settings for parameters in our programs, particu-larly the sparseness of the matrix entries, will be an important factor in producingsome dim F [ A, B, C ] > n , a case which the authors informally term a “ Eureka ”.We know Eurekas exist with commuting quadruples of matrices, but surprisinglythey don’t occur that frequently in computer calculations without carefully settingthe parameters. We use such productive settings in the commuting quadruplescase (for the same matrix size n ) as a guide to how best to set them for commutingtriples.Most of our testing to date has been done on just laptops, and working withcommuting triples of matrices of size ranging from 15 ×

15 to 50 ×

50. Thethree sample trials given near the end of the paper indicate the sort of thing This frustration with arithmetic blowouts was a strong motivation to go to mod p calcula-tions, and in turn a factor in leading the second author to the Turing computability of the GP.So in a sense, here a computer formulation led to an important conjecture and later proof. e have done. (Our suggestion to a potential user of our programs, as a ﬁrst stepto understanding our approach, is to re-run these sample trials.) However, theauthors feel that the Loch Ness monster probably lives in deeper water, closer to100 × p arithmetic,it would be a simple matter for readers to change the arithmetic to suit theirneeds (e.g. real or complex arithmetic). With readers in mind who might wish tomodify our computer programs, perhaps using the languages MAGMA or GAP , wehave made the description of the programs given here (and documentation withinthe code) a high priority.1.

A brief history of the Gerstenhaber Problem

The following is just a bare sketch of some of the history behind the GP. Muchfuller accounts can be found in the references at the end of our paper. As forthe authors’ own accounts, Chapter 5 of [20] is devoted to Gerstenhaber’s originaltheorem while Chapter 7 develops from scratch all the algebraic geometry neces-sary to understand how the powerful techniques of that area (irreducible varieties,algebraic geometry dimension etc) have impacted the GP. The paper [10] devotesmany pages to the history and recent developments.In 1961 Gerstenhaber [4] established his (most surprising) theorem thatdim F [ A, B ] ≤ n for all commuting n × n matrices A, B over any ﬁeld F . (Fora single “commuting” matrix A this is an immediate consequence of the Cayley-Hamilton theorem.) His proof used algebraic geometry, but later purely linear-algebraic proofs were also found by Barr´ıa-Halmos [1] and Laﬀey-Lazarus [13].The earlier Holbrook paper [9] may be mentioned in this connection, but it dealsonly with complex matrices. Whether Gerstenhaber’s result extends to three com-muting matrices A, B, C is what is known as the

Gerstenhaber Problem (GP) .That is, must dim F [ A, B, C ] ≤ n for all n × n commuting matrices A, B, C ? Recall F [ A, B, C ] is the smallest(unital) subalgebra of the matrix algebra M n ( F ) of all n × n matrices over F that contains A, B, C . Thus here, F [ A, B, C ] is the space ofall commuting polynomials in

A, B, C (including the scalar matrices cI ). Withoutcommutativity we can easily have F [ A, B ] = M n ( F ), giving dim F [ A, B ] = n , themaximum possible.Historically, two of the most signiﬁcant papers related to Gerstenhaber’s the-orem are the 1955 Motzkin-Taussky paper [14] and the 1992 Robert Guralnickpaper [5]. The former established that the variety C (2 , n ) of commuting pairs of × n matrices is always irreducible. Note this paper predates Gerstenhaber’s pa-per [4] but Motzkin and Taussky in [14] seemed unaware of Gerstenhaber’s resultto come later. (Also Gerstenhaber doesn’t mention [14], although in his paperhe establishes the Motzkin-Taussky irreducibility result.) Guralnick in his papersimpliﬁed the Motzkin-Taussky argument and showed how quickly Gerstenhaber’stheorem follows from the Motzkin-Taussky theorem. Moreover, he showed thatthe variety of commuting triples of matrices need not be irreducible, and there-fore that method of attack could not answer the GP. It also gave weight to thepossibility of a negative answer to the GP.The current state-of-play is there are no examples known where dim F [ A, B, C ] >n . However, it is known that the GP has a positive answer for n = 1 , , . . . , F is of characteristic 0. The proofs are long and intricate, and use a mixof algebraic geometry and matrix canonical forms from linear algebra (see forinstance Klemen ˇSivic’s papers).By an algebraic geometry argument, it is known that, for a ﬁxed matrix size n ,dim F [ A, B, C ] ≤ n over an algebraically closed ﬁeld F (and therefore for subﬁeldsthereof) whenever the variety C (3 , n ) of all commuting triples of n × n matrices isirreducible. This is true in characteristic 0 for n <

12 (Klemen ˇSivic and others)but not when n >

28 (Robert Guralnick [5], Holbrook-Omladiˇc [11]). Whathappens in the integer interval [12, 28] remains open.The linear-algebraic proofs of Gerstenhaber’s theorem proceed inductively onthe matrix size n , and centre on a ﬁxed form spanning set for F [ A, B ] of n wordsin A, B depending only on the Jordan (or Weyr) structure of the ﬁrst matrix A .However, such ﬁxed spanning sets for commuting triples do not exist. See [18] fora discussion of this.Based on the above history (particularly the failure of the Motzkin-Taussky the-orem to extend to commuting triples of matrices, and likewise the linear-algebraictechniques), and their own experiments, the authors ﬁrmly believe the GP willturn out to have a negative answer. The question is whether a speciﬁc counter-example is accessible with current computing techniques.2. The central role played by the Weyr form

By a standard argument in linear algebra (using the generalized eigenspace de-composition), the GP reduces to considering commuting triples

A, B, C of nilpo-tent matrices, that is, some power of each matrix is the zero matrix. The smallestsuch power is called the nilpotent index of the matrix (note by the Cayley-Hamilton theorem, the nilpotent index of an n × n matrix X ∈ M n ( F ) can’t This in no way lessens the credit due to Gerstenhaber for having the mathematical insightthat the dimension bound for commuting pairs might be the same as for a single matrix — theresult is so unexpected and few would have even conjectured it. xceed n , although always X n = 0.) When F is algebraically closed, the nilpotentmatrices are exactly those whose only eigenvalue is 0. It is also well-known that,over an algebraically closed ﬁeld, commuting matrices X , X , . . . , X k can be si-multaneously triangularised, that is, for some invertible matrix P , all the P − X i P are upper triangular (in fact must then be strictly upper triangular if the matricesare nilpotent). But as was shown by O’Meara and Vinsonhaler [19] in 2006 (seealso Theorem 2.3.5 in [20] for an account), we can do much better than this: CRITICAL PROPERTY FOR OUR APPROACH. If A , A , . . . , A k are commut-ing n × n matrices over an algebraically closed ﬁeld F , then there is a similaritytransformation that puts A in Weyr form and simultaneously puts A , . . . , A k inupper triangular form. (The algebraically closed assumption can be dropped if thematrices involved have all their eigenvalues in F, such as with nilpotent matrices.) This fact applied to three commuting matrices

A, B, C is at the heart of our com-puting algorithm. It fails for the Jordan form. Of course, it is enough for thepurposes of the GP to establish whether dim F [ A, B, C ] ≤ n after such a simul-taneous similarity transformation, because the transformed algebra is isomorphicto the original. In summary, and henceforth assumed: The Reduction . The commuting triple

A, B, C takes the form

W, K, L where W is a nilpotent Weyr matrix and K, L are strictly upper triangular .So what is the Weyr form? The form was introduced by the Czech mathemati-cian Eduard Weyr in 1885, but then largely forgotten. It has been periodicallyrediscovered over the years by a handful of people (including O’Meara and Vin-sonhaler), and now appears ﬁnally to be gaining traction. The 2nd edition [12] ofthe Horn and Johnson “

Matrix Analysis ” (2013) covers the Weyr form in Chapter3. The monograph [20] by O’Meara, Clark, and Vinsonhaler (2011) gives a com-prehensive account of the form. We will restrict our brief discussion of the Weyrform to nilpotent matrices.The best view of a nilpotent Weyr matrix W is that it is a blocked matrix formof a basic r × r nilpotent Jordan matrix J =   . n the simplest case where n = dr for some speciﬁed positive integers d, r , theWeyr analogue is the r × r blocked matrix with d × d blocks W =  I I . . . 0 I  . Here 0 and I are the d × d zero matrix and identity matrix respectively. In general,the blocking can be in terms of a speciﬁed partition n = n + n + · · · + n r of n , with n ≥ n ≥ · · · ≥ n r , that determines the sizes of the diagonal blocks.We then call ( n , n , . . . , n r ) the Weyr structure of W . The structure is said tobe homogeneous if n = n = · · · = n r . The ﬁrst superdiagonal blocks in thegeneral case take the form W j,j +1 = (cid:20) I j (cid:21) where I j is the n j +1 × n j +1 identity matrix and 0 is the ( n j − n j +1 ) × n j +1 zeromatrix (equivalently, W j,j +1 is a reduced row-echelon matrix of full column rank).All other blocks are zero. A picture is worth a thousand words: W =   is the 6 × x = (3 , , K that centralizes (commutes with) a nilpotent Weyr matrix W has an especiallynice form, a blocked matrix version of the Toplitz matrices which centralize a ba-sic nilpotent Jordan matrix J . This was ﬁrst observed by Belitskii in 1983 (anaccount in English was given in [2]). We won’t state the description precisely here,just illustrate with the above W of structure (3 , , K must look like K =  a b d g i l c e h j m f k na b g c ha  . he centralizing matrices are then completely determined by their top row ofblocks. We denote the ( i, j ) block of a centralizing matrix K by K ij . Thus in theabove displayed K , its top row blocks are K =  a b d c e f  , K =  g ih j k  , K =  lmn  . The forms of the top row blocks, outside of the homogeneous case, are not arbitrarybut follow a certain staircase template. See [10], [18], or [20] for more details. Thisapproach is important for computing: given the Weyr matrix W , we construct theother pair K, L of strictly upper triangular, commuting matrices by specifying thecentralizing form for their top row of blocks (thereby guaranteeing both commutewith W ), and then solving linear equations on their top row entries to ensure K and L commute with other. Such calculations done with the Jordan form wouldbe extremely diﬃcult (the centralizers don’t take a nice block upper triangularform). Summing Up : In the reduction above to

W, K, L with W in Weyr form, notonly are K, L strictly upper-triangular as ordinary matrices, they also have thenice centralizing form as block matrices relative to the Weyr structure of W . Awin-win !One of the on-screen, blow-by-blow, displays from our main routines is the out-put of so-called leading edge dimensions ( LED) resulting from the constructedcommuting triple

W, K, L . It is not an essential display but one of the most in-teresting, and possibly the best guide to how close we are to getting Eurekas forour particular setting of parameters. (Users who don’t want this or other displayscan easily suppress them.) We won’t go into the deﬁnition of the

LED here (theinterested reader can look at [18], [10], or [20]), but rather illustrate a couple offeatures by way of examples (taken from actual trials).

Example 1.

Suppose we are constructing commuting triples

W, K, L where W has the Weyr structure x = (8 , , , , , , , , , n =8 + 8 + 8 + 4 + 1 + 1 + 1 + 1 + 1 + 1 = 34. Here is an actual display ( s is the seedused for the random number generator in MATLAB ): s = 8146dim = 32 LED = 8 8 8 4 1 1 1 1 1 15 8 10 3 1 1 1 1 1 1In the two-line display for the

LED , the ﬁrst row gives the Weyr structure com-ponents. Beneath each is the corresponding leading edge dimension for that com-ponent. The output dim is dim F [ W, K, L ] provided all steps were completedduring the construction, but dim = 0 indicates that certain linear equations hadno solutions and the process had to stop. Now notice:(1) dim is the sum of the

LED . If two successive Weyr structure components are the same, then the

LED for the second is at least as big as the ﬁrst. (E.g. for the second and thirdcomponents.)

These features hold in general. In fact, the sum of the

LED at any given stageagrees with the dimension of the corresponding algebra of top-left corners (pro-jecting here is an algebra homomorphism because the matrices are block upper-triangular.) So if the program is set up to display in real time (step-by-step), onecan anticipate a Eureka if the process were to complete. This is illustrated in oursecond example. (cid:3)

Example 2.

Consider the output s = 8206dim = 0 LED = 8 8 8 8 2 1 1 16 9 9 0 0 0 0 0Here the sum of the ﬁrst three

LED is 24. And since the 4 th structure componentis the same as the 3 rd , if the 4 th step were to complete, we would have the sum ofthe LED to this point at least 24 + 9 = 33 > F [ π ( W ) , π ( K ) , π ( L )] where π is the projection map onto the top-left4 × (cid:3) In view of the above discussion, it is tempting to arrange that the 1 st LED (=dim( Z /p )[ K , L ]) be large. The biggest this can be is actually the 1 st struc-ture component n (by Gerstenhaber’s theorem for commuting pairs — note W contributes nothing to the dimension of the ﬁrst corner). A nontrivial theoremfor commuting pairs of k × k matrices A, B says that if dim F [ A, B ] = k , thenthe only matrices that centralize both A and B are polynomials in A, B (see [13],[15]). From this it follows that in the homogeneous case (and in trials so far forthe nonhomogeneous cases), if the 1 st LED is the maximum possible (namely n ),then dim F [ W, K, L ] = n . So a Eureka can never occur this way. This illustratesan important principle: don’t get greedy with the st LED . On the other hand, if the1 st LED is small, there may be too much ground to make up to obtain a Eureka.It is a delicate balance. 3.

The three main routines

Here we outline the three main routines in our

MATLAB package. Various sub-routines that the main routines call will be described in Sections 4, 5, 6, 7. Thereis a total of 38 routines in all (plus 3 sample trials) , but all most users will need The number of printed pages of code from our package runs to 63. o do is to run some of the main ones. Examples of this will be given in Section8. Of course, if the user wants to modify some routines, that will require a betterunderstanding of the program code. For those not very familiar with programmingin MATLAB , we recommend the text [8].The three principal routines are

CommTriplesI, CommTriplesII, CommQuads .We look at each in turn. (Note that

Comm is an abbreviation for Commuting.)

CommTriplesI . This has 15 input arguments and 5 output arguments: [EUREKAS,ES,W,K,L] = CommTriplesI(x,p,p0,p1,p2,pc,Min,Max,ICR,CS,TR,a,b,s1,s2)

The most important inputs are the chosen Weyr structure x of the desired nilpotentWeyr matrix W (so x is given as a partition of the chosen matrix size n , which isthe sum of the partition entries, and so need not be separately inputted), theprime p for the mod p arithmetic, and the seed range [s1,s2] . Thus the programruns through each seed s in the integer interval [ s , s ] and attempts to constructa commuting triple A, B, C with A = W . If the process completes (more on thatlater), the routine calculates dim ( Z /p )[ A, B, C ]. If this dimension exceeds thematrix size n , then the seed s that produced it is recorded in the vector ES (of“Eureka Seeds”). The ﬁrst triple for which this happens is recorded in the outputs W,K,L . If no Eureka occurred during the run, all

W,K,L are set to zero, and ES will be the empty array [ ]. On the other hand, the output EUREKAS returns thenumber of Eurekas that occurred (

EUREKAS = 0 indicates there were none.) The inputs p0, p1, p2, pc determine the sparsity of entries at various stages.Thus p0, p1, p2 take values in the real interval [0 ,

1] and determine the probabil-ity of a particular entry being zero. The ﬁrst of these gives sparsity for the ﬁrstblocks B , C of B and C . After that sparsity is set as p1 for the remainingset percentage pc of the blocks B , C , . . . , then at p2 for the rest of the blocks.The reason for this comes from experience with constructing commuting Eurekaquadruples: p1 is set lower so as to produce an early LED that has jumped consid-erably above the corresponding Weyr structure component. Then p2 is set muchhigher so as to improve the chances of the process completing (the linear equationsthat determine this are more likely to have a solution). See the earlier commentson the behaviour of the LED .Unlike

CommTriplesII , the routine

CommTriplesI chooses for the ﬁrst block B of B an n × n nilpotent Weyr matrix, where x = ( n , n , . . . , n r ), subjectto certain constraints. The advantage of this is that then the ﬁrst block C of C has a known form because of the earlier centralizing description, and we can If CommTriplesI is being run continuously over several hours or days, it is useful to displaythe output

EUREKAS periodically, so one can check for success from time to time. hoose such a C randomly (subject to the above constraints). It turns out thatthere is no loss of generality in making this choice of B if x has a homogeneousstructure. But it is also known that this simpliﬁcation is not always possible inthe nonhomogeneous cases, whence B and C should be computed as random,strictly upper triangular, commuting n × n matrices, as CommTriplesII does.This is no problem if n is smallish, say n ≤

10. But it really slows the numberof trials one can do in a given run once say n ≥

15. It is a tradeoﬀ between speedand generality. But since the analogue,

CommQuads , of

CommTriplesI works wellin producing Eurekas for commuting quadruples of matrices, we tend to favourmaking the ﬁrst corner of B a Weyr matrix. The remaining blocks B j , C j arethen chosen from random solutions of certain linear equations.Inputs Min and

Max give, respectively, the minimum and maximum allowable1 st LED . Setting

Min = (0 . n and Max = (0 . n often work well, but in somesituations the restrictions need to be stronger. Experimentation is advised.Input ICR (“Index Corner Restriction”) sets the proportion of n = x (1) (= 1 st corner size) to bound the nilpotent indices of B and C . This imposes a bound( ICR × n + r ) on the nilpotent indices of the ﬁnal constructed B, C . See [18] forthe rationale for this. Setting

ICR = 0.5 often works well.The input CS (“Change Step”) sets the number of trials run for the ﬁxed choiceof 1 st blocks B and C , before changing to a new pair (but keeping the same W ). And input TR “Test Runs” sets the number of trials in an optimizing processto determine the choice of the next random (but subject to the above constraints)pair B , C so as to improve the completion rate for resulting triples A, B, C (byincreasing the rank of a certain coeﬃcient matrix G in a linear system). Setting TR = 3000 often works well.Finally, the inputs a,b determine the interval of ﬁt [ a, b ] for a beta distributionfor the output of dimensions. One can take b = ( n + 1) + 0 . a to be 0.5 lessthan the theoretical minimum dimension (or estimate thereof). The accuracy ofthe estimate of a is not critical. The ﬁtted distribution is updated and displayedperiodically, after every 1000 trials. Also displayed is the estimate so far ofthe probability of a Eureka occurring, IF one exists for our particular settings.On this basis, a user can decide whether to continue or try new settings. Westress, however, that the beta distribution is NOT used as a predictor of whethera Eureka exists here, but rather the number of trials needed to reasonably assureus otherwise. See [18] for more details. In the case of commuting quadruples

A, B, C, D of matrices, our beta ﬁt worked very well for the predicted estimates(say of getting dim( Z /p )[ A, B, C, D ] = n + 1 or n + 2, or even n + 3). In large trialsthe estimates closely matched the actual number of these. So we see no reasonwhy the predictions can’t be relied on also for commuting triples. ommTriplesII . This has 11 input arguments, 9 in common with CommTriplesI ,and the 6 output arguments, 5 in common with

CommTriplesI : [EUREKAS,ES,W,K,L,HIST] = CommTriplesII(x,p,p1,p2,pc,B11,C11,a,b,s1,s2) The inputs that have been dropped are p0, Min, Max, ICR, CS, TR . The twonew inputs are

B11,C11 and these are the ﬁrst blocks of the constructed pair

B, C . We give two subroutines for this (

CommPairI, CommPairII ) , the user canchoose one, that calculate random, strictly upper-triangular, commuting n × n matrices B , C . Users can, if they wish, achieve the earlier constraints involvingMin and Max, and the size of the 1 st LED , by running the subroutines enoughtimes to achieve these. Again, x = ( n , n , . . . , n r ) is the Weyr structure of theﬁrst matrix A of the commuting triples A, B, C that is constructed. The newoutput

HIST records the number of times dim = k occurred during the trials for k = 1 , . . . , n + 5. This enables a series of trials for diﬀerent B , C to be run andtheir outputs to be collated (see SampleTrial2 ). CommQuads . This constructs for a given Weyr structure x and seed range [s1,s2] commuting quadruples W,K,L,M of matrices with W the nilpotent Weyr matrix ofstructure x : [EUREKAS,MaxExcess,MaxSeed] = CommQuads(x,p,p0,p1,p2,pc,a,b,s1,s2) The constructed quadruples are not displayed within the routine (they could be ifthe user wishes), although dimensions and

LED are displayed. The routine is de-signed principally to suggest promising parameter settings for snaring Eurekas inthe case of commuting triples. The code for

CommQuads is less sophisticated thanthat for commuting triples. The inputs are a subset of those for

CommTriplesI .Again,

EUREKAS is the number of Eurekas the trials produced. Output

MaxExcess is the maximum excess of dimension over matrix size that occurred in the pro-duction of Eurekas, while

MaxSeed is the ﬁrst seed that gave MaxExcess. Anotherpromising sign of a good choice of parameters is by how far some of the

LED havejumped above the corresponding Weyr structure component. CommPairII is faster although not always guaranteed to produce strictly upper-triangularmatrices. . Three important supporting subroutines

The three most important subroutines are

WorkHorseI , WorkHorseII , and

Dim . WorkHorseI . [dim,LED,E,S,W,K,L] = WorkHorseI(x,p,p1,p2,pc,K11,L11,G,C,PivInfo,s) This constructs a commuting triple

W,K,L of matrices for a single inputted seed s and Weyr structure x . The ﬁrst matrix is the nilpotent Weyr matrix of structure x . The inputs p, p1, p2, pc are the same as for CommTriplesI , while

K11, L11 arethe 1 st corners of K, L . The ﬂags

E, S signal if a Eureka was produced, respectivelywhether the process completed, according to the values 1 and 0. Output dim isdim( Z /p )[ W, K, L ] if the process completed, otherwise dim = 0 . And

LED givesthe leading edge dimensions. To understand the inputs

G, C, PivInfo , and onlyusers who might want to modify the code need to read the details carefully, weneed to go into how the top row blocks K k , L k are calculated for k >

1. Let x = ( n , n , . . . , n r ).The K k , L k are calculated recursively. The construction of X = K k and Y = L k from the earlier top row blocks is determined by a linear system Gv = w where G = (cid:20) MD (cid:21) , with D a certain 2 ab × ab diagonal matrix for a = n , b = n k , and M = [ M , M ]an augmentation of two ab × ab matrices M = I b ⊗ K − K t (1 : b, b ) ⊗ I a M = − ( I b ⊗ L − L t (1 : b, b ) ⊗ I a ) . The matrices v, w are column vectors v = (cid:20) vec( Y )vec( X ) (cid:21) ,w = (cid:20) vec( Z )0 (cid:21) , where 0 is the 2 ab × Z = K L ,k − (1 : n , n k − ) + K L ,k − (1 : n , n k − ) + · · · + K ,k − L (1 : n k − , n k − ) − L K ,k − (1 : n , n k − ) − L K ,k − (1 : n , n k − ) − · · · − L ,k − K (1 : n k − , n k − ) . ecall the vectorisation of an m × n matrix X , vec( X ), is the mn × X .The system Gv = w comes from the requirement that the top left k × k cornersof blocks of K and L commute (role of M ) and that the new blocks conform tothe shape of matrices centralizing W (role of D ). Notice that G is 3 ab × ab , v is 2 ab ×

1, and w is 3 ab ×

1. To solve the system we put G in loose row-echelon form using elementary row operations, and record an invertible matrix C such that CG is the desired form (and so (CG)v = Cw is an equivalent system).Notice that each G associated with the pair of blocks in a given position (thereare r − K , L and the structure x . Therefore, if we are performing a whole series of trials byrunning through a seed range [ s , s ] (and thereby generating new solutions forthe following blocks of K and L ), but involving the same ﬁrst corners K , L , itmakes sense to record all the G and C that are to be used. (That way, we avoidhaving to repeat the row operations each time.) This greatly increases eﬃciencyof trialing (much faster). When the individual G are stacked in an array, this givesthe input argument G . To make this possible, before stacking, we augment each G with a zero matrix if necessary so that all the G have the same number of columns,namely n n . Similarly, stacking the individual C , after augmenting with a zeromatrix, gives the input C . From these arrays we can quickly recover each of theoriginal, individual G, C as required. Note that G is 3 n ( n − n ) × n n and C is3 n ( n − n ) × n n .This leaves only the input argument PivInfo . It is a ( 3( r −

1) + 1) × n n array that records the pivot positions (and more) of the various reduced matrices CG , thereby allowing rapid solving of the various linear equations as they arise. WorkHorseII . [dim,LED,E,S,W,K,L] = WorkHorseII(x,p,p1,p2,pc,K11,L11,s) This routine is similar to

WorkHorseI but without the inputs

G,C,PivInfo . Theyare calculated within the routine itself. The routine is designed to be used inconjunction with routines

CommPairI or CommPairII to produce the 1 st corners K11,L11 . Since this is much slower than the method used in

CommTriplesI ,to include

G,C,PivInfo as inputs as well would make

WorkHorseII too slow,particularly over short runs.

WorkHorseII can be used in its own right to produce a commuting triple ofmatrices. But its main purpose is to support the code of

CommTriplesII . Thesubroutine is also used to produce the 1 st corners of K, L, M in CommQuad . Asimilar technique could be used to construct k commuting matrices A , A , . . . , A k ,with the ﬁrst matrix a nilpotent Weyr matrix, by constructing the ﬁrst corners of , A , ..., A k using the program for k − WorkHorseIII isthe analogue of

WorkHorseII for commuting quadruples.

Dim . [dim,LED] = Dim(W,K,L,x,p) The subroutine

Dim returns the dimension dim of the subalgebra generated by thethree inputted commuting matrices

W,K,L , where W is the nilpotent Weyr matrixof Weyr structure x . All calculations are done mod p for the inputted prime p .Output LED gives the leading edge dimensions.Here is a sketch of how the subroutine works. Let x = ( n , n , . . . , n r ). Supposethe nilpotent indices of W, K, L are respectively a, b, c (actually a = r = length ofthe partition x ). Then ( Z /p )[ W, K, L ] is spanned as a vector space by the words W i K j L k for 0 ≤ i ≤ a − , ≤ j ≤ b − , ≤ k ≤ c − . Taking the vectorisation, vec( W i K j L k ), of each of these words and making themthe columns of a matrix M , we now have that dimension is the rank of M . Thereare a number of ways to ﬁnd rank( M ). For instance, we could call the built-inroutine in MATLAB (it would need to be modiﬁed for mod p arithmetic). However,this would be very slow because the matrix M can be huge: it is n × abc , where n is the matrix size of W, K, L . Also, a, b, c can be as large as n , so potentiallywe are looking at a n × n matrix. In our search for Eurekas, we may haveto consider matrices bigger than 100 × M bigger than10 , × , , , × ,

000 are big matricesif we are running tens of thousands of trials.Our way around these problems is to take advantage of the special form thematrices

K, L have that comes from their centralizing of W . Firstly, since the W i K j L k centralize W , they are completely determined by their top row blocks.Thus we need only take the vectorisation of their top row blocks. We also orderthe set { ( i, j, k ) } of indices in a special way so that the corresponding parts ofthe matrix M form an approximately lower triangular matrix. This allows us tocalculate rank( M ) recursively, using various elementary row operations to achievea loose, row-echelon form of M (whence rank( M ) = the number of nonzero rows ofthe reduced form). The earlier pivots are noted for subsequent clearing below butthe earlier corner has zeros to its right in M and is not touched again. Basically, weare recursively calculating the dimension of the algebra generated by the top left m × m corner of blocks of W, K, L for m = 1 , , . . . , r . We need these dimensionsanyway for the LED . Also we can check whether a Eureka has occurred at one of hese corners. This is important because an early Eureka could be lost for the fullmatrices.In summary, our subroutine Dim computes dimension very quickly and accu-rately. 5.

Subroutines connected with the Weyr form

We won’t go into much detail here. The interested reader can look at the codeand the accompanying documentation. The letter W in a routine name is shorthandfor “Weyr”.The routine W = WMatrix(x) returns the nilpotent Weyr matrix W of a givenWeyr structure x , given as a partition x = ( n , n , . . . , n r ) of the underlying ma-trix size n . It is independent of the underlying ﬁeld. On the other hand x =WStruc(A,p) gives the Weyr structure x of a given nilpotent matrix A mod p . Tocomplete the “three Weyr amigos” , we have [W,C] = WForm(A,p) , which returnsthe Weyr form W of the inputted nilpotent matrix A , together with an invertiblematrix C that achieves the similarity transformation ( C − AC = W ). The arith-metic is done modulo the prime p . But with all our routines, it would not behard for a user to change the arithmetic to suit other situations (such as rationalmatrices), once safeguards are built in to avoid roundoﬀ errors.We talked in Section 2 about how a matrix K that centralizes a given nilpotentWeyr matrix W of structure x has a special blocked form, and is determined byits top row of blocks. In turn, these blocks have a certain staircase form. Theroutine T = Template(x) gives a template for these blocks, as an n × n matrix ofzeros and ones, where a 0 entry says that this entry must be zero, and an entry 1says that the entry in this position can be arbitrary. This is often used in tandemwith the routine K = TopRowToFull(A,x) to obtain the full n × n centralizingmatrix K that has the n × n matrix A as its top row of blocks. Finally K =RandCentMatrix(x,p,q,s) returns a random, subject to sparsity probability q ,strictly upper-triangular matrix K that centralizes the nilpotent Weyr matrix W of the inputted structure x . The answer depends on the inputted seed s that is tobe used in the MATLAB random number generator. The input p is the prime thatdetermines the range { , , , . . . , p − } of allowable entries in K . . Subroutines for solving linear equations

As is to be expected, if our routines involve solving linear equations, we shouldbe using row operations. At the risk of reinventing the wheel, we have written ourown versions to suit arithmetic mod p and our particular inductive techniques inaddressing the GP. Minimising the number of row operations (such as not makingpivots 1 or swapping rows) and speed of execution are our priorities. The letter L in a routine name is shorthand for “loose”.The routine LFormI takes the “form” [L,C,P,r] = LFormI(A,TI,p).

Here A is the matrix to be reduced and p is the prime for mod arithmetic. The output L isrow equivalent to A and has a loose row echelon form: no two pivots (ﬁrst nonzeroentries of nonzero rows) lie in the same column, pivots need not be 1, and zerorows can be interspersed with nonzero rows. Note that our form need not havea staircase (echelon) shape. Output r is the rank of A (the number of nonzerorows of L ), and P is a column vector that records the pivot positions: P(j) = i ifrow i contains a pivot in column j , otherwise P(j) = 0 . Input TI (“To Invert”)takes the values 1 and 0 according to whether we wish to produce an invertiblematrix that gives (under left multiplication) the form L or not (for instance, wedon’t need the invertible matrix if all we are interested in is the rank of A ). Ifso desired, output C gives such an invertible matrix, otherwise C is the identitymatrix.Routine [L,C,r] = LFormII(A,TI,p) is similar to LFormI except now output P has been dropped, and entries above and below a pivot are 0 (whence C mayno longer be lower triangular).Routine [L,C,Q,r] = LFormIII(A,B,TI,P,r1,p) is more interesting. Herethe input A is an m × n matrix already in the form of L in LFormI , P is the columnvector of the pivot positions, and r1 is the rank of A . The input B is a t × n matrix.Then output L is a matrix in the loose form of LFormI of the augmented matrix (cid:20) AB (cid:21) . Output Q gives the new pivot positions, r is the new rank, and C is an invertiblematrix that realises the reduction to L .The routine Inv uses

LFormII to invert a matrix or record that it is not invert-ible. The techniques in our routines for solving a linear system Ax = b using someof the above forms are pretty standard. SolveI uses

LFormI and works throughthe equations in the column-order of the pivots, starting with far rightmost pivotand working backwards. On the other hand,

SolveII uses

LFormII and reads oﬀthe solutions directly. . The beta fit routine

The ﬁnal subroutine we have chosen to highlight is

BetaFit . As described inSection 3 it is a useful guide to the number of trials needed to ﬂush out a Eureka if such exist for our particular setting of parameters. A beta distribution takesthe form p ( x ) = Cx d (1 − x ) e , x ∈ [0 , C, d, e and over the real interval [0 ,

1] — so it is a continuousdistribution. (Formal descriptions state this diﬀerently in terms of “shape param-eters” α, β .) On a general (ﬁnite) interval [ a, b ], we need to replace x in the righthand side of p ( x ) by ( x − a ) / ( b − a ) and replace C by C/ ( b − a ) (but retain same d, e ).From a series of trials for commuting triples W, K, L of n × n matrices, we viewthe histogram of the computed dimensions as a crude approximation to the truedensity function. (The histogram rectangle, say at dim = 10, should be viewed assitting on [9 . , . a, b ] to the histogramand from that estimate the probability of getting dim = n + 1 if such a Eurekaexists.Figure 1 is the beta distribution (red) p ( x ) = 414 . x − . / . ] [(1 − (( x − . / . ]for the histogram (blue) of dimensions from commuting triples of 20 ×

20 matrices

W, K, L with W the nilpotent Weyr matrix of structure x = (7 , , , , p = 101. (This was suggested by commuting quadrupleswhich gave the promising output of dimensions 21,22, and even 23 with proba-bilities 0.0367, 0.0039, 3 . × (10) − respectively, which closely matched theactual number of occurrences.) The program used to produce the underlying dataneeded for Figure 1 is similar to that in SampleTrial1 given later. The green baseis a reminder of the interval of ﬁt. And the green ∗ are simply a visual guide as towhere the base (integer) dimension is centred. The ﬁt here for commuting tripleslooks good over the distribution of our gathered raw data, and suggests no reasonwhy it can’t be relied on for a Eureka probability prediction if one exists here.On the assumption that p ( x ) gives even a halfway reasonable ﬁt to the uppertail of the underlying distribution, dim = 21 = n + 1 should be popping up about1 in 500 trials (and certainly at least every 10,000 trials). The fact that it didn’toccur at all in some 2 million trials makes it a reasonable inference that the GPdoes not fail here. So we should look elsewhere.Here is the form of the routine BetaFit : [alpha,beta,C,M,V,Estimate] = BetaFit(n,a,b,Hist) ost of the input and output arguments are self-explanatory. Input Hist is thehistogram of the dimensions recorded from a (large) series of trials. It takes theform of a 1 × n row vector whose i th component is the number of times dim = i occurred. ( n as always is the underlying matrix size.) Output Estimate gives theprobability of a Eureka occurring if one exists. The calculations used to ﬁt thebeta probability density function involve only the mean and variance of the data,given as outputs

M,V , not higher moments. But it works ﬁne because we are notafter a precise probability estimate (that would be very diﬃcult). The interestedreader can look at the actual code for more detail.This concludes our description of some of the routines in our package. Thereare a total of 38 routines in all, although some of them are at quite a low level ofprogramming. To assist the reader in keeping track of the overall structure, weinclude the following chart of dependency relations between certain highlightedroutines. We use the notation AB (cid:79) (cid:79) to indicate program A calls program B. DEPENDENCY

CommTriplesI CommTriplesIIRandCentMatrix (cid:53) (cid:53) (cid:107)(cid:107)(cid:107)(cid:107)(cid:107)(cid:107)(cid:107)(cid:107)(cid:107)(cid:107)(cid:107)(cid:107)(cid:107)(cid:107)

WorkHorseI (cid:79) (cid:79)

BetaFit (cid:54) (cid:54) (cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109) (cid:104) (cid:104) (cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)

WorkHorseII (cid:79) (cid:79)

Template (cid:79) (cid:79)

SolveII (cid:79) (cid:79)

Dim (cid:104) (cid:104) (cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81) (cid:54) (cid:54) (cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)(cid:109)

SolveI (cid:79) (cid:79)

Rank (cid:64) (cid:64) (cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)(cid:2)

LFormII (cid:79) (cid:79)

LFormIII (cid:79) (cid:79)

LFormI (cid:108) (cid:108) (cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89)(cid:89) (cid:104) (cid:104) (cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81)(cid:81) (cid:60) (cid:60) (cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122)(cid:122) . Some examples of running the main routines

Many Mathematics Departments have a general licence for

MATLAB that can beshared with the faculty. So for those not very familiar with

MATLAB , our advice isto enquire whether your department has such a licence. Our package of programsfor the Gerstenhaber Problem should then be downloaded into a folder within

MATLAB .Below are three sample trials involving actual

MATLAB code (they are alsoincluded in our package). Rather than entering the code directly into

MATLAB ’scommand window, it is better to create a separate m-ﬁle

SampleTrial.m andexecute that by typing in the command window tic, SampleTrial, toc and hitting the enter button. An advantage of this is that it is easy to make smallchanges (changing parameter settings, for example) in future trials. Note that the tic, toc will return the running time for the trial (these are optional).As a guide to the running times of each sample trial, we give the times from abasic 4-core-processor laptop of around 2.5 GHz speed (and using a 64 bit versionof

MATLAB ). Our ﬁrst trial

SampleTrial1 illustrates

CommTriplesI and takesaround 1 hour. % SampleTrial1 Tests CommTriplesI on matrix size n = 29% with structure x = [9,7,7,4,1,1].n = 29x = [9,7,7,4,1,1]p = 97 % chosen prime;p0 = 0.5;s1 = 1s2 = 25000 % so 25,000 trials;a = 12.5;b = 30.5;ICR = 0.6CS = 1000; % K11,L11 change after 1000 trials;TR = 3000; % 3000 experiments for optimal K11,L11;p1 = 0.88p2 = 0.93pc = 50; % 2nd and 3rd blocks have p1, last three p2;Min = 4 Ask a friendly technician to install MATLAB on your oﬃce computer and to give a brieftutorial on how to run programs. The authors would regard this as very short trial. ax = 7 [ EUREKAS,ES,W,K,L ] = CommTriplesI(x,p,p0,p1,p2,pc,Min,Max,ICR,CS,TR,a,b,s1,s2), Our second trial illustrates

CommTriplesII and takes under 1 and 1/2 hours. % SampleTrial2 Tests CommTriplesII on matrix size n = 18% with structure x = [ ] . Makes 20% changes in 1st corner in a series of runs% of 2000 trials each. Then fits beta% distribution (and estimates pr(dim = n+1))% over all 40,000 trials.n = 18x = [ ] p = 37q = 0.82p1 = 0.75p2 = 0.90pc = 60a = 5.5b = 19.5EUREKAS = 0;FirstES = 0;Hist = zeros(1,n+5);for t = 0:19if t > 0if EUREKAS == 0disp(’No Eurekas so far.’),elsedisp(’EUREKA WAS FOUND AT SEED’), FirstESendpause(5)end [ B11,C11 ] = CommPairII(x(1),1,p,q,t), % could use CommPairI here instead;s1 = 1+2000*t;s2 = 2000 + 2000*t; [ Eurekas,ES,W,K,L,HIST ] = CommTriplesII(x,p,p1,p2,pc,B11,C11,a,b,s1,s2);Hist = Hist + HIST;if ( (EUREKAS == 0) & (Eurekas ~0) ) == 1First ES = ES(1);endEUREKAS = EUREKAS + Eurekas;endEurekasOverAll = EUREKAS,Hist, isplay(’Beta distribution over all our choices of B11,C11’),Hist = [ Hist(1:n),0,0,0 ] ; [ alpha,beta,CC,MM,VV,Estimate ] = BetaFit(n,a,b,Hist);FinalEstimate = Estimate,TotalCompletedCases = sum(Hist), Our ﬁnal trial illustrates

CommQuads . Its code is a couple of pages long, sowe refer the reader to the code and documentation given within our package ofroutines. The running time is around 2 and 1/4 hours. However, a user need notrun the full trial to see what is going on because there are periodic updates onthe results so far. The best way of getting an overview of the output is to watchfor the changes in the displayed beta ﬁt graph, at about 3 minute intervals afterthe completion of 1000 trials for a particular choice of settings. From this onecan quickly see the number of Eurekas, maximum excess of dimension over matrixsize, etc. Here is the introductory description of the trial. % SampleTrial3 For matrix size n = 15, the trial illustrates% how to choose promising parameter settings for% CommTriplesI and CommTriplesII to snag Eurekas,% based on the routine CommQuads and promising% settings for commuting quads of n x n matrices% that actually produce lots of Eurekas.

The references below include those mentioned explicitly in our text along witha number of others that are related to the GP.

Acknowledgement.

The authors wish to thank Ian Coope, Jean-Pierre Schoch, Pace Nielsen,Robert Carlson, and Graham Wood for their helpful advice.

References [1] Barr´ıa, J.; Halmos, P.,

Vector bases for two commuting matrices , Linear and MultilinearAlgebra (1990), 147–157.[2] Belitskii, G., Normal forms in matrix spaces , Integral Equations Operator Theory , (2000),251– 283.[3] Bergman, G. M., Commuting matrices, and modules over artinian local rings , preprint(2013), readable online at http://math.berkeley.edu/ ∼ gbergman/papers/unpub [4] Gerstenhaber, M., On dominance and varieties of commuting matrices , Ann. Math. (1961), 324–348.[5] Guralnick, R. M., A note on commuting pairs of matrices , Linear and Multilinear Algebra (1992), 71–75.

6] Guralnick, R. M.; Sethuraman, B. A.,

Commuting pairs and triples of matrices and relatedvarieties , Linear Algebra Appl. (2000), 139–148.[7] Han, Y,

Commuting triples of matrices , Electron. J. Linear Algebra (2005), 274–343(electronic).[8] Higham D. J., Higham N. J., Matlab Guide , 2nd edition, Siam, Philadelphia, 2005.[9] Holbrook, J.,

Polynomials in a matrix and its commutant , Linear Algebra Appl. (1982),293–301.[10] Holbrook J., O’Meara K. C., Some thoughts on Gerstenhaber’s theorem , Linear AlgebraAppl. (2015), 267–295.[11] Holbrook, J.; Omladiˇc, M.,

Approximating commuting operators , Linear Algebra Appl. (2001), 131–149.[12] Horn, R. A.; Johnson, C. R.,

Matrix Analysis , Cambridge University Press, Cambridge,Second Edition, 2013.[13] Laﬀey, T. J.; Lazarus, S.,

Two-generated commutative matrix subalgebras , Linear AlgebraAppl. (1991), 249–273.[14] Motzkin, T. S.; Taussky, O.,

Pairs of matrices with property L . II. Trans. Amer. Math. Soc. (1955), 387–401.[15] Neubauer, M. G.; Saltman, D., Two-generated commutative subalgebras of M n ( F ), J. Algebra (1994), 545–562.[16] Neubauer, M. G.; Sethuraman, B. A., Commuting pairs in the centralizers of 2-regularmatrices , J. Algebra (1999), 174–181.[17] Ngo, N.; ˇSivic, K.,

On varieties of commuting nilpotent matrices , Linear Algebra Appl. (2014), 237–262.[18] O’Meara K. C.,

The Gerstenhaber problem for commuting triples of matrices is “decidable” ,Comm. In Algebra (2020), 453–466.[19] O’Meara, K. C.; Vinsonhaler, C., On approximately simultaneously diagonalizable matrices ,Linear Algebra Appl. (2006), 39–74.[20] O’Meara, K. C.; Clark, J.; Vinsonhaler C. I.,

Advanced Topics in Linear Algebra: WeavingMatrix Problems through the Weyr Form , Oxford University Press, Oxford, 2011.[21] O’Meara K. C., Watanabe J.

Weyr structures of matrices and relevance to commutativeﬁnite-dimensional algebras , Linear Algebra and Appl. (2017), 364–386.[22] Omladiˇc, M.,

A variety of commuting triples , Linear Algebra Appl. (2004), 233–245.[23] Rajchgot J., Satriano M.,

New classes of examples satisfying the three matrix analogue ofGerstenhaber’s theorem , J of Algebra (2018), 245–270.[24] Rajchgot J., Satriano M., Shen W.,

Some combinatorial cases of the three matrix analog ofGerstenhaber’s theorem , to appear Proceedings of the 2019 AWM Research Symposium.[25] Sethuraman, B. A.,

The algebra generated by three commuting matrices , Math. Newsl., (2011), 62–67.[26] Sethuraman, B. A.; ˇSivic, K.,

Jet schemes of the commuting matrix pairs scheme , Proc.Amer. Math. Soc. (2009), 3953-3967.[27] Shapiro, H.,

The Weyr characteristic , Amer. Math. Monthly (1999), 919–929.[28] ˇSivic, K.,

On varieties of commuting triples , Linear Algebra Appl. (2008), 2006–2029.[29]

On varieties of commuting triples II , Linear Algebra Appl. (2012),no. 2, 461-489.[30]

On varieties of commuting triples III , Linear Algebra Appl. (2012),no. 2, 393-460[31] Wadsworth, A. R.,

The algebra generated by two commuting matrices , Linear and MultilinearAlgebra (1990), 159–162. AFFILIATIONS (John Holbrook) University of Guelph, Guelph, Ontario, Canada.(Kevin O’Meara) University of Canterbury, Christchurch, New Zealand. -MAIL [email protected] (John Holbrook) [email protected] (Kevin O’Meara)(Kevin O’Meara)