[PDF] Faster Tensor Canonicalization

Abstract

The Butler-Portugal algorithm for obtaining the canonical form of a tensor expression with respect to slot symmetries and dummy-index renaming suffers, in certain cases with a high degree of symmetry, from O(n!) explosion in both computation time and memory. We present a modified algorithm which alleviates this problem in the most common cases---tensor expressions with subsets of indices which are totally symmetric or totally antisymmetric---in polynomial time. We also present an implementation of the label-renaming mechanism which improves upon that of the original Butler-Portugal algorithm, thus providing a significant speed increase for the average case as well as the highly-symmetric special case. The worst-case behavior remains O(n!) , although it occurs in more limited situations unlikely to appear in actual computations. We comment on possible strategies to take if the nature of a computation should make these situations more likely.

Full PDF

FFaster Tensor Canonicalization

Benjamin E. Niehoﬀ

Institute for Theoretical Physics, KU LeuvenCelestijnenlaan 200D, B-3001 Leuven, Belgiumben.niehoﬀ@kuleuven.be

Abstract

The Butler-Portugal algorithm for obtaining the canonical form of a tensor expressionwith respect to slot symmetries and dummy-index renaming suﬀers, in certain cases witha high degree of symmetry, from O ( n !) explosion in both computation time and mem-ory. We present a modiﬁed algorithm which alleviates this problem in the most com-mon cases—tensor expressions with subsets of indices which are totally symmetric or to-tally antisymmetric—in polynomial time. We also present an implementation of the label-renaming mechanism which improves upon that of the original Butler-Portugal algorithm,thus providing a signiﬁcant speed increase for the average case as well as the highly-symmetric special case. The worst-case behavior remains O ( n !) , although it occurs in morelimited situations unlikely to appear in actual computations. We comment on possiblestrategies to take if the nature of a computation should make these situations more likely. Contents a r X i v : . [ c s . S C ] A p r The improved algorithm 17

Computer algebra systems have become quite powerful at simplifying expressions with polyno-mials, integrals, derivatives, special functions, etc. All of these fall under the category of whatwe will call scalar algebra, meaning that they deal with objects that are functions from R → R or C → C . A ﬁeld that is still quite in development is the extension of this computationalpower to tensor algebra, by which we mean the manipulation of objects with “indices” such asvectors v i , matrices M ij , and higher-rank tensors T ijk(cid:96) . Such indexed objects are a way to denotemultidimensional arrays (and thus represent objects of multilinear algebra ). The indices refer tothe components of the object in a basis of some vector space like R n ; for example, one can write: v = v i e i ≡ v e + v e + . . . + v n e n , (1.1)where v ∈ R n is a vector and { e i } is a given basis. Where multiple indices occur on a singletensor, these refer to the component along the tensor product of the corresponding basis elements,as in T = T ijk(cid:96) e i ⊗ e j ⊗ e k ⊗ e (cid:96) . Alternatively, we can choose to think of indices abstractly ,referring merely to the structure of a tensor and its contractions, rather than literally referringto the components of the underlying multidimensional array [1]. “Upstairs” (superscript) and “downstairs” (subscript) indices are to be distinguished, and when the sameindex is repeated both upstairs and downstairs, it is meant to be contracted or summed over, as shown in (1.1).

2n either case, whether indices are thought of as literal or abstract, the algebra of indexedobjects is as follows: We can think of each index as a slot which has been ﬁlled with a label such as i, j, k, (cid:96) , etc. The slots are, for our purposes, an ordered sequence of “blanks” in a tensorexpression into which labels can go, T ◦◦◦◦ , (1.2)whereas the labels i, j, k, (cid:96) are a means to “name” these slots, and either indicate that a pair ofslots have been contracted (in the case that a label is repeated), or indicate that a slot is free ;that is, available to be contracted. The positional order of the labels has meaning, and can beused to eﬀectively indicate notions like the transpose of matrices: ( M (cid:62) ) ij ≡ M ji . (1.3)The free labels must be “balanced” in any sum or equation in order for it to be a valid mathe-matical sentence. For repeated labels which indicate a contraction, one may use any availablesymbol; one calls these “dummy” labels, as the particular symbol used is unimportant. A ﬁnaluse of labels is one we will call component labels , which refer to particular components of a tensor(in a particular concrete basis); for example, T (1.4)refers to the component of T along the tensor product basis element e ⊗ e ⊗ e ⊗ e .The task of a tensor computer algebra system—that is, to manipulate algebraic expressionswith indexed objects—is complicated by two types of symmetries: re-labelling symmetries, whichwe have already hinted at, and intrinsic symmetries, which we will deﬁne in a moment. Re-labelling symmetries arise when there are dummy labels present, as illustrated by T abab = T bcbc . (1.5)Since the labels used to indicate each contraction are unimportant, the two sides of (1.5) areequal, even though they do not look the same. In order to see that they are equal, the systemmust either canonicalize the sequence of dummy labels used (thus putting each monomial intoa standard form), or use some sort of isomorphism -detection algorithm (in order to match onemonomial to the other).With only dummy-relabelling to contend with, this problem would be rather simple. However,a tensor can also have intrinsic symmetries, which are symmetries that involve permuting thetensor’s slots . For example, a symmetric matrix M ij = M ji is symmetric irrespective of the Thus in T kikj , the ﬁrst and third slots are contracted, while the second and fourth slots are free. We willfrequently refer to the slots by their names, saying that i, j are free, while k is contracted. Note that a given labelmay only be repeated twice, once upstairs and once downstairs, to indicate a single contraction. M itself. Intrinsicsymmetries come in two distinct forms, which we will call mono-term symmetries and multi-term symmetries. Some simple examples of mono-term symmetries are the slot-exchange symmetriesof the Riemann curvature tensor, R abcd = − R bacd , R abcd = R cdab , (1.6)whereas an example of a multi-term symmetry is the algebraic Bianchi identity: R abcd + R bcad + R cabd = 0 . (1.7)In the context of a calculation (that is, when labels have been placed in the slots), the intrinsicsymmetries of a tensor monomial can interact with its re-labelling symmetries in a complicatedway. For example, a tensor T abcd which has no intrinsic symmetries may acquire the appearance of a slot symmetry when contracted, T abab = T baba , (1.8)because we are free to exchange dummy labels. Similarly, if a tensor A ab = − A ba is intrinsicallyantisymmetric, then placing component labels into its slots can have non-trivial results: A = A = 0 , A = − A . (1.9)Here we can think of the ﬁrst equalities as arising from the contradiction between the anti-symmetry of A and the symmetry of exchanging identical component labels. The same sort ofsituation can happen with dummy labels, if say we take M ab A ab where M is symmetric and A isantisymmetric. In general, all of these complications may be mixed within a single expression: atensor monomial may have intrinsic slot symmetries (of both single- and multi-term types), somefree indices, some dummy indices, and some concrete component labels, all at the same time.Thus any CAS with the power to simplify tensor expressions is faced with a challenge toeﬃciently manipulate index symmetries. The greatest amount of progress has been made onmono-term symmetries, whereas multi-term symmetries are considerably more diﬃcult to man-age. For mono-term symmetries, there are two main approaches: index canonicalization andindex isomorphism:The index canonicalization approach seeks to reduce each monomial in an expression to itscanonical form, which is deﬁned as the least possible arrangement of its indices, modulo slotsymmetries and label exchanges, where least is deﬁned by lexicographical order. Simpliﬁcationcan then proceed as it does in a standard (i.e. non-tensorial) CAS, as all objects which are equalhave been reduced to the same form. While processing a given monomial, the same algorithmcan easily detect if the monomial is itself equal to zero due to its own internal symmetries (say,4y having a pair of symmetric slots which are contracted against a pair of antisymmetric ones);thus, the canonicalization approach can also pre-emptively simplify some parts of expressions.The canonicalization strategy is well-suited for use as a plugin or package for a standard CASsuch as Maple or Mathematica , since it reduces terms to a form which can then be manipulatedby pre-existing routines.By contrast, the index isomorphism approach works by attempting to match the varioustensor monomials of an expression onto each other by constructing a bijection between their freeindices (subject to symmetries). With every successful bijection, the expression can be simpliﬁedby combining the corresponding terms. This approach has two advantages: First, the algorithmto make a single match could conceivably run faster than a canonicalization algorithm, since onehas a deﬁnite goal in mind rather than needing to search the space of equivalent conﬁgurationsfor the “least” one. Second, a matching algorithm can treat compound expressions such as T ac ( S bbcd + V cd ) as a single monomial ( · ) ad , and thus simplify the expression tree without ﬂatteningit; whereas the canonical form of such an expression is not obvious, and may be ill-deﬁned. However, the disadvantages of the isomorphism approach may outweigh the advantages: First,while a single matching test might run faster than canonicalization, one still has to make up to O ( n ) matching tests, vs. O ( n ) canonicalizations (where n here is the number of terms in anexpression). Second, a matching algorithm alone has no good way of detecting whether a single monomial vanishes due to its own symmetry, so in the end one has to run a separate test for that.And ﬁnally, attempting to use the index isomorphism approach within an existing (scalar) CASis cumbersome and requires circumventing what that system already knows about expressionsimpliﬁcation; it is an approach better suited to standalone software.There are many implementations of the index canonicalization strategy in currently-availableCAS software. Some of the more popular ones include the open-source

Canon package [2] for

Maple , which is also included in the

GRtensorII [3] suite of packages; the open-source xPerm package [4] of the package suite xAct [5] for

Mathematica ; the standalone open-source tensorCAS

Cadabra [6] (which in fact uses the same code from xPerm ); and the tensor subsystemof the standalone open-source Python CAS

SymPy [7]. Since Version 9,

Mathematica also in-cludes a built-in implementation of tensor canonicalization, which is not open-source, but we cannevertheless draw conclusions about its algorithm from testing. At the time of writing, however, we do not know of any software which actually implements such an algorithmwhich is eﬃcient in the presence of slot symmetries. However, the notion of canonical-ness of compound expressions has not been explored in the literature, norimplemented in any software package to our knowledge, and it may well be that one can come up with a workabledeﬁnition! There is currently only one software package which uses the index isomorphism strategy: the standalonetensor CAS

Redberry [8]. most common cases. However,in tensor calculus in higher dimensions (of particular interest to the high-energy theoreticalphysics community), there are certain types of simpliﬁcation problems which thwart the Butler-Portugal algorithm and cause it to consume O ( n !) memory and take O ( n !) time. These areproblems where tensors with many indices and a high degree of symmetry are contracted witheach other; thus the slots of these highly-symmetric tensors are ﬁlled with dummy indices andare subject to a high degree of re-labelling ambiguity. The overlap of slot symmetries withre-labelling symmetries inhibits the algorithm’s ability to deduce whether two arrangements ofindices are actually diﬀerent.In this article we present a modiﬁcation which improves upon the Butler-Portugal algorithmby automatically recognizing (in polynomial time) these worst-case situations, and adaptingto avoid processing O ( n !) redundant steps. Thus for tensor monomials with total symmetry ortotal antisymmetry over some subset(s) of slots, our improved algorithm can canonicalize them inpolynomial time, even subject to index re-labelling symmetries. In the course of implementingthese modiﬁcations for these worst-case situations, we introduce a number of data structureswhich signiﬁcantly increase the algorithm’s average-case performance as well.Aside from (subsets of indices with) total symmetry or total antisymmetry, there are stillsituations that can lead to O ( n !) behavior (such as a symmetric group which acts on pairs ofindices rather than individuals), and there is no eﬃcient means to identify all potential sourcesof combinatorial explosion. Thus our algorithm still has worst-case time and space complexitiesof O ( n !) , although the cases in which this behavior occurs are much less frequent in practicalcalculations. However, in the event that one does expect these situations to occur, we discusssome possible strategies based upon the concepts introduced in this work.This paper is structured as follows: in Section 2 we review tensor symmetries and permutationgroups, and precisely deﬁne the problem to be solved; in Section 3, we review the Butler-Portugalalgorithm; in Section 4 we present the improved algorithm; in Section 5 we give the results ofperformance testing on a few diﬀerent types of tensor inputs; and in Section 6 we discuss theseresults and potential further improvements. We note that some attempt has been made at improving the canonicalization of (anti)symmetric tensors in[11], although it is far from complete. Tensor symmetries and permutation groups

In this work we will focus on the mono-term symmetries of indexed objects and their relation topermutation groups. A mono-term symmetry of a tensor can be described by a signed permuta-tion ( ε, π ) ∈ {− , } × S n , where S n is the symmetric group on the tensor’s slots: T a a ...a n = ε T π ( a a ...a n ) . (2.1)That is, the tensor returns to itself, up to a sign, after permuting its slots in some speciﬁc way.This type of symmetry encompasses symmetric and antisymmetric matrices, and most of theexamples discussed in the Introduction. For mono-term symmetries, one can address the canonicalization problem purely from the stand-point of permutation groups [10], which are well-studied [9, 13, 14]. Thus using this well-honedtool, we will focus on the mono-term symmetries.First let us formalize the deﬁnition of the problem. It is useful ﬁrst of all to think of a tensormonomial T bdafce (2.2)as a map from a space of slots into a space of labels, which therefore informs us which labelshould go into which slot. We will refer to this map as a conﬁguration g : S → L , where S is thespace of slots and L is the space of labels. The conﬁguration corresponding to (2.2) can then bewritten as g = (cid:32) b d a f c e (cid:33) . (2.3)We note that g is essentially a permutation—if one instead replaces the labels a through f bynumbers 1 to 6, then the notation (2.3) becomes the standard Cauchy two-line notation for a One could also consider promoting ε to be a root of unity. By contrast, to describe a multi-term symmetry of a tensor (such as the one in the algebraic Bianchi identity(1.7)) requires several signed permutations { ( ε , π ) , ( ε , π ) , . . . , ( ε k , π k ) } , and takes the form T a a ...a n − ε T π ( a a ...a n ) − ε T π ( a a ...a n ) − . . . − ε k T π k ( a a ...a n ) = 0 . Written in this way, we see that (2.1) arises as a special case. Dealing with multi-term symmetries eﬃciently is animportant open problem in the symbolic computation literature, which we will not address here. One approachused in the

Cadabra system [6] is to enforce the multi-term symmetry by projection onto Young tableaux, butthis has the tendency to introduce many extra terms into an expression. Another approach is described in [12]and is based on graph isomorphism. g . Thus if we act ﬁrst with a permutation (in cyclic notation)to exchange slots (1 , , and then afterwards act with one to exchange (2 , , we obtain g (cid:48) = (cid:32) b d a f c e (cid:33) = (cid:32) d a b f c e (cid:33) ⇒ T dabfce . (2.4)The slot symmetries of a tensor monomial form a group S which acts on the slot space S .If a tensor monomial has multiple repeated labels, such as when there are dummy pairsor concrete component labels, we may clarify what happens by treating the repeated labels asdistinct. We can then implement their “sameness” via symmetries which exchange the labels.For example, we could have T abbc = T ab b c , (2.5)where we attach a subscript to some of the labels to distinguish identical ones . Then theconﬁguration could be given as (cid:32) a b b c (cid:33) . (2.6)Due to the repetition of labels, even if T has no slot symmetries, the following conﬁgurations areequivalent to (2.6) (assuming the presence of a metric tensor): (cid:32) a b b c (cid:33) , (cid:32) a b b c (cid:33) , (cid:32) a b b c (cid:33) . (2.7)Thus we see that repeated labels of either type give us a symmetry group L that acts on labelspace L . These symmetries are not intrinsic to the tensor, but arise from the context; i.e., havingplaced labels in the slots of the tensor.The mono-term symmetries of a tensor monomial then consist of the union of two distinctsymmetry actions: those symmetries, intrinsic to the deﬁnition of T , which act on the slots S ; and those symmetries, arising from context, which act on the labels L . Again treating theconﬁguration g as a permutation, the group of slot symmetries S act from the right of g , and thegroup of label symmetries L act from the left. Thus the set of all equivalent conﬁgurations ofthe indices of a tensor monomial is the double coset LgS , and the task of index canonicalizationis precisely that of ﬁnding the canonical representative of a double coset. For dummy pairs we adopt the convention that the “downstairs” label comes ﬁrst. This is opposite theconvention of [10, 4].

8n all of this discussion thus far, we have ignored the fact that tensor symmetries are signed permutations. In fact, both groups S and L can have signed permutations; the ones in L cancome from “antisymmetric metrics”, used in spinor formalisms in certain dimensions, which canintroduce minus signs when raising/lowering contracted indices, e.g. ψ α χ α = − ψ α χ α . (2.8)This corresponds, in label space, to the signed permutation − ( α , α ) that exchanges the two(distinguished) α ’s. It is not diﬃcult to expand our formalism to include signed permutations.As was observed in [4], the sign is just an element of a Z group, which can be represented as apermutation group. One just needs to add an extra two columns at the end of g : g = (cid:32) b d a f c e + − (cid:33) , (2.9)and in label space, allocate two arbitrary symbols for tracking the order of the last two columns.The action of the signed permutation − (1 , on (2.9) then gives g (cid:48) = (cid:32) d b a f c e − + (cid:33) , (2.10)and thus, one can work out whether the sign of any expression has ﬂipped by checking the lastpair of columns.We point out that g has a fairly obvious computer implementation which makes clear ournotions of left- and right-actions. If we store the conﬁguration of (2.3), (2.9) in computer memoryas the array g = (cid:104) b, d, a, f, c, e, + , −(cid:105) , (2.11)then the map g can be implemented as array access i (cid:55)→ g (cid:74) i (cid:75) , for i ∈ S . One can easily extendthis notion to the group multiplication of permutations, and thus for (cid:96) ∈ L and s ∈ S , one hasschematically g (cid:48) = (cid:96) (cid:74) g (cid:74) s (cid:75)(cid:75) , (2.12)which corresponds to the stated notions of left and right. Finally, one of the main advantages oftreating the conﬁguration as a slots-to-labels map, rather than labels-to-slots, is that it becomestrivial to order conﬁgurations lexicographically; for example one has that (cid:104) b, a, f, c, e, d, + , −(cid:105) ≺(cid:104) b, c, a, d, f, e, + , −(cid:105) , etc., for the lexicographical ordering ≺ . Our choice to append the signcolumns to the end rather than the beginning is motived by this lexicographic ordering: if a Of course, to encode the symbols of the label alphabet, we should use numbers in the same range as the slotnumbers, so this becomes g = (cid:104) , , , , , , , (cid:105) . So in the end, g is truly just a permutation map. + g and − g , then these two conﬁgurations will be adjacent when the set is sorted lexicographically. Thus it becomes easy to detect when the only tensorconsistent with a given set of index permutations is the zero tensor. In [10], Manssur and Portugal adapt Butler’s algorithm [9] for ﬁnding the canonical representativeof a double coset to the speciﬁc problem of simplifying tensor monomials with dummy indices.Their main modiﬁcation, which results in a highly-eﬃcient algorithm for most tensor monomialsin practice, is to take advantage of the simple and predictable nature of the label permutationgroup L for dummy indices. In [4], Martín-García extends the algorithm of [10] to include dummyindices drawn from multiple separate pools (in the case that a tensor has indices belonging todiﬀerent vector bundles), as well as component labels. The label permutation group L retains asimple structure, which allows these modiﬁcations to run equally eﬃciently.Like many algorithms in ﬁnite group theory, the Butler-Portugal algorithm makes use of theconcepts of a base and strong generating set to deal with the permutation groups in an eﬃcientway. We refer the reader to Appendix A for a brief presentation of these concepts. Notational conventions

Before delving into algorithms, we brieﬂy mention some notational conventions. We will usecurly braces { . . . } to denote sets, in the strict sense of being unordered. To denote arrays orlists, whose elements are ordered by their position, we use angle brackets (cid:104) . . . (cid:105) . To denote arrayaccess, we use double square brackets; thus g (cid:74) i (cid:75) is the i -th entry of g . Double square bracketscan also denote the action of a permutation, if their argument is itself a permutation map ratherthan a single number; thus g (cid:74) s (cid:75) denotes the permutation that results by acting with g on s .The convention for (cid:74) · (cid:75) is left multiplication; whereas in cyclic notation, the equivalent action is right multiplication. Thus the array g (cid:74) s (cid:75) is the same as the array corresponding to s · g in cyclicnotation; that is the action on some element i is “ﬁrst s , then g ”: g (cid:74) s (cid:75)(cid:74) i (cid:75) = g (cid:74) s (cid:74) i (cid:75)(cid:75) = i s · g . (3.1)In the pseudocode of the algorithms we present, we will adopt a few conventions on typefaces.Variables will be indicated by sans-serif type, and arrays will be indicated by Sans-Serif-with-Caps .Function names will take

Small-Caps-Camel-Case . Alternatively, if one wants to use cyclic notation for permutations, it is more convenient to work with theright-action i (cid:55)→ i g , in which case the notions of left and right are reversed: g (cid:48) = s · g · (cid:96) . ≺ in which all of the free indices come ﬁrst, and then the component indices and dummy indicesof various types. For dummy indices, the ordering is deﬁned such that they come in pairs, lowerﬁrst and then upper. Therefore the indices of the tensor T abfgccddee are in lexicographical order. The observation of [10] which contributes most to the speed of its algorithm is that the labelsymmetry group L takes a simple form. As an example, take the tensor monomial T aabbcc , (3.2)whose indices come only in contracted pairs. Acting on the label space L = { a , a , b , b , c , c } ,we can immediately see that the label symmetry group is given by the signed permutations L = { +(1 , , +(3 , , +(5 , , +(1 , , , +(3 , , } , (3.3)and moreover, this is a strong generating set with respect to the base B = (cid:104) , , (cid:105) . Moreimportantly, however, it is simple to re-order the base : If instead we want a strong generatingset with respect to the base B = (cid:104) , , (cid:105) , then we need only conjugate the set (3.3) by the signedpermutation +(1 , , , which obtains L = { +(3 , , +(1 , , +(5 , , +(1 , , , +(1 , , } . (3.4)Butler’s algorithm [9] for the canonical representative of a double coset relies on repeatedlychanging the base order for one of the two subgroups. The best known algorithms for an arbitrarybase rearrangement on generic groups require O ( n ) time [16, 17]. The conjugation operation,however, is much faster. Portugal observed [10] that for tensor canonicalization, the ﬁxed,simple structure of the label symmetry group always allows for a much-streamlined base-change-by-conjugation.A similar situation occurs with component labels. This time take the tensor monomial T , (3.5)with label space L = { , , , } . The label symmetry group is just the symmetric group onfour elements, whose strong generating set we can easily write down: L = { +(1 , , +(2 , , +(3 , } , (3.6)with respect to the base B = (cid:104) , , , (cid:105) . Again, re-ordering the base can be done by conjugation,thus avoiding any more expensive group theory algorithms.11 .2 The slot symmetry group The slot symmetry group S is less well-behaved, because in principle it can be any permutationgroup. However, Butler’s algorithm to ﬁnd the canonical double coset representative only requireschanging the base order of one of the subgroups. Therefore we can leave the base of the slotsymmetry group intact, and change only the base of the label symmetry group.We will adopt a further convenience that the base of the slot symmetry group will be chosento be a complete, position-ordered base . Thus if a tensor has 6 slots, the base chosen for itsslot symmetry group will always be B = (cid:104) , , , , , (cid:105) (or, including the two columns for thesign, B = (cid:104) , , , , , , , (cid:105) ). Typically this results in some redundancy; for example, if a slotsymmetry group only moves points { , , } , then there is no reason to include points { , } in the base. There is some small overhead cost in skipping past the redundant base entries;however, the tradeoﬀ here is simplicity in describing the algorithm. We will always canonicalizewith respect to strict lexicographic order ≺ , as it is more human-readable than alternatives. We are now prepared to describe the Butler-Portugal algorithm for tensor canonicalization. Thebasic strategy is straightforward: Proceed slot by slot from 1 to n and ﬁnd, for each slot, theleast (lexicographically) label which can be moved into that slot via a combination of slot andlabel symmetries. Once a slot has been ﬁlled with the least available label, that slot is frozen and we move on to the next slot. Since the slots are consumed in position-order and the base forthe slot symmetry group is also in position-order, freezing a slot is the same as moving to thenext S ( i ) in the stabilizer chain of S . The labels, on the other hand, are consumed in an orderwhich cannot be predicted before running the algorithm. Thus we must re-order the base of thelabel symmetry group at every step. At each step the least possible label is placed at the ﬁrstavailable position, and thus at the end we are guaranteed to have the arrangement of labels, outof all elements in LgS , that is lexicographically ﬁrst.The main complication of the algorithm is that at each step, there may be multiple diﬀerentways in which the least label can be moved into the current slot, thus bifurcating the searchtree. We are forced to track each possibility, because only at some future step will we be able to We note that this convention is a departure from Portugal’s, which uses an ordering ≺ B deﬁned by the baseorder. So if the base is given as B = (cid:104) , , (cid:105) , one ﬁrst extends this to a complete ordering by appending theremaining slot numbers, B = (cid:104) , , , , , (cid:105) , and then ≺ B is deﬁned relative to this ordering (and hence ≺ B ,for example). The running time of an algorithm can be sensitive to the length of the base, which in turn isinﬂuenced by the choice of base order, and in some cases it may be worth ﬁnding a more optimal choice (however,optimizing the base order is itself a costly operation!). We prefer to keep the positional/lexicographic order forsimplicity, although it is not hard to modify the algorithms here for base-ordering. lgorithm 3.1 The Butler-Portugal algorithm for tensor canonicalization

Input:

Initial conﬁguration g init ; slot symmetry group S ; label symmetry group L Output:

Canonical conﬁguration g can or zero if tensor vanishes procedure Butler-Portugal ( g init , S, L ) Conﬁgs ← { g init } for i ← to n do ∆ Si ← {orbit of slot i under S ( i ) } global-least-label ← n ; Conﬁg-Slots-of-Least-Label ← {} for each g ∈ Conﬁgs do for each j ∈ ∆ Si do ∆ Lg (cid:74) j (cid:75) ← {orbit of label g (cid:74) j (cid:75) under L ( i ) } least-label-in-orbit ← lexicographically-least label in ∆ Lg (cid:74) j (cid:75) if (cid:0) least-label-in-orbit < global-least-label (cid:1) then global-least-label ← least-label-in-orbit Conﬁg-Slots-of-Least-Label ← {} end if if (cid:0) least-label-in-orbit = global-least-label (cid:1) then Conﬁg-Slots-of-Least-Label ← Append (cid:0)

Conﬁg-Slots-of-Least-Label , ( g, j ) (cid:1) end if end for end for L ← Re-order-Base (cid:0)

L, i, global-least-label (cid:1)

Next-Conﬁgs ← {} for each ( g, j ) ∈ Conﬁg-Slots-of-Least-Label do s ← Coset-Rep (cid:0) j, S ( i ) (cid:1) (cid:96) ← Permutation-Inverse (cid:0)

Coset-Rep (cid:0) g (cid:74) j (cid:75) , L ( i ) (cid:1)(cid:1) Next-Conﬁgs ← Append (cid:0)

Next-Conﬁgs , (cid:96) (cid:74) g (cid:74) s (cid:75)(cid:75) (cid:1) end for Conﬁgs ← Remove-Duplicates (cid:0)

Sort (cid:0)

Next-Conﬁgs (cid:1)(cid:1) if (cid:0) Conﬁgs contains equal conﬁgurations + g and − g of opposite sign (cid:1) then return end if end for return g can = ﬁrst element of Conﬁgs end procedure T abc = T bca = T cab , (3.7)and we wish to canonicalize an expression like T cbc U baa , (3.8)where U is some other tensor that has no symmetries. We can ﬁnd three diﬀerent ways to put thelabel a into the ﬁrst slot; one for each of the cyclic conﬁgurations of (3.7), while simultaneouslyexchanging the labels ( a, c ) or ( a, b ) as appropriate, T aba U bcc , T acc U abb , and T aab U bcc , (3.9)and in the last case we have also applied the metric tensor to swap aa (cid:55)→ aa . Since we mustnow freeze the ﬁrst slot, it is clear that only the last conﬁguration in (3.9) will lead to theleast possible arrangement by lexicographic order; however, one cannot see which of these threepossibilities should be kept until we examine the second slot. It is not hard to imagine situationsthat frustrate the decision process for arbitrarily many steps before ﬁnally allowing us to resolvethe correct choice, and thus we must potentially store many intermediate conﬁgurations as wetraverse the search tree. Thus in addition to iterating over the slots, we must also iterate overthe set of intermediate conﬁgurations { g i } . We store the set of conﬁgurations at the current levelof the search tree in a variable Conﬁgs , and prepare the set of conﬁgurations at the next level ofthe search tree in

Next-Conﬁgs .In Algorithm 3.1 we present a high-level view of the algorithm of [10], glossing over somedetails of implementation. The groups S and L are presumed to be stored via their bases andstrong generating sets, and the base for S is ordered by slot position. The algorithm will internallymodify L by re-ordering its base in line 19 once for each slot, when a label is selected to be placedin that slot. Each step of the main iteration over slots is split into two phases: First, in lines 4–18,we obtain the least available label that can be moved into the current slot under the action of thesymmetry groups S ( i ) and L ( i ) , which respectively stabilize the already-consumed slots and thealready-consumed labels. At the same time, we populate the set Conﬁg-Slots-of-Least-Label withordered pairs ( g, j ) where g ∈ Conﬁgs is one of the nodes at this level of the search tree, and j isa slot reachable from i (via some s ∈ S ( i ) ) wherein it is possible to encounter the least-availablelabel (via some (cid:96) ∈ L ( i ) ). In the second phase, from lines 19–25, we iterate through each of the ( g, j ) just collected and populate the set Next-Conﬁgs with the next level of the search tree, givenby (cid:96) (cid:74) g (cid:74) s (cid:75)(cid:75) , where s (cid:74) i (cid:75) = j and (cid:96) (cid:74) g (cid:74) j (cid:75)(cid:75) = global-least-label .The function Re-order-Base (cid:0)

L, i, global-least-label (cid:1) in line 19 re-orders the base of L suchthat the i -th base point becomes global-least-label , while keeping the ﬁrst i − points unmodiﬁed.14he main insight of [10] is that Re-order-Base can be implemented for the label symmetrygroup via conjugation by the appropriate label swap. The function

Coset-Rep (cid:0) j, S ( i ) (cid:1) in line22 returns a group element of S ( i ) which maps slot i to slot j (there may be several such groupelements; it does not matter which is returned). Similarly, the call to Coset-Rep (cid:0) g (cid:74) j (cid:75) , L ( i ) (cid:1) inline 23 returns a group element of L ( i ) which maps the i -th base point (which was earlier set to global-least-label ) onto the label g (cid:74) j (cid:75) . The functions Permutation-Inverse and

Append arestraightforward.We now comment brieﬂy on the data structures needed to represent the various quantitiesin Algorithm 3.1. In order to implement

Coset-Rep we presume that some sort of

Schreiertree structure (various examples are described in [16, 17, 18, 19]) has been constructed in thecourse of enumerating the orbits ∆ Si and ∆ Lg (cid:74) j (cid:75) ; Coset-Rep then constructs a group element bytracing the Schreier tree from the speciﬁed leaf back to the root. In fact, since the base of S isnever re-ordered, it is possible to compute ∆ Si ahead-of-time, if computational time is at a higherpremium than memory space. There are a few diﬀerent options to store all of the ∆ Si togetherwith their Schreier trees, such as a labelled branching [18] or a Schreier vector structure [19]. There are also other small optimizations, such as for the intermediate array

Conﬁg-Slots-of-Least-Label to store not the actual conﬁguration elements g , but rather an index into the Conﬁgs array,etc.

In [10], Portugal shows by direct testing that for the case of many Riemann tensors with randomcontractions, Algorithm 3.1 appears to have an average-case time complexity of O ( n ) for n thenumber of slots (in fact, our measurements in Section 5 show that the particular implementationin Mathematica runs slightly faster then O ( n ) ). While the Riemann tensor is presented as havingone of the more complicated slot symmetry groups that is typically encountered in computations,in fact it represents one of the easier (non-trivial) cases for the algorithm to resolve. The worst-case behavior actually comes from fully symmetric tensors (or fully antisymmetric) when theyare fully contracted, for example: T bdcfae S ebfdac , where T abcdef = T ( abcdef ) . (3.10)Such tensor monomials suﬀer from a re-labelling ambiguity which maximally frustrates thedecision-making of the algorithm, rendering it completely unable to prune the search tree. Letus follow this example through a few iterations of the algorithm to see why: In our particular implementation of the algorithm in Section 4.3, we have chosen a “cube Schreier tree”structure as detailed in [17]. This structure requires more time to create than a standard Schreier tree, but thetree is guaranteed to be balanced and thus minimizes the time used by

Coset-Rep . a into slot 1, whichcorrespond to the following slot and label symmetries: s ∈ S id (1 ,

2) (1 ,

3) (1 ,

4) (1 ,

5) (1 , (cid:96) ∈ L ( a, b ) ( a, d ) ( a, c ) ( a, f ) id ( a, e ) , (3.11)which in turn give the conﬁgurations T adcfbe S eafdbc , T abcfde S ebfadc , T adbfce S ebfdca , (3.12) T adcbfe S ebadfc , T adcfbe S ebfdac , T adcfeb S abfdec . (3.13)Now we must examine slot 2 on each of these conﬁgurations. For each conﬁguration, we will ﬁndthat there are 5 distinct ways to move label b into slot 2, and thus the set of conﬁgurations willgrow to · entries. At the next step, on each of these entries we will ﬁnd 4 ways to movelabel c into slot 3, and so on, so that the largest intermediate result contains

6! = 720 entries,and the total time examining all of the intermediate results is · · · . . . + 6! = 1957 ,multiplied of course by whatever time it takes to execute each iteration of the internal loops.Clearly in such cases the Butler-Portugal algorithm will experience combinatorial explosion inboth time and space. In a sense this behavior is especially frustrating, because to the end-userwith human eyes, the canonical result is “obviously” the one with all of the indices sorted: T abcdef S abcdef , (3.14)and yet the algorithm will take quite a while to work this out. Increase the number of indices to12 and the algorithm will use 24 GB of RAM and take a full day. Increase the number of indicesto 14 and the algorithm (if allowed to write intermediate results to disk) will completely ﬁll a 4TB hard drive over the course of 30 years. This problem is noted by the authors of [5], and they attempt to mitigate it by ﬁrst sortingany tensor products (such as

T S here) by increasing group order of their slot symmetry groups.Thus S , since it has no symmetries, will be put ﬁrst, and then the algorithm becomes linear as there is no longer any ambiguity of intermediate results. This is a dramatic improvement;however, it is still thwarted when both tensor factors have a high degree of symmetry, as in T bdcfae T ebfdac . (3.15) The space requirements here are calculated assuming each conﬁguration is an array of ( n + 2) If one is doing the sort of computation where one expects to see a lot of symmetric/antisymmetrictensors, even of with a moderate number of indices, the behavior of Section 3.4 is troubling.One needs an eﬃcient solution. One naïve strategy is to treat fully (anti)symmetric tensors ina special way, perhaps by simply sorting their indices (and inserting the appropriate sign forantisymmetric tensors). But we can quickly see that this will not be suﬃcient, for two reasons:First, such tensors can appear in tensor products with tensors with other symmetries, and thusto be fully general, an algorithm must be able to deal with arbitrary (anti)symmetric subsets of indices, rather than having special code only for the case of full (anti)symmetry. Second,when including dummy contractions in such tensor products, it is not enough merely to sort the(anti)symmetric subsets of indices, because dummy labels may be exchanged while canonicalizingneighboring tensors, thus causing the notion of “sorted” to change as well. A more intelligentapproach is needed.

It turns out that a useful way to approach the problem is to employ the

Penrose graphical notation for tensor contractions, ﬁrst published in [28] (see also [29] for an alternative version). In thisnotation, tensors are represented by various (arbitrary) shapes, and their indices are representedby lines emanating from these shapes. For example, one might have tensors θ and χ given by: θ a bc = θ abc , χ ab c d = χ abcd . (4.1)Lines going to the top of the diagram represent upper indices, while lines going to the bottomof the diagram represent lower indices. To represent contracted indices, we merely connect the17ines: χθ fa bc d e = θ af c χ bfde . (4.2)Of course, the label f is meaningless; the important information is the line connecting θ and χ which shows the contraction. One also has special symbols for the metric, inverse metric, andKronecker delta, a b = g ab , a b = g ab , a b = δ ba , (4.3)which are made only of connecting lines, and which obey the obvious graphical relation g ac g cb = a c b = a b = δ ba . (4.4)Combining these with the shapes for tensors like θ and χ , one can create diagrams for tensorcontractions of arbitrary complexity. There are some additional features of the notation whichwe will not need here; we refer the reader to [28] for more detail. For our purposes, it is useful to enlarge the diagrams a bit and write in the slot number to whicha line is attached. For example, the tensor contraction T abcdef U edfcgh (4.5)can be represented by the diagram in Figure 1, where the slots are numbered 1-12 in the orderthey appear in (4.5).It is instructive to start from the arrangement in Figure 1 and walk through the steps a human might take to canonicalize it. Suppose, for example, that the tensor T is totally symmetric overthe slots { , , , } . This situation is depicted in Figure 2(a). The dummy labels c, d, e, f arenot meaningful, but the objective is to ﬁnd a combination of slot- and label-rearrangementswhich minimizes (4.5) with respect to lexicographical ordering. The ﬁrst step is to “disentangle”18 U a b c d e f g h Figure 1:

Penrose graphical notation for the tensor contraction T abcdef U edfcgh . The slots arenumbered 1-12 in positional order. the contracted edges in Figure 2(a) by moving the endpoints which are attached to the slots { , , , } where the slot symmetry acts. This results in T abedfc U edfcgh , (4.6)as shown in Figure 2(b). The ﬁnal step is to re-name all of the dummy labels in order ofappearance, which produces T abcdef U cdefgh , (4.7)as shown in Figure 2(c). Assuming the tensor U has no slot symmetries, then we have achievedthe least possible lexical order, with the dummies (cid:104) c, d, e, f, c, d, e, f (cid:105) repeated in order.The above process is a strict application of the available symmetry groups: ﬁrst using slotsymmetries to make the two sets of dummies match in order (cid:104) e, d, f, c, e, d, f, c (cid:105) , and then usinglabel symmetries to rename these into the correct order. It is easy to describe this process inwords, but to put it in an algorithm which visits each slot only once, scanning from left to right,is a bit non-trivial. Since the Butler-Portugal algorithm does not backtrack once it has made adecision to place a label in a slot, it must instead retain enough information that it can makethe decision later. In this case, it will see ways to put (cid:104) c, d, e, f (cid:105) into slots (cid:104) , , , (cid:105) , and onlybegin to resolve which of those possibilities to retain when it reaches slot 7. The decisions neededto reduce possibilities down to the 1 correct one are not fully made until it visits slot 10.Of course, we do not wish to introduce backtracking into the algorithm. Instead, we willslightly redeﬁne the problem in a way that makes it amenable to a strict left-to-right passwithout having to retain (and iterate over!) a factorial number of intermediate states. Our mainobservation is that the symmetries at one end of an edge in a Penrose tensor diagram inducesymmetries of the opposite end . Thus the slot symmetries of tensor T can “propagate” along theconnecting lines to tensor U , and induce new symmetries which appear to act on the slots of U (as we shall see, however, they actually act on the labels!). This fact is somewhat obviousfrom the Penrose graphical notation, but is obscured in the standard index notation because the“propagation” of symmetries along contractions may be eﬀected only by executing slot and label19 U a b c d e f g h (a) Initial conﬁguration. T U a b e d f c g h (b) Rearrange { , , , } . T U a b c d e f g h (c) Relabel dummies.

Figure 2:

We use the total symmetry group on slots { , , , } to “disentangle” the contractions.In (a) , we have the initial conﬁguration T abcdef U edfcgh . In (b) , we act with symmetry to re-arrangethe ends attached to { , , , } , giving us T abedfc U edfcgh . Finally in (c) , we re-name the dummylabels, giving T abcdef U cdefgh . symmetries at the same time.To illustrate, we walk through the same example. We begin with (4.5), as shown in Fig-ure 3(a). Now, instead of rearranging the endpoints attached to { , , , } , let us actually freezethem in place; after all, the labels there are already (cid:104) c, d, e, f (cid:105) , which is what we want. Butthere is a symmetry among the slots { , , , } which we have not used, and which is crucialto obtain the correct result. Let us “propagate” this symmetry forward, along the legs attachedat { , , , } , to the slots { , , , } on tensor U . In fact, for reasons we will soon explain, wewill consider this symmetry to be associated to the labels { e, d, f, c } which are present at slots { , , , } , as depicted in Figure 3(b).The tensor U does not actually have a symmetry in the slots { , , , } , but by simulta-neously applying the slot symmetries on { , , , } and the standard label symmetries, one cangive the appearance that U has such a symmetry: T abcdef U edfcgh = T abcdef U fedcgh = T abcdef U cdefgh = etc. (4.8)This is what we mean by a “propagated” symmetry. Thus we may freeze the T -attached endpointsof the contractions in place, and treat the U -attached endpoints as though they are free to move.This allows us to rearrange the labels { e, d, f, c } on tensor U independently of those on tensor T ,and thus reach the canonical conﬁguration as shown in Figure 3(c).With this notion of symmetry propagation along dummy contractions, we can eﬀectivelyimplement the necessary decision process in a single left-to-right pass without storing manyintermediate conﬁgurations. Instead we must store information about the symmetries to bepropagated forward. For arbitrary symmetry groups, this might become unwieldy; but if thismethod is used only for totally (anti)symmetric subgroups, then it is simple to implement.One needs only a single array of length n to store all of the information regarding propagated20 U a b c d e f g h (a) Initial conﬁguration. T U a b cde f g h (b) Propagate symmetry. T U a b c d e f g h (c) Re-arrange { c, d, e, f } . Figure 3:

Alternate way to view the process. In (a) , we have the initial conﬁgura-tion T abcdef U edfcgh . In (b) , we freeze slots { , , , } in place. The conﬁguration remains T abcdef U edfcgh , but we propagate the symmetry along the edges, so that the original { , , , } slot symmetry of T becomes a label symmetry of the labels attached to { , , , } of U . Finally,in (c) we apply this label symmetry to re-arrange the U -attached endpoints (remembering thatthe T -attached ends do not participate!), giving the ﬁnal conﬁguration T abcdef U cdefgh . symmetries (as will be described in Section 4.2). One can then take advantage of this informationto make a single deﬁnite decision at each new visited slot, completely eliminating the O ( n !) intermediate steps needed in the Butler-Portugal approach.We have stated that slot symmetries which have been propagated along dummy contractionsbecome label symmetries, and this requires some explanation. Up until now, the tensor U had no symmetries of its own. But suppose instead it had a single symmetry (9 , , which exchanges the last two pairs of slots. Our lexicographical ordering prioritizes free indicesahead of dummies, so we should use this symmetry to move (cid:104) g, h (cid:105) ahead of (cid:104) f, c (cid:105) . But thepropagated symmetries, which originated from total symmetry of slots { , , , } of T , mustremain associated to the contraction edges which attach to slots { , , , } . Therefore, thepropagated symmetries will now be associated with slots { , , , } as shown in Figure 4. Itis clear that we should really associate the propagated symmetry with the labels { e, d, f, c } , andthus the generators { ( c , d ) , ( d , e ) , ( e , f ) } (4.9)should be appended to the label group L (here, as before, the subscript refers to the appearanceof the label as an upper index; namely on the tensor U ). In this section we give several data structures which are needed to implement the symmetry-propagation mechanism described in Section 4.1.1. These data structures are also useful forimproving the basic Butler-Portugal algorithm as they can reduce most of the label-group com-21 U a b cde f g h Figure 4:

Symmetry propagation with additional slot symmetries. If U also has a slot symmetry,for example (9 , , , then the propagated symmetry (over the labels { c, d, e, f } ) must moveto the new slots. Hence propagated symmetries are label symmetries; not slot symmetries. putations of Algorithm 3.1 to mere array lookups. These data structures we will refer to as the Values array, the

Label-Groups array, the

Symmetric-Subsets array, and the

Propagated-Symmetries array. All of these are simple length- n arrays, and thus using these data structures incurs anadditional O ( n ) memory cost only.We will also describe Conﬁg-Slots-of-Least-Value , which is a slightly diﬀerent way of trackingthe current level of the search tree, and the notion of a

Least-Value-Set , which is just a usefulway to group the entries of

Conﬁg-Slots-of-Least-Value . Values array

Lines 4–18 of Algorithm 3.1 are occupied with ﬁnding the least available label which can bemoved into the current slot. The costliest parts of this code are line 4 and line 8, where theorbits ∆ Si and ∆ Lg (cid:74) j (cid:75) are computed. In fact, the slot orbit ∆ Si can be computed once-and-for-all at the time a tensor and its symmetries are declared, and stored in a type of Schreier treestructure, which we will not detail here (see, for example, [16, 17, 18, 19]). However, the labelorbit ∆ Lg (cid:74) j (cid:75) depends on which labels have been placed in a tensor’s slots, which comes from thecontext of a speciﬁc tensor expression. It would appear that this must be computed on-the-ﬂy.However, we can introduce the concept of the value of a label, which can achieve two usefulthings simultaneously: First, it can reduce lines 8 and 9 of Algorithm 3.1 to a simple arraylookup, thus skipping the computation of ∆ Lg (cid:74) j (cid:75) altogether; second, it can concisely encode all the data as to which indices are free, or belong to a given dummy set or repeated set, into asingle array of length n . The concept is simple, and satisﬁes the properties of Deﬁnition 1:Take, for example, the tensor expression T abbc = T ab b c , (4.10) The exception is when a label belongs to a dummy set for a vector bundle on which there is no deﬁnedmetric. Then upstairs labels cannot mix with downstairs ones. The mechanics of this case will be explained inSection 4.2.2. eﬁnition 1: Properties of the

Values array.

V1.

The value of a label is equal to the (number encoding) the least label in the group to whicha label belongs;

V2.

Each label in a group of interchangeable labels has the same value;

V3.

Every free index belongs to its own group;

V4.

A group of interchangeable labels is usually contiguous in label space. where as before, we insert the subscripts , to distinguish identical labels. The labels have anatural lexicographical ordering (cid:104) a, c, , , b , b (cid:105) which we use as the index into the array. Thenthe Values array can be written Label a c b b Value . (4.11)As a more elaborate example, consider again the contraction T abcdef U edfcgh = T abc d e f U e d f c gh . (4.12)The Values array corresponding to this expression is given byLabel a b g h c c d d e e f f Value . (4.13)Notice in each case that all labels which can be interchanged by label symmetries have the samevalue; moreover, that value is equal to the (numerical) index of the least label itself. Then, forexample, if one needs to know the least label that can be moved into slot 7 of (4.12), the processis simple:1. Look up the label in slot 7 of (4.12): e .2. In the Values array (4.13), look up the value of e : 5.3. In the list of sorted labels (cid:104) a, b, g, h, c , c , d , d , e , e , f , f (cid:105) , take the 5th entry: c .and thus we see that c is the least label that can be moved into slot 7, without having toexplicitly enumerate the orbit ∆ Lg (cid:74) (cid:75) .Of course, since the algorithm visits the slots in order, we should have already placed a , b ,and four of the dummy labels by the time we reach slot 7. Once a label has been consumed23 able 1: Group codes for

Label-Groups array.

Code Meaning ∅ No label exchange ∼ Component labels such as (cid:104) , , , (cid:105) + Dummy labels with symmetric metric such as g ab − Dummy labels with antisymmetric metric such as ε αβ ⊥ Lower dummy labels with no metric (cid:62)

Upper dummy labels with no metricby placing it in a slot, it is no longer interchangeable with the unused labels, and the

Values array must reﬂect that fact for the correct functioning of the algorithm. As we shall detail inSection 4.3, the

Values array will be updated at the end of every step to ensure that the propertiesof Deﬁnition 1 continue to hold.

Label-Groups array

The

Values array can eﬃciently keep track of which subsets of labels are interchangeable; however,it does not contain information about how they can be interchanged. In Algorithm 3.1, thisinformation is contained in the label symmetry group L , which must be continually updated inline 19 via the procedure Re-order-Base whenever any two labels are swapped. The maininsight of [10] is that

Re-order-Base can be implemented by conjugating the members of L ,which is much faster than a generic base-change algorithm.However, even this can be improved upon. To this end, we introduce another data structureof length n , the Label-Groups array. Instead of literally storing a base-and-strong-generating-setfor the label group L , we will store an array of group codes which indicate how labels may beexchanged among a given equal-value set. Like the Values array, the

Label-Groups array is indexedby the labels, in lexicographic order, and thus should be thought of as living “in label space”.Each entry in the

Label-Groups array shall be one of the group codes as described in Table 1(these can be implemented, for example, by an enum type or similar).It is useful to display the

Values array along with the

Label-Groups array, as together theseconstitute all the information about the label symmetry group L . To revisit our earlier examples,the expression T abbc = T ab b c (4.14)24orresponds to the Values and

Label-Groups

Label a c b b Value

Group ∅ ∅ ∼ ∼ + + , (4.15)and the expression of our earlier example, T abcdef U edfcgh = T abc d e f U e d f c gh , (4.16)corresponds to Label a b g h c c d d e e f f Value

Group ∅ ∅ ∅ ∅ + + + + + + + + , (4.17)if we assume that there is a metric which can exchange upper and lower indices. If no metricexists, then we must remember that only labels which can be exchanged can have the same value;thus the Values and

Label-Groups arrays will readLabel a b g h c c d d e e f f Value

Group ∅ ∅ ∅ ∅ ⊥ (cid:62) ⊥ (cid:62) ⊥ (cid:62) ⊥ (cid:62) . (4.18)In this case, the set of labels with value 5 is not contiguous (likewise the set with value 6), andthus our algorithm cannot strictly rely on the Values array being a non-decreasing sequence.However, the entries in the

Label-Groups array give enough hinting to deduce that the

Values array must have a contiguous sequence of (cid:104) , (cid:105) blocks, and this fact we can take advantage of.In any of these cases, the Values and

Label-Groups arrays in (4.15), (4.17), and (4.18) containsuﬃcient information to reconstruct any necessary label exchange to bring the least label into aslot containing a given label. Suppose, for example, we are given the label e in the presence ofa symmetric metric, thus corresponding to (4.17). We construct the label exchange as follows:1. The value of e in (4.17) is 5, which means it can be swapped with the label in the 5thposition, c .2. The group of e is + , which means that each of the swaps ( c , e )( c , e ) and ( e , e ) areallowed.3. The position of e is 10; since (10 − is odd, we determine that both exchanges must bemade in order to map e (cid:55)→ c . This can be achieved by ( e , e ) · ( c , e )( c , e ) = ( c , e , c , e ) . (4.19)25y contrast, suppose instead that we have no metric, as in (4.18). In this case, the label exchangecan be constructed as follows:1. The value of e in (4.18) is 6, which means it can be swapped with the label in the 6thposition, c .2. The group of e is (cid:62) , which means it is an upper index. In our conventions, the lowerindices are ordered before each corresponding upper one; therefore we conclude that eachof c , e should be paired with the label immediately to its left in the array.3. With this information, we can construct the necessary label exchange: ( c , e )( c , e ) .Thus we can see how the Values and

Label-Groups arrays can be used to quickly circumventmany of the more expensive steps in the inner loops of Algorithm 3.1. It should also be straight-forward how one might construct these arrays in the ﬁrst place; the

Values are merely a meansof encoding the various sets of indices that appear in an expression, and the

Label-Groups areﬁxed entirely by the combination of label types and type of metric tensor currently deﬁned (ornot deﬁned) on a given vector bundle.

Symmetric-Subsets array

The previous data structures can be used to improve the performance of the standard Butler-Portugal algorithm even without considering the new concepts introduced in Section 4.1.1. In thissection and the next, we will introduce further data structures for implementing the symmetry-propagation ideas which allow fast canonicalization of totally (anti)symmetric subsets of indices.The ﬁrst of these is the

Symmetric-Subsets array, which is an extra input to the algorithm, oflength n , which stores condensed information on where to ﬁnd (anti)symmetric subsets in the slotsymmetry group S . The Symmetric-Subsets array can be generated by O ( n ) group-membershiptests (slightly modiﬁed for signed permutations) to ﬁnd the elementary length-2 cycles whichgenerate each (anti)symmetric subgroup (see [9, 13, 14]). Since the information contained in Symmetric-Subsets is derived only from the slot symmetry group S , it can be obtained ahead oftime at tensor declaration time and stored at O ( n ) cost, which is trivial compared to the cost ofthe Schreier vector structure needed for S itself. The structure of the

Symmetric-Subsets array is simple: The positions in the array represent slots (and thus

Symmetric-Subsets lives in slot space, in contrast to

Values and

Label-Groups We point out that the

Symmetric-Subsets array in fact describes redundant information which is alreadystrictly present in the Schreier vector structure of S . However, this information is in a special format whichallows quick manipulation by taking advantage of the simple structure of totally (anti)symmetric groups. Onecould set Symmetric-Subsets to the zero array and the algorithm will still produce correct results, but more slowly. eﬁnition 2: Properties of the

Symmetric-Subsets array.

S1.

A zero entry indicates that a slot does not participate in any (anti)symmetric subsets.

S2.

Positive integers represent a symmetric subset of slots. All slots with the same positiveinteger can be exchanged with each other.

S3.

Negative integers represent antisymmetric subsests. All slots with the same negative integercan be exchanged at the cost of a minus sign.

S4.

In addition, we shall require that the integers be chosen in a sequence of increasing absolutevalue, and that each absolute value is used for only one subset (thus, 1 and − should notboth appear, but rather 1 and − , etc.).which live in label space). Each entry is an integer, with meanings as given in Deﬁnition 2. Forexample, suppose we have a tensor T abcdef which is symmetric on the last 4 slots; thus the slotsymmetry group S is given by the strong generating set S = { +(3 , , +(4 , , +(5 , } . (4.20)The corresponding Symmetric-Subsets array is then given bySlot

Subset . (4.21)Similarly, the Riemann tensor R abcd has the slot symmetry group given by the strong generatingset S = {− (1 , , +(1 , , , − (3 , } , (4.22)and the corresponding Symmetric-Subsets array isSlot

Subset − − − − . (4.23)Finally, consider the contraction T abcdef R cdgh . This has the slot symmetry group given by S = { +(3 , , +(4 , , +(5 , , − (7 , , +(7 , , , − (9 , } , (4.24)and the Symmetric-Subsets array will readSlot

Subset − − − − . (4.25)27f course, we know that the expression T abcdef R cdgh = 0 because the contraction of the indices cd will bring the symmetry of slots (cid:104) , (cid:105) in conﬂict with the antisymmetry of slots (cid:104) , (cid:105) . Thus wecan appreciate the usefulness of having pre-computed the structure in (4.25); in cases like thisone, this will allow our algorithm to detect zero contractions much earlier than Butler-Portugal,as will be shown in Section 5.In addition we note that Symmetric-Subsets , being a container for slot-symmetry information,does not take into account that cd are dummy indices while ef are free; the slots { , , , } participate in the same slot symmetry regardless of which indices they contain. Propagated-Symmetries array

In this section we will discuss the

Propagated-Symmetries array, whose purpose is to track theinformation needed to implement the symmetry-propagation mechanism of Section 4.1.1. Thetypes of information that need to be stored are twofold: First, one needs to know which portionsof the

Symmetric-Subsets are attached to dummy indices; second, one needs to know where theother “ends” of those dummy contractions reside, in the sense of the Penrose graphical notationintroduced in Section 4.1. Both of these needs can be served using a single array of length n , whose structure is similar to that of Symmetric-Subsets , but with an additional distinctionbetween odd- and even-numbered entries, which will be explained shortly.Before deﬁning the entries in

Propagated-Symmetries , however, we must decide whether itshould be indexed by slots or labels. We ﬁrst note that the symmetries described in

Propagated-Symmetries are label symmetries, as suggested by Figure 4, and thus it seems sensible for

Propagated-Symmetries to live in label space. However, as will be explained in Section 4.2.4,the

Propagated-Symmetries array can never be accessed directly , because each conﬁguration g ∈ Conﬁgs which we must iterate over (as in line 6 of Algorithm 3.1) may have been reachedby a diﬀerent combination of slot and label symmetries. That is, the labels may have beenrenamed, and/or the slots shifted. The total conﬁguration is given by g = (cid:96) (cid:74) g init (cid:74) s (cid:75)(cid:75) where ( (cid:96), s ) are the independent label- and slot-actions which have been accumulated in the course of thealgorithm. The

Propagated-Symmetries array is a global variable whose information is neededin every conﬁguration we iterate over. Thus if

Propagated-Symmetries is label-indexed, we mustknow (cid:96) in order to access it properly; alternatively, if it is slot-indexed, we must know s . Neitherconvention seems to oﬀer a computational advantage. We will choose Propagated-Symmetries tobe slot -indexed, despite that fact that its entries represent label symmetries, because we ﬁnd iteasier to relate to the diagrams of Section 4.1 this way.The properties satisﬁed by

Propagated-Symmetries are given in Deﬁnition 3. Of all of the In our version of the Butler-Portugal algorithm in Algorithm 3.1, we store only the total conﬁguration g ,whereas in the original version in [10], the label- and slot-actions ( (cid:96), s ) are stored separately. eﬁnition 3: Properties of the

Propagated-Symmetries array.

P1.

A zero entry indicates that the label at a given slot does not participate in any(anti)symmetric subsets.

P2. An odd integer (positive or negative) indicates that the label at a given slot has a symmetryor antisymmetry, according to the sign of the integer. All labels at slots with the sameinteger participate in the same symmetry. Odd integers may be entered for both dummylabels and component labels. P3. An even integer (positive or negative) indicates that the dummy label at a given slot has asymmetry/antisymmetry which has been propagated from its corresponding partner withodd-numbered entry. If the even-numbered entry is k , then its partner symmetry is theodd number of the same sign, and one less in absolute value, (sign k ) · ( | k | − . P4.

The process which creates the entries shall work left-to-right in slot order, alternativelyenumerating the odd-numbered (anti)symmetric sets and then propagating to their corre-sponding even-numbered partners. If at any time this process would cause a new (nonzero)entry to overwrite an old (nonzero) entry, the entry with lower absolute value shall takeprecedence (such conﬂicts can happen when both ends of a contraction are attached to the same set of symmetric slots, or when the “far” end of a contraction is attached to anothersymmetric subset, distinct from the subset to which the “near” end is attached).

P5.

If there are labels whose slots have the same

Symmetric-Subsets entry, but whose

Values en-tries are diﬀerent (thus, they belong to diﬀerent component-label or dummy sets), then theywill be given diﬀerent odd integers; likewise, any of their propagated partners must receivediﬀerent (and corresponding) even integers. Thus the entries in

Propagated-Symmetries must always represent valid label symmetries.

P6.

Nonzero entries of either type will be entered only if they are shared by at least twodummy labels ( after applying Rule P4); “symmetric subsets of length 1” are equivalent tonon-symmetric labels, and should have the entry 0.

P7.

As with the

Symmetric-Subsets array, odd integers will be chosen in a sequence of increasingabsolute value, and each absolute value will be used for only one subset. Even integers arealways entered as partners of odd ones, and hence also form such a sequence. The even-numbered sequence may have gaps, either because component labels do not have partners,or as a result of Rule P4. 29ata structures we will introduce, this one has the most complex system of rules, and someexamples will help clarify how it is to be ﬁlled out. First, suppose we have a tensor T abcdef whichis symmetric in its last 4 slots, and thus has the Symmetric-Subsets arraySlot

Subset . (4.26)We have previously considered the expression T abbc = T ab b c , (4.27)whose Values array is Label a c b b Value . (4.28)Now consider how to ﬁll out its Propagated-Symmetries array according to the rules in Deﬁnition 3.It has one subset of equivalent component labels { , } , but they are in slots (cid:104) , (cid:105) which do nothave a slot symmetry. Therefore the (cid:104) , (cid:105) entries of Propagated-Symmetries should be 0. Next,there is one dummy pair, { b , b } in slots (cid:104) , (cid:105) . These slots are in a symmetric subset with eachother, and thus should receive the odd-number entry 1. Because of the precedence established inRule P4, there will be no entries for propagated symmetries, because both ends of the b (cid:94) b contraction are already attached to the same symmetric subset. The Propagated-Symmetries array is therefore Slot

Symmetry . (4.29)Next, consider a slightly more elaborate example: the expression T abcdef R cdgh . Again we take T to be symmetric on its last 4 slots, and R is the Riemann tensor. We ﬁrst write out the Values array (remembering that free indices are always sorted before dummies):Label a b e f g h c c d d Value . (4.30)Next, the Symmetric-Subsets array is as in (4.25):Slot

Subset − − − − . (4.31)Finally, to construct the Propagated-Symmetries array, we note that the only non-free labels arein slots { , , , } , and by the precedence given in Rule P4, we obtainSlot Symmetry . (4.32)30n particular, we do not get − , − in slots (cid:104) , (cid:105) . This we see that Propagated-Symmetries allowsus to “look ahead” and see that this contraction must be zero, because the entries in slots (cid:104) , (cid:105) of Propagated-Symmetries disagree in sign from those in (cid:104) , (cid:105) of Symmetric-Subsets .As a ﬁnal example, consider the expression T ab cd R cedf , again with the same tensors. Thistime, the Values array is Label a b e f c c d d Value , (4.33)and the Symmetric-Subsets array remains the same as in (4.25):Slot

Subset − − − − . (4.34)However, to construct the Propagated-Symmetries array, we must apply Rule P5, since there areboth component labels and dummy labels appearing in the symmetric subset of slots { , , , } ,and these label sets have diﬀerent entries in the Values array. The symmetric subset “splits” intotwo pieces, giving the

Propagated-Symmetries arraySlot

Symmetry . (4.35)Notice that slots { , } must be marked , , which shows that they have an induced symmetrypropagated from slots { , } . There are no entries with the number 2, because the labels in slots { , } are component labels and do not have partners.Finally, note that because we have chosen Propagated-Symmetries to be slot -indexed, we muststore the slot symmetry s which was used to reach the current conﬁguration g from the initial one g init . In Algorithm 4.1, we will re-deﬁne Conﬁgs to be the set of ordered pairs ( g, s ) . Alternatively,one could store ( (cid:96), s ) as was done in the original implementation of Butler-Portugal [10] and usethe relation g = (cid:96) (cid:74) g init (cid:74) s (cid:75)(cid:75) ; however, we ﬁnd it more useful to store ( g, s ) as (cid:96) alone is nottypically needed. Conﬁg-Slots-of-Least-Value array and its

Least-Value-Set entries

One ﬁnal data structure we will need is the

Conﬁg-Slots-of-Least-Value array, which replaces the

Conﬁg-Slots-of-Least-Label array appearing in Algorithm 3.1. The overall purpose of

Conﬁg-Slots-of-Least-Value will be the same: to store the conﬁgurations and slot numbers at which one canﬁnd a label with least value. However, because the properties of the label symmetry group are If we had chosen

Propagated-Symmetries to be label-indexed, we would instead need to store ( g, (cid:96) ) . Values , Label-Groups , and

Propagated-Symmetries ,some slight changes will be necessary.First, as mentioned in Section 4.2.4, the

Conﬁgs array will now have to store ordered pairs { ( g, s ) } where g is the total slot-to-label map of the conﬁguration, and s is the total action ofthe slot symmetry group which brought us to g = (cid:96) (cid:74) g init (cid:74) s (cid:75)(cid:75) . Therefore each entry in the Conﬁg-Slots-of-Least-Value array must contain a reference to the ordered pair ( g, s ) ∈ Conﬁgs from whichthe current least-value slot numbers originate (that is, ( g, s ) is a reference to the parent node inthe search tree).In addition, each entry of Conﬁg-Slots-of-Least-Value must contain an ordered pair slot num-bers ( p, q ) rather than just a single one. The reason for this is because label symmetries cancome from two diﬀerent sources: the original label symmetry group L which is represented bythe combination of Values and

Label-Groups ; and the new label symmetries which come frompropagation of slot symmetries, as represented in

Propagated-Symmetries . Thus the

Conﬁg-Slots-of-Least-Value array will have the following structure:Conﬁguration ( g , s ) ( g , s ) . . . ( g m , s m ) Slot pair ( p , q ) ( p , q ) . . . ( p m , q m ) , (4.36)where one can think of each ( g k , s k ) as a reference into the Conﬁgs array. We do not expect eachof the ( g k , s k ) to be distinct, as it is possible that a given conﬁguration contain multiple labels of(equal) least value in the orbit of the slot currently under consideration. We will ﬁnd it useful tocollect together all of the Conﬁg-Slots-of-Least-Value coming from the same conﬁguration ( g, s ) into a Least-Value-Set ; thus, the

Least-Value-Set corresponding to a given ( g, s ) is precisely theset of child nodes of the search tree under the parent node ( g, s ) . The meanings of the slotpairs ( p k , q k ) are as follows:1. The p k are the slot numbers in the slot-symmetry orbit ∆ Si where a least-value label canbe found directly.2. The q k are the slot numbers where a least-value label can be found by application of Propagated-Symmetries , even if that slot is outside the orbit ∆ Si .Usually, one has p k = q k for each k , except in the case where the Propagated-Symmetries havebeen used to step outside the usual slot-symmetry orbit to ﬁnd the least-value label. Then p k gives the point in the slot-symmetry orbit we jumped away from, and q k gives the placewhere the least-value label was actually found. These two slot numbers together give us implicit One could alternatively use a Schreier-tree structure to store the entire label group as was done in Algo-rithm 3.1, but this would sacriﬁce quite a bit of speed, as one would have to add the propagated symmetries by“sifting” them into the Schreier tree, and then re-compute orbits, etc.

Having developed the necessary preliminary concepts, we now present the improved algorithm.We will ﬁrst give an overview in Algorithm 4.1 of the main loop which iterates once over the slotsof the initial conﬁguration g init . There are several subprocedures called by the main loop, mostof whose code we relegate to Appendix B, with the exception of Append-Non-Redundant-Instances which contains the essential logic that prevents iteration over redundant branches ofthe search tree.The general strategy of Algorithm 4.1 is the same as Algorithm 3.1: namely, to visit eachslot one at a time from left to right, determine the least label which can be moved into that slot,place that label, and then move on to the next slot. As before, if there are multiple ways inwhich the (same) least label can be moved into the current slot, then the search tree bifurcates,and the next iteration must look at all of the resulting possibilities. The key diﬀerence is thatin Algorithm 4.1, we make an eﬀort to prune these extra branches whenever they would beredundant. Not all types of redundancy are detected, but we do handle the most common caseof (anti)symmetric subgroups ﬁlled with dummy labels. We apply the symmetry propagationconcepts of Section 4.1.1 in order to make an early decision about placing such dummy labelsinto slots, while preserving the symmetry information needed to ensure that later slots can becanonicalized. The same information can be used for early detection of zero results.The main loop over the slots of g init runs from lines 4 through 21 of Algorithm 4.1. In line 5,we obtain the orbit ∆ Si of the current slot i under the slot symmetry group S (an operation whichcan be made quite fast if we have stored S in the form of a Schreier tree structure ahead of time).In line 6, we populate the Conﬁg-Slots-of-Least-Value array described in Section 4.2.5, as well asrecord the least-value label which can be moved into this slot (either via the slot symmetry group S , or via the extra label symmetries in Propagated-Symmetries ).Next in lines 8 through 15, we iterate over the next level of nodes of the search tree, groupedby their parent node ( g, s ) ∈ Conﬁgs in the form of a

Least-Value-Set . On each

Least-Value-Set we make two passes; ﬁrst to iteratively add information to the

Propagated-Symmetries array, andthen to use the information to make decisions about which leaf nodes to append to

Next-Conﬁgs .Note that the data in

Propagated-Symmetries is ﬁlled out according to the rules in Deﬁnition 3,but it is not done all at once; rather, only the

Symmetric-Subsets which are encountered within Thus

Conﬁg-Slots-of-Least-Value could be implemented as a multi-dimensional array containing

Least-Value-Set ’s, although one must take into account that each

Least-Value-Set may be of diﬀerent length. For the timingdata in Section 5, we have chosen a ﬂat implementation. lgorithm 4.1 A faster algorithm for tensor canonicalization

Input:

Conﬁguration g init ; slot symmetry group S ; Values , Label-Groups , and

Symmetric-Subsets

Output:

Canonical conﬁguration g can or zero if tensor vanishes procedure Canonicalize ( g init , S, Values , Label-Groups , Symmetric-Subsets ) Conﬁgs ← { ( g init , id) } Propagated-Symmetries ← (cid:104) , , . . . , (cid:105) n for i ← to n do ∆ Si ← {orbit of slot i under S ( i ) } { least-value , Conﬁg-Slots-of-Least-Value } ←

Get-Least-Value-Instances (cid:0) i, ∆ Si , Conﬁgs , Values , Propagated-Symmetries (cid:1) Next-Conﬁgs ← {} for each ( g, s ) ∈ Conﬁgs do Least-Value-Set ← {all entries of Conﬁg-Slots-of-Least-Value coming from ( g, s ) } Propagated-Symmetries ← Update-Propagated-Symmetries (cid:0)

Least-Value-Set , ( g, s ) , least-value , Label-Groups , Symmetric-Subsets , Propagated-Symmetries (cid:1) if (cid:0) Zero-Due-to-Propagated-Symmetries (cid:0) ( g, s ) , least-value , Label-Groups , Symmetric-Subsets , Propagated-Symmetries (cid:1)(cid:1) then return end if Next-Conﬁgs ← Append-Non-Redundant-Instances (cid:0)

Next-Conﬁgs , Least-Value-Set , ( g, s ) , least-value , S, i, Label-Groups , Symmetric-Subsets , Propagated-Symmetries (cid:1) end for { Values , Label-Groups } ←

Update-Values-and-Label-Groups (cid:0)

Values , Label-Groups , least-value (cid:1) Conﬁgs ← Remove-Duplicates (cid:0)

Sort (cid:0)

Least-Value-Set are entered. Thus at any given time before the last iteration ofthe algorithm, the

Propagated-Symmetries array is incomplete; however, it always contains justenough information to be used in

Append-Non-Redundant-Instances . In lines 16 through 20, we do some ﬁnal steps before moving on to the next slot iteration. Firstwe must update the

Values and

Label-Groups arrays to maintain the properties in Deﬁnition 1and Table 1. This eﬀectively implements “re-ordering the base” for the label symmetry group, asthe lowest-value label will be marked as used and its exchange symmetry with other labels willbe erased. Finally we sort and remove duplicates from

Next-Conﬁgs , copy the result back into

Conﬁgs , and check for zero.The subprocedures

Get-Least-Value-Instances , Update-Propagated-Symmetries ,and

Update-Values-and-Label-Groups are intended to populate the various data structuresdeﬁned in Section 4.2 and maintain the conditions of their deﬁnitions for each loop iteration. Theconditional

Zero-Due-to-Propagated-Symmetries constitutes the early check for zero usingthe extra information assembled in the data structures of Section 4.2. However, we emphasizehere that this check is not optional; rather, it is necessary to take into account all the possibilitiesfor a zero result that might be contained in

Propagated-Symmetries immediately. This allows

Append-Non-Redundant-Instances to be quite liberal in its rejection of redundant search-tree branches. A detailed description of these subprocedures can be found in Appendix B.However, we single out

Append-Non-Redundant-Instances for discussion here, as thissubprocedure contains the key logic which prevents factorial growth of intermediate results. Wepresent this subprocedure in Algorithm 4.2. The rejection functionality is achieved with a simplecheck in lines 5 through 11 which ensures that within the current

Least-Value-Set , only onerepresentative from each symmetry subset is kept. This is done via an array

Visited-Subsets which marks whether a subset has been visited; in lines 6 and 7, we use the absolute valueof

Symmetric-Subsets (cid:74) p (cid:75) as the index into Visited-Subsets . Note that for any ( p, q ) within a Least-Value-Set , all of entries

Values (cid:74) q (cid:75) in the Values array must be the same (since they arethe least value), and thus any two entries belonging to the same symmetric subset must infact be exchangeable (even if there may be labels with other values in the other slots of thissymmetric subset!). In order to rely on this complete exchange equivalence (i.e., the slots aretotally (anti)symmetric, and we likewise treat the labels as totally (anti)symmetric), we must Propagated symmetries always come from slot symmetries, and if there is a symmetric subgroup to bepropagated, then it is always true that all of its slot numbers should be listed in the orbit ∆ Si , and thus appearin the current Least-Value-Set . Thus at the very latest, each symmetry which can be propagated is entered into

Propagated-Symmetries immediately before it is needed for the decision-making phase. One could populate theentire

Propagated-Symmetries ahead of time, but doing so iteratively as we do here prevents us having to doany more work than necessary (in the event, for example, that the entire algorithm is short-circuited by a zerodetection). lgorithm 4.2 Append-Non-Redundant-Instances

Input:

Current list of

Next-Conﬁgs and

Least-Value-Set ; conﬁguration ( g, s ) ; the least-value inthis conﬁguration; slot group S ; slot i ; Label-Groups , Symmetric-Subsets ,and

Propagated-Symmetries

Require:

The possibility of obtaining a zero result merely from the conﬂict of

Propagated-Symmetries with

Symmetric-Subsets has already been taken into account

Output:

New

Next-Conﬁgs with non-redundant items added from

Least-Value-Set procedure Append-Non-Redundant-Instances ( Next-Conﬁgs , Least-Value-Set , ( g, s ) , least-value , S, i,

Label-Groups , Symmetric-Subsets , Propagated-Symmetries ) Visited-Subsets ← (cid:104) , , . . . , (cid:105) n for each ( p, q ) ∈ Least-Value-Set do label ← g (cid:74) q (cid:75) (cid:46) Note that we use q here if (cid:0) Symmetric-Subsets (cid:74) p (cid:75) (cid:54) = 0 (cid:1) then if (cid:0) Visited-Subsets (cid:74) | Symmetric-Subsets (cid:74) p (cid:75) | (cid:75) = 0 (cid:1) then Visited-Subsets (cid:74) | Symmetric-Subsets (cid:74) p (cid:75) | (cid:75) ← else skip to next ( p, q ) end if end if ˜ (cid:96) ← id if (cid:0) Propagated-Symmetries (cid:74) s (cid:74) q (cid:75)(cid:75) (cid:54) = 0 and p (cid:54) = q (cid:1) then ε ← Sign (cid:0)

Propagated-Symmetries (cid:74) s (cid:74) q (cid:75)(cid:75) (cid:1) ˜ (cid:96) ← ˜ (cid:96) (cid:74) signed permutation ε × ( label , g (cid:74) p (cid:75) ) in array form (cid:75) end if ˜ (cid:96) ← ˜ (cid:96) (cid:74) Label-Permutation-from-Group (cid:0) label , least-value , Label-Groups (cid:74) label (cid:75) (cid:1) (cid:75) ˜ s ← Coset-Rep (cid:0) p, S ( i ) (cid:1) (cid:46) We require p here, not q g (cid:48) ← ˜ (cid:96) (cid:74) g (cid:74) ˜ s (cid:75)(cid:75) s (cid:48) ← s (cid:74) ˜ s (cid:75) Next-Conﬁgs ← Append (cid:0)

Next-Conﬁgs , ( g (cid:48) , s (cid:48) ) (cid:1) end for return Next-Conﬁgs end procedure

Zero-Due-to-Propagated-Symmetries ). Rather than perform detailed complexity analysis, we will present data in Section 5 which showsthat our algorithm, like the original Butler-Portugal algorithm, is polynomial in the most commonsituations. Here we will make only a few comments which relate to our improvements to thealgorithm.We have made two main improvements: First, we have chosen a means to represent the labelsymmetry group L by distributing this information among several small arrays: Values , Label-Groups , and

Propagated-Symmetries for storing the additional label symmetries which might beadded during the course of the algorithm. Each of these arrays is just a single row of length n , andthus we incur an O ( n ) memory cost, which is dwarfed by the O ( n ) cost of storing the Schreier-tree structure of the slot symmetry group S . While the new data structures require special logicto deal with them, their beneﬁt is to reduce all group-theory computations involving the labelsymmetry group to either to array lookups which take O (1) time, or single scans through thearray which take O ( n ) time. In either case, the time taken to compute with the label symmetrygroup is now insigniﬁcant.Second, the two arrays Symmetric-Subsets and

Propagated-Symmetries allow us to eliminateequivalent choices from the search tree when presented with several interchangeable labels in aset of slots which are (anti)symmetric. At a cost of O ( n ) space, we can avoid having to store O ( n !) intermediate results, which take O ( n !) time to process, a dramatic improvement.However, we note that O ( n !) behavior has not been eliminated entirely. Our algorithm isonly capable of detecting subsets of indices which are totally symmetric or antisymmetric. Butthere are other possibilities which lead to combinatorial explosion. For example, suppose that atensor has many indices which are pairwise symmetric, with a slot symmetry group generatedby the pairwise exchanges S = { (1 , , , (3 , , , . . . , (2 n − , n − n − , n ) } . (4.37)Such a slot symmetry group has order n ! and will not be detected because it does not involvethe direct exchange between two slots. One can imagine other factorial-size groups such asalternating groups or groups which symmetrize length- k subsets, etc. These are not caught bythe algorithm, and thus it will exhibit O ( n !) behavior in these cases. Fortunately, however,37hese cases are rare in actual calculation—in contrast to the (anti)symmetric case which we haveaddressed!One can see, in fact, that it is impossible to eliminate all possible sources of O ( n !) behavior(as should be expected [9]). Take the simple case of symmetric exchanges of length- k subsets, forexample. It may be possible, with some cleverly-designed data structures, to write an algorithmwhich— given data about these exchange symmetries ahead of time —can resolve canonicalizationproblems in polynomial time for such groups. But even if such an algorithm were designed,this is not enough, because that data about the symmetric exchange of length- k subsets must ﬁrst be somehow obtained. To generate it algorithmically for a speciﬁc k requires O ( n k +1 ) group membership tests (and thus, in our particular case which resolves the question for length-1exchanges, one can construct the Symmetric-Subsets structure using O ( n ) group-membershiptests). To do so for all k (thus, up to k = n/ ) would then require O ( n !) group membershiptests, and so we gain nothing; removing one head from the hydra only springs forth more.We point out, however, that there is a case in which data about length- k exchanges can beprovided ahead of time, in polynomial time: The case where a tensor monomial is constructedout of the tensor product of several identical factors, T abc T def T ghi · · · . (4.38)In such cases, it may be useful to have an algorithm which eﬃciently handles this type ofsymmetry. But even so, it will still be a special case, and O ( n !) behavior is still possible by someother means. To test the eﬀectiveness of Algorithm 4.1, we have fully implemented it and tested it in severalsituations against two implementations of the Butler-Portugal algorithm: the one in xPerm [4]which is used to simplify tensor expressions in xAct [5], and the built-in function

TensorReduce which was introduced in

Mathematica version 9. For the purpose of these tests, Algorithm 4.1has been implemented in

Mathematica code, and wrapped within the command

Compile[...,CompilationTarget → "C"] . This causes Mathematica to automatically generate C code (fromthe enclosed Mathematica code), which is then compiled to machine language via an external C compiler. We have used the most up-to-date version of xPerm , from xAct version . Thefunction

TensorReduce is that available in

Mathematica version , which was used torun all of these tests. It is therefore conceivable that one might achieve slightly better performance by coding directly in C , whichwould allow more control over memory resources.

38n each test, we run each of the three implementations on randomized input with certainconstraints as will be explained below. We have attempted to increase the input size until theasymptotic behavior becomes clearly visible, or at least until the total running time begins toexceed a few minutes. For each input size, we run the test several times (typically 10, exceptfor the Riemann tensor tests which used 30 trials in order to generate enough results for each ofthe zero and nonzero bins). Diﬀerent random input is generated for each trial. In each graph,the error bars represent the full range of running times over the trials for a given input size.The plot marks are placed at the geometric mean of the running times (thus, in these log plots,representing the mean of the logs).Each of the three implementations takes slightly diﬀerent input, but we have made our besteﬀort to eliminate as much overhead as possible. Algorithm 4.1 takes a pre-computed Scheier treestructure as input, as well as

Symmetric-Subsets , and represents the label symmetry group via theinputs

Values and

Label-Groups . For xPerm , we have given it a pre-computed base-and-strong-generating-set (BSGS) for the slot symmetry group, and checked a ﬂag indicating that this doesnot need to be re-computed. Unfortunately with

Mathematica ’s built-in function

TensorReduce ,there is no way to specify that the slot symmetries are already a BSGS, and so it spends timerunning the Schreier-Sims algorithm attempting to construct a BSGS which dominates the timingfor low numbers of slots. Therefore we should point out that the data here are not entirely “fair”;however, the diﬀerences in performance are quite signiﬁcant, in some cases diﬀering by a wholepower of n and not just a constant factor.We also point out that each of these three implementations produces slightly diﬀerent output ;that is, each algorithm is using a slightly diﬀerent notion of what “canonical” means. xPerm works in two stages, ﬁrst moving all of the free indices as far forward as possible, then ﬁxingthem in place and canonicalizing the remaining indices by lexicographic order. Mathematica usesa deﬁnition of “canonical” based on the slot numbers of the contracted pairs, rather than thelabels that appear in those slots, and so it may produce a diﬀerent ordering. Algorithm 4.1uses similar criteria for “canonical” as does xPerm , but works in one stage only, moving free anddummy indices as they appear; it is possible for them to give diﬀerent results. Nevertheless, wehave tested Algorithm 4.1 extensively for correctness, before running these timing tests.The

Mathematica source code used to implement the algorithm and run these tests can befound at https://github.com/bniehoff/tensor-canonicalizer . We start with three tests which do not signiﬁcantly stress any of the algorithms, and give asort of basis of comparison which may help counteract the imbalance just mentioned. First, inFigure 5, we run the algorithms with free indices, with total slot symmetry. Thus there is no39

50 100 150 200 250 30010 − s10 − s10 − s0 .

01 s0 . Number of indices R unn i n g t i m e Free indices with total symmetry

MathematicaxPerm

Algorithm 4.1

Figure 5:

Free indices with total symmetry label renaming ambiguity; the algorithms need only sort the list of indices. From Figure 5 it isclear that

Mathematica is doing extra work by running Schreier-Sims every time; furthermore it isclear that at around 50 indices,

Mathematica changes strategy to use a randomized Schreier-Simsalgorithm. To obtain an estimate on the asymptotic complexity, we attempt to ﬁt a power lawto the data points with at least 100 indices (thus ignoring the transient eﬀects for low numbersof indices). We obtain the following power law growth:

Mathematica ∼ O ( n . ) , xPerm ∼ O ( n . ) , Algorithm 4.1 ∼ O ( n . ) . (5.1)The slightly smaller power in the growth rate for Algorithm 4.1 compared to Mathematica islikely due to the fact that Algorithm 4.1 starts from slot-symmetry data where the orbits ∆ Si arepre-computed (we point out that in a system where tensors are declared in advance, it is alwayspossible to do this). Another possible explanation is that the original version of Butler-Portugalcanonicalizes the free indices separately from the dummy ones using a slightly diﬀerent algorithm[4, 30].Next we run two basic tests with dummy indices in Figure 6. In the ﬁrst test there are no slotsymmetries, but instead we give a random sequence of dummy indices. Thus, all the dummiesneed to be renamed in the optimal order, but since there are no slot symmetries, there shouldbe no ambiguity in the search tree. In Figure 6(a), one can clearly see the advantage of our datastructures Values and

Label-Groups for representing the label group L , which allow us to reducethe computation of the label-group orbit ∆ Lg (cid:74) j (cid:75) to a simple array lookup. Using a similar methodof best-ﬁt power law while throwing out the points with a small number of indices, we obtainthe time complexities Mathematica ∼ O ( n . ) , xPerm ∼ O ( n . ) , Algorithm 4.1 ∼ O ( n . ) . (5.2)40

50 100 150 20010 − s10 − s10 − s0 .

01 s0 . Number of dummy pairs R unn i n g t i m e No symmetry (a) − s10 − s10 − s0 .

01 s0 . Number of dummy pairsCyclic symmetry (b)

Figure 6:

Dummy indices without symmetry, and with cyclic symmetry.

Mathematica xPerm

Algorithm 4.1In Figure 6(b) we run a test which eliminates both the advantages of Algorithm 4.1 and the drawbacks of the original Butler-Portugal algorithm. The algorithms are given the task tocanonicalize the expression T π ( a a ...a n ) U π ( a a ...a n ) , T a a ...a n = T a ...a n a , U a a ...a n = U a ...a n a , (5.3)where the tensors T, U have a cyclic symmetry over all of their indices. In the

Mathematica algorithm, since the strong-generating-set of a cyclic group only has one generator, the Schreier-Sims step ﬁnishes quickly without signiﬁcantly impacting performance. In Algorithm 4.1, onegets no beneﬁt from the symmetry-propagation mechanism, because there are no (anti)symmetricsubsets of slots. Nevertheless, the cyclic symmetry, combined with dummy renaming, provideenough computational challenge that it dominates in both cases, and Figure 6(b) shows nearlyidentical performance. The best-ﬁt power law complexity is around O ( n ) . The next test is to repeat the benchmarking done in [10, 4, 31] for randomly-contracted Riemannmonomials of the form R ◦◦◦◦ R ◦◦◦◦ R ◦◦◦◦ · · · , (5.4) The behavior of xPerm in Figure 6(b) is somewhat strange in that there is a sharp corner around 35 dummypairs where it suddenly jumps to a diﬀerent power-law behavior. We have discussed this with the author of [4]but cannot ascertain the cause. xPerm does not show any strange behavior in any other tests.

10 20 30 40 5010 − s10 − s0 .

01 s0 . Number of Riemann tensors R unn i n g t i m e Nonzero results (a) − s10 − s10 − s0 .

01 s0 . Number of Riemann tensorsZero results (b)

Figure 7:

Randomized Riemann contractions, nonzero and zero results.

Mathematica xPerm

Algorithm 4.1where each ◦ should be ﬁlled with a random dummy index. Since the Riemann tensor hasantisymmetries, these invariants are occasionally zero. The cases which turn out to be zero tendto have vastly shorter timing, and so we separate results into two bins, as shown in Figure 7. InFigure 7(a), for nonzero results we obtain the best-ﬁt complexities

Mathematica ∼ O ( n . ) , xPerm ∼ O ( e . n ) , Algorithm 4.1 ∼ O ( n . ) . (5.5)We observe that not only is Algorithm 4.1 faster than Mathematica by at least an order ofmagnitude, it is also faster by about n . in time complexity. It appears xPerm is not well-ﬁtby a power law in this case, although perhaps we have only managed to capture its transientbehavior.For invariants which turn out to be zero, Figure 7(b) shows an even more dramatic improve-ment of up to 3 orders of magnitude and the following best-ﬁt complexities: Mathematica ∼ O ( n . ) , xPerm ∼ O ( n . ) , Algorithm 4.1 ∼ O ( n . ) . (5.6)All of the algorithms are faster; again we see that Algorithm 4.1 is faster than Mathematica byabout one power of n and a large constant factor. Here we see the beneﬁts of the symmetry-propagation mechanism for early zero detection. Unlike [31], we do not include derivatives of the Riemann tensor, but only algebraic invariants. We also donot impose any dimension-dependent identities, but use only the Riemann tensor symmetries. − s10 − s0 . Number of dummy pairs R unn i n g t i m e Frustrated (a) − s10 − s0 . Number of dummy pairsRandomized (b)

Figure 8:

Total symmetry with contracted dummies, both frustrated and randomized.

Mathematica xPerm

Algorithm 4.1

Next we run a test which exhibits the condition Algorithm 4.1 was speciﬁcally designed for:namely, contractions between totally symmetric tensors T and U : T a a ...a n = T ( a a ...a n ) , U a a ...a n = U ( a a ...a n ) , (5.7)as shown in Figure 8. We run two diﬀerent versions of the test. First in Figure 8(a) is the“maximally frustrated” situation, where the dummy pairs are distributed so that each pair hasone leg on T and one leg on U : T π ( a a ...a n ) U π ( a a ...a n ) . (5.8)This situation represents the absolute worst-case scenario for the Butler-Portugal algorithm. InFigure 8(b), we do instead a fully randomized test, where the indices are distributed across bothtensor factors randomly. This is a slightly better situation for Mathematica and xPerm , but theresults are not much diﬀerent. As expected, Algorithm 4.1 clearly exhibits better performancein this test, showing polynomial growth whereas

Mathematica and xPerm grow as O ( n !) . Thedata points for Algorithm 4.1 in the frustrated case of Figure 8(a) are well-ﬁt by the power lawAlgorithm 4.1 ∼ O ( n . ) . (5.9)While Mathematica and xPerm are clearly factorial, there are too few data points to get a goodﬁt. 43

12 20 28 3610 − s10 − s0 . Total number of indices R unn i n g t i m e Frustrated (a) − s0 .

01 s1 s100 s

Total number of indicesRandomized (b)

Figure 9:

Pairwise symmetry with contracted dummies, both frustrated and randomized.

Mathematica xPerm

Algorithm 4.1

In the ﬁnal test, we try a situation which is not addressed by our improvements, and in which weexpect factorial growth in all cases. This test uses tensors T and U which have total symmetryamong pairs of indices, T abcdef ··· = T cdabef ··· = T abefcd ··· = etc. , (5.10)and similarly for U . The results of this test are shown in Figure 9, again separated into the“frustrated” case in Figure 9(a) where each dummy pair is split between T and U , and therandomized case in Figure 9(b) where the dummy pairs are randomly distributed across bothtensors.As expected, Figure 9(a) shows clear factorial growth, and is also the only case in thesetests where Algorithm 4.1 performs worse than Mathematica , clearly growing at a faster rate.We expect that this is due to the extra processing done with

Propagated-Symmetries which iswasted eﬀort in this particular test case. By contrast, the performance for all three algorithmsin Figure 9(b) is more equal (because when the labels are randomly distributed, not every pairof slots is necessarily equivalent to every other pair); nevertheless, for high numbers of indiceswe begin to see the patterns of Figure 9(a) repeated.We suspect that some adjustments in implementation could bring the performance of Algo-rithm 4.1 more in line with

Mathematica in the pairwise-symmetric case (and similar) withoutsacriﬁcing the performance beneﬁts of the prior tests. However, we point out that real-worldoccurrences of the pairwise symmetry tested in Figure 9 are somewhat rare. If one did expect44o see them frequently in some given application, it would be worth developing some more so-phisticated version of the symmetry-propagation mechanism to bring them into the realm ofpolynomial time.

It is important in many applications of symbolic computation to have algorithms which caneﬃciently deal with expressions with tensor indices. The Butler-Portugal algorithm [10] has beenthe state-of-the-art for some time, ﬁnding implementations in several popular software packages[4, 5, 7, 6, 3, 2]. We have presented two improvements on the Butler-Portugal algorithm: First,a faster way to deal with the label-exchange symmetries implicitly rather than explicitly; andsecond, a way to canonicalize contractions over subsets of symmetric or antisymmetric slots inpolynomial time (where previously, they took factorial time).The improvements are evident in the data of Section 5 for all of the most common situationsactually found in tensor calculations. These improvements are achieved using a number ofdiﬀerent data structures: First we introduce the

Values and

Label-Groups arrays which not onlyallow an eﬃcient speciﬁcation of the problem (i.e., the partitioning of labels into free indices,component indices, and various separate pools of dummies of diﬀerent types), but also allowone to reduce most label-group computations to simple array lookups. We also introduce the

Symmetric-Subsets and

Propagated-Symmetries arrays which allow eﬃcient tracking of the subsetsof slots which are totally (anti)symmetric, and how those symmetries interact with other slotsin the expression via propagation over contracted dummies.In particular, the contractions-over-symmetric-subsets problem is dramatically reduced fromfactorial to polynomial complexity in both time and memory. This is achieved via the concept of symmetry propagation , wherein the slot symmetries at one end of a contraction can “travel” to theother end, where they become label symmetries. In this work, we have focused on propagatingsimple (anti)symmetries (i.e., those which are generated by the simple exchange of two slots),because these symmetries are the simplest to describe, and they address the worst-case situationslikely to be encountered in real computations.However, we point out that the symmetry-propagation concept applies to generic symmetriesas well. We strongly suggest this approach as an avenue to future improvement. The mainchallenge is designing an eﬃcient means to describe such generic symmetries. One needs a datastructure which is both simple to construct and simple to work with, and thus there are trade-oﬀsto consider. We expect, however, that such an approach will be useful for cases like the pairwisesymmetry of Section 5.4, provided that the pairs can be provided ahead of time (which is true,if these symmetries derive from the tensor product of many identical factors). We also expect45hat one may ﬁnd eﬃcient algorithms to enumerate Young-tableau-like symmetries, where onehas totally (anti)symmetric exchange of totally (anti)symmetric length- k subsets. Despite thesepotential future improvements, it is still true that the generic worst-case is O ( n !) . However, ifit is known in advance how such behavior could arise in a given real-world calculation, we areconﬁdent that the ideas presented here will be useful in developing further modiﬁcations thatcan reduce such behavior to polynomial.One should also consider the broader context in which tensor index canonicalization occurs:namely, to simplify algebraic expressions with tensor contractions. As we have pointed out,there are other approaches to this problem, based on matching terms rather then canonicalizingthem [8], which may be faster in certain cases. Also, there is the largely-unexplored arena of multi-term symmetries, for which a graph-theory approach seems to be useful [12]. A completetensor computer algebra system should probably combine aspects of all of these approaches.We look forward to addressing many of these considerations in future work. Acknowledgements

I would like to thank J. M. Martín García for useful discussions and extensive comments on theinitial draft. This work is supported in part by ERC grant ERC-2013-CoG 616732-HoloQosmos.

AppendicesA Eﬃcient computation with permutation groups

A permutation group on a sequence of n symbols can in principle be quite large, and one needsan eﬃcient method to manage computations. In what has now become the standard method[9, 13, 14], one uses the notion of a strong generating set and its corresponding base . This isessentially a way to give a hierarchical structure to a group, which paves the way for doing many(though not all) problems of ﬁnite group theory in polynomial time.To deﬁne a base-and-strong-generating-set (or BSGS), let us ﬁrst deﬁne a few group-theorynotions. Suppose G ⊆ S n is a permutation group acting on a set Ω of n symbols. Let α = (cid:104) α , α , . . . , α n (cid:105) be a ﬁxed ordering of the points of Ω . Then for example, a permutation g ∈ G acts by rearranging the ordering to α g = (cid:104) α g , α g , . . . , α gn (cid:105) . If any of the α i are not moved by g ,then we say g stabilizes these α i . Given a subset of points { α i } ⊆ Ω , the set of all g ∈ G whichstabilize these points give a subgroup of G called the pointwise stabilizer of { α i } . The standardway to notate a pointwise stabilizer is with a subscript, G α α ...α k ⊆ G, (A.1)46hich means that G α α ...α k is the pointwise stabilizer of all the α i listed.Given the ordering α , we can then consider the pointwise stabilizers of the subsequences of α : G α , G α α , G α α α , etc. Introducing a special notation for the stabilizer of the ﬁrst i − points of α , G ( i ) ≡ G α α ...α i − , (A.2)we can then observe that each G ( i ) contains G ( i +1) as a subgroup, and thus the group G can bedecomposed into the point stabilizer chain G = G (1) ⊇ G (2) ⊇ · · · ⊇ G ( n − ⊇ G ( n ) = { id } . (A.3)The stabilizer chain gives a hierarchical way to organize a group which permits many eﬃcientmethods of computation.Given an ordering α and its corresponding stabilizer chain, the base of the group G withrespect to the ordering α is deﬁned as the shortest subsequence of α whose pointwise stabilizeris trivial. That is, if k ≤ n is the lowest index for which G ( k ) = { id } , then the base with respectto this ordering is B = (cid:104) α , α , . . . , α k − (cid:105) . Since the pointwise stabilizer of B is the identity, eachmember g of the group G is uniquely identiﬁed by the image B g = (cid:104) α g , α g , . . . , α gk − (cid:105) . Thus abase serves is a sort of index or set of coordinates.A strong generating set of G with respect to a base B is a set of permutations S = { g , g , . . . , g N } which generate the group G , with an added condition that certain subsets of S also generate each of the subsequence stabilizers G ( i ) . Speciﬁcally, S satisﬁes the property (cid:104) S ∩ G ( i ) (cid:105) = G ( i ) , for each G ( i ) . (A.4)It is important to note that, as a generating set for G , S may have redundant generators whichare needed in order to satisfy the property (A.4). For example, the symmetric group S may begenerated by the set (in cyclic notation) { (1 , , (1 , , , , } , (A.5)but this is not a strong generating set. To obtain the property (A.4) for the base B = [1 , , , , ,we can use instead the set { (1 , , (2 , , (3 , , (4 , } . (A.6)When constructing a strong generating set, it is still desirable to eliminate redundant generators,and there are standard algorithms for doing this. Here the notation (cid:104)·(cid:105) means the group generated by the enclosed set. We trust this does not cause confusionwith the usual meaning of (cid:104) . . . (cid:105) in this article. G , ﬁnd the order of G , or generate arandom element of G from a uniform distribution. To construct a BSGS in the ﬁrst place requiresthe Schreier-Sims algorithm, whose main variant takes O ( n ) time. For detailed descriptions ofthese and other group theory algorithms, we refer the reader to [9, 13, 14]. B Subprocedures of the improved algorithm

Here we collect the various subprocedures of Algorithm 4.1 whose explicit code does not yieldany great insights worthy of discussion in the main text, yet nevertheless whose function is notentirely obvious and ought to be written out.The ﬁrst of these is Algorithm B.1 for the subprocedure

Get-Least-Value-Instances ,which traverses the current level of the search tree and generates the next level, by populatingthe

Conﬁg-Slots-of-Least-Value array with all of the conﬁgurations ( g, s ) and slot pairs ( p, q ) atwhich one can ﬁnd the least value label. In particular, in lines 8 through 16, we see how theslot number q is obtained, by starting from p and traversing any even-numbered symmetric setsavailable in Propagated-Symmetries . We only use the even-numbered sets because those onesare the result of symmetry propagation; the odd-numbered sets are already available in the slotsymmetry group S , and thus should have been reached as one of the slots p ∈ ∆ Si .Next, in Algorithm B.2, we give the subprocedure Update-Propagated-Symmetries ,which updates the

Propagated-Symmetries array to reﬂect any new symmetric subsets found thusfar in the search tree. Some eﬀort must be taken to maintain the conditions of Deﬁnition 3which ensure the correct behavior of Algorithm 4.2. The function

Get-Next-Odd-Number inline 10 must maintain an internal state, so that it returns a new odd number every time it iscalled, in sequence, starting from 1. The function

Remove-Singletons in line 27 removes anynonzero entry from

Next-Propagated-Symmetries that occurs only once, thus preserving Rule P6of Deﬁnition 3. Alternatively, one could avoid entering singletons in the ﬁrst place, although thisrequires a bit more logic.In Algorithm B.3 we give the subprocedure

Zero-Due-to-Propagated-Symmetries which looks for conﬂicts between

Symmetric-Subsets and

Propagated-Symmetries which may causethe expression to be zero. In line 12 is a check whether a propagated slot symmetry conﬂictswith an original slot symmetry. In line 21 is a check for dummy contractions where both endslie in the same symmetric subset of slots, to see whether the metric (anti)symmetry (if there isa metric) agrees with the slot (anti)symmetry. We note that the functionality of

Zero-Due-to-Propagated-Symmetries could also have been integrated into

Update-Propagated-Symmetries , so that zero checks are made as soon as new symmetries are entered; this is the48ethod used in the actual implementation tested in Section 5, although the logic becomes morecomplex.Finally, in Algorithm B.4 we give

Update-Values-and-Label-Groups , whose purpose isto maintain the conditions of Deﬁnition 1 and Table 1 within the

Values and

Label-Groups arrays.Whenever the least-value label is consumed, we mark this by increasing the value of every labelafter it, within the set of labels whose value was least-value . This creates a new subset of labelswith value ( least-value + 1) . The details of this behavior must be modiﬁed depending on whetherthe label is a component index, or a dummy index with or without metric. For dummy indices,we must oﬀset the two entries at least-value and ( least-value + 1) , increasing those remaining to ( least-value + 2) . For example, take the Values array of (4.13):Label a b g h c c d d e e f f Value . (B.1)If we place the label c into a slot, then we must update the Values array toLabel a b g h c c d d e e f f Value . (B.2)This eﬀectively marks the partner of a previously-used dummy for preferential treatment (asit must automatically precede the rest of the dummies in lexicographical order). In this case c has a lower value than the remaining dummies { d , d , e , e , f , f } , because c is no longerinterchangeable with them, and it should be preferred as it comes earlier in order.49 lgorithm B.1 A subprocedure to populate

Conﬁg-Slots-of-Least-Value

Input:

Slot i ; Orbit ∆ Si ; Conﬁgs , Values , and

Propagated-Symmetries arrays

Output:

The least-value found in orbit, and the

Conﬁg-Slots-of-Least-Value array procedure Get-Least-Value-Instances ( i, ∆ Si , Conﬁgs , Values , Propagated-Symmetries ) least-value ← n Conﬁg-Slots-of-Least-Value ← {} for each ( g, s ) ∈ Conﬁgs do for each p ∈ ∆ Si do value ← Values (cid:74) g (cid:74) p (cid:75)(cid:75) symmetry ← Propagated-Symmetries (cid:74) s (cid:74) p (cid:75)(cid:75) q ← p if (cid:0) symmetry (cid:54) = 0 and symmetry is even (cid:1) then for k ← i to n do if (cid:0) Propagated-Symmetries (cid:74) s (cid:74) k (cid:75)(cid:75) = symmetry and Values (cid:74) g (cid:74) k (cid:75)(cid:75) < value (cid:1) then value ← Values (cid:74) g (cid:74) k (cid:75)(cid:75) q ← k end if end for end if if (cid:0) value < least-value (cid:1) then least-value ← value Conﬁg-Slots-of-Least-Value ← {} end if if (cid:0) value = least-value (cid:1) then Conﬁg-Slots-of-Least-Value ← Append (cid:0)

Conﬁg-Slots-of-Least-Value , (cid:104) ( g, s ) , ( p, q ) (cid:105) (cid:1) end if end for end for return { least-value , Conﬁg-Slots-of-Least-Value } end procedure lgorithm B.2 A subprocedure to add new entries to

Propagated-Symmetries

Input:

The current

Least-Value-Set ; its conﬁguration ( g, s ) ; least value for this slot least-value ; Label-Groups , Symmetric-Subsets , and

Propagated-Symmetries

Output:

Propagated-Symmetries updated with the provided information procedure Update-Propagated-Symmetries ( Least-Value-Set , ( g, s ) , least-value , Label-Groups , Symmetric-Subsets , Propagated-Symmetries ) Next-Propagated-Symmetries ← Propagated-Symmetries for each ( p, q ) ∈ Least-Value-Set do (cid:46) Note that we use only q label ← g (cid:74) q (cid:75) if (cid:0) Label-Groups (cid:74) label (cid:75) (cid:54) = ∅ and Symmetric-Subsets (cid:74) q (cid:75) (cid:54) = 0 and Propagated-Symmetries (cid:74) s (cid:74) q (cid:75)(cid:75) = 0 (cid:1) then current-entry ← Next-Propagated-Symmetries (cid:74) s (cid:74) q (cid:75)(cid:75) if (cid:0) current-entry (cid:54) = 0 and current-entry is even (cid:1) then new-entry ← Sign (cid:0) current-entry (cid:1) · (cid:0) | current-entry | − (cid:1) else new-entry ← Sign (cid:0)

Symmetric-Subsets (cid:74) q (cid:75) (cid:1) · Get-Next-Odd-Number (cid:0)(cid:1) end if

Next-Propagated-Symmerties (cid:74) s (cid:74) q (cid:75)(cid:75) ← new-entry if (cid:0) Label-Groups (cid:74) label (cid:75) is one of { + , − , ⊥ , (cid:62)} (cid:1) then if (cid:0) ( label − least-value ) is odd or Label-Groups (cid:74) label (cid:75) = (cid:62) (cid:1) then partner-label ← label − else partner-label ← label + 1 end if partner-slot ← g − (cid:74) partner-label (cid:75) partner-entry ← Sign (cid:0) new-entry (cid:1) · (cid:0) | new-entry | + 1 (cid:1) if (cid:0) Next-Propagated-Symmetries (cid:74) s (cid:74) partner-slot (cid:75)(cid:75) = 0 or | partner-entry | < | Next-Propagated-Symmetries (cid:74) s (cid:74) partner-slot (cid:75)(cid:75) | (cid:1) then Next-Propagated-Symmetries (cid:74) s (cid:74) partner-slot (cid:75)(cid:75) ← partner-entry end if end if end if end for Propagated-Symmetries ← Remove-Singletons (cid:0)

Next-Propagated-Symmetries (cid:1) return

Propagated-Symmetries end procedure lgorithm B.3 A subprocedure to do early zero checking

Input:

Conﬁguration ( g, s ) ; least value for this slot least-value ; Label-Groups , Symmetric-Subsets , and

Propagated-Symmetries

Output: true or false procedure Zero-Due-to-Propagated-Symmetries ( ( g, s ) , least-value , Label-Groups , Symmetric-Subsets , Propagated-Symmetries ) Symmetries-of-Visited-Labels ← (cid:104) , , . . . , (cid:105) n for p ← to n do label ← g (cid:74) p (cid:75) symmetry ← Propagated-Symmetries (cid:74) s (cid:74) p (cid:75)(cid:75) if (cid:0) symmetry (cid:54) = 0 (cid:1) then Symmetries-of-Visited-Labels (cid:74) label (cid:75) ← symmetry if (cid:0) Label-groups (cid:74) label (cid:75) = ∼ and symmetry < (cid:1) then return true end if if (cid:0) Label-groups (cid:74) label (cid:75) is one of { + , − , ⊥ , (cid:62)} (cid:1) then if (cid:0) symmetry is even and Sign (cid:0) symmetry (cid:1) (cid:54) = Sign (cid:0)

Symmetric-Subsets (cid:74) p (cid:75) (cid:1)(cid:1) then return true end if if (cid:0) ( label − least-value ) is odd or Label-Groups (cid:74) label (cid:75) = (cid:62) (cid:1) then partner-label ← label − else partner-label ← label + 1 end if partner-symmetry ← Symmetries-of-Visited-Labels (cid:74) partner-label (cid:75) if (cid:0) partner-symmetry = symmetry (cid:1) and (cid:2)(cid:0) Label-groups (cid:74) label (cid:75) = + and symmetry < (cid:1) or (cid:0) Label-groups (cid:74) label (cid:75) = − and symmetry > (cid:1)(cid:3) then return true end if end if end if end for return false end procedure lgorithm B.4 A subprocedure to maintain the deﬁning conditions of

Values and

Label-Groups

Input:

Values , Label-Groups , and the least-value that was just used

Output:

The new

Values and

Label-Groups after “freezing” the label least-value and its dummypartner, if one exists procedure Update-Values-and-Label-Groups ( Values , Label-Groups , least-value ) if Label-Groups (cid:74) least-value (cid:75) = ∅ then return { Values , Label-Groups } else if Label-Groups (cid:74) least-value (cid:75) = ∼ then Label-Groups (cid:74) least-value (cid:75) ← ∅ loop-start ← least-value + 1 value-threshold ← least-value ; value-increment ← else if Label-Groups (cid:74) least-value (cid:75) is one of { + , −} then Values (cid:74) least-value + 1 (cid:75) ← Values (cid:74) least-value + 1 (cid:75) + 1

Label-Groups (cid:74) least-value (cid:75) ← ∅ ; Label-Groups (cid:74) least-value + 1 (cid:75) ← ∅ loop-start ← least-value + 2 value-threshold ← least-value ; value-increment ← else if Label-Groups (cid:74) least-value (cid:75) = ⊥ then Label-Groups (cid:74) least-value (cid:75) ← ∅ ; Label-Groups (cid:74) least-value + 1 (cid:75) ← ∅ loop-start ← least-value + 2 value-threshold ← least-value + 1 ; value-increment ← else if Label-Groups (cid:74) least-value (cid:75) = (cid:62) then Label-Groups (cid:74) least-value − (cid:75) ← ∅ ; Label-Groups (cid:74) least-value (cid:75) ← ∅ loop-start ← least-value + 1 value-threshold ← least-value ; value-increment ← end if j ← loop-start while j ≤ n and Values (cid:74) j (cid:75) ≤ value-threshold do Values (cid:74) j (cid:75) ← Values (cid:74) j (cid:75) + value-increment j ← j + 1 end while return { Values , Label-Groups } end procedure eferences [1] R. Penrose and W. Rindler, Spinors and Space-Time . Cambridge Monographs onMathematical Physics. Cambridge Univ. Press, Cambridge, UK, 2011.[2] L. R. U. Manssur and R. Portugal, “The Canon package: a fast kernel for tensormanipulators,”

Computer Physics Communications (Feb., 2004) 173–180.[3] D. P. Peter Musgrave and K. Lake, “GRTensorII software,” 1996. Queen’s University,Kingston, Ontario, Canada.[4] J. M. Martín-García, “xperm: fast index canonicalization for tensor computer algebra,”

Computer Physics Communications no. 8, (2008) 597 – 603.[5] J. M. Martín-García, “xAct, Eﬃcient tensor computer algebra for mathematica,” 2004. .[6] K. Peeters, “A Field-theory motivated approach to symbolic computer algebra,”

Comput.Phys. Commun. (2007) 550–558, arXiv:cs/0608005 [cs.SC] .[7] A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin, A. Kumar,S. Ivanov, J. K. Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger, R. P. Muller,F. Bonazzi, H. Gupta, S. Vats, F. Johansson, F. Pedregosa, M. J. Curry, A. R. Terrel,v. Roučka, A. Saboo, I. Fernando, S. Kulal, R. Cimrman, and A. Scopatz, “SymPy:symbolic computing in Python,”

PeerJ Computer Science (Jan., 2017) e103.[8] D. A. Bolotin and S. V. Poslavsky, “Introduction to Redberry: a computer algebra systemdesigned for tensor manipulation,” arXiv:1302.1219 [cs.SC] .[9] G. Butler, Fundamental Algorithms for Permutation Groups , vol. 559 of

Lecture Notes inComputer Science . Springer, 1991.[10] L. R. U. Manssur, R. Portugal, and B. F. Svaiter, “Group-Theoretic Approach forSymbolic Tensor Manipulation,”

International Journal of Modern Physics C (2002)859–879, math-ph/0107032 .[11] T. Bäckdahl, “SymManipulator,” 2011-2015. .[12] H. Li, Z. Li, and Y. Li, “Riemann Tensor Polynomial Canonicalization by Graph AlgebraExtension,” ArXiv e-prints (Jan., 2017) , arXiv:1701.08487 [cs.SC] .[13] D. F. Holt, B. Eick, and E. A. O’Brien,

Handbook of computational group theory . CRCPress, 2005. 5414] Á. Seress,

Permutation group algorithms , vol. 152. Cambridge University Press, 2003.[15] A. L. Cauchy, “Mémoire sur le nombre des valeurs qu’une fonction peut acquerir, lorsqu’ony permute de toutes les manières possibles les quantités qu’elle renferme,”

Journal del’École Polytechnique (1815) .[16] C. A. Brown, L. Finkelstein, and P. W. Purdom, Jr, “A new base change algorithm forpermutation groups,”

SIAM Journal on Computing no. 5, (1989) 1037–1047.[17] G. Cooperman and L. Finkelstein, “A fast cyclic base change for permutation groups,” in Papers from the International Symposium on Symbolic and Algebraic Computation , ISSAC’92, pp. 224–232. ACM, New York, NY, USA, 1992.[18] M. Jerrum, “A compact representation for permutation groups,”

Journal of Algorithms no. 1, (1986) 60 – 78.[19] D. E. Knuth, “Eﬃcient representation of perm groups,” Combinatorica no. 1, (1991)33–43.[20] M. A. Vasiliev, “Higher spin gauge theories in four-dimensions, three-dimensions, andtwo-dimensions,” Int. J. Mod. Phys. D5 (1996) 763–797, arXiv:hep-th/9611024[hep-th] .[21] M. A. Vasiliev, “Higher spin gauge theories in various dimensions,” Fortsch. Phys. (2004) 702–717, arXiv:hep-th/0401177 [hep-th] . [,137(2004)].[22] E. Cremmer, B. Julia, and J. Scherk, “Supergravity Theory in Eleven-Dimensions,” Phys.Lett.

B76 (1978) 409–412.[23] D. Z. Freedman and A. Van Proeyen,

Supergravity . Cambridge Univ. Press, Cambridge,UK, 2012.[24] R. Feger and T. W. Kephart, “LieART: A Mathematica application for Lie algebras andrepresentation theory,”

Comput. Phys. Commun. (2015) 166–195, arXiv:1206.6379[math-ph] .[25] W. Fulton,

Young tableaux: with applications to representation theory and geometry ,vol. 35. Cambridge University Press, 1997.[26] W. Fulton and J. Harris,

Representation theory: a ﬁrst course , vol. 129. Springer Science& Business Media, 2013. 5527] C. R. Frye and C. J. Efthimiou, “Spherical Harmonics in p Dimensions,” arXiv:1205.3548[math.CA] .[28] R. Penrose, “Applications of negative dimensional tensors,”

Combinatorial mathematicsand its applications (1971) 221–244.[29] P. Cvitanovic,

Group theory: Birdtracks, Lie’s and exceptional groups . 2008.[30] R. Portugal and B. F. Svaiter, “Group-theoretic Approach for Symbolic TensorManipulation: I. Free Indices,”

ArXiv Mathematical Physics e-prints (July, 2001) , math-ph/0107031 .[31] J. M. Martín-García, D. Yllanes, and R. Portugal, “The Invar tensor package: Diﬀerentialinvariants of Riemann,”

Comput. Phys. Commun. (2008) 586–590, arXiv:0802.1274[cs.SC]arXiv:0802.1274[cs.SC]