[PDF] Symmetry-Aware Reservoir Computing

Abstract

We demonstrate that matching the symmetry properties of a reservoir computer (RC) to the data being processed dramatically increases its processing power. We apply our method to the parity task, a challenging benchmark problem that highlights inversion and permutation symmetries, and to a chaotic system inference task that presents an inversion symmetry rule. For the parity task, our symmetry-aware RC obtains zero error using an exponentially reduced neural network and training data, greatly speeding up the time to result and outperforming hand crafted artificial neural networks. When both symmetries are respected, we find that the network size N necessary to obtain zero error for 50 different RC instances scales linearly with the parity-order n. Moreover, some symmetry-aware RC instances perform a zero error classification with only N=1 for n\leq7. Furthermore, we show that a symmetry-aware RC only needs a training data set with size on the order of (n+n/2) to obtain such performance, an exponential reduction in comparison to a regular RC which requires a training data set with size on the order of n2^n to contain all 2^n possible n-bit-long sequences. For the inference task, we show that a symmetry-aware RC presents a normalized root-mean-square error three orders-of-magnitude smaller than regular RCs. For both tasks, our RC approach respects the symmetries by adjusting only the input and the output layers, and not by problem-based modifications to the neural network. We anticipate that generalizations of our procedure can be applied in information processing for problems with known symmetries.

Full PDF

SSymmetry-Aware Reservoir Computing

Wendson A. S. Barbosa, ∗ Aaron Griﬃth, Graham E. Rowlands, † Luke C. G. Govia, Guilhem J. Ribeill, Minh-Hai Nguyen, Thomas A. Ohki, and Daniel J. Gauthier ‡ Department of Physics, Ohio State University, 191 W. Woodruﬀ Ave., Columbus, OH 43210, USA Quantum Engineering and Computing, Raytheon BBN Technologies, Cambridge, MA 02138, USA

We demonstrate that matching the symmetry properties of a reservoir computer (RC) to the databeing processed can dramatically increase its processing power. We apply our method to the paritytask, a challenging benchmark problem, which highlights the beneﬁts of symmetry matching. Ourmethod outperforms all other approaches on this task, even artiﬁcial neural networks (ANN) handcrafted for this problem. The symmetry-aware RC can obtain zero error using an exponentiallyreduced number of artiﬁcial neurons and training data, greatly speeding up the time-to-result. Weanticipate that generalizations of our procedure will have widespread applicability in informationprocessing with ANNs.

Reservoir computing [1–3] is an emerging machinelearning (ML) paradigm based on artiﬁcial neural net-works (ANNs) that is ideally suited for a variety of taskssuch as learning dynamical systems from time series data[4, 5] or classifying structures in data [6, 7]. In com-parison to other ML approaches, reservoir computing re-quires much smaller data sets for training and the train-ing time can be orders-of-magnitude faster while main-taining high performance [8, 9], making them suitable fordeployment on edge-computing devices [10].The core of an RC is a pool of N artiﬁcial neuronswith recurrent connections, known as the reservoir andillustrated in Fig. 1, along with an input layer that broad-casts the input data to the reservoir and an output layerthat forms a weighted sum of the values of the reservoirnodes that provides the computation result. Diﬀeringfrom other approaches, the relative weights of the con-nections of the input layer W in and within the reservoir W r are generated randomly at instantiation of the RCand held ﬁxed, although their overall scale can be ad-justed. Only the weights of the output layer W out areadjusted during training, which is a linear optimizationproblem that can be solved using standard tools and isthe cause of the short training time.Even though the RC is a complex network with ran-dom weights, it still possesses symmetries that can sub-stantially impact the RC performance depending on thesymmetries of the data being processed. This point wasnoted and addressed in an ad hoc way when using an RC FIG. 1. Reservoir Computer. The Boolean time series u ( t )is input into the reservoir through either serial (i) or parallel(ii) input layer schemes. to forecast the dynamics of the Lorenz ’63 chaotic attrac-tor [11–14]. Also, it has been known for some time thatsymmetry plays an important - but often subtle - role inregulating synchronized behavior on complex networks[15–18], which has important application for coupled bi-ological or engineered oscillator networks. Moreover, animportant aspect of an RC, the fading memory [2], is re-lated to general synchronization of the input data andthe reservoir [13]. Fading memory is also known as ‘echostate property’ [19, 20] in that the reservoir ‘forgets’ theinput after a time. Thus, good RC performance requiresgeneral synchronization and symmetry likely aﬀects it.Here, we study a classiﬁcation task that especiallyhighlights the issue of the symmetry diﬀerences betweenthe data and the RC. Furthermore, we demonstrate howto realize a ‘symmetry aware’ RC that can be discoveredautomatically using optimization tools [14, 21]. Speciﬁ-cally, we use an RC to compute the parity of a sequenceof digital bits, which is a known challenging ML task be-cause the problem is linearly inseparable [22–24]. Handcrafted ANNs can tackle this problem with reasonableaccuracy (see, for example, Refs. [25–27]), but a genericANN requires that the network size [28] and training time[23] increases exponentially with the parity order n (de-ﬁned precisely below) to reach a user-deﬁned accuracy.We match the RC symmetry to the data by only makingstraightforward changes to the input and output layers.The symmetry-aware RC requires exponentially smaller N in comparison to the non-aware RC and has similar orbetter performance than the hand-crafted ANNs. Thiswork paves the way for improving the performance of RCson other tasks using automated tools that can discoversymmetries. The parity task -

The task we consider is to deter-mine the parity of each sequence of n bits in a signal u ( t ), which is a Boolean time series where each bit hasa time duration T and assumes either value +1 or -1.The RC is trained to predict the n th order parity func-tion P n ( t )= Q ni =0 u ( t − iT ) which inspection reveals twosymmetries: a r X i v : . [ c s . N E ] J a n • Parity-order symmetry:

The parity functionhas an inversion symmetry that depends on n .For n odd, an n -bit sequence will have the paritychanged from p to − p if all its bits are ﬂipped, i.e. ,( u, p ) → ( − u, − p ). On the other hand, ( u, p ) → ( − u, p ) for n even. • Sequence-order permutation symmetry:

Theparity of a sequence is the same under permuta-tion of its bits. Thus, the parity only depends onthe number of positive (or negative) bits in the se-quence.For future reference, we divide the set of possible n -bit input sequences to the parity function into sets L n ( l )according to the number of ones l in the sequence. Thisdivides the 2 n possible inputs into sets of size | L n ( l ) | = (cid:0) nl (cid:1) . For each n , there are n + 1 such sets. Because all n -bit sequences containing l ones are equivalent under thepermutation symmetry and consequently have the sameparity, it should be possible to train a symmetry-awareRC that shares this symmetry with a small number ofsequences that cover these n +1 distinct sets, rather thanall 2 n possible inputs. The RC -

In our RC implementation, also known asan echo state network, the reservoir node dynamics r aregoverned by˙ r ( t ) = − γ r ( t ) + γf ( W r r ( t ) + W in u ( t ) + b ) , (1)where γ is the decay rate, f ( · ) is the nonlinear activationfunction, and b is a bias. While γ and b can be diﬀerentfor each node, we take them the same for simplicity. Thereservoir output is given by v ( t ) = W out g ( r ( t )), where g ( · ) is often taken as a linear function but we allow it tobe nonlinear for adjusting the RC symmetry as describedbelow. The two-component vector v ( t ) = { v ( t ) , v ( t ) } projects the reservoir onto the two parity labels: the RCoutput parity is +1 for each time span T if the averageover ∆ T component v is larger than v and − T is the measurement window within T usedfor the reservoir output calculation. It starts at time aninitial time T and ﬁnishes at T + ∆ T .Training the RC uses supervised learning, where aninput data u train drives the reservoir and the desiredoutput Y is known in advance. We use Ridge regressionto ﬁnd the (2 × N ) output matrix W out by minimizing | Y − W out g ( r ) | + α || W out || , where the Ridge parameter α prevents overﬁtting.The RC is instantiated using the following procedures.The components of W in are chosen randomly from azero-mean normal distribution with variance ρ in andprobability σ for a non-zero coeﬃcient that speciﬁesthe input connectivity. The adjacency matrix W r hasa spectral radius ρ r and a node in-degree k , which isthe number of connections from other reservoir nodes.The RC performance depends on the hyper-parameter set p =( T , ∆ T, γ, ρ r , σ, ρ in ), which is selected using aBayesian optimizer [14] (See Supplemental Material [29]). A symmetry-aware RC -

First, we describe how anRC is typically used to solve the parity task [30–39] andwhy this standard approach violates the symmetries de-scribed above. In the previous works, u is injected intothe reservoir as serial data, as shown in Fig. 1(i). Becauseof the RC fading memory, required for good performance,bits earlier in the sequence are partially forgotten by thetime the n th bit is injected into the reservoir. Also, in-formation from one n -bit sequence spills into the next se-quence. Thus, the combination of serial-data-input andfading memory violates the sequence-order permutationsymmetry. No adjustment of the RC hyperparameterscan fully ﬁx this symmetry mismatch and the problembecomes more pronounced as n increases.The parity-order symmetry is also not respected in astandard RC. In the reservoir computing community, it iscommon to use f ( r )=tanh( r ), g ( r )= r , and b =0. In thiscase, the RC possesses inversion symmetry ( u , r , v ) →− ( u , r , v ), which respects the parity-order symmetry for n odd, but not for n even. Thus, we expect poor perfor-mance for n even. This might be the reason why manyworks on RCs only try to solve the parity task for n odd,as in Refs. [30–33] for example. On the other hand, otherrelated works where RC does not present such symme-try address both odd and even n , as in Refs. [34, 35].However, for the latter, whether symmetry rules play animportant role in their system is not discussed.We make changes to both the input and output lay-ers to solve these problems and realize a symmetry-aware RC; no change to the reservoir is required.To address sequence-order permutation symmetry wemake two changes to the input layer. First, we usea tapped delay line for the input data as shown inFig. 1(ii), which converts the serial data into an n -bitparallel word. Serial-to-parallel conversion is a com-mon method in high-speed electronics and hence canbe achieved in hardware without loss of RC through-put. Here, u ( t ) is a n -dimensional vector with compo-nents [ u ( t ) , u ( t − T ) , . . . u ( t − [ n − T )] | , where | indi-cates the transpose. The second modiﬁcation is to broad-cast all n components of the data vector simultaneouslyto each reservoir node with an identical weight deter-mined by W in . We also reset all reservoir nodes to zeroafter the time T when a new sequence is input. Thesechanges restore the sequence-order permutation symme-try.The parity-order symmetry can be respected to someextent by changing the symmetry of f , g , or taking b =0.However, changing the symmetry of f also aﬀects the in-hibitory versus excitatory aspect of the signals and hencecan have a negative impact on RC performance. Simi-larly, it is diﬃcult (or impossible, depending on f ) tohave a pure even or odd symmetry by adjusting b . Onthe other hand, adjusting g can provide symmetry match- −101 u P η r = 0 η r = 1 r B E R P P r P P (a)(b) (c) FIG. 2. RC performance as function of symmetrybreaking parameter. (a) Top: Segment of input test-ing signal u . Bottom: P desired output (continu-ous black line) and the optimized RC output (dashedline) for η r =0 (left) and η r =1 (right). The hyperpa-rameters are p =(0 . T, . T, . T − , . , . , .

30) and p =(0 . T, . T, . T − , . , . , . η r for n even. The vertical bars are limited by the q and q quartiles and the vertical lines by the minimum and maxi-mum BER values. (c) Same as (b), but for n odd. ing. In the Supplemental Material [29], we compare allthree approaches and demonstrate that adjusting only g gives rise to a highly performing RC for the parity task.In the results presented below, we set f ( x )=tanh( x ) and b =0 and set the RC symmetry by squaring a portion η r ofnodes before the output multiplication so that g ( r i )= r i for i ≤ η r N and g ( r i )= r i for i > η r N . An optimizationroutine can be used to select η r . Results -

We now demonstrate that an RC can be de-signed to achieve zero error for the P n task using a small N when both symmetries are taken into account. As abaseline, we perform the parity task applied to the testbinary time-series data shown in the top panel of Fig. 2(a)for n =6 using the common RC conﬁguration of serial-data input with η r =0 and a reservoir size of N =100.The reservoir is trained using a random binary time se-ries with 1000 bits and with optimized hyperparameters.For testing, we measure the bit error rate (BER) of a1000-bit random time series diﬀerent from that used intraining. Comparing the ground truth and RC-predictedparity in the bottom left panel of Fig. 2(a), we see thatthe RC performs poorly with a BER of 0.4 - essentiallynot much better than guessing.Next, we modify only the output layer by taking η r =1so that the parity-order symmetry is respected for thiscase when n is even. The reservoir is retrained with thisnew output layer and the hyperparameters re-optimized.Dramatically, the BER drops to zero as seen in the bot-tom right panel of Fig. 2(a), albeit for this fairly largereservoir. To our knowledge, there are no previous re-ports of obtaining zero-error for P in the reservoir com-puting literature, demonstrating the importance of re- N −3 −2 −1 Mean BER (a) (b) (c)

FIG. 3. Mean BER as function of N and n . The dashed linesrepresent the ﬁt of the network size scaling to obtain a meanBER=0 (black bars). (a) Only the parity-order symmetry isrespected. The y -axis starts with N =1 and N =10, then N is incremented by 10. The ﬁt shows an exponential scalingwith coeﬃcient of determination R =0.994. (b) and (c) Bothparity-order and sequence-order permutation symmetries arerespected and the ﬁt shows linear scaling N ∼ . n − . R =0.96 for n ≤

10 and N ∼ . n − .

33 with R =0.99for 10 ≤ n ≤ specting the parity-order symmetry.To explore this point further, we measure the BER asa function of η r as seen in Figs. 2(b) and 2(c). For eachpoint, we optimize the hyperparameters for 10 diﬀerentreservoirs and average our results. For n =2 or 3, the se-quences are short enough so that zero-error performanceis obtained even when the symmetry is not fully satisﬁed( η r should be equal to 1 for n even and 0 for n odd to fullysatisfy the parity-order symmetry). However, for larger n , it is of greater importance to match this symmetry.For P , the mean BER is 0 . ± .

009 for η r =0, demon-strating that satisfying the parity-order symmetry aloneis not enough to obtain zero error for this reservoir size.Here, the standard deviation is used as error interval.We expect that the performance of the RC will improveas N increases as is generally found in the RC literature.To explore the reservoir size required to obtain zero erroron the parity task, we set η r to respect the parity-ordersymmetry, instantiate 50 diﬀerent RCs and optimize thehyperparameters for each. Figure 3(a) shows the meanBER (color scale) for each N and n . Here, we stop in-creasing N when all the 50 RCs reach BER=0. The widthof the horizontal bars indicates the fraction of reservoirswith BER=0, where the minimum width for small N in-dicating that no reservoir has zero error. The white starindicates the smallest N for which at least one out of the50 RCs obtains BER=0. While we only go up to n =7 dueto exponential increasing computational cost, the ﬁtting(dashed line) shows an exponential scaling of N to obtainBER=0 for these RCs that respect parity-order symme-try but use serial input.We ﬁnd a remarkable improvement in the RC perfor-mance when respecting both symmetries. We use theparallel input scheme discussed above while simultane-ously setting η r to satisfy the parity-order symmetry. Asseen in Fig. 3(b), we ﬁnd that a reservoir with only N ≤ n =7, an expo-nential reduction in N in comparison to the serial-inputcase that does not respect the sequence-order permuta-tion symmetry. To our knowledge, there are no previousworks in reservoir computing literature that completelysolve the parity task using such small networks. Figure3(c) shows that N continues linear scaling for n up to100. Past work using hand-crafted ANNs suggested ascaling of N ∼ log ( n + 1) [26], but full accuracy wasnot obtained when such ANNs were performed. Theirsuccess rate decreases with increasing n . Size of training data set -

As a ﬁnal thought on us-ing RCs for solving the parity task, we note that previouswork trained the RC with a long random bit sequence.Commonly, it is found that the performance increaseswith the length of this training set. We hypothesize thatthe reason the performance improves for longer randombinary sequences in the past work is partly due to thefact that the RC is more likely to be presented with theentire set of unique sequences the longer the data set.To quantify this point, we ﬁnd that the expected num-ber of n -bit-long sequences required in the training timeseries is given approximately by the coupon collector ex-pression E ( n )=2 n [1 / n + 1 / (2 n −

1) + 1 / (2 n −

2) + ... +1 / n H n where H M is the M th harmonic number [40].Because the parity task involves a sliding window with n bits being processed at a time, there is re-use of bitsfrom one sequence to the next. Accounting for this reuse,the training time series only need to contain, on average,[ E ( n )+ n −

1] bits. As an example, E =22 for n =3 so thatwe need to train the reservoir with a 24-bit-long randomsequence on average.For a fully symmetry-aware RC, each sequence in theset L n ( l ) is equivalent so the reservoir only needs to betrained on any one sequence in each set. Furthermore,the NOT of a sequence in L n ( l ) (equivalent to u → − u )is found in the set L n ( n − l ) and the parity-order sym-metry ensures that the RC will give the correct resultjust by training on the sequence; that is, the NOT of thesequence is not needed.To quantitatively predict the number of sequences re-quired to train the reservoir based on this line of rea-soning, we introduce the parameter s , which is the min-imum of the number of 1’s or -1’s in a sequence. Itsmaximum value s max is n/ n even and ( n − / n odd. With this notation, the number of n -bit-longsequences for training is s max + 1. Because of the slid-ing window and bit re-use mentioned above, the requiredtraining length is only n + s max , an exponential reduc-tion in comparison to the standard method of training anon-symmetry-aware RC. A simple way to construct thetraining data set in this case is to make the ﬁrst n bitsequal to -1 and the following s max bits equal to 1. Weused this procedure on the RCs of Figs. 3(b) and 3(c),which greatly reduced the computation time required togenerate this plot in addition to the savings obtained by using a much smaller N . Conclusion -

Our work highlights the importance ofmatching the symmetry of an RC to the symmetry of thedata being processed and that these symmetries can besatisﬁed by only making changes to the input and outputlayers of the RC. While we swept the output-layer sym-metry parameter η r in Figs. 2(b) and 2(c) and the opti-mal value can be found by visual inspection, a Bayesianoptimization routine could be used to automatically ﬁndthe best value. Furthermore, a mixture of serial and par-allel data structures on the input layer could be simi-larly used and optimized automatically. This suggeststhat our approach can be used to discover symmetriesin other problems that may be more complex than thesimple parity task considered here. As a recent exam-ple, by taking into account symmetries in the electronicwavefunction, a Deep-Neural-Network approach for solv-ing the electronic Schr¨odinger equation [41] outperformedprevious state-of-the-art solutions for this problem.Of note is the observation that a symmetry-aware RChas vastly improved performance. For the parity task,which is traditionally considered to be a hard ML prob-lem, we obtain an exponential reduction in the number ofreservoir nodes and training set size to obtain zero error.In principle, the symmetry considerations we have usedto achieve drastic improvement in performance for reser-voir computing can be applied to other neuromorphicand machine learning approaches, such as ANNs. Futureresearch is required to determine if similar performanceimprovements can be found in these methodologies whensymmetry is a design consideration.W.B. and D.J.G. gratefully acknowledge the ﬁnancialsupport of Raytheon BBN Technologies through project ∗ [email protected] † [email protected] ‡ [email protected][1] H. Jaeger and H. Haas, Science , 78 (2004).[2] W. Maass, T. Natschl¨ager, and H. Markram, NeuralComput. , 2531 (2002).[3] D. J. Gauthier, SIAM News , 12 (2018).[4] J. Pathak, B. Hunt, M. Girvan, Z. Lu, and E. Ott, Phys.Rev. Lett. , 024102 (2018).[5] C. Klos, Y. F. Kalle Kossio, S. Goedeke, A. Gilra, and R.-M. Memmesheimer, Phys. Rev. Lett. , 088103 (2020).[6] A. Jalalvand, G. Van Wallendael, and R. Van DeWalle, in (2015) pp. 146–151.[7] I. Shani, L. Shaughnessy, J. Rzasa, A. Restelli, B. R.Hunt, H. Komkov, and D. P. Lathrop, Chaos , 123130(2019).[8] P. Vlachas, J. Pathak, B. Hunt, T. Sapsis, M. Girvan,E. Ott, and P. Koumoutsakos, Neural Netw. , 191(2020). [9] A. Chattopadhyay, P. Hassanzadeh, and D. Subrama-nian, Nonlinear process. geophys. , 373 (2020).[10] D. Canaday, A. Griﬃth, and D. J. Gauthier, Chaos ,123119 (2018).[11] J. Pathak, Z. Lu, B. R. Hunt, M. Girvan, and E. Ott,Chaos , 121102 (2017).[12] Z. Lu, J. Pathak, B. Hunt, M. Girvan, R. Brockett, andE. Ott, Chaos , 041102 (2017).[13] Z. Lu, B. R. Hunt, and E. Ott, Chaos , 061104 (2018).[14] A. Griﬃth, A. Pomerance, and D. J. Gauthier, Chaos ,123108 (2019).[15] L. M. Pecora, F. Sorrentino, A. M. Hagerstrom, T. E.Murphy, and R. Roy, Nat. Commun. , 4079 (2014).[16] T. Nishikawa and A. E. Motter, Phys. Rev. Lett. ,114101 (2016).[17] J. D. Hart, L. Larger, T. E. Murphy, and R. Roy, Philos.Trans. R. Soc. A , 20180123 (2019).[18] F. Molnar, T. Nishikawa, and A. E. Motter, Nat. Phys. , 351 (2020).[19] H. Jaeger, German National Research Center for Infor-mation Technology GMD Report No. 148 (2001).[20] I. B. Yildiz, H. Jaeger, and S. J. Kiebel, Neural Netw. , 1 (2012).[21] J. Yperman and T. Becker, arXiv:1611.05193.[22] C. Thornton, in Advances in Artiﬁcal Intelligence , editedby G. McCalla (Springer Berlin Heidelberg, Berlin, Hei-delberg, 1996) pp. 362–374.[23] M. Grochowski and W. Duch, in

Constructive NeuralNetworks. Studies in Computational Intelligence , Vol.258, edited by L. Franco, E. D. A., and J. M. Jerez(Springer, Berlin, Heidelberg, 2009) pp. 49–70.[24] S. Shalev-Shwartz, O. Shamir, and S. Shammah, in

Pro-ceedings of the 34th International Conference on MachineLearning - Volume 70 , ICML’17 (JMLR.org, 2017) pp.3067–3075.[25] B. M. Wilamowski, D. Hunter, and A. Malinowski, in

Proc. 2003 IEEE IJCNN. , Vol. 4 (2003) pp. 2546–2551vol.4.[26] D. Hunter, H. Yu, M. S. Pukish, III, J. Kolbusz, andB. M. Wilamowski, IEEE Trans. Industr. Inform. , 228(2012). [27] M. Z. Arslanov, Z. E. Amirgalieva, and C. A. Kenshimov,Open Eng. (2016).[28] M. L. Minsky and S. A. Papert, Perceptrons: An In-troduction to Computational Geometry (The MIT Press,Cambridge, MA, 1969).[29] See Supplemental Material at [URL will be inserted bypublisher] for details on hyperparameters optimizationand comparison of symmetry breaking parameters.[30] N. Bertschinger and T. Natschl¨ager, Neural Comput. ,1413 (2004).[31] S. Dasgupta, F. W¨org¨otter, and P. Manoonpong, in En-gineering Applications of Neural Networks , edited byC. Jayne, S. Yue, and L. Iliadis (Springer Berlin Hei-delberg, Berlin, Heidelberg, 2012) pp. 31–40.[32] D. Snyder, A. Goudarzi, and C. Teuscher, Phys. Rev. E , 042808 (2013).[33] J. Schumacher, H. Toutounji, and G. Pipa, in Artiﬁ-cial Neural Networks and Machine Learning – ICANN2013 , edited by V. Mladenov, P. Koprinkova-Hristova,G. Palm, A. E. P. Villa, B. Appollini, and N. Kasabov(Springer Berlin Heidelberg, Berlin, Heidelberg, 2013)pp. 26–33.[34] J. C. Coulombe, M. C. A. York, and J. Sylvestre, PLoSOne , 1 (2017).[35] G. Dion, S. Mejaouri, and J. Sylvestre, J. Appl. Phys. , 152132 (2018).[36] T. Furuta, K. Fujii, K. Nakajima, S. Tsunegi, H. Kubota,Y. Suzuki, and S. Miwa, Phys. Rev. Applied , 034063(2018).[37] T. Kanao, H. Suto, K. Mizushima, H. Goto,T. Tanamoto, and T. Nagasawa, Phys. Rev. Applied ,024052 (2019).[38] S. Tsunegi, T. Taniguchi, K. Nakajima, S. Miwa,K. Yakushiji, A. Fukushima, S. Yuasa, and H. Kubota,Appl. Phys. Lett. , 164101 (2019).[39] S. Watt and M. Kostylev, Phys. Rev. Applied , 034057(2020).[40] P. Flajolet, D. Gardy, and L. Thimonier, Discrete Appl.Math. , 207 (1992).[41] J. Hermann, Z. Sch¨atzle, and F. No´e, Nat. Chem. ,891 (2020). upplemental Material: Symmetry-Aware Reservoir Computing Wendson A. S. Barbosa, Aaron Griﬃth, Graham E. Rowlands, Luke C. G. Govia, Guilhem J. Ribeill, Minh-Hai Nguyen, Thomas A. Ohki, and Daniel J. Gauthier Department of Physics, Ohio State University, 191 W. Woodruﬀ Ave., Columbus, OH 43210, USA Quantum Engineering and Computing, Raytheon BBN Technologies, Cambridge, MA 02138, USA

HYPER-PARAMETERS OPTIMIZATION

The choosing of the hyper-parameters set is very important regarding the RC performance. We use a Gaussian-Process-based Bayesian optimizer available in the skopt python module to ﬁnd the optimal set of hyper-parameters( T ,∆ T , γ , ρ r , σ , ρ in ) for each case studied in this work. We keep k = 10 ( k = N for N <

10) for the serial and k = 1for parallel input schemes. For the reservoir integration, we used a simple Euler algorithm with time step dt = 0.01 T and saved the reservoir state every 5 steps. Table S1 shows the scanned range for each hyper-parameter. The optimalhyper-parameters may change for diﬀerent RC topologies, i.e., diﬀerent W r and W in chosen before optimization. T [ T ] ∆ T [ T ] γ [ T − ] ρ r ρ in σ Serial 0-0.5 0.05-0.5 0.1-5.0 0.1-2.0 0.1-1.0 0.1-1.0Parallel 0-1 0.05-1 0.1-20.0 0.1-20.0 0.1-1.0 0.1-1.0TABLE S1. Hyper-parameter space scanned by the Bayesian optimizer.

Figure S1(a) shows the optimal parameters distribution of the 50 RCs that have BER = 0 represented by the blackbars in Fig. 3(a) of the main text for the serial input case. Figure S1(b) shows the optimal parameters histogramsfor the 50 RCs that reach BER = 0 for parallel case shown in Fig. 3(b) on the main text (black bars). The all set ofoptimal hyper-parameters is available for request. [T]10 −1 ] 0 2ρ 0 1ρ in F r e q u e n c y P P P P P P Δa) [T]10 −1 ] 0 20ρ 0 1ρ in F r e q u e n c y P P P P P P Δb)

FIG. S1. Optimal parameters distribution of the 50 RCs that have BER = 0 for 2 ≤ n ≤ a r X i v : . [ c s . N E ] J a n RC SYMMETRY BREAKING PARAMETERS

The RC inversion symmetry can be adjusted by three diﬀerent ways: • Changing the symmetry of f : we use f = tanh as non-linearity for a portion η f of the nodes. • Changing the symmetry of g : we square r ( t ) for the portion η r of nodes just before output matrix multiplication. • Adding a bias b : we introduce a bias b = 0 in the argument of f .Figure S2 shows a box plot for the P classiﬁcation BER for when the RC has its symmetry adjusted separatelyby η f , η r and b . When one of these three parameters is adjusted, the other two are set to zero. For each one of thethree cases, 5 diﬀerent RC instances are optimized. The mean BER is represented by the red triangles. FIG. S2. P classiﬁcation BER for η f , η r and b as symmetry breaking parameter. The box plot represents a set of 5 optimizedRC instances. The mean BER is represented by the red triangles, the blue box is limited by the q and q quartiles, the orangehorizontal line stands for the median and the vertical lines are limited by the minimum and maximum BERs among the 5instances. We ﬁnd that the best RC performance (mean BER = 0) is obtained when we adjust η r . For this case, the symmetryis broken at the output layer and all the network nodes have dynamics which states can present either negative orpositive values. It does not happen when we break the symmetry by adjusting η f . In that case a portion of nodeshas its state set to be always positive due to its nonlinearity f = tanh . These nodes are always excitatory to the restof the network. This may limit the network inhibitory behavior and decrease the network computational capacity.The bias is the worst of all the three symmetry breaking parameters. The high mean BER for P classiﬁcation incomparison to the other two parameters is explained by the incapacity of having an even function whenever there isa b = 0 oﬀset inside ff