Adaptive model selection in photonic reservoir computing by reinforcement learning
AAdaptive model selection in photonic reservoir computing by reinforcement learning
Kazutaka Kanno * , Makoto Naruse , and Atsushi Uchida Department of Information and Computer Sciences, Saitama University 255 Shimo-Okubo, Sakura-ku, Saitama City, Saitama, 338β8570 Japan Department of Information Physics and Computing, Graduate School of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8654 Japan *[email protected]
ABSTRACT
Photonic reservoir computing is an emergent technology toward beyond-Neumann computing. Although photonic reservoir computing provides superior performance in environments whose characteristics are coincident with the training datasets for the reservoir, the performance is significantly degraded if these characteristics deviate from the original knowledge used in the training phase. Here, we propose a scheme of adaptive model selection in photonic reservoir computing using reinforcement learning. In this scheme, a temporal waveform is generated by different dynamic source models that change over time. The system autonomously identifies the best source model for the task of time series prediction using photonic reservoir computing and reinforcement learning. We prepare two types of output weights for the source models, and the system adaptively selected the correct model using reinforcement learning, where the prediction errors are associated with rewards. We succeed in adaptive model selection when the source signal is temporally mixed, having originally been generated by two different dynamic system models, as well as when the signal is a mixture from the same model but with different parameter values. This study paves the way for autonomous behavior in photonic artificial intelligence and could lead to new applications in load forecasting and multi-objective control, where frequent environment changes are expected.
Introduction
Reservoir computing involves information processing based on recurrent neural networks . This method is known to be suitable for temporal or sequential information processing, such as time series prediction and speech recognition . In reservoir computing, the input data to be processed are fed into a recurrent neural network, which is called a reservoir. The reservoir network produces a transient response when the input signal is injected. The reservoir computing processing result is the weighted linear sum of the node states in the reservoir. The main characteristic of reservoir computing is that the input weights and reservoir are fixed, being specified by the physical characteristics of the reservoir, while the output weights are trained. These characteristics significantly reduce the computational cost of learning compared with those of standard recurrent neural networks. Nonlinear mapping of the input data into a high-dimensional space is required to achieve reservoir functionality for successful computation . This functionality can be realized using other nonlinear dynamic systems instead of recurrent neural networks. Reservoir computing based on various types of nonlinear dynamic systems has been proposed . Photonic mplementation of reservoir computing is one example, where a semiconductor laser with a delayed feedback loop is used as a reservoir . One of the advantages of photonic reservoir computing is that it enables the realization of fast information processing with low learning cost using established optoelectronic devices. It has been reported that speech recognition at a rate of 1.1 Gb/s can be achieved using photonic reservoir computing . Reservoir computing can, however, only adapt to input signals that are used to train the output weights of the reservoir. In other words, reservoir computing does not work well if the incoming signals do not correspond with the training datasets. In reality, environmental conditions may change the characteristics of the observations, which could induce variations of the input that are different from the original knowledge used in the training phase. Additionally, it is assumed that the input signals could be generated by many different dynamic source models and the source model is dynamically switched in time or the signal is a mixture of different source models. It may be difficult to train the reservoir computing system to produce the correct outputs for all different models or arbitrary environmental conditions. To solve this serious issue, we propose a scheme of reservoir computing combined with reinforcement learning in this study. In this scheme, training is conducted with respect to individual input signals generated by a designated model. Hence, multiple output weights of the reservoir are obtained, corresponding to the different types of source signals in the training phase. In the task execution phase, one of the output weights of the reservoir is selected such that the minimum prediction error for the given input signals is achieved by reinforcement learning. This adaptive model selection scheme is expected to be useful for applications such as load forecasting , multi-objective control , and signal recovery in communication when environmental changes or diverse types of input signals are expected; hence, the preparation of multiple output weights of the reservoir prior to execution and dynamic model selection would be highly effective. Decision making using reinforcement learning is a machine learning scheme concerned with the problem of training an action policy to maximize the total reward . The multi-armed bandit (MAB) problem is a fundamental problem in reinforcement learning, whose goal is to maximize the total reward when agents select one of multiple slot machines with unknown hit probabilities in finite trials . The idea of adaptive model selection stems from associating the slot machines in the MAB problem with the trained output weights of the reservoir. Therefore, the strategy used to solve the MAB problem could be effective in adaptive model selection. Furthermore, several methods of photonic decision making have been demonstrated with operation in the gigahertz regime by utilizing chaotic laser time series . Notably, both reservoir computing and dynamic model selection can be performed on a photonic platform for ultrafast operation . In this study, we numerically demonstrate adaptive model selection using decision making based on chaotic laser outputs in photonic reservoir computing with reinforcement learning. We consider a situation in which the input signal is generated by one of two dynamic models, specifically, the Lorenz model or RΓΆssler model , and the input signal is switched in time between the two models to mimic environmental changes. We train the reservoir using the time series generated by either one of the two models and prepare two types of output weights for the reservoir corresponding to the two models. We perform time series prediction of the input signal using reservoir computing. Generally, if the output weights of a reservoir do not correspond to the characteristics of the actual input signals, for instance due to environmental changes, a larger prediction error is obtained. In reinforcement learning, action policies are trained based on rewards, and the prediction errors in reservoir computing are regarded as rewards in this study. The proposed scheme autonomously changes the output weights of the reservoir according to the given input signals to reduce the prediction error. We numerically demonstrate correct adaptive model selection for different configurations of the dynamic models. Adaptive model selection based on decision making in photonic reservoir computing
We propose a scheme for adaptive model selection based on decision making in photonic reservoir computing. Figure 1 schematically illustrates the architecture of the proposed approach. The scheme comprises three parts: photonic reservoir computing, reinforcement learning, and generation of chaotic laser outputs. In this study, we numerically implement photonic reservoir computing and reinforcement learning. We use experimentally generated chaotic temporal waveforms of the laser outputs for reinforcement learning in the numerical simulations. The photonic reservoir computing system consists of a semiconductor laser with optical feedback. (See the
Methods section for details.) In this scheme, chaotic time series prediction is numerically performed using photonic reservoir computing, where a predicted signal is generated using two dynamical models: the Lorenz and RΓΆssler models . Considering the situation in which the source of the input signal changes over time, mimicking environmental changes, single-point prediction is performed using photonic reservoir computing. Two types of reservoir output weights are prepared, which are trained by chaotic time series generated separately using the Lorenz and RΓΆssler models. Two predicted time series are generated based on the two output weights. In the adaptive model selection, the prediction errors for the two output weights are utilized with the objective of determining which model should be used for time series prediction. Figure 1.
Schematic diagram of adaptive model selection using reservoir computing and reinforcement learning. The system comprises three parts: photonic reservoir computing, reinforcement learning, and chaotic laser system. LD is laser diode, PM is phase modulator, CIRC is optical circulator, ATT is optical attenuator, FC is optical fiber coupler, PD is photodetector, ISO is optical isolator, and OSC is digital oscilloscope.
The input chaotic time series is denoted by π’(π) (Fig. 1). The task of the reservoir is to conduct a single-point prediction of π’(π) ; that is, the reservoir computing predicts π’(π + 1) when π’(π) is injected into the reservoir. Two types of output weights are trained separately using the time series from the two models and are represented as π€ and π€ . The reservoir produces two predicted outputs, π (π) and π (π) , using output weights π€ and π€ , respectively. LD FC PD
OSC
ISOReflectorATT π€ π€ π ππ π LD PMFC PD
OSC
ATT
CIRC
Rӧssler
Lorenz Mask π’ π + 1
Photonic reservoir computing
Select π π or π π π π’ π + 1 π ππ’ π π π’ π + 1 π π Calculation of prediction errors
Change π Generation of chaotic laser outputs
Reward π π Switch
Reinforcement learning
Chaotic laser outputs daptive model selection is performed by determining whether the prediction error (π) or (π) is smaller. The method of decision making based on chaotic laser output is employed to select one of the two output weights . In this method, a chaotic laser time series generated by a semiconductor laser with optical feedback is used , and the sampled data of the chaotic output are compared with a threshold value (π) . If the chaotic laser output is larger (smaller) than (π) , π (π) ( π (π) ) is selected. Then, we go to the next time step ( π β π + 1 ) and use the next input datum π’(π + 1) . From π’(π + 1) , π (π) , and π (π) , (π) and (π) can be obtained using (π) |π’(π + 1) π (π)| . The smaller prediction error is determined by comparing (π) and (π) , and (π) is changed accordingly. If (π) is smaller (larger) than (π) , then (π) is decreased (increased). The change in (π) increases the probability of selecting the predicted output with the smaller error. By repeating the change in (π) based on the comparison of (π) and (π) , (π) becomes much smaller or larger than the probability distribution of the chaotic laser output, and only one of p ( n ) or p ( n ) is selected. Then, the correct values of w and w for adaptive model selection are determined. (π) is changed via the threshold adjuster π΄(π) and is defined as follows: (π) { ππ π πβ π΄(π)β ππ π ( π΄(π 1) > π)( ππ π β€ (π) β€ ππ π ( π΄(π 1) < π π ) . (1) β π΄ ( π )β is the nearest integer to π΄(π) rounded to 0. In this study, β π΄ ( π )β was assumed to take the values π π , β¦ 1,0,1 β¦ , π π , where π π is a natural number. Hence, the number of thresholds is π + 1 . The threshold number and π in Eq. (1) determine the range of (π) . The range of (π) is limited from ππ π to ππ π by setting (π) ππ π when π΄(π) > π π and (π) ππ π when π΄(π) < π π . π΄(π) is changed based on the relationship between the magnitudes of (π) and (π) as follows: π΄(π + 1) {πΌ π΄(π) 1πΌ π΄(π) + 1 ( (π) β€ (π))( (π) > (π)) , (2) where πΌ is referred to as the forgetting (memory) parameter . A large value of πΌ means that the dynamics of π΄(π) holds memory of the initial value of
π΄(π) . In this scheme, the sum of the hit probabilities of the two slot machines (models) is supposed to be fixed at 1, because one of the two models is always selected. Therefore, the threshold shift is fixed at 1. A temporal waveform of chaotic laser outputs used for decision making was experimentally obtained from a semiconductor laser with optical feedback . The semiconductor laser was subjected to delayed optical feedback by using an external fiber reflector, inducing chaotic temporal waveforms in the intensity of the laser output . The chaotic output was detected using a photodetector and sampled by a high-speed digital oscilloscope. The sampling interval of the digital oscilloscope was 10 ps, and the chaotic laser output was sampled at this interval. In this study, decision making is performed at a sampling interval of 50 ps, because it has been reported that this sampling interval yields the best performance due to the existence of a negative correlation . The vertical resolution of the digital oscilloscope was 8 bits, and the sampled data had 8-bit resolution. In the decision making method, the chaotic data sampled by the oscilloscope are compared to (π) . We thus limited the range of (π) to
128 β€ (π) β€ 128 . To determine the shift of (π) , π π and π 16 were used in this study. The number of threshold levels was π + 1 17 . Results and Discussion daptive model selection between RΓΆssler and Lorenz models
We numerically demonstrate adaptive model selection based on decision making in chaotic time series prediction. To generate a prediction target, we use two models, the RΓΆssler and Lorenz models, which are well-known models that can produce chaotic behaviors (see the
Methods section for details). A time series is generated using one of the two models, and the models are switched over time. Figure 2 shows the input signals produced by the two models. The first 500 points of the time series are generated by the Lorenz model, which is then switched to the RΓΆssler model for the next 500 points. After that, the model is periodically switched every 500 points.
Figure 2.
Temporal waveform generated using the Lorenz and RΓΆssler models. The first 500 points of the waveform are produced using the Lorenz model, and the model is switched every 500 points. The values of the output weights π€ and π€ of the reservoir for the RΓΆssler and Lorenz models are presented in Figs. 3(a) and 3(d), respectively. The i -th element of the weight corresponds to the i -th virtual node in photonic reservoir computing. The time series generated using the RΓΆssler and Lorenz models are employed to calculate π€ and π€ , respectively, in the training procedure. The number of points used for training is 5,000, and the two weights are different from each other. -3-2-10123 0 500 1000 1500 2000 T e m po r a l w ave f o r m [ a r b . un i t s ] Time n [arb. units] Lorenz LorenzRossler Rossler" "
Figure 3.
Prediction results obtained using reservoir computing. (a), (d) Trained output weights π€ and π€ of the reservoir for the RΓΆssler and Lorenz models, respectively. (b), (e) Predicted time series obtained from π€ and π€ , respectively. (c), (f) Prediction errors calculated from the differences between the input and predicted time series. Figures 3(b) and 3(e) show the predicted time series of π (π) and π (π) , which were generated using π€ and π€ , respectively. Figures 3(c) and 3(f) depict the prediction errors (π) and (π) , respectively. These figures are enlarged in
300 β€ π β€ 700 , which includes the switching of the time series from the Lorenz model to the RΓΆssler model at π 500 . (π) < (π) when
300 < π < 500 , where the prediction target is the Lorenz model. Meanwhile, (π) < (π) when 500 < n < β (π) (π) (π) . The relationship between the magnitudes of the errors can be determined from β (π) . A positive value of β (π) indicates that (π) < (π) in Fig. 4(a). The temporal evolution of (π) is shown in Fig. 4(b). (π) for decision making varies based on β (π) and increases to 128 after fluctuating around 0 at a small time step. The predicted output is selected by comparing (π) with the chaotic laser output, and the selection result is shown in Fig. 4(c). When (π) fluctuates around 0 in Fig. 4(b), either π (π) or π (π) may be selected. After (π) reaches 128, only π (π) can be selected. Thus, the predicted output corresponding to the target (the Lorenz model) is selected successfully. -3-2-10123300 400 500 600 700 p ( n ) [ a r b . un i t s ] Time n [arb. units]Switch e ( n ) [ a r b . un i t s ] Time n [arb. units]Switch -0.12-0.08-0.0400.040.080.120 50 100 150 200 W e i gh t w [ a r b . un i t s ] Virtual node i -0.12-0.08-0.0400.040.080.120 50 100 150 200 W e i gh t w [ a r b . un i t s ] Virtual node i -3-2-10123300 400 500 600 700 p ( n ) [ a r b . un i t s ] Time n [arb. units]Switch e ( n ) [ a r b . un i t s ] Time n [arb. units]Switch (a) (b) (c)(d) (e) (f) Figure 4. (a) Time series of the differences between two prediction errors
Ξ (π) (π) (π) . (b) Temporal dynamics of (π) for decision making. (c) Models selected by decision making in each step. The Lorenz model is selected for π > 14. To investigate the adaptation of model selection to sudden environmental changes, we demonstrate time series prediction with model switching. The target time series shown in Fig. 2 consists of 2,000 points, so model selection is repeated 2,000 times. The prediction of the time series with 2,000 points is repeated 100 times. In each trial, different time series are generated by the two models and used as the prediction targets. We calculate the correct model selection rate , denoted by
CMSR(π) , which is defined as the ratio of the number of selections of the predicted output corresponding to the target model at time n among 100 trials. If CMSR(π) = 1, then the model used for time series prediction at time n perfectly agrees with the original input signal source model. Figure 5.
Correct model selection rate (CMSR) in adaptive model selection based on decision making in time series prediction. The models are switched between the Lorenz and RΓΆssler models every 500 steps, as shown in Fig. 2.
Figure 5 shows the temporal evolution of
CMSR(π) , which increases quickly to 1 after the prediction begins. When -1.2-0.8-0.400.40.81.20 10 20 30 40 50 60 e ( n ) β e ( n ) [ a r b . un i t s ] Time n [arb. units] -128-640641280 10 20 30 40 50 60 T ( n ) [ a r b . un i t s ] Time n [arb. units] S e l ec t e d m od e l Time n [arb. units]Rossler modelLorenz model " R o ss l e r " Lo r e n z (a) (b) (c) C M S R ( n ) [ a r b . un i t s ] Time n [arb. units] he target model is switched at π 500 , , and , CMSR(π) decreases to 0. After the switch,
CMSR(π) increases to 1 again. Therefore, the correct model is selected adaptively under model switching (i.e., environmental changes). In addition, we note that the switching of the model selection may randomly occur in general situations. The cases of the model selection at different switching times are present in the Supplementary Information.
Adaptive model selection with mixed time series from RΓΆssler and Lorenz models
Adaptive model selection in the RΓΆssler and Lorenz models is a simple case because the difference between (π) and (π) is large, as shown in Fig. 4. In this subsection, a more difficult case of adaptive model selection is described, in which a mixed time series is used for the prediction target, as shown in Fig. 6(a). A mixed time series is generated by the RΓΆssler and Lorenz models, where the two kinds of time series are mixed with different ratios. The two mixed time series are given by π₯ ππ₯ πΏ + (1 π)π₯ π and π₯ (1 π) π₯ πΏ + ππ₯ π , where π₯ π and π₯ πΏ represent the time series generated by the RΓΆssler and Lorenz models, respectively, and the coefficient a is the ratio of the Lorenz model in the mixed time series. The mixed time series is shown in Fig. 6(b), which is obtained with π fixed to and π₯ and π₯ used alternately every 500 points. The time series generated by the RΓΆssler and Lorenz models are used to train π€ and π€ , respectively. The aim of this model selection using the mixed time series is to select the time series corresponding to the model with the larger π , that is, π₯ for the Lorenz model and π₯ for the RΓΆssler model at π 0.8 . Figure 6. (a) Schematic diagram of switching in the mixed time series. The mixing ratio is represented as π . (b) Temporal waveform generated by mixing time series from the Lorenz and RΓΆssler models, π₯ π π₯ πΏ + (1 π)π₯ π and π₯ (1 π)π₯ πΏ + π π₯ π for π = 0.8. π₯ and π₯ are switched every 500 points. π₯ is used for and , while π₯ is used for
500 < π β€ 1000 and . The temporal evolution of β (π) , (π) , and the selected sequence of the predicted output are summarized in Figs. 7(a), 7(b), and 7(c), respectively. To obtain the mixed time series, a is fixed at and the ratio of the Lorenz model is larger than that of the RΓΆssler model. Initially, β (π) fluctuates around 0, as can be seen in Fig. 7(a), indicating that it -3-2-10123 0 500 1000 1500 2000 T e m po r a l w ave f o r m [ a r b . un i t s ] Time n [arb. units] RΣ§sslerLorenz Switch ππ1 π1 π (a) (b) π₯ π₯ π₯ π₯ π₯ π₯ ould be difficult to identify the correct model. However, the threshold reaches 128 approximately when π > 35 , as shown in Fig. 7(b), although β (π) fluctuates around 0. Only π (π) is selected after the threshold reaches 128. In other words, correct model selection is achieved in the mixed time series since π (π) corresponds to the Lorenz model, whose waveform is dominant in the input signal. Figure 7. (a) Time series of β (π) (π) (π) . (b) Temporal dynamics of (π) for decision making. (c) Models selected by decision making in each time step. π₯ is selected for π > 31 . CMSR(π) is calculated to examine the adaptation ability of model selection in the mixed time series. Figure 8(a) shows the temporal evolution of
CMSR(π) , and an enlarged view is provided in Fig. 8(b). The target time series is shown in Fig. 6(b), which is obtained by alternating between π₯ and π₯ every 500 points at π 0.8 . The red curve represents CMSR(π) in the case of the mixed time series. The black curve is the same as that in Fig. 5 and is included for comparison with the mixed time series. The black curve corresponds to the case in which π₯ and π₯ are switched for π 1.0 . CMSR(π) quickly increases to 1 in both curves after the prediction begins. When the model is switched at π 500 , , and , CMSR(π) decreases to 0. However,
CMSR(π) quickly increases to 1 after the switch. Therefore, the correct model is selected adaptively with environmental changes, which means successful model selection. For the mixed time series case, the difference between (π) and (π) fluctuates around 0, as shown in Fig. 7(b). The fluctuation of β (π) around 0 results in a slower increase of CMSR(π) . However,
CMSR(π) becomes 1, and the correct model is selected successfully. -1.2-0.8-0.400.40.81.20 10 20 30 40 50 60 e ( n ) β e ( n ) [ a r b . un i t s ] Time n [arb. units] -128-640641280 10 20 30 40 50 60 T ( n ) [ a r b . un i t s ] Time n [arb. units] S e l ec t e d m od e l Time n [arb. units] x x x x (a) (b) (c) Figure 8. (a) Correct model selection rate (CMSR) in the time series prediction task. (b) Enlarged view of (a). The red curve represents the case in which the input signal is a mixed time series consisting of π₯ and π₯ , as shown in Fig. 6(b). For comparison, the black curve represents the case in which the input signal is switched between the Lorenz and RΓΆssler models every 500 steps, as shown in Fig. 2. The possibility of model selection in the mixed time series is investigated while changing a . Figure 9 shows CMSR(π) at π 300 as a function of a . In total, 1,000 trials are conducted in the calculation of CMSR(π) . The target in the model selection is the RΓΆssler model for π < 0.5 and the Lorenz model for π β₯ 0.5 . When π < 0.50, CMSR(π) becomes 1 and the target model (RΓΆssler model) is successfully selected. On the other hand,
CMSR(π) does not always reach 1 when π β₯ 0.50 , where the target is the Lorenz model. For a large value of π ( ), CMSR(π) reaches 1. However,
CMSR(π) < 1 when π < 0.70 . In particular,
CMSR(π) is close to 0.03 when π 0.55 , where the RΓΆssler model is selected in most trials. The reason that the Lorenz model is not selected when π 0.55 is that (π) for the Lorenz model is larger than (π) for the RΓΆssler model. The output weights for the Lorenz model are trained from the time series whose attractor has a butterfly structure, and the Lorenz model shows unique oscillations in the upside and downside directions in the time series. The prediction accuracy for the output weights of the Lorenz model decreases if the butterfly structure does not appear in the mixed time series. The butterfly structure can be identified for π 0.8 in Fig. 6. However, this structure cannot be clearly observed for π < 0.7 in the mixed time series. Then, the prediction accuracy decreases and the RΓΆssler model is selected when π 0.55 . C M S R ( n ) [ a r b . un i t s ] Time n [arb. units] C M S R ( n ) [ a r b . un i t s ] Time n [arb. units] (a) (b) Figure 9.
Correct model selection rate (CMSR) when π 300 as a function of π . The correct selection is the RΓΆssler model for π < 0.5 and the Lorenz model for π β₯ 0.5 . Adaptive model selection between RΓΆssler models with different parameter values
In the previous two cases, adaptive model selection between two different models (the RΓΆssler and Lorenz models) is investigated. In this subsection, RΓΆssler models with different parameter values are considered, where the parameter value change corresponds to model switching, as shown in Fig. 10(a). This situation of parameter switching is expected to be more difficult than the case of switching between models. Figure 10(b) shows the prediction target. The parameter that is changed in the RΓΆssler model is represented as b (see the Methods section) and is switched to π and π , where the RΓΆssler model shows chaotic dynamics in both cases. The first 500 points in the target time series are generated using π . The switching interval is 500 points, and the total number of points in the time series is 2,000. The weights π€ and π€ are trained for π and π , respectively. C M S R ( n ) [ a r b . un i t s ] Ratio of Lorenz a [arb. units] n = 300 Figure 10. (a) Schematic diagram of switching in the case of the RΓΆssler model with different values of π . (b) Temporal waveform generated from the RΓΆssler model with π π and π π . In the first 500 points of the time series, π is used, and switching between π and π is performed every 500 points. CMSR(π) is investigated in the selection of the RΓΆssler models with the two parameter values. Figure 11 shows the temporal evolution of
CMSR(π) . The target time series is depicted in Fig. 10(b), where the value of π is switched at π 500 , , and . CMSR(π) quickly increases to 1 after the parameter value is switched. Therefore, adaptive model selection is effectively performed with parameter switching.
Figure 11.
Correct model selection rate (CMSR) in the time series prediction task. The target model is the RΓΆssler model with parameter values π and π , and the time series is shown in Fig. 10(b). π is changed every 500 points. -3-2-10123 0 500 1000 1500 2000 T e m po r a l w ave f o r m [ a r b . un i t s ] Time n [arb. units] b b b b (a) (b) RΣ§ssler ( π )SwitchRΣ§ssler ( π ) C M S R ( n ) [ a r b . un i t s ] Time n [arb. units] he dependence of model selection on the value of π is investigated using different values of π , with π fixed at 0 .
6. In this case, the target model is fixed to the RΓΆssler model with π π . Figure 12(a) shows CMSR(π) as a function of π at time steps π 100 (black curve) and π 300 (red curve). We focus on how the difference between π and π is related to the speed of adaptation in model selection. In Fig. 12(a), CMSR(π) is small near π π . CMSR(π) increases and approaches 1 as the difference between π and π increases. Therefore, if the two parameter values are apart from each other, the correct model can be selected. In addition, the adaptation speed increases as the difference between π and π increases. Figure 12. (a) Correct model selection rate (CMSR) at π 100 (black curve) and π 300 (red curve) as a function of π for the RΓΆssler model. π is fixed at . (b) Bifurcation diagram of the RΓΆssler model as a function of π . Local maxima in a time series of π₯ π are plotted in the bifurcation diagram. (c) The maximum Lyapunov exponent is plotted as a function of π . L ya puno v ex pon e n t [ a r b . un i t s ] Rossler parameter b " b = 0.6 Lo ca l m ax i m a o f w ave f o r m Rossler parameter b " b = 0.6 [ a r b . un i t s ] C M S R ( n ) [ a r b . un i t s ] Rossler parameter b n = 100 n = 300 " b = 0.6 (a) (b) (c) he temporal dynamics of the RΓΆssler model is also related to the adaptation speed. The bifurcation diagram and maximum Lyapunov exponent of the RΓΆssler model are investigated to determine the dynamics of the RΓΆssler model, as shown in Figs. 12(b) and 12(c), respectively, where b is changed. The bifurcation diagram is generated from the local maxima of the time series of the RΓΆssler model. The Lyapunov exponent quantifies the unpredictability of a dynamic system, and a positive value of the maximum Lyapunov exponent indicates chaotic dynamics. Here, the maximum Lyapunov exponent is positive in three regions: , , and , except when π 0.51 and , where periodic windows are observed. In Fig. 12(a), CMSR(π) at π 100 does not reach 1 in certain regions of π (e.g., π ). However, CMSR(π) becomes more than 0.99 at π 100 when β€0.12 and β€ 0.36 , where the dynamics is periodic. Although CMSR(π) at π 100 does not reach 1 when π β₯ 0.70 , it approaches 1 when π 300 . Therefore, the adaptation speed is slow if the temporal dynamics at π π and π π are chaotic, and it is fast if the dynamics of the two target models are different (e.g., chaotic and periodic oscillations). Conclusions
We proposed an adaptive model selection scheme using reinforcement learning for applications in photonic reservoir computing. Two types of time series were generated using the RΓΆssler and Lorenz models and were exchanged over time to emulate dynamic environmental changes of the incoming signals. We prepared two types of output weights for the RΓΆssler and Lorenz models prior to execution of the prediction task and identified one of the two models for accurate time series prediction using photonic reservoir computing. We succeeded in identifying the correct model adaptively using the prediction errors as rewards in reinforcement learning. The adaptive model selection was also achieved in the case of a mixed time series obtained from the Lorenz and RΓΆssler models with different ratios. We also investigated the adaptive selection of RΓΆssler models with different parameter values. The model selection became easier as the difference between the two parameter values increased. Although two models in reservoir computing were considered in the present study, scalable architecture should be possible; indeed, our former work demonstrated a solution for bandit problems with up to 64 arms using chaotic time series. We consider that constructing a single universal reservoir computing model that can deal with any possible input is most likely impossible; hence, dynamic and autonomous model selection will be a promising means of expanding the computing abilities of photonic artificial intelligence. Methods Photonic reservoir computing scheme
In reservoir computing, nonlinear mapping of the input information to be processed into a higher-dimensional phase space is required for successful computation . A recurrent neural network with a large number of nodes provides the nonlinear mapping in conventional reservoir computing. Instead of using a recurrent neural network, a semiconductor laser and delayed feedback loop can be utilized for photonic reservoir computing . The reservoir in Fig. 1 consists of a semiconductor laser with delayed optical feedback. A network is virtually emulated by a semiconductor laser and delayed feedback loop. In this scheme, nodes in a network are virtually implemented by temporally dividing the laser output into short time intervals π , which are called node intervals. Virtual nodes are defined by dividing the delay time of a feedback loop into π . The number of nodes is given by π /π . Therefore, a small value of π increases π . However, too mall of a value of π decreases the processing performance of reservoir computing . We used π 0.1 ns in this study, and of the reservoir was fixed at ns. Hence, π /π 200 . The input information to be processed is injected into the reservoir after preprocessing the input. We consider discrete time input data π π ( π 1,2, β¦ is the discrete time), which are injected into the reservoir for the duration of Ο to feed the input data to all of the virtual nodes. Before the input data are injected, a mask signal π(π‘) is multiplied with π π . The mask acts as input weights for virtual nodes and generates transient dynamics in the reservoir. To implement the same input weights for all of the input data, the period of the mask is equal to . The mask used in this study was a piece-wise step function with step interval π . The value of the mask was randomly chosen from the set { β , β . , . , (four-level digital mask). The multiplication of the input signal times the mask can be expressed as follows: π (π‘) πΎπ(π‘)π π ((π 1) β€ π‘ < π ), (3) where πΎ is the coefficient that scales the amplitude of π (π‘) . A weighted linear combination of virtual node states is calculated in the output layer, and the calculation result is the output of the RC. The RC output π¦(π) for the n -th input datum is given by the following equation: π¦(π) β π€ π π₯ π (π) ππ=1 , (4) where π₯ π is the node state and π€ π is the output weight for the -th node state. π₯ π is extracted from the temporal output of the reservoir, and π€ π is trained by minimizing the mean-square error between the target function π¦Μ (π) and RC output π¦(π) as follows: π‘π β(π¦(π) π¦Μ (π)) π‘π π=1 β πππ, (5) where π π‘π is the number of input data for training. Numerical model for photonic reservoir
The reservoir is an external cavity semiconductor laser with feedback phase modulation. The temporal dynamics of the laser is described by the Lang-Kobayashi equations : ππΈ(π‘)ππ‘ 1 + ππΌ2 {πΊ π (π(π‘) π )1 + π|πΈ(π‘)| π } πΈ(π‘) + π πΈ(π‘ ) exp[π{π (π‘) π }] + π(π‘) (6) ππ(π‘)ππ‘ π½ π(π‘) π πΊ π (π(π‘) π )1 + π|πΈ(π‘)| |πΈ(π‘)| , (7) where πΈ is the slowly varying complex electric field amplitude and π is the carrier density. πΊ π is the gain coefficient; π is the carrier density at transparency; πΌ is the linewidth enhancement factor; π and π are the photon and carrier lifetimes, respectively; and π½ is the injection current of the laser. π½ is given by the product of the lasing threshold current π½ π‘β and , where is the normalized injection current. π is the angular optical frequency of the laser and is given by π 2π/π , where π 1547 nm is the optical wavelength of the laser. These parameter values are shown in Table 1. The second term on the right-hand side of Eq. (6) represents the optical feedback. π is the feedback strength and is given by π π (1 π )/(π ππ ) , where π is the reflectivity of the laser facet and ππ is the round-trip time in the internal cavity of the laser. is the feedback delay time and is related to the number of virtual nodes π ( /π ). π (π‘) π represents the phase shift of the feedback light due to phase modulation and feedback delay. π (π‘) is the input signal or reservoir computing, and the input signal is injected into the reservoir via feedback phase modulation . The last term π(π‘) on the right-hand side of Eq. (6) represents the effect of spontaneous emission noise. π(π‘) is the normalized white Gaussian noise with the properties β©π (π‘)βͺ 0 and β©π (π‘ )π (π‘)βͺ πΏ(π‘ π‘ ) , where β©β βͺ denotes the ensemble average and πΏ is the Dirac delta function. Chaotic dynamical models for generating prediction targets
The prediction targets were generated by the RΓΆssler and Lorenz models, which are well-known models that can generate deterministic chaos. The temporal dynamics of the RΓΆssler and Lorenz models are represented in the following equations. For the RΓΆssler model, ππ₯ π ππ‘ π¦ π π§ π (5) ππ¦ π ππ‘ π₯ π + 0.2π¦ π (6) ππ§ π ππ‘ π + π₯ π π§ π π , (7) For the Lorenz model, ππ₯ πΏ ππ‘ 10(π¦ πΏ π₯ πΏ ) (8) ππ¦ πΏ ππ‘ π₯ πΏ π§ πΏ + 28π₯ πΏ π¦ πΏ (9) ππ§ πΏ ππ‘ π₯ πΏ π¦ πΏ
83 π§ πΏ , (10) In the RΓΆssler model, parameter π was set to unless otherwise specified. Variables π₯ π and π₯ πΏ were used for the prediction test. The time series of π₯ π and π₯ πΏ were normalized with their variances so as not to be identifiable based on knowledge of the amplitudes of the time series. Table 1.
Parameter values used in numerical simulations Symbol Parameter Value πΊ π Gain coefficient 8 . Γ β m s β π Carrier density at transparency 1 . Γ m β π Gain saturation coefficient 2 . Γ β π Photon lifetime 1 . Γ β s π Carrier lifetime 2 . Γ β s ππ Round-trip time in the internal cavity 8 . Γ β s π Reflectivity of the laser facet 0 . πΌ Linewidth enhancement factor 3 . π Optical wavelength of the laser 1 . Γ β m π Speed of light 2 . Γ ms β Reflectivity of the external mirror 0 . Normalized injection current of the laser 2 . Feedback delay time 20 . Γ β s References Jaeger, H. & Haas, H. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science , 78β80 (2004). 2.
Jaeger, H. The βecho stateβ approach to analysing and training recurrent neural networks. GMD-Report 148, Ger. Natl. Res. Inst. for Comput. Sci. (2001). 3.
Verstraeten, D., Schrauwen, B., Stroobandt, D. & Campenhout, J. V. Isolated word recognition with the liquid state machine: A case study. Inf. Process. Lett. , 521β528 (2005). 4. Cover, T. M. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. , 326β334 (1965). 5. Appeltant, L. et al. Information processing using a single dynamical node as a complex system. Nat. Commun. , 468 (2011). 6. Snyder, D., Goudarzi, A. & Teuscher, C. Computational capabilities of random automata networks for reservoir computing. Phys. Rev. E , 042808 (2013). 7. Nakajima, K., Hauser, H., Li, T. & Pfeifer, R. Information processing via physical soft body. Sci. Reports , 10487 (2015). 8. Bueno, J. et al. Reinforcement learning in a large-scale photonic recurrent neural network. Optica , 756β760 (2018). 9. Tanaka, G. et al. Recent advances in physical reservoir computing: A review. Neural Networks , 100β123 (2019). 10.
Paquot, Y. et al. Optoelectronic reservoir computing. Sci. Rep. , 287 (2012). 11. Larger, L. et al. Photonic information processing beyond Turing: An optoelectronic implementation of reservoir computing. Opt. Express , 3241β3249 (2012). 12. Brunner, D., Soriano, M. C., Mirasso, C. R. & Fischer, I. Parallel photonic information processing at gigabyte per second data rates using transient states. Nat. Commun. , 1364 (2013). 13. Kuriki, Y., Nakayama, J., Takano, K. & Uchida, A. Impact of input mask signals on delay-based photonic reservoir computing with semiconductor lasers. Opt. Express , 5777β5788 (2018). 14. Hong, T. & Fan, S. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast. , 914β938 (2016). 15. Marler, R. & Arora, J. Survey of multi-objective optimization methods for engineering. Struct. Multidiscip. Optim. , 369β395 (2004). 16. Jarajreh, M. A. et al. Artificial neural network nonlinear equalizer for coherent optical OFDM. IEEE Photonics Technol. Lett. , 387β390 (2015). 17. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (The MIT Press, 2018), second edn. 18.
Robbins, H. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. , 527β535 (1952). 19. Lai, T. & Robbins, H. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. , 4β22 (1985). 0. Daw, N. D., OβDoherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature , 876 (2006). 21.
Naruse, M., Terashima, Y., Uchida, A. & Kim, S.-J. Ultrafast photonic reinforcement learning based on laser chaos. Sci. Reports , 8772 (2017). 22. Homma, R. et al. On-chip photonic decision maker using spontaneous mode switching in a ring laser. Sci. Reports , 9429 (2019). 23. Mihana, T., Terashima, Y., Naruse, M., Kim, S.-J. & Uchida, A. Memory effect on adaptive decision making with a chaotic semiconductor laser. Complexity , 4318127 (2018). 24.
Lorenz, E. N. Deterministic nonperiodic flow. J. Atmospheric Sci. , 130β141 (1963). 25. RΓΆssler, O. E. An equation for continuous chaos. Phys. Lett. A , 397β398 (1976). 26. Uchida, A. Optical Communication with Chaotic Lasers, Applications of Nonlinear Dynamics and Synchronization (Wiley-VCH, Weinheim, 2012). 27.
Kim, S.-J., Naruse, M., Aono, M., Ohtsu, M. & Hara, M. Decision maker based on nanoscale photo-excitation transfer. Sci. Reports , 2370 (2013). 28. Mihana, T. et al. Decision making for the multi-armed bandit problem using lag synchronization of chaos in mutually coupled semiconductor lasers. Opt. Express , 26989β27008 (2019). 29. Naruse, M. et al. Scalable photonic reinforcement learning by time-division multiplexing of laser chaos. Sci. Reports , 10890 (2018). 30. Bueno, J., Brunner, D., Soriano, M. C. & Fischer, I. Conditions for reservoir computing performance using semiconductor lasers with delayed optical feedback. Opt. Express , 2401β2412 (2017). 31. Takano, K. et al. Compact reservoir computing with a photonic integrated circuit. Opt. Express , 29424β29439 (2018). 32. Lang, R. & Kobayashi, K. External optical feedback effects on semiconductor injection laser properties. IEEE J. Quantum Electron. , 347β355 (1980). 33. Nguimdo, R. M. et al. Prediction performance of reservoir computing systems based on a diode-pumped erbium-doped microchip laser subject to optical feedback. Opt. Lett. , 375β378 (2017). Acknowledgments
This work was supported in part by JSPS KAKENHI JP19H00868 and JST CREST JPMJCR17N2.
Author contributions
All authors have contributed to development and/or implementation of the concept. K. K. performed the numerical simulations and analyzed the data. K. K., M. N, and A. U. contributed to the discussion of the results. K. K., M. N, and A. U. contributed to the writing of the manuscript.
Competing Interests