Chaos | 2021

On explaining the surprising success of reservoir computing forecaster of chaos? The universal machine learning dynamical system with contrast to VAR and DMD.

 

Abstract


Machine learning has become a widely popular and successful paradigm, especially in data-driven science and engineering. A major application problem is data-driven forecasting of future states from a complex dynamical system. Artificial neural networks have evolved as a clear leader among many machine learning approaches, and recurrent neural networks are considered to be particularly well suited for forecasting dynamical systems. In this setting, the echo-state networks or reservoir computers (RCs) have emerged for their simplicity and computational complexity advantages. Instead of a fully trained network, an RC trains only readout weights by a simple, efficient least squares method. What is perhaps quite surprising is that nonetheless, an RC succeeds in making high quality forecasts, competitively with more intensively trained methods, even if not the leader. There remains an unanswered question as to why and how an RC works at all despite randomly selected weights. To this end, this work analyzes a further simplified RC, where the internal activation function is an identity function. Our simplification is not presented for the sake of tuning or improving an RC, but rather for the sake of analysis of what we take to be the surprise being not that it does not work better, but that such random methods work at all. We explicitly connect the RC with linear activation and linear readout to well developed time-series literature on vector autoregressive (VAR) averages that includes theorems on representability through the Wold theorem, which already performs reasonably for short-term forecasts. In the case of a linear activation and now popular quadratic readout RC, we explicitly connect to a nonlinear VAR, which performs quite well. Furthermore, we associate this paradigm to the now widely popular dynamic mode decomposition; thus, these three are in a sense different faces of the same thing. We illustrate our observations in terms of popular benchmark examples including Mackey-Glass differential delay equations and the Lorenz63 system.

Volume 31 1
Pages \n 013108\n
DOI 10.1063/5.0024890
Language English
Journal Chaos

Full Text