Synergetic Learning Systems: Concept, Architecture, and Algorithms
NNoname manuscript No. (will be inserted by the editor)
Synergetic Learning Systems: Concept, Architecture,and Algorithms
Ping Guo · Qian Yin
Received: date / Accepted: date
Abstract
Drawing on the idea that brain development is a Darwinian pro-cess of “evolution + selection” and the idea that the current state is a localequilibrium state of many bodies with self-organization and evolution pro-cesses driven by the temperature and gravity in our universe, in this work,we describe an artificial intelligence system called the “Synergetic LearningSystems”. The system is composed of two or more subsystems (models, agentsor virtual bodies), and it is an open complex giant system. Inspired by nat-ural intelligence, the system achieves intelligent information processing anddecision-making in a given environment through cooperative/competitive syn-ergetic learning. The intelligence evolved by the natural law of “it is not thestrongest of the species that survives, but the one most responsive to change,”while an artificial intelligence system should adopt the law of “human selec-tion” in the evolution process. Therefore, we expect that the proposed systemarchitecture can also be adapted in human-machine synergy or multi-agentsynergetic systems. It is also expected that under our design criteria, the pro-posed system will eventually achieve artificial general intelligence through longterm coevolution.
Keywords
Artificial Intelligence Systems · Synergetic Learning · Self-Organization · Synergetic Evolution · Complex Network Structure
Ping GuoImage Processing & Pattern Recognition Lab., Beijing Normal University, Beijing 100875,ChinaE-mail: [email protected] YinImage Processing & Pattern Recognition Lab., Beijing Normal University, Beijing 100875,ChinaE-mail: [email protected] a r X i v : . [ c s . N E ] J un P. Guo & Q. Yin
During the development of artificial intelligence (AI), scholars of differentdisciplines have refined their understanding of artificial intelligence, put for-ward different viewpoints, and produced different academic schools of thought.There are three major schools of thought that have great influence on artificialintelligence research: symbolism, connectionism, and actionism. – Symbolism, also known as logisticism, psychologism, or computerism, isbased on the assumptions of physical symbology (i.e., symbolic operatingsystems) and the principle of limited rationality. – The main principle of connectionism, also known as bionicism or physiolo-gism, is the connection mechanism and learning algorithm between neuralnetworks. – Actionism, also known as the theory of evolutionism or cyberneticism, isbased on cybernetics and perceptual-action-based control systems.After two troughs, the development of artificial intelligence benefited fromthe persistence and continuous efforts of Hinton et al. In 2006, the concept andalgorithm of deep belief networks were proposed, which rekindled the passionfor neural networks in the field of artificial intelligence. In 2012, Hinton and hisstudent Alex Krizhevsky designed AlexNet based on a Convolutional NeuralNetwork (CNN) and took advantage of the powerful parallel computing powerof GPUs in an image competition, which is the forefront of computer intelli-gent image recognition. The test error rate was 15.3%, which was much lowerthan the second test error rate of 26.2%. In 2015, LeCun, Bengio & Hintonjointly published a deep study review paper [35] in the journal Nature, andneural networks experience a strong resurgence under the name `‘deep learn-ing`‘. Thanks to the explosive growth of interconnected data, the significantincrease in computing power and the development and maturity of deep learn-ing algorithms, we have ushered in the third wave of development since theemergence of artificial intelligence.With the evolution of time and the expansion of research, deep learning hasencountered bottlenecks, and the theory of artificial intelligence has stagnated.Gary Marcus, a professor of psychology at New York University, poured coldwater on deep learning. He criticized various problems in deep learning, asdetailed in the literature [37].Academician Xu Zongben, the professor at Xi-An Jiao Tong University,stated that [51]:“
It is difficult to design topologies, to anticipate effects, and to explainmechanisms with deep learning. There is no solid mathematical theory to sup-port solving these three problems. Solving these problems is the main focus ofdeep learning for future research. ”Recently, M. Mitchell Waldrop published a review article in the Proceed-ings of the National Academy of Sciences (PNAS) entitled [47], “News Fea-tures: What are the Limitations of Deep Learning?” In this PNAS featurenews, Waldrop briefly describes the history of deep learning and believes that
LS: Concept, Architecture, and Algorithms 3 all the glorious benefits of computing power have made artificial intelligenceflourish today. However, deep learning has many limitations, including vulner-ability to counterattack, low learning efficiency, application instability, lack ofcommon sense, and interpretability. From a computability point of view, moreand more people in the field of artificial intelligence research believe that tosolve the shortcomings of deep learning, some fundamental new concepts andideas are needed.Yann LeCun gave the speech entitled as “Learning World Models: the NextStep towards AI” at the opening ceremony of IJCAI-2018 [34]. LeCun said thatthe future of the artificial intelligence revolution will be neither supervisedlearning, nor will it be pure reinforcement learning, but rather a world modelwith common sense reasoning and predictive ability. Intuitively, the worldmodel contains general background knowledge about how the world works, theability to predict the consequences of actions, and the ability to have long-termplanning and reasoning. Yann LeCun summarized three learning paradigms,namely, reinforcement learning, supervised learning and self-supervised learn-ing, and believes that self-supervised learning (formerly known as predictivelearning) is a potential research direction to realize the world model. At theend of the lecture, Yann LeCun summarized the mutual drive and promotionbetween technology and science, such as telescopes and optics, steam enginesand thermodynamics, or computers and computer science. He also raised sev-eral questions:
1. What is the equivalent of “thermodynamics” in intelligence?2. Are there underlying principles behind artificial intelligence and naturalintelligence?3. Are there simple principles behind learning?4. Is the brain a collection of a large number of “hacks” that evolved?
As there are many schools of thought regarding basic research on artificialintelligence, it is difficult to construct a unified theory to solve these questions.However, we believe that computational intelligence (CI) is a new stage in ar-tificial intelligence development of artificial intelligence. CI, a nature-inspiredintelligence, and a paradigm that has the potential to solve most problems. CIis initialized from the phenomena and laws of physics, chemistry, mathemat-ics, biology, psychology, physiology, neuroscience and computer science, andintegrates the three artificial intelligence schools of thought to form an organicwhole. The system formed by the integration of multiple disciplines and tech-nologies can realize complementary advantages, which will be more effectivethan a single discipline or technology and can achieve greater results. There-fore, in order to solve the shortcomings of deep learning, we propose the useof the cognitive neuroscience mechanism and mathematical tools in machinelearning to develop a new generation of artificial intelligence. Based on compu-tational intelligence, we should develop a Synergetic Learning Systems (SLS)[13] to establish a theoretical foundation for intelligent “thermodynamics.”For a more detailed analysis of the current status of artificial intelligencebasic research and development trends, please refer to the literature [50].
P. Guo & Q. Yin
The structure of Part I (this paper) is as follows: the methodology for devel-oping a Synergetic Learning Systems is given in section 2, section 3 introducesthe basic concept of the Synergetic Learning Systems, section 4 describes thearchitecture of the Synergetic Learning Systems, section 5 lists the relevantoptimization algorithms, and the last section summarizes and analyzes thefuture direction of the Synergetic Learning Systems.
We know that a methodology is a theoretical system aimed at solving prob-lems, usually involving the elaboration of problem stages, tasks, tools, andmethodological techniques.We believe that the artificial intelligence system needs to be analyzed andstudied systematically at multiple scales, levels, and perspectives. The so-called multi-scale refers to the study of artificial intelligence systems from themacro scale, micro-scale, and mesoscopic scale. Macroscopic, microscopic andmesoscopic are are our “Three perspectives” [23].In the theme report for the Second China System Science Conference, “Howdoes the brain work in the whole?” Academician Guo Aike, a neuroscience andbiophysicists in China, mentioned: “The brain function linkage map should bedrawn from the macro-brain scale, the mesoscopic neural network scale, andthe micro synapse scale, and more scales can be considered.” [11]We can also draw on the research results of basic disciplines, such as theevolution of the universe on a macroscopic scale. The main physical parametersfor the evolution of the universe are temperature and gravity. According to theBig Bang Theory, the initial temperature was very high, and the current galaxystructure evolved due to cooling and gravity. Therefore, the temperature of asystem is a very important basic quantity. In the evolution of the cooperativelearning system, the simulated annealing algorithm can be considered in orderto solve the combinatorial optimization problem.To analyze and study the artificial intelligence system systematically is toadopt the methodology of system science and use the viewpoint of the systemto understand and grasp the essence and movement development of artificialintelligence. Our proposed Synergetic Learning Systems is inspired by naturalintelligence and is based on the cognitive neuroscience mechanism and compu-tational intelligence, integrating multidisciplinary knowledge and adopting themode of complex system thinking. The Synergetic Learning Systems is basedon the concept of systems science. The use of the term “collaboration” is in-fluenced by the idea of “synergetic learning” [24][25]. However, SLS includescooperative synergetic learning and competitive synergetic learning amongvarious elements, as well as concepts and methods, such as system evolutionand evolutionary game.In the field of neural network research, Hinton’s Boltzmann machine, theHelmholtz machine, and the restricted Boltzmann machine draw on the con-cepts and research methods of statistical physics. Therefore, we also need to
LS: Concept, Architecture, and Algorithms 5 look at the problem by using physical thinking during the research process.Statistical mechanics looks at the essence through phenomena, in which thephenomena is the observed data and essence is the law. In statistical mechan-ics, probability distribution, mathematical models and other tools are used tosystematically quantify and analyze the general laws and randomness behindobserved data.For a more detailed discussion, please refer to [23].
The Synergetic Learning Systems we proposed is an information processingsystem, which is equivalent to an intelligent “thermodynamic system.” As weknow, neural networks process information to achieve intelligent informationprocessing and decision making in a given environment. The rule of “naturalselection, survival of the fittest” followed by nature should adopt the “humanselection” rule when the artificial intelligence system evolves. We believe thatthis law is the principle of free energy. Nature likes to find a physical systemwith minimum free energy, so free energy can also be utilized as an objectivefunction of system evolution.The concept of free energy comes from statistical physics. It refers to thepart of the system that can be converted into external work in a certain ther-modynamic process. In a particular thermodynamic process, the “useful en-ergy” of the system’s external output can be divided into Helmholtz free energyand Gibbs free energy.In NIPS’93 , Hinton et al. [27] established relationship among the auto-encoder and minimum description length (MDL) principle with the Helmholtzfree energy, and also transformed the auto-encoder into a restricted Boltzmannmachine. Hinton borrowed concepts from statistical physics and explained it asthe deep belief network. Based on the viewpoint of statistical physics and therelationship among free energy, internal energy and entropy in the canonicalensemble, the interpretability problem of the model can be solved. In statisticalphysics, ensembles represent a collection of a large number of possible states ofa system under certain conditions. In the canonical ensemble, the relationshipamong free energy F , internal energy E , entropy S , and temperature T of thestate function is F = E − T S , and the relationship between the free energyand the partition function Z is: F = k B T ln Z where k B is the Boltzmann constant. Entropy is a linear combination of freeenergy F , temperature T and average internal energy (cid:104) E (cid:105) , S = ( (cid:104) E (cid:105) − F ) /T .The partition function is Z = (cid:80) i Ω ( E i ) e − βE i , where Ω ( E ) is the level ofdegeneracy and β = 1 / ( k B T ). In a canonical ensemble, the probability distri-bution function is the Boltzmann distribution, p i = Z exp − βE i . Therefore, aslong as the free energy of the ensembles is defined and the combined network P. Guo & Q. Yin model is optimized by the principle of least action, the desired neural networksystems can be obtained.The energy-based learning algorithm [36] should also be derived from sta-tistical physics. Hinton also made the neuroscientist Karl Friston accept theidea that the best way to explore the brain is to think of it as a Bayesian prob-ability machine. In 2010, Friston published a paper titled, “The free-energyprinciple: a unified brain theory?” in Nature Reviews Neuroscience, explainingthe brain’s operating mechanism using the principle of free energy [8]. Fromthe work of Hinton et al. and Friston, we are convinced that the study ofthe “intelligent thermodynamics” system based on the principle of free en-ergy undoubtedly contributed to the inspiration and success of the SynergeticLearning Systems theory.To break through those dilemmas of topology design of poor deep learningnetwork structure, difficulty of predicting the effects and explainability of thenetwork mechanism, we propose to build a Synergetic Learning Systems toestablish a “grand unified theory” of intelligence.Drawing on system theory to study SLS, the most fundamental Syner-getic Learning Systems we designed has two subsystems (or models): the sys-tem reduction model (discriminative model) and the system evolution model(generative model). The evolution of the system is described by a differentialdynamic system.At the IJCAI 2018, one of Yann LeCun’s questions was, “is there a simplerule behind learning?” We think there should be a simple rule. It is well knownthat there are simple and elegant principle in physics, which is the principleof least action .A mechanical system, using the result of variation of the Hamiltonian bythe principle of least action, can derive the Lagrangian equation describing themechanical system. If we define the total effect of the system as equal to thesum of the actions of the gravitational field and the material field, Einstein’sgeneral relativity equation can be derived also according to the principle ofleast action.Our proposed Synergetic Learning Systems is an information processingsystem based on the principle of free energy. Therefore, we assume that inan SLS, the amount of action in the system is equal to the free energy. Theprinciple of free energy in the SLS is, in particular, equivalent to the principle ofleast action, and free energy is equivalent to the Hamiltonian of the mechanicalsystem. Therefore, our proposition is:Free energy == Hamiltonian;Principle of free energy == principle of least action.Therefore, for a given environment (data), as long as the “Hamiltonian”of the neural network system is defined, the self-organization and evolutionof the neural network structure can be systematically studied through multi-view and multi-scale dynamics equations. This gives us an important conceptalso, that is, principle of least action is the first principle for artificialintelligence.
LS: Concept, Architecture, and Algorithms 7
The dynamics of the SLS should be described by differential dynamic equa-tions. What kind of equation is this differential dynamic equation? We knowthat in the field of chemical research, the “free energy” of a one-componentsystem can be represented by a variational form as a restricted evolutionequation:[48] ∂φ∂t = − δ F δφ (1)“Free energy” F is given by: F = (cid:90) ∞−∞ (cid:34) D (cid:18) ∂φ∂x (cid:19) − V ( φ ) (cid:35) d x (2)In the above formula, V ( φ ) is the chemical potential and R ( φ ) = d V ( φ )) / d φ .Therefore, we have ∂φ∂t = D ∂ φ∂ x + R ( φ ) . (3)If generalizing the equation to a multi-component systems, we can obtaina variation of the “free energy” F : ∂ψ∂t = ∇ · ( D ( Ψ ) ∇ Ψ ) + ∇ ( f ( Ψ )) + g ( Ψ ) (4)in which ψ ( x, θ, t ) = { φ , φ , · · · , φ n } , D ( ψ ) is the diffusion matrix, and ∇ ψ = (cid:16) ∂ψ∂x e + ∂ψ∂x e + · · · · · · + ∂ψ∂x n e n (cid:17) . ∇ f ( ψ ) is called convection vectorand g ( ψ ) is reaction vector[52].Therefore, the free energy of the systems is equal to the amount of action,and the variation algorithm is applied to action according to the principle ofleast action (the principle of free energy), and the reaction-diffusion equationis derived! Therefore, the dynamics of the SLS is described by the reaction-diffusion equation. In statistical physics, the dissipative systems theory is atheoretical description of the self-organization of non-equilibrium systems, andthe reaction-diffusion equation is utilized to model this systems also [49].Below we describe the architecture of the Synergetic Learning Systems. The unity of structure and function is one of the basic concepts in biology. Thebrain has a complex neural network structure. Therefore, a unified architectureis very important, and it is unified with the function of the system.Aristotle, a famous ancient Greek philosopher, proposed that “the wholeis greater than the sum of the parts,” which is an ancient, simple overall viewand a basic principle of modern systems theory. In accordance with this basic
P. Guo & Q. Yin principle, we have designed a SLS that consists of two or more subsystems.The SLS should not be a simple isolated system. It contains at least twosubsystems, in this way the performance of the systems may demonstrate thethe whole is greater than the sum the parts, otherwise, only one part summedstill is one part. It is well known that the brain is an open complex giantsystem, if we intend to simulate the brain structures with neural networks,the SLS should also be an open complex giant system.According to Dr. Hsue-Shen Tsien’s category method [45], if there are manykinds of subsystems and a hierarchical structure in a systems, the relationshipbetween them is very complicated, thus giving rise to a complex giant system.If this system is also open, it is called an open complex giant system. Opennessat here refers to the exchange of energy, information or matter between thesystems and the outside world. To be more precise: 1. the systems and itssubsystems have various exchanges of information with the outside world; 2.each subsystems in the systems acquires knowledge through learning.Academician Guo Aike elaborated on the working principle of the brainand the roots of his intelligence: “How does the human brain work as a whole?‘The Tao produced One, One produced Two, Two produced Three, Three isAll things.’ My initial understanding is the result of a multi-module synergeticoperation; I believe that the function of the brain is the result of the collab-oration of thousands of subsystems with different specialized skills, which isthe result of the entanglement combination of millions of years of evolution”[11]. Therefore, referring to the neurocognitive mechanism, the overall work-ing state of the SLS is also the result of the coordinated operation of multiplesubsystems (multi-modules).The visual organ of
Drosophila consists of more than 750 monoculars.William Bialek, a theoretical physicist at Princeton University, has shownthat these eyes work together to create a visual system that enables highlyaccurate calculations [26]. This systems illustrates the synergy of individualsin a complex systems, and it is one of the biological basis for our proposedSLS.Figure 1 is a schematic diagram of the architecture of our Synergetic Learn-ing Systems. In this SLS, subsystems can have multiple structures and layers.The system is flexible and scalable and has complex interrelationships betweensubsystems.4.1 Multi-Agent SystemsIf each subsystem is a peer-to-peer model, and each subsystem is an agent,the Synergetic Learning Systems will be a Multi-Agent Systems (MAS).A MAS is a collection of multiple agents. Its goal is to transform large,complex systems into small, coordinated, and manageable systems that caneasily communicate with each other. Therefore, we can understand specificexamples of the application of the divide-and-conquer strategy.
LS: Concept, Architecture, and Algorithms 9
Fig. 1
Schematic diagram of the architecture of our Synergetic Learning Systems.
The MAS is a coordination system, and each agent solves large-scale com-plex problems by coordinating with each other. The MAS is also an integratedsystems, which uses information integration technology to integrate the infor-mation from each subsystem to complete the integration of complex systems.In a MAS, each agent communicates and coordinates with each other andsolves problems in a parallel way, thereby this will effectively improve its abil-ity to solve complex problems. Multi-Agent Systems are suitable for complex,open distributed systems. They deal with the task through the cooperation ofthe agent. The key to realizing the MAS is the communication and coordina-tion between these agents, that is, the synergetics. After the data are given,the processing of building the MAS is the processing of synergetic learning.We know that data is the manifestation and information carrier. Therefore,a specific SLS relies on a given data set, that is, it is a data-driven modelingprocess. As the systems evolves, data-driven single-wheel drive will evolve intothe two-wheel drive comprised of data and models.We believe that the MAS and swarm intelligence are highly similar, butthe scope of swarm intelligence may be wider than that of the MAS. The SLSfocuses on how each agent works with others. Each subsystem can be a complexneural network system, and much more attention to systems’s integrity is paidcompared with that of the MAS or swarm intelligence systems.One of special examples is that if a hybrid expert systems architectureis used, the gate network can be used to collaborate the opinions of variousexpert networks. The gate network can be designed to utilize simple votingor weighted voting mechanics. In addition, the gate network can be connected to various expert networks to coordinate the input and output of the expertnetworks according to given task, which is the servo mechanism in the syner-getic learning. The gate network controls the various expert networks, whenfrom viewpoint of the stacked generalization, we can regard the gate networkmodule as a meta learner.4.2 Two-body systemBased on reductionism, a complex systems consists of multiple simple systems.We should go from simple to complex when tackling problems. The nonliv-ing system usually obeys the second law of thermodynamics, as a system al-ways spontaneously tends toward equilibrium and disorder, and the entropy ofthe system achieves a maximum value eventually. The system spontaneouslychanges from order to disorder, while disorder does not spontaneously changeto order, which is due to the irreversibility of the system and the stability of theequilibrium state. However, the life system is the opposite. Biological evolutionand social development are always more orderly: from simple to complex andfrom low to high order. Such systems are capable of spontaneously formingan orderly, stable structure. Life evolves from a simple structure to a complexform also, therefore, we can start to construct a system from a system withsimple structures and gradually evolve into complex systems. The two-bodysystem is the most fundamental SLS. In the AI field, there are a lot of exam-ples about two subsystems. On the other hand, when dealing with many bodyproblems in statistical physics, we may take mean field approximation. Withthis approximation, the many body problem can be simplified into the two-body problem. Although the degree of approximation is related to a specificproblem, it is a proven method of solving complicated problems effectively.
Any one systems can be divided into two subsystems. In the AI field there aremany dual systems, for example, computer graphics, in which the GPUs arewidely used, are dual counterpart of images. A Chinese to English translationsystem has a dual translation system for English to Chinese. These two systemscan be considered as subsystems to form a large systems, which can be viewedas dual learning / Synergetic Learning Systems. In the probabilistic statisticalmodel, the Generative Adversarial Network (GAN) and the Variational Auto-Encoder (VAE) can be considered as Synergetic Learning Systems with twosubsystems also.
The simplest two-models system is the Autoencoder. Here we regard the en-coder is the system A, the decoder is the system B, and the two parts arecombined into a simple SLS.
LS: Concept, Architecture, and Algorithms 11
In this simple SLS, the cooperative manner is sequential, and the inputof the decoder depends on the output of the encoder. We can interpret thatthese two subsystems are enslaved by the loss function. Minimizing the lossfunction (reconstruction error) is one way to achieve synergetic learning.The restricted Boltzmann machine can also be considered a simple SLS. Adetailed discussion will take place in another work [14], the part II of the SLS isprimarily concerned with the interpretability of neural network systems basedon statistical physics. For the marginal distribution of the joint distribution p ( v, h ), it is understood that p ( v ) is a subsystem and p ( h ) is another subsystem.In fact, many energy-based models are called Boltzmann machines in the past.The original Boltzmann machine was composed of two types of models: modelswith and without latent variables. What is now called the Boltzmann machineis a model with latent variables [9]. Under the framework of the free-energy principle, the SLS can be trained withenergy-based learning algorithms. The state of a physical system can be stud-ied when “action” is defined, and the maximum / minimum value is obtainedby the variational method based on the principle of least action [10]. In theSLS, we need to define the free energy of the systems. After adopting statisticalphysics thinking, the mathematical tool to describe the model is probabilityand statistics. Hopfield neural networks were early energy-based models [28][29]. In an energy-based learning model, negative variational free energy isalso known as the Evidence Low BOund (ELBO)[44][7][1]. To the estimatethe probability density function, we can adopt the Expectation-Maximizationalgorithm [9], the Markov Chain Monte Carlo (MCMC) algorithm [3]and thevariational inference algorithm [31][4].Here we give some examples in the equilibrium state (thermal equilibrium):5.1 Variational inference algorithmIn the paper of Yann LeCun et. al. titled “A Tutorial on Energy-Based Learn-ing”, the variational free energy is defined as follows[36]: L nll ( W, Y i , X i ) = E ( W, Y i , X i ) + F β ( W, Y , X i )Where F is the free erengy of the ensemble { E ( W, y, X i ) , y ∈ Y} : F β ( W, Y , X i ) = 1 γ log (cid:18)(cid:90) y ∈Y exp (cid:0) − βE ( W, y, X i ) (cid:1)(cid:19) In the paper titled “Energy-based Generative Adversarial Network” [53],discriminator is described as an energy function (negative evaluation func-tion).That is, the smaller the function is, the truer the data. The auto-encoder
AE is used as a discriminator (energy function). The energy function is definedas the error function of the discriminator: L D ( x, z ) = D ( x ) + max (0 , m − D ( G ( z )))According to the Boltzmann distribution, p i ( Θ ) = 1 /Ze − βE i = 1 /Ze − βL D and the distribution function Z ( Θ ) = (cid:80) i L D ( x i , z i , Θ ). The free energy canbe expressed as follows: F = k B T ln Z ( Θ )From these equations, we can see that you can calculate the free energyjust by finding the partition function. Bengio et al. turned the problem into anestimating a probability distribution problem[32][33]. Therefore, the key issueof the problem is to estimate the probability density distribution function, andone of the algorithms is variational inference algorithm. Some introductionabout the variational inference algorithm can be found at literatures [44] [1].In order to utilize variational inference (variational Bayes) algorithm tosolve the SLS learning problem, we need to define a certain environment, inother words, to assume some conditions. Suppose we design a form of a com-plete Bayesian model, in which all parameters are given in the prior distribu-tion. The model has both parameters and potential variables. when Z is usedto represent the set of all potential variables and parameters, we use X to rep-resent the set of all the observed variables. For example, we might have a setof N independent, identically distributed (i.i.d) data, where X = { x , . . . , x N } and Z = { z , . . . , z N } .One of our models represents the joint distribution p ( X, Z ), and our goalis to find an approximation of the posterior distribution p ( Z | X ) and modelevidence p ( X ).In general, the form of posterior probability is very complicated and dif-ficult to be obtained, so we hope to approximate p ( Z | X ) with a relativelysimple and easily understood model q ( Z | X ), namely, p ( Z | X ) ≈ q ( Z | X ). An-other model is described by q ( X, Z ), and q ( Z ) = (cid:82) q ( X, Z )d x .Mean filed approximation (Factorized distributions): The local interactionbetween individuals in the system can produce a relatively stable behaviorat the macro level, therefore, we can make the posterior is independence hy-potheses. That is, ∀ i, p ( Z | X ) = p ( Z i | X ) p ( Z − i | X ) ,q ( Z ) = M (cid:89) i =1 q i ( Z i )Since logarithmic evidence, log p ( X ), is fixed by the corresponding q , inorder to minimize the Kullback-Leible (KL) divergence, only L ( q ) should be LS: Concept, Architecture, and Algorithms 13 maximized. By selecting an appropriate q , L ( q ) is easily calculated and eval-uated. In this way, the approximate analytical expression of the posterior p ( Z | X ) and the lower bound of the log evidence can be obtained, which isalso called the variational free energy: L ( Q ) = (cid:88) z q ( Z | X ) log p ( Z, X ) − (cid:88) z q ( Z | X ) log q ( Z | X )= E q [log p ( Z, X )] + H ( q ) . (5)For this equation, the first term on the right-hand side is defined as theenergy, and the second term is the Shannon entropy. A more detailed discussionof problem solving can be found in chapter 10 of Bishop’s book [3].5.2 Approximate Synergetic Learning AlgorithmsAs discussed above, we can consider minimizing a Helmholtz Free Energy, thisis equivalent to minimizing the expected log likelihood, under the modelmin E q ( z ) [log p ( x )] . Because the ELBO is equivalent to negative variational free energy, wemaximize the ELBO is the same with minimizing variational free energy. Ofcause, we can also start by just minimizing the KL divergence between themodels. It is notice that the SLS we discussed above is consist of two sub-systems (models, or agents), one model is expressed with p ( x , z ), and theother is described with q ( x , z ). Therefore, to implement synergetic learningfor this two-models SLS, it results to following optimization problem.min Θ KL [ q ( x , z ) (cid:107) p ( x , z )] . (6)Where Θ is a parameter group we try to learn. In most cases this isintractable, but if we sophisticated design the proper models, this will betractable. The key issue is that we can use gradient descent optimization orpseudo-inverse learning (PIL) algorithm to train the neural network model.Consequently, we developed the approximate synergetic learning (ASL) al-gorithm, which is based on our previous work [16] [21] [18], to tackle thecomplicated variational inference computation problems.In our ASL algorithm, the models are designed as follows: p ( x ) is a Gaussian mixture distribution, and q ( x ) is a nonparametric kernelestimation. p ( x ) = K (cid:88) k =1 α k N ( x | µ k , Σ k ) . (7) q ( x ) = K ( x , h ) , (8) where h is a kernel parameter. KL [ q ( x , z ) (cid:107) p ( x , z )] = (cid:90) q ( x , z ) ln q ( x , z ) p ( x , z ) dxdz (9)= (cid:90) q ( z | x ) q ( x ) ln q ( z | x ) q ( x ) p ( x | z ) p ( z ) dxdz Let q ( z | x ) = α k N ( x | µ k , Σ k ) p ( x , Θ ) p ( x , z ) = p ( x , Θ ) = K (cid:88) k =1 α k N ( x | µ k , Σ k ) , (10)With this definition, q ( z | x ) is posterior probability estimation in E − step of EM algorithm. This means (cid:90) q ( z | x ) q ( x ) ln p ( x , Θ ) d x d z = (cid:90) q ( x ) ln p ( x , Θ ) d x (cid:90) q ( z | x ) q ( x ) ln q ( x ) d x d z = (cid:90) q ( x ) ln q ( x ) d x (Normalized in hidden space, (cid:82) q ( z | x ) d z = 1.)Then KL [ q | p ] = − (cid:90) q ( x ) ln p ( x , Θ ) d x (11)+ (cid:90) q ( x ) ln q ( x ) d x The details of this work are discussed in Ref. [16].With data set D = { x i } Ni =1 , we intend to cluster the data into severalclusters.Now we use Gaussian kernel density for q ( x ) , q ( x ) = 1 N N (cid:88) i =1 N ( x | x i , h I d ) . (12)The hyper-parameter h play the key rule in cluster number selection prob-lem, if it estimated with gradient descent approach, it will approach to zeroeventually. With minimizing KL divergence (Free energy ), we obtain a newequation for estimating smoothing parameter h . LS: Concept, Architecture, and Algorithms 15
Let g ( x , Θ ) = − ln p ( x , Θ ) , (13)and use Taylor expansion for g ( x , Θ ) at x = x i . When h is small, we can omitthe higher order terms and only keep the first-order term. h = N (cid:80) Nj =1 [ q ( x j ) −
1] ln q ( x j ) J r ( x i , Θ ) , (14)where J r ( x i , Θ ) = 12 N N (cid:88) i =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) K (cid:88) k =1 q ( z k | x i )( x i − m k ) T Σ − k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . (15)Now z k = k , stands for Gaussian component label. The details of this work are discussed in Ref. [21].Given data set D = { x i , z i } Ni =1 , for supervised learning, z i can be outputlabel, or samples drawn from regressed function.Now the joint distribution q ( x , z ) in this work is designed as q ( x , z ) = 12 N (cid:88) x i , z i ∈ D K h x ( x − x i ) K h z ( z − z i ) . (16)where the kernel density function used the most is Gaussian kernel, K h ( r ) = N ( r | , h I d ) = 12 πh d/ exp (cid:26) − (cid:107) r (cid:107) h (cid:27) . (17)This model is under a very strong assumption ( x , z are statistically inde-pendent ).When error is Gaussian [2], t k = h k ( x ) + (cid:15) k (18) p ( (cid:15) k ) = 12 πσ / exp (cid:18) − (cid:15) k σ (cid:19) . (19)In our work [21], the following designed model is considered: p ( z | x , Θ ) = p ( z | f ( x , Θ )) (20)where f ( x , Θ ) is a function of input variable x and parameter Θ .Also, we design p ( z | f ( x , Θ )) = N ( z | g ( x , Θ ) , σ I d ), and g ( x , Θ ) is a neural network map-ping function, for example, single hidden layer feedforward neural networks,or deep neural networks, or any other network architecture. z = g ( x , Θ ) + (cid:15) (cid:107) (cid:15) (cid:107) = (cid:107) z − g ( x , Θ ) (cid:107) . (21)In our method, the Taylor expansion approximation is used, this methodcan be considered as “Local quadratic approximation.” (Eq. (17) in Ref. [21]).We derived the loss function as follows: loss ( Θ, h ) = 12
N σ N (cid:88) i =1 (cid:40) (cid:107) z i − g ( x i , Θ ) (cid:107) + h (cid:2) (cid:107) g (cid:48) ( x i , Θ ) (cid:107) (cid:3) (22) − h (cid:107) [ z i − g ( x i , Θ )] g (cid:48)(cid:48) ( x i , Θ ) (cid:107) (cid:41) . The first term is the traditional sum-square-error function, the second termis Jacobin regularization term, and the third term is Hessian regularizationterm. h is the regularization parameter, which can be estimated with followingformula (Eq.(50) in [21], without considering Hessian term.) Also assume thatprior p ( x ) is a uniformly distributed function and regard it as h independent, h ≈ d [1 + ( d − ] (cid:80) Ni =1 (cid:107) z i − g ( x i , Θ ) (cid:107) (cid:80) Ni =1 (cid:107) g (cid:48) ( x i , Θ ) (cid:107) . (23)If we omit the second order derivative of Eq. (22), the loss function isreduced to the first-order Tikhonov regularizer.With the generalized linear network assumption only for Jacobin regular-ization term, (cid:80) Ni =1 (cid:107) g (cid:48) ( x i , W ) (cid:107) = N (cid:80) Mj =1 w j , weight decay regularizer isobtained.When we just simple let z is reconstructed input vector x , this will reduceinto the contractive auto-encoders loss function[30][42][41].5.3 Numerical Method for Reaction-Diffusion EquationFor non-equilibrium statistics, such as dissipative structures, we used thereaction-diffusion equation to study the dynamic process of the SLS. Whenstudying a differential dynamical system, we are concerned with the properties(mainly global properties) of the system and its changes during perturbation.In a cooperative learning systems, the reaction-diffusion equation can also beused to describe the evolution dynamics of the system. If the SLS is designedas a differential dynamical system and attention is paid to the attractor sub-network [43][5], the reaction-diffusion equation can also be used to describe it.Therefore, the mathematical basis of artificial intelligence should also includeordinary and partial differential equations. LS: Concept, Architecture, and Algorithms 17
Example: Turing’s reaction-diffusion equations [46].Turing’s reaction-diffusion equations is one of his revolutionary discoveriesin the field of natural science and is the mechanism for Pattern Formation[40]. ∂U∂t = D u ∇ U + f ( U, V ) ∂V∂t = D v ∇ V + g ( U, V ) . (24)It may be the reason that inspired by Turing’s discovery of reaction-diffusion equations, Prigogine proposed the theory of the Dissipative Systems,believing that “the energy exchange with the outside world is the fundamen-tal reason for making the system orderly (contrary to the principle of entropyincrease),” and founded a new discipline called non-equilibrium statistical me-chanics [38]. Explanation : Here, we interpret the two substances in Turing’s model astwo types of informations. The reaction of substances is similar to the processesof information fusion and production, and the diffusion of substances is equiv-alent to the process of information transmission. By modeling the SLS as aprocessing system of information, the so-called Information Granular process-ing systems, we will study the general systems of artificial intelligence withthis basic concept. However, the information is not only photons like “par-ticles”, but also has “wave-particle duality”. In our SLS theory, informationparticles are described by high-dimensional random variables, and the functionfor information waves is the density distribution function.
Conjecture : Turing’s reaction-diffusion equations can be applied to pat-tern generation, can we produce the topological structure of neural networkwith Turing’s equations? At here our conjecture is: there exist functions f ( U, V )and g ( U, V ), the neural network structure can be automatically generated byTuring’s equation. The U and V stand for two different graph networks, re-spectively. (More discussions about this conjecture will be further plannedwork in part III of the SLS.)An information wave is different from an electromagnetic wave, which isthe carrier of information, but the elliptic and parabolic equations in the math-ematical physics equation can be used to describe the process of informationtransmission. Methods to solve such equations depend on the complexity of theproblem. At present, most studies on partial differential equations in math-ematics use the finite element method [54] and the finite difference method[6] to obtain numerical solutions. In our earlier work, we used the heat dif-fusion equation to study the propagation of light beams in nonlinear media[12] [22] and the dynamics of interference filters [15]. The diffusion equationis a parabolic semi-linear partial differential equation and can also be used tostudy dispersive optical tomography [39].In practice, uncertainty problems usually are tackled with Bayesian variesmethods. If we transform the uncertainty problems into a deterministic systemlearning problem, we can use the stochastic gradient descent algorithm or a pseudoinverse learning algorithm to optimize the SLS[17] [19] [20]. In our2003 paper [21], we set one model as a parametric model and the other as anonparametric model based on the two models. After variational inference, asecond order approximation was adopted to turn it into a deterministic systemlearning (optimization) problem. The method is called approximate synergeticlearning algorithm as presented in subsection (5.2). This paper briefly introduces the SLS’s concept, architecture and algorithms.Its main goals are to build a grand unified framework of the world model andto explore the road toward “intelligent mechanics.”However, by constructing the SLS, can we achieve “intelligent mechanics”and develop grand unified theories about artificial intelligence? We alreadyknow the significance of building a world model, but why study the grandunified theories?The world model described by Yann LeCun is the world model of artifi-cial intelligence, but the world we live in is a physical world. The intellectualactivity of human beings belongs to the mental world. Therefore, the transi-tion space when constructing an intelligence world model is the Cyber-PhysicsModel (CPM), which means that one of the vital cornerstones of artificialintelligence is physics, and physicists like to unify these theories. In these the-ories, complex phenomena are described as a set of concepts, and mathematicalformulas that express these concepts can make very successful predictions.In physics, the grand unified theories, super symmetry, and the M the-ory are not only very beautiful thoughts but also deal with many problemsthat cannot be solved by standard models, thus attracting many theoreticalphysicists. However, no matter how wonderful they are, they will eventuallyrequire extensive experimental verification. We need to remember that a goodscientific theory must meet the following three conditions:1. It must be able to replicate all successful predictions of existing scientifictheories;2. It should be able to explain the latest experimental and observational datathat existing scientific theories cannot explain;3. And most importantly, it should also make predictions that can be tested.Maxwell’s equations, General Relativity, and the Standard Model all con-form to these three points.Therefore, as a theory, it must not only explain the present but also canpredict the future. Does our SLS can meet these three points? We believe thatthe SLS basically meets these three points in the artificial intelligence world.In physics, a theory needs to make predictions that can be tested, but inthe field of artificial intelligence, a theory ought to be subversive and inno-vative. The innovation behind our synergetic learning theory is based on theprinciple of least action, the variation of free energy, and the reaction-diffusion
LS: Concept, Architecture, and Algorithms 19 equation. These equations describe the process of systems evolution. Duringthe evolution of the systems, the systems is considered as a differential dy-namics systems, which is described mathematically by the reaction-diffusionequations. Therefore, the intelligent “thermodynamic system” can be seen asthe processing of information particles. The main innovation is that in the SLSthere are at least two subsystems, one subsystem is generated by the differen-tial dynamic system and the other is determined by the reduction model. Weused the differential dynamic system to present the generative model, and thereduction model is corresponds to the disentangle model.In systems science, a common phrase is “complex world, simple rules.” Oneof Yann LeCun’ s questions is whether or not there is a simple rule behindlearning. If we think that if there is such a simple rule, then what is it? Atpresent, no one has answered this question yet, but we did it ! We found thisrule: the principle of least action.Why the principle of least action is the mentioned simple rule? It is acommon sense that brain development is the Darwin process of “Evolution& Selection.” The evolution of the human brain is not only related to thebrain: it is also the sum of human evolutionary results and the coevolution ofEarth’s entire biological community. The study of bio-intelligence has neverbeen focused a single individual but, rather, on the evolution of all organismsin all populations in the history of the world, and it is a learning processwith the function of survival as the optimization goal. The objective functionof natural selection is driven by the probability of survival, and in nature,it always prefers the state with the least energy. The objective function inmachine learning is set by human beings, which is a choice by human beingsand conforms to their laws. Therefore, the principle of least action is the lawwe chose for the evolution of the artificial intelligence world model.The novelty of the SLS theory we propose is that most of the other meth-ods consider either cooperation or competition; we believe that cooperationand competition exist among groups in a systems that is harmonious and co-existing, and this systems is the opposites unity of contradictory. During theprocess of evolution, the relationship among groups is not static but fluctuatesfrom cooperation to competition and from competition to cooperation. In aSynergetic Learning Systems, competition is also a synergy.
In this work a method for addressing the challenge of difficult in AI researchfields is proposed. We believe that under a given environment (data, boundaryconditions), the solution to the differential equation can be predicted with adefined evolutionary path, and the final effect is predictable and controllable.However, the problem with uncertainty is that during the evolution of thesystems from simple toward complex, a phenomenon that needs to pay at-tention is emerging. The term “emergent phenomena,” as used by condensedmatter physicists, refers to the complex nature produced by the interaction between a large number of simple components. In life, the current phenomenonis the interaction between molecules and how the molecules combine to forma structure or perform a function. Living systems evolve, adapt and changethrough interactions or information exchanges with other systems. Biologi-cal systems have feedback loops that make them difficult to analyze usingstandard differential equations. We do not know how to solve this problem.Ramin Golestanian, director of the Max Planck Institute for Dynamics andSelf-Organization, said: “Physicists have studied many complex systems, butin terms of the number of complexity and degrees of freedom, life systemsbelong to a completely different category.” Therefore, currently the SLS isnot concerned to be a living systems, in fact it is an artificial intelligence sys-tems. How to integrate the emerging phenomenon into the differential dynamicequation during the evolution process is a future research direction.The further work will focus on exploring the difficult mechanism explana-tion problem and the interpretable neural network model problem. The furtherwork is aimed at addressing the challenges of topology design and explores thedesign of neural network topology.The second part (part II) of SLS is to explore the interpretable neuralnetwork model for the challenge of explainable mechanism of deep neural net-work model based on statistical mechanics [14]. One scheme of interpretabilityis to interpret information processing as a process in which information istransformed through a complex system. Based on the big bang theory, themulti-models SLS interpretation holds that the systems is in the ground stateat the beginning, driven by fluctuations and the force outside systems. Andthis system diffuses to the current state through long-term evolution. In gen-eral evolutionary computation, the goal of evolution is unknown. But in ourSLS, the goal of evolution is approach to minimum of system’s free energy. Theclustering problem (unsupervised learning) can be explained as the process ofsystem reduction. Through the “Maxwell’s Demon”, the information particlesgradually agglomerate from disorder to order state.The third part (part III) of SLS is to explore the topologic architecturedesign of neural network in view of the challenge of automatic machine learn-ing. Based on the theory of system self-organization, the theory and method ofnetwork structure automatic organization and evolution will be developed. Insystems science, the theory of system self-organization studies how a systemsautomatically changes from disorder to order state, or from low-level order tohigh-level order state under certain conditions, one example is the theory oflaser. From the thermodynamic point of view , “self-organization” refersto a systems through the exchange of material, energy and information withthe outside systems, the entropy of the system is constantly reduced and itsorder degree is improved. From the point of view of statistical mechan-ics , “self-organization” refers to the spontaneous migration of a system alongthe direction from the most probable state to lower probable state. From thepoint of view of evolutionism , “self-organization” refers to a process inwhich a system under the influence of “inheritance”, “variation” and “survival
LS: Concept, Architecture, and Algorithms 21 of the fittest”, constantly improves its organizational structure and operationmode, so as to improve its adaptability to the environment.
Acknowledgement
The research work described in this paper was fully supported by the NationalKey Research and Development Program of China (No. 2018AAA0100203).Prof. Ping Guo and Qian Yin are the authors to whom all correspondenceshould be addressed.
References
1. Beal, James, M.: Variational algorithms for approximate Bayesian inference (2003). PhDthesis, University College London.2. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Ox-ford, UK (1996)3. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer-Verlag New York(2006). Chapter 10: Approximate Inference4. Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: A review for statis-ticians. Journal of the American Statistical Association (518), 859–877 (2017).DOI 10.1080/01621459.2017.1285773. URL https://doi.org/10.1080/01621459.2017.1285773
5. Domnisoru, Cristina, ans Amina A., K., Tank, W., D.: Membrane potential dynamicsof grid cells. Nature , 199–204 (2013)6. Ferziger, H., J., Peri´c, M.: Finite Difference Methods. Springer, Berlin, Heidelberg(2002). In: Computational Methods for Fluid Dynamics.7. Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial In-telligence Review (2), 85–95 (2012). DOI 10.1007/s10462-011-9236-8. URL https://doi.org/10.1007/s10462-011-9236-8
8. Friston, K.: The free-energy principle: a unified brain theory? Nature Reviews Neuro-science (2), 127–138 (2010). DOI 10.1038/nrn2787. URL https://doi.org/10.1038/nrn2787
9. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)10. Gray, C., Karl, G., Novikov, V.: Progress in classical and quantum variational principles.Reports on Progress in Physics (2), 159 (2004)11. Guo, A.: How does the brain work on the whole. the second China Systems ScienceConference (2018). In Chinese12. Guo, P.: Numerical solution of gaussian beam propagation in nonlinear gradient refrac-tive media. Laser Technology (5) (1990). In Chinese13. Guo, P.: Synergetic learning systems (I): Concept, architecture, and algorithms.Preprint, researchgate.net (2019). DOI 10.13140/RG.2.2.13681.12644. URL https://doi.org/10.13140/RG.2.2.13681.12644 . The third China Systems Science Confer-ence (CSSC2019), Changsha, May 18-19, 2019. (Chinese version)14. Guo, P.: Synergetic learning systems (II): Interpretable neural network model withstatistical physics approach. Preprint, researchgate.net (2019). DOI 10.13140/RG.2.2.23969.66401. The Fifth National Statistical Physics & Complex Systems Conference(SPCSC 2019), Hefei, July 26-29, 201915. Guo, P., Awwal, A.A.S., Chen, C.L.P.: Dynamics of a coupled double-cavity opticalinterference filter. Journal of Optics (1), 167–174 (1999)16. Guo, P., Chen, C.L.P., Lyu, M.R.: Cluster number selection for a small set of samplesusing the bayesian ying-yang model. IEEE Trans. Neural Networks (3), 757–763(2002). DOI 10.1109/TNN.2002.1000144. URL https://doi.org/10.1109/TNN.2002.1000144
18. Guo, P., Jia, Y., Lyu, M.R.: A study of regularized gaussian classifier in high-dimensionsmall sample set case based on MDL principle with application to spectrum recognition.Pattern Recognit. (9), 2842–2854 (2008). DOI 10.1016/j.patcog.2008.02.004. URL https://doi.org/10.1016/j.patcog.2008.02.004
19. Guo, P., Lyu, M.R.: Pseudoinverse learning algorithm for feedforward neural net-works. In: N.E. Mastorakis (ed.) Advances in Neural Networks and Appli-cations, pp. 321–326. World Scientific and Engineering Society Press, Athens,Greece (2001). URL
20. Guo, P., Lyu, M.R.: A pseudoinverse learning algorithm for feedforward neural networkswith stacked generalization applications to software reliability growth data. Neurocom-puting , 101–121 (2004). DOI https://doi.org/10.1016/S0925-2312(03)00385-0. URL
21. Guo, P., Lyu, M.R., Chen, C.L.P.: Regularization parameter estimation for feedforwardneural networks. IEEE Trans. Systems, Man, and Cybernetics, Part B (1), 35–44(2003). DOI 10.1109/TSMCB.2003.808176. URL https://doi.org/10.1109/TSMCB.2003.808176
22. Guo, P., Sun, Y.G.: Gaussian beam propagation with nonlinear medium limiter. ActaOptica Sinica (12) (1990). In Chinese23. Guo, P., Zhao, B.: Methodology for building synergetic learning system. Preprint,researchget.net, (2019). DOI 10.13140/RG.2.2.10146.07368. The third China SystemsScience Conference (CSSC2019), Changsha, May 18-19, 2019 (Chinese version)24. Haken, H.: The mystery of nature. ISBN: 9787532736379 (2005-01)25. Haken, H.: Information and self-organization: Sichuan education publishing. ISBN:9787540853112 (2010-4)26. Haykin, S.O.: Neural Networks and Learning Machines, 3rd edn. Pearson Higher Ed(2011)27. Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and helmholtzfree energy. In: J.D. Cowan, G. Tesauro, J. Alspector (eds.) Advances in NeuralInformation Processing Systems 6, [7th NIPS Conference, Denver, Colorado, USA,1993], pp. 3–10. Morgan Kaufmann (1993). URL http://papers.nips.cc/paper/798-autoencoders-minimum-description-length-and-helmholtz-free-energy
28. Hopfield, J.J.: Neural networks and physical systems with emergent collective compu-tational abilities. Proceedings of the National Academy of Sciences (8), 2554–2558(1982). DOI 10.1073/pnas.79.8.2554. URL
29. Hopfield, J.J.: Neurons with graded response have collective computational proper-ties like those of two-state neurons. Proceedings of the National Academy of Sciences (10), 3088–3092 (1984). DOI 10.1073/pnas.81.10.3088. URL
30. Ian Goodfellow Yoshua Bengio, A.C.: Deep Learning. MIT Press (2019)31. Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variationalmethods for graphical models. In: M.I. Jordan (ed.) Learning in Graphical Models,
NATO ASI Series , vol. 89, pp. 105–161. Springer Netherlands (1998). DOI 10.1007/978-94-011-5014-9 \
5. URL https://doi.org/10.1007/978-94-011-5014-9_5
32. Kim, T., Bengio, Y.: Deep directed generative models with energy-based probabilityestimation. CoRR abs/1606.03439 (2016). URL http://arxiv.org/abs/1606.03439
33. Kumar, R., Goyal, A., Courville, A.C., Bengio, Y.: Maximum entropy generators forenergy-based models. CoRR abs/1901.08508 (2019). URL http://arxiv.org/abs/1901.08508
34. LeCun, Y.: Learning world models: the next step towards AI. Keynote, the 27th IJCAI(2018)35. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature , 436–444 (2015)36. Lecun, Y., Chopra, S., Hadsell, R., Ranzato, M., Huang, F.J.: A tutorial on energy-basedlearning. In: Predicting Structured Data. MIT Press (2006)LS: Concept, Architecture, and Algorithms 2337. Marcus, G.: Deep learning: A critical appraisal. arXiv:1801.00631 [cs.AI] (2018). URL http://arxiv.org/abs/1801.00631
38. Nikolis, G., Prigogine, I.: Self-Organization in Non-Equilibrium Systems. Wiley, NewYork (1977)39. Niu, H., Guo, P., Ji, L., Zhao, Q., Jiang, T.: Improving image quality of diffuse opti-cal tomography with a projection-error-based adaptive regularization method. OpticsExpress (17), 12423–34 (2008)40. Pearson, J.E.: Complex patterns in a simple system. Science (5118), 189 – 192(1993). URL http://dx.doi.org/10.1126/science.261.5118.189
41. Rifai, S., Mesnil, G., Vincent, P., Muller, X., Bengio, Y., Dauphin, Y.N., Glorot, X.:Higher order contractive auto-encoder. In: D. Gunopulos, T. Hofmann, D. Malerba,M. Vazirgiannis (eds.) Machine Learning and Knowledge Discovery in Databases -European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011,Proceedings, Part II,
Lecture Notes in Computer Science , vol. 6912, pp. 645–660.Springer (2011). DOI 10.1007/978-3-642-23783-6 41. URL https://doi.org/10.1007/978-3-642-23783-6_41
42. Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders:Explicit invariance during feature extraction. In: L. Getoor, T. Scheffer (eds.) Pro-ceedings of the 28th International Conference on Machine Learning, ICML 2011, Belle-vue, Washington, USA, June 28 - July 2, 2011, pp. 833–840. Omnipress (2011). URL https://icml.cc/2011/papers/455_icmlpaper.pdf
43. Rolls, E.T., Loh, M., Deco, G., Winterer, G.: Computational models of schizophreniaand dopamine modulation in the prefrontal cortex. Nature Reviews Neuroscience ,696–709 (2008)44. Sm´ıdl, V., Quinn, A.: The Variational Bayes Method in Signal Processing. Springer(2006)45. Tsien, H.S., Yu, J.Y., Dai, R.W.: A new field of science - an open complex giant systemand its methodology. Chinese Nature (1990). In Chines46. Turing, A.M.: The chemical basis of morphogenesis. Philosophical Transactions of theRoyal Society of London, Series B (641), 37–72 (1952). DOI https://doi.org/10.1098/rstb.1952.0012. URL https://doi.org/10.1098/rstb.1952.0012
47. Waldrop, M.M.: News feature: What are the limits of deep learning? Proceedings of theNational Academy of Sciences (4), 1074–1077 (2019)48. Wikipedia: Reaction diffusion system. wikipedia.org (2016). URL https://en.wikipedia.org/wiki/Reaction_diffusion_system
49. Willems, J.C.: Dissipative dynamical systems part I: General theory. Archive for Ra-tional Mechanics and Analysis (5), 321–351 (1972). DOI 10.1007/BF00276493. URL https://doi.org/10.1007/BF00276493
50. Xin, X., Guo, P.: A survey on the past, present and development trend of the basictheory of artificial intelligence. ChinaXiv:201905.00013 (2019). URL https://doi.org/10.12074/201905.00013 . (in Chinese)51. Xu, Z.: Grasping the focus of next-generation information technology. People’s Daily(2019). In Chinese52. Ye, Q.X.: Introduction to reaction diffusion equation. Practice and understanding ofmathematics pp. 48–56 (1984). In Chinese53. Zhao, J.J., Mathieu, M., LeCun, Y.: Energy-based generative adversarial network.CoRR abs/1609.03126 (2016). URL http://arxiv.org/abs/1609.03126
54. Zienkiewicz, O.C., Taylor, R.L., Zhu, J.Z.: The Finite Element Method: Its Basisand Fundamentals, p. 756. Butterworth-Heinemann, Oxford (2013). DOI 10.1016/B978-1-85617-633-0.00019-8. URL54. Zienkiewicz, O.C., Taylor, R.L., Zhu, J.Z.: The Finite Element Method: Its Basisand Fundamentals, p. 756. Butterworth-Heinemann, Oxford (2013). DOI 10.1016/B978-1-85617-633-0.00019-8. URL