[PDF] On the spatiotemporal behavior in biology-mimicking computing systems

Abstract

The payload performance of conventional computing systems, from single processors to supercomputers, reached its limits the nature enables. Both the growing demand to cope with "big data" (based on, or assisted by, artificial intelligence) and the interest in understanding the operation of our brain more completely, stimulated the efforts to build biology-mimicking computing systems from inexpensive conventional components and build different ("neuromorphic") computing systems. On one side, those systems require an unusually large number of processors, which introduces performance limitations and nonlinear scaling. On the other side, the neuronal operation drastically differs from the conventional workloads. The conventional computing (including both its mathematical background and physical implementation) is based on assuming instant interaction, while the biological neuronal systems have a "spatiotemporal" behavior. This difference alone makes imitating biological behavior in technical implementation hard. Besides, the recent issues in computing called the attention to that the temporal behavior is a general feature of computing systems, too. Some of their effects in both biological and technical systems were already noticed. Nevertheless, handling of those issues is incomplete/improper. Introducing temporal logic, based on the Minkowski transform, gives quantitative insight into the operation of both kinds of computing systems, furthermore provides a natural explanation of decades-old empirical phenomena. Without considering their temporal behavior correctly, neither effective implementation nor a true imitation of biological neural systems are possible.

Full PDF

OOn the spatiotemporal behaviorin biology-mimicking computing systems

J´anos V´eghKalim´anos BT, Hungary´Ad´am J. BerkiUniversity of Medicine, Pharmacy, Sciences and Technologyof Targu Mures, Romania

Abstract

The payload performance of conventional computing systems, from singleprocessors to supercomputers, reached its limits the nature enables. Boththe growing demand to cope with ”big data” (based on, or assisted by, artiﬁ-cial intelligence) and the interest in understanding the operation of our brainmore completely, stimulated the eﬀorts to build biology-mimicking comput-ing systems from inexpensive conventional components and build diﬀerent(”neuromorphic”) computing systems. On one side, those systems requirean unusually large number of processors, which introduces performance limi-tations and nonlinear scaling. On the other side, the neuronal operation dras-tically diﬀers from the conventional workloads. The conventional computing(including both its mathematical background and physical implementation)is based on assuming instant interaction, while the biological neuronal sys-tems have a ”spatiotemporal” behavior, although conduction time is typicallyignored in computational models of neural network function. This diﬀerencealone makes imitating biological behavior in technical implementation hard.Besides, the recent issues in computing called the attention to that the tem-poral behavior is a general feature of computing systems, too. Some oftheir eﬀects in both biological and technical systems were already noticed.Nevertheless, handling of those issues is incomplete/improper. Introducingtemporal logic, based on the Minkowski transform, gives quantitative insightinto the operation of both kinds of computing systems, furthermore providesa natural explanation of decades-old empirical phenomena. Without consid-ering their temporal behavior correctly, neither eﬀective implementation nora true imitation of biological neural systems are possible.

Preprint submitted to Neural Networks 18 Sept 2020 a r X i v : . [ c s . ET ] S e p eywords: spatiotemporal behavior; Minkowski-transform; non-instantinteraction; imitating biological behavior; artiﬁcial neuron; performance ofneural networks; brain simulation

1. Introduction

The appropriate time handling is vital for the operation of both bio-logical and technical computing systems. Initially, the technical computingwas modeled about the biological computing. The technological possibilities,however, limited the true imitation of some features, and the extremely fasttechnological development in the electronic industry has strengthened thediﬀerences between the true and imitated features in a stealthy way. Thedrastically diﬀerent speed of evolution of those systems signiﬁcantly changedalso their relationship.The tendency is to use an increasing number of conventional processors intechnical systems and introduce a new type of workload: artiﬁcial intelligencetype computations. Both of these tendencies shed light on the importanceof timed cooperation of their computing elements . However, neither of thosecomputing systems formulates the task of timing correctly. Technical com-puting assumes ”instant interaction”, corresponding to the standpoint of thescience a hundred years ago. Its use leads to massive energy-wasting andextremely low payload computing performance when solving real-life com-puting tasks. The biological computing uses a description for the experi-enced ”spatiotemporal” behavior, where the ”space” and ”time” are handledmathematically as separated functions.The need for modeling the biological functionality in technical computingsystems is more and more pressing. Imitating their real biological behavior,despite some initial successes on small systems, comprising only a few neu-rons, is getting more challenging as the complexity grows. The existence ofa limiting interaction speed in both kinds of computing systems, enables usto introduce a correct mathematical handling in both ﬁelds, and in this way,forms the basis for mutual understanding the needs and possibilities of theother ﬁeld, for researchers of both biology and computing.Given that Heraclitus, BC. 500, stated: ”No man ever steps in the sameriver twice” the idea of the space-time system is ancient. In studying bio-logical neural systems, it is known since the beginnings that if we repeat ameasurement at a diﬀerent location in the system, or at a diﬀerent time at2he same place of the system, the measured values are diﬀerent, althoughthe phenomenon under study is the same. The measurable parameters of anevent change their value both in time and space, and we see the same eventat a diﬀerent time in a properly chosen place. This common experience is ex-pressed with the wording, that the systems show a ”spatiotemporal” behavior[2, 3]. The used phraseology is closely related to the ”space-time” coordi-nates, introduced by Minkowski. Despite the evident resemblance, to theauthors’ best knowledge, no one used the formalism of Minkowski-transformto describe the behavior of biological or technical computing systems. Thephilosophical discussion about ”space and time in the brain” [4] mentionsMinkowski-transform, but we discuss it from a completely diﬀerent point ofview: to describe the information delivery in computing networks, both inbiological and technical ones.The classic science knows temporal dependence of interactions (from ourpoint of view: transferring information via physical interaction) only in thesense that if we move, for example, an electric charge, the frequency of thegenerated electromagnetic (EM) wave can be calculated. However, becauseof assuming instant interaction, the speed of the EM wave cannot: the in-stant interaction is achievable only having inﬁnitely large speed . A similarinterpretation was used in former studies in describing neural behavior: themathematical formalism used time ( t ) as an independent parameter that wasnot connected (through the interaction speed of the action under study) tothe spatial coordinates. Because of this, similarly to the case of frequency and speed , one of the vital features of computing (including neural) networksremained out of sight of the research , resulting in an incomplete descriptionof their behavior.The fact that the speed of light is ﬁnite was known since Galilei. More-over, Einstein discovered that interactions, such as forces between objectshaving electric charge or gravitational mass, have a ﬁnite interaction speed(in other words, their interaction is not instant, although very fast). In his(implicit ) interpretation, there exists a universal limiting speed for inter- Einstein in his classic paper ”On the electrodynamics of moving bodies” [5] speaksabout that ”light is always propagated in empty space with a deﬁnite velocity c,” given thatthe light represents the propagation of the electromagnetic interaction. In the abstractof the paper, however, he mentions that the speaks about ”the phenomena of electrody-namics as well as of mechanics”, i.e., gravity. The formalism, however, was available forelectrodynamics only; for gravity only a decade later. ctions, that even the light (given that it is a propagating electromagneticinteraction) cannot exceed . The scientiﬁc truth about the existence of aﬁnite interaction speed, in general, was recently conﬁrmed by providing ex-perimental evidence for the existence of gravitational waves. The experi-mental evidence also indirectly underpins that the mathematical backgroundof the Minkowski transform is well-established and correctly describes nature .Modern treatments of special relativity base it on the single postulate ofMinkowski spacetime [6].In our electronic devices, the EM waves are propagating with speed pro-portional to light’s speed. In the ﬁrst computer [7, 8] (as well as in the presentcomputers), the interaction speed was in the range of 10 m/s -range. The”processor size” was in the m -range, the timings (cycle length, access time,instruction’s execution time) in the msec -range. Under those conditions, the”spatiotemporal” behavior could not be discovered, and also theoretically, itcould be safely neglected. For today, thanks to the technological develop-ment, the processor dwarfed million-fold, the timing got million-fold faster.The interaction speed, however, did not change. Since the ﬁrst personalcomputers’ appearance, the physical components’ characteristic size, such asthe length of buses connecting their components, did not change signiﬁcantly(unlike the distance of computing gates, see Moore’s observation).Thanks to the decreasing density and the increasing frequency, the speedof changing the electronic states in a computing system, more and more ap-proached the limiting speed . It was recognized that the system’s clock signalsmust be delayed [9], that eﬀort takes nearly the half of the power consump-tion of the processor (and the same amount of energy is needed for cooling).Furthermore, only about 20% of the consumed (payload) power is used forcomputing [10]; the rest goes for transferring data from one place to an-other. Despite these shortcomings, it was not suspected that the physicalimplementations of the electronic components have a temporal behavior [11] .However, it was the ﬁnal reason of many experienced issues, from the payloadperformance limit of supercomputers [12] and brain simulation [13] , to theweeks-long training time in deep learning [14, 15] .In contrast, in biological computing systems, the ”spatiotemporal” be-havior of dynamically interacting neurons, is explicitly investigated. Their Both assuming the non-instant nature of interactions, and demonstrating their exis-tence, deserved Nobel price. cm -range distances, the m/s -rangeinteraction speed, and the msec -range timings (periodicity, spike length, etc.)prove that the proper description of the biological networks is feasible onlywith temporal logic. In this context, both in theoretical descriptions andreal-time simulations: the temporal behavior is a vital feature of biologicalsystems. The ”liquid state machine” model grasps the essential point of thebiological neural networks: their logical behavior cannot be adequately de-scribed without using both time and space coordinates . However, that modelhandles them separately; it does not connect those coordinates in a way, asproposed here. Because of this, that model cannot provide a full-featureddescription of the behavior of biological neural networks, and also it does notenable us to analyze the temporal behavior of their technical implementa-tions.The Minkowski-transform was famous for its role in accepting, quicklyand widely, Einstein’s special relativity theory, providing an expressive andpicturesque mathematical frame for it. The Minkowski-transform, however,is self-contained with assuming only what usually is considered as one of theessential consequences of the special relativity, that a limiting speed exists. For our paper, we only assume that a limiting speed (in both electronicand biological systems) exists, and transferring information in the systemneeds time . In biology, the limiting speed (the conduction velocity) is modu-lated, but our statement holds for any single action: the spatial and temporalcoordinates are connected through the corresponding limiting speed . In ourapproach,

Minkowski provided a mathematical method to describe informa-tion transfer phenomena in a world, where the interaction speed is limited .For example, if we have our touching sense as the only source of informationfrom the external world, we need to walk to the object, and this automaticallylimits the information propagation speed (both touching and being touched)to our walking speed. This case is the same when transferring informationin computing systems: the space-time four-coordinates describe that world. The real-time in our terminology means that all computing events happen on thebiologically correct time scale, instead that, on average, the computing time matches thebiological time. The paradox consequences of special relativity, such as time dilation and length con-traction, are indeed relevant for observers in diﬀerent frames . The simpliﬁed discussiongiven here is suﬃcient for the case of one observer. , furthermore, that the interaction speedis other than the speed of light.We can proceed, following Minkowski, merely introducing a fourth co-ordinate, and through the assumed limiting speed (without making furtherassumptions about the value and nature of that speed), we can transform thetime of propagation of an event (the interaction, aka the physical transfer ofthe information) to a distance , within which the interaction can have an ef-fect in the considered time duration. Notice the critical aspect, that space andtime not only have equal rank, but they are connected through the interactionspeed, and that all coordinates have the same dimension .The present paper focuses on three major topics. In section 2, it is shownhow the (inverse) Minkowski transform can be used to describe both com-putational and biological neural networks. The section introduces the idea,that both processing time and data transfer time are naturally part of anykind of computing .In section 3 we take the focus narrower, especially to biology, and discussthe importance of time synchronization in collective oscillations, one of theessential biological functionalities of the brain. The analysis results in atrivial explanation of the commonly used principle ”neurons that ﬁre togetherwire together.” The present paper does not want to discuss how diﬀerentbiological phenomena (such as pre-synaptic and post-synaptic processing,etc.) shall be included in the model.In understanding the excessively complex operation of the brain, com-puter methods also have their role. Section 4 discusses some technical im-plementations and how their temporal behavior limits the intended faithfulimitation of biological neural systems’ operation. Again, it is out of scopeof the paper to analyze special-purpose artiﬁcial neuronal chips (partly be-cause of the lack of suﬃcient (proprietary) technical information of theirimplementation). Instead, some common principles are discussed.The importance of and the need for AI applications speedily grows, andthe decades-old truth, that ”more is diﬀerent” [16], remains out of sight. The Receiving a neural spike, however, is a little bit special case: because of the integration,in some cases, the ”processing time” can vary. The eﬀects of some inherited solutions, such as ”grid time”, are discussed in [15]

2. The Minkowski transform

As suspected by many experts, the computing paradigm itself, ” the im-plicit hardware/software contract [17]”, is responsible for the experiencedissues in computing: ”

No current programming model is able to cope withthis development [of processors], though, as they essentially still follow theclassical van Neumann model ” [18]. When thinking about ”advances beyond2020”, the solution was expected from the ” more eﬃcient implementation ofthe von Neumann architecture ” [19], however.There are many analogies between science and computing [20] , amongothers, how they handle time. Both classic science and classic computingassume instant (inﬁnitely quick) interaction between its objects. That is, anevent happening at any location can be instantly seen at all other locations.This assumption implies that the information inside a computing system isdelivered instantly, and it is always available when the processing unit isready to process it. Mathematics assumes only logical dependence betweenits operands, otherwise assumes that they are instantly available. Becauseof the ”contract”, engineering implementation must follow the same rules ofthe game; leading (among others) to assuming ”weak scaling” that neglectsthe transfer time even in globe-wide networked distributed systems [15].

Thetime has no speciﬁc role in computer science (in a sense given above). Inscience, discovering that there exists an insurmountable interaction speedled to the modern scientiﬁc disciplines’ birth. Those new disciplines did notinvalidate the corresponding classical ones. Instead, they drew their range ofvalidity and described the world outside that range.

In computing, distances get deﬁned during the fabrication of componentsand assembling the system. Operating them, however, temporal character-istics are used. In biological systems, nature deﬁnes the neuronal distances,and in ’wet’ neuro-biology, signal timing, rather than axon length, is the right The idea, however, that processing the data does not happen instantly, somehowslipped in computing theory, although it is handled only implicitly. we need to ﬁnd out how muchlater a component notices that an event (aka a piece of information, such asa spike, a clock signal, or a network package arrival) occurred in the system .Computer science is based on logical functions. It assumes that theirvalue does not depend on where and when the functions are evaluated. Inother words, it is assumed that all events happen at the same time and in thesame place.

It is true for both the timeless world of the mathematics and aninﬁnitely small and inﬁnitely fast, physically implemented computer. How-ever, it is increasingly wrong for physically implemented computers as theirphysical size increases, and/or the data transfer time compared to processingtime increases, and/or the change of their electronic states approaches thespeed of light.

If we can change the interpretation of logical functions toanother one, using space-time coordinates , we can consider the new tech-nological situation in producing, describing and using computing systems,including biological neural systems (importantly: the technical systems imi-tating them); using the solid mathematical background of computer science.Furthermore, we can change the design principles correspondingly, to providemore eﬀective systems (and mimicking more closely ”biological computing”).We suggest introducing a temporal logic (i.e., that the value of a logicalexpression depends on where and when it is evaluated) into computing. The reverse of the Minkowski transform is proposed here: we need to use a specialfour-vector, where all coordinates are time values . The ﬁrst three are thecorresponding local coordinates (distance, measured along the path of thesignal, from the location of the event, divided by the speed of interaction;plus time contributions such as multiplexing or network hops) having timedimension, and the fourth coordinate is the time itself; that is, we introduce a four dimensional time-space system. The resemblance with the Minkowski-space is obvious, and the name diﬀerence signals the diﬀerent aspects ofutilization.Figure 1 shows why the time must be considered explicitly in all kinds ofcomputing . The ﬁgure shows (for visibility) a three-dimensional coordinatesystem: how an event behaves in a two-dimensional space plus time (the con-cept is more comfortable to visualize with the number of spatial dimensions Formally, we only introduce bool f unction ( M inkowski coord x = (0 , , , bool f unction ( void ) x y t Figure 1: The computing operation in time-space approach. The processing operators canbe gates, processors, neurons or networked computers. reduced from three to two). In the ﬁgure, the direction ’y’ is not used, butenables us to place observers at the same distance from the event, without theneed to locate them in the same point. The event happens at point (0,0,0);the observers are located on the ’x’ axis; the vertical scale corresponds to thetime. 9 .2. Reproducing Einstein’s hypothetical experiment

In the classic physical hypothetical experiment, we switch a light on inthe origo, and the observer switches his light when notices that the ﬁrstlight was switched on. If we graph the growing circle (corresponding to thepropagation of the light) around the vertical axis of the graph representingtime, the result is a cone, known as the future light cone (in 2D space plus atime dimension). Both light sources have some ”processing time”, that passesbetween noticing the light (receiving the instruction or a synaptic input) andswitching the light on (performing the instruction or emitting a spike).The instruction is received at the origo, at the bottom of the green arrow.The light goes on at the head of the arrow (at the same location, but later);after the ”processing time” T p passed. Following that, the light propagatesin the two spatial dimensions as a circle around the axis ”t”. Observers ata larger distance notice the light at a later time: a ”transmission time” T t is needed. The same spike arrives at diﬀerent neurons (or a signal in a dis-tributed system to diﬀerent processors) at diﬀerent times. If the ”processingtime” of the ﬁrst event’s light source were zero, the light would propagatealong the gray surface at the origo. However, the light will propagate alongthe blueish cone surface at the green arrow’s head because of the ﬁnite pro-cessing time.A circle denotes the position of our observer on the axis ”x”. With zero”transmission time”, the second gray conical surface (at the head of thegreen dotted arrow) would describe his light. However, his ”processing time”can only begin when the observer notices the light at his position: whenthe dotted orange arrow hits the blueish surface. A crucial point, that acomputing system (including biological or artiﬁcial neurons) cannot processits data input, until all their data physically arrive at the input ports of theprocessing unit. (See the discussion below, how high-speed bus changes thedata transmission time and how the fast tensor processing computes a wrongfeedback with uninitialized state variables.)At that point begins the ”processing time” of the second light source; theyellowish conical surface describes the second light propagation. The hori-zontal (green dotted) arrow describes the physical distance of the observer(as a time coordinate), the vertical (orange dotted) arrow describes the timedelay of the observer light. It comprises two components: the T t transmis-sion time to the observer and its T p processing time. The light cone of theobserver starts at t = 2 ∗ T p + T t . 10he red arrow represents the resulting apparent processing time T A : thelonger is the red vector; the slower is the system. As the vectors are in thesame plane, T A = (cid:113) T t + (2 · T p + T t ) , that is T A = T p · (cid:113) R + (2 + R ) .That is, the apparent processing time is a non-linear function of both of itscomponent times and their ratio R . If more computing elements are involved, T t denotes the longest transmission time. (Similar statement is valid if the T p times are diﬀerent.) Their eﬀect is signiﬁcant: if R = 1, the apparentexecution time of performing the two computations is more than three timeslonger than the processing time.Two more observers are located on the axis ’x’, in the same position. Forvisibility, their timings are displayed at points ’1’ and ’2’, respectively. Theirresults illustrate the inﬂuence of the transmission speed (and/or the ratio R ). In their case, the transmission speed diﬀers by a factor of two , comparedto that, displayed at point ’0’; in this way three diﬀerent R = T t /T p ratiosare displayed. Notice that at half transmission speed (the horizontal greenarrow, representing the transfer time, is twice as long as that in the origo)the vector is considerably longer, while at double transmission speed, thedecrease of the apparent time is much less expressed . The ”liquid state machine” correctly grasps some essential aspects ofbrain operation, but it cannot provide a solid mathematical base for describ-ing the details of the time dependence of neuronal operations, because theinhomogeneous coordinates are not connected, they are mathematically sep-arable . In this way, having diﬀerent conduction velocities, phase-locking orlearning (and especially those in their technical implementations), remainsout of sight of the model. Similarly, it is not easy to visualize diﬀerent re-lations between neuronal state variables.

The inverse Minskowski transformworks entirely in the temporal domain (as do researchers in ’wet’ neurobiol-ogy), that is representing the neural events in this system is quite natural.Notice that the handling depicted in Fig. 1 natively separates the time-dependent and distance-dependent time components (and, at the same time,uniﬁes them). The processing happens at the same place, and has a timeduration, so the corresponding vectors are parallel with the time axis. Giventhat there is no instant interaction, a signal, transferring information from [12] discusses this phenomenon in details. The slope of the vector depends on the interaction speed . The slope canhave a proper meaning in technical computing, too, for example in the caseof networked distributed systems. Nevertheless, in biology, it is a uniquefeature.

Given that the apparent processing time T A , rather than the physical pro-cessing time T p , deﬁnes the performance of the system, T p and T t must beconcerted. The biology predeﬁnes the composition of the time contributionsin a biological neuron, changing either of the two component times in theirsimulation drastically degrades the relevance of simulating their temporalbehavior. The nature does not solve diﬀerential equations, as the computerneeds to do, and the technical implementation must convert the ”massivelyparallel” data transfer of biological systems to a sequential transfer. In thisway, it can drastically degrade the speed (and even their sequence) of trans-ferring information from one processing unit to another. That is, both usingdiﬀerent mathematical solutions (as well as computing accelerators) and im-plementing neuronal information transfer changes the ratio between thosecomponent times in biology-mimicking systems. As a consequence, any tech-nical imitation of a biological computing system, has no relation to the realbiological operation, until it handles the biological time correctly . The appar-ent processing time T A and the technical processing time T p can be shockinglydiﬀerent when using a large number of technical neurons, and transferringthe information using conventional technical implementations.The mutually blocking eﬀect of computing and transfer operations isdemonstrated in Fig 2. Recent supercomputers F ugaku [21] and

Summit [22]provided their HPL performance for both 64-bit and 16-bit operands. Thepower consumption data [22] underpin the expectation that their payloadperformance should be four times higher when using four times shorteroperands. However, their computing performance shows a moderate perfor-mance enhancement because of the needed housekeeping: 3.01 for

Summit and 3.42 for

F ugaku . The presence of the (for ﬂoating computation) non-payload activity acts in the same way as data transmission: delays (and inthis way: blocks) ﬂoating computation. Given that the inverse Minkowski12

PL and HPCG timingSummit and Fugaku implementation FP FP · FP FP non-payload FPHPCG FPHPCG Figure 2: The degrading eﬀect of apparent transfer time T A on the non-payload contri-bution, in deﬁning HP L and

HP CG eﬃciencies, for double and half precision ﬂoatingoperation modes. For visibility, a hypothetic eﬃciency ratio E HP L /E HP CG =10 assumed.The housekeeping (including transfer time and cache misses) dominates, the length ofoperands has only marginal importance. transform enables us to handle them uniformly, the time delays due to datatransmission (including physical delivery and data access time; in the spatialdomain) and the time delay due to addressing, incrementing, branching (hap-pening at the same place, in the time domain), combines in this simple way.The slight diﬀerence in the contribution of housekeeping (denoted by

F P inthe ﬁgure), however, results in a payload performance-limiting factor aboutthree between the two supercomputers. In terms of temporal operation, thehigher F P contribution results in a higher apparent processing time, i.e., inlower performance, as discussed in detail in [12] .The temporal behavior also explains, why diﬀerent workloads (repre-sented by the benchmarks HPCG and HPL, respectively) result in diﬀerentpayload performances: the ”sparse” operation of the HPCG algorithm, fur-thermore its need for a more intense communication. Increasing the apparenttransfer time degrades the system’s payload performance. In the ﬁgure, forvisibility, a hypothetic eﬃciency ratio ten is assumed. The real (measured)eﬃciency ratio is about 100-250: the cache failures enormously increase theaverage data transfer time, which blocks the computation, and in this way,increases the apparent processing time. As estimated in [12] , in the caseof imitating neuromorphic operation on a conventional architecture, this ra-tio is expected to be above 1000. It was bitterly admitted, that ” artiﬁcialintelligence, . . . it’s the most disruptive workload from an I/O pattern per- pective. ” Notice also, that under AI workload, using half-precision insteadof double precision, reduces the power consumption of the system, but hasonly marginal eﬀect on its computing performance , see Fig. 2.

Given that its apparent processing time T A deﬁnes the payload perfor-mance of the system, T p and T t must be properly concerted. Fig. 3 demon-strates how the apparent access time T A of an on-chip cache memory changes,if one changes the processing speed T p of the cache. In the ﬁgure, two dif-ferent topologies and two diﬀerent physical cache operating speeds are used.Two cores are in positions (-0.5,0,0) and (0.5,0,0), furthermore two cachememories at (0,0.5,0) and (0,1,0). The signal, requesting to access the cache,propagates along the dotted green vector (it changes both its time and posi-tion coordinates; recall that position coordinates are also mapped to time),the cache starts to operate only when the green dotted arrow hits its position.Till that time, the cache is idle waiting. After its operating time (the verticalorange arrow), the result is delivered back to the requesting core. This timecan also be projected back to the ”position axes”, and their sum (thin redarrow) can be calculated. Similarly, the requesting core is also ”idle waiting”until the requested content arrives.The physical delivery of the fetched value begins at the bottom of thelower thick green arrows, includes waiting (dashed thin green lines), andﬁnishes at the head of the upper thick green vector; their distance deﬁnesthe apparent cache access time that, of course, is inversely proportional withthe apparent cache access speed . Notice that the apparent processing time isa monotonic function of the physical processing speed, but because of theincluded ’transmission times’ due to the physical distance of the respectiveelements, their dependence is far from being linear. The apparent cachespeed increases either if the cache is physically closer to the requesting coreor if the cache access time is shorter (or both). The apparent processingtime (represented by vertical green arrows) is only slightly aﬀected by thephysical speed of the cache memory (represented by vertical orange arrows).The concrete example explains why the clever placing of its cache memoriesresulted in that supercomputer F ugaku outperformed its predecessor by afactor of three. T t dominates the system’s payload performance,so even decreasing T p to zero could not lead to a signiﬁcant performanceenhancement. MRQ MRQ MDT MDT − . . xy t MRQ MRQ MDT MDT − . . xy t Figure 3: Performance dependence of an on-chip cache memory, at diﬀerent cache oper-ating times, in the same topology. Cores at (-0.5,0) and (0.5,0) positions access on-chipcache memories at (0,0.5) and (0,1), respectively. Vertical orange arrows represent physi-cal cache operating time. Cache memories, from left to right, have physical access speed(on some arbitrary scale) 1 and 10, respectively. Vertical green arrows (from the bottomof the lower arrow to the top of the upper arrow) represent the apparent access time.

Also, adding analog components to a digital processor has its price. Giventhat the digital processor cannot handle resources outside of its world, onemust call the OS for help. That help, however, is rather expensive in termsof execution time. The required context switching takes time in the order ofexecuting 10 instructions [25, 26], which greatly increases the total executiontime and makes the non-payload to payload ratio much worse. In terms oftemporal analysis, the ratio of the component times changes to a value range15f R ≈ . In this way, the frequent context switches will dominate thepayload performance of the system.These cases seem to be very diﬀerent. They, however, share at least thecommon feature, that they change not only one parameter: they also changethe non-payload to payload ratio that deﬁnes the systems’ payload eﬃciency.

3. Manifestations of the role of time in biology

The brain maintains diﬀerent temporal characteristics of its neural events [27].Moreover, we want to emphasize why adopting their timely behavior is nec-essary for authentic/successful technical imitations. The biology-mimickingcomputing principles seem to be cleared up to a level that even some com-puting primitives could be proposed [28]. When considering the role of timein (all kinds of) biology-mimicking systems, some basic principles (and, con-sequently, those primitives) must be pinpointed or extended. As an example,the periodic oscillation (or more precisely: the synchronized co-oscillation)is discussed, which – in addition to its role in the operation of the brain– hasother roles in biology [29]. The phenomena seem to be well-studied also bymathematical methods, see [30] and its cited references.Given that the single neurons are embedded in oscillatory networks, theycan (and shall) synchronize their temporal operation using their low-precision”local clock”. Receiving a base frequency, resets the ”local clock” frequently,and the relative time can be accurate for a more extended operating pe-riod. Notice, that neurons at diﬀerent Minkowski-distance ”see” the localfrequency at diﬀerent Minkowski-times, so they set their internal phase an-gle to the Minkowski-distance. That is, ”at the same time” means diﬀerentabsolute times for diﬀerent neurons, depending on their location: the biol-ogy uses Minkowski four-coordinates.The brain oscillates continuously at a vast range of frequencies (0.02-600Hz) [31], maintaining its neurons’ global function at various time scales. Thefrequency bands are generated by the cell assemblies’ current behavior, rep-resenting their involvement in diﬀerent computational processes. The phasehas a unique role in stationary oscillations, as a kind of ﬁfth coordinate in Concerning Einstein’s hypothetic experiment: the neuronal interactions have a speedin the range up to 10 m/s ; this speed is the speed limit of receiving information by the”internal observers” (the neurons) only . The scientist, however, is ”external observer”,having information transfer speed in the range 10 m/s The phases oftheir oscillations at diﬀerent frequencies is also relevant in keeping the op-eration of neurons synchronous. A conduction delay of 5ms could changeinteractions of two coupled oscillators at the upper end of the gamma fre-quency range ( ≈ Hz ) from constructive to destructive interference; de-lays smaller than 1 ms could change the phase by ◦ , signiﬁcantly aﬀectingsignal amplitude” [1] .The initially diﬀerent phases are tuned to produce a suﬃciently strongsignal due to the proper superposition of the signals: the Minkowski-time ofthe signals sent by the peer neurons is adjusted so that the signals arrive atthe same Minkowski-coordinate. The goal is to produce a maximum currentsignal amplitude (or integrated charge) at the target (at given time-spacecoordinate!). Given that the contributing neurons have diﬀerent time-spacecoordinates, their phases (the time coordinate) indeed are not necessarily thesame. The spatial Minkowski-coordinates are deﬁned anatomically. However,as they are located in a way to minimize the needed energy eﬃciently, theirMinkowski coordinates diﬀer usually only marginally. Notice that when thephases of the synaptic inputs change, they aﬀect when the neuron in questionﬁres (due to weighting, diﬀerent amount of charge is integrated with respectto time, and so the threshold potential is exceeded at a diﬀerent time).The spikes’ conduction speed is determined by the axon’s anatomicalproperties (diameter and the amount of myelin) and the type of the synapses(chemical or electrical). The neuronal communication means the transferof information between distinct nodes. Although the neurons’ impulses tomuscles could travel with 100 m/s, the interaction speed between two neuronscan be found mainly in the 1 m/s range. Given that the brain’s size is inthe 10 cm range, the typical neural action speed is placed in the several msec range, resulting in up to 10 Hz maximum operating frequency, withoutsynchronization. Hence, it is a challenge to deliver a spike from a distantneuron to its target that receives spikes also from nearby peer neurons (alocal assembly) at the right time. In more complex tasks, requiring evenmore extensive areas to cooperate, the problem to be solved becomes evenmore challenging. The coordinated operation of neurons is brought about by synchrony sus-tained by functionally interconnected networks’ oscillatory behavior. To gain17ﬃcacy, synchrony is established in brief temporal windows. However, theirduration depends on membrane resistance, conductance, and other factors.In this periodically emerging time window, the incoming stimuli’ response isalso inﬂuenced by the previously integrated inputs.The brain uses temporal packages brought about by oscillatory cycles toovercome these diﬃculties, i.e., the information is transferred within well-deﬁned time-windows. To support this, the brain developed a unique syn-chronization method to enable the individual neurons (having diﬀerent Minkowskicoordinates) to cooperate in such a way. Given that the goal of their coop-eration is, that their collective eﬀort can be synchronized on a remote set ofneurons; interdependent, but distinct, cell assemblies are wired together bylong-range neurons endowed with unique properties (thicker myelin sheaths),enabling the establishment of 0-lag synchrony between neuronal populations.The ﬁrst part of the task of synchronization is solved by issuing some fun-damental frequencies (aka oscillations) to which the neurons can synchronizethemselves. To transfer those ”time base” frequency signals with the typicalspeed of the system would be too slow: it would not enable to distinguishwhether a given signal value from a signal having a higher frequency (sam-pled at a given Minkowski coordinate) belongs to which period of the sampledsignal. In biological systems, instead of the ionic current conduction, a pureelectric conduction technology was developed: the Ranvier cells polarizingeach other’s input with their output membrane (actually, the neurons havethe corresponding ’half’ Ranvier cell in the sender and receiver neuron, re-spectively). However, this ”charge cloning” works only if the connection lineis not leaking, so these axons are thickly myelinated. The result is that whilethe potential induced by polarization enables one to see an eﬀect that ” theaction potential jumps from one node to another ” (generating local currentsonly), its speed is signiﬁcantly (up to about 60 times) higher than that of theionic conduction. In other words, the biology modulates the interaction speedthat permits signals from a distant neuron (at a more considerable space-onlydistance) to reach their targets in time, despite that the signals they controlarrive from neurons at smaller space-only distance .The receiving neuron could integrate this current in the same way asthe input from the others: after the last ”half Ranvier cell”, the sender(and the path the signal traveled) is no longer known. However, it wouldbe only one of the many (maybe thousands) synaptic inputs, and becauseof the ”voting,” it would not have the ability to provide the needed ”timebase” signal. To reach the required goal, biology developed a method that18onsiders the ”rate of rise” of the input synaptic current [32], too.

Exceedingthat current threshold, triggers spiking, independently from the state of theother synapses and the action potential.

After spiking, the neuron is ”reset”,that is, its phase is set to zero value. That means, that in addition tothe voltage threshold for action potential of the neuron, one more currentthreshold for synaptic currents exists .The ﬁrst part of the discussion in this section provides a native expla-nation for the decades-old empirical rule: ”neurons that ﬁre together wiretogether.” The oscillators must have the same time constant ( RC value),to oscillate together. Using the self-frequency of the oscillators requires theminimum energy to operate their set; this is required to operate the brainwith its outstanding eﬃciency. Given that the structure of the members ofthe assembly are anatomically identical, it is a plausible assumption, thatthey have identical RC value.To achieve maximum eﬀect on a remote neuron’s synapses, the spikesemitted by a neuronal assembly members shall arrive at the same phase to the target neuron. In the assembly, the sender neurons can be ”reset”by a ”central clock”, a base frequency of the brain. That is, the axonshave a common source and a destination that branches to the individualneurons only at the very end of their path. Till that point, they usually”wire together.” However, their Minkowski-distances are slightly diﬀerent,both from the source to one of the member neurons in the assembly, and fromthat neuron to the target neuron. The arrival of the synchronizing pulse (abase synchron spike, delivered using the ”saltatoric” way of charge delivery),through its path, delivers the correct Minkowski-distance for resetting themember neuron, and the neuron has no way (and no reason) to change it.The Minkowski-distance for the second portion of the triggering can alsobe (slightly) diﬀerent. To adjust their phase properly, to achieve maximumcontribution at the target, the participating member neurons must learn howto adjust it. As discussed, the same amount of synaptic current contributesdiﬀerently to the action potential in the member neurons, in function of thephase compared to that of the inputs from other neurons, so changing theweights (due to the feedback the member neurons receive) leads to a changein the discharge time in the spiking neuron . That is, the feedback is basedon the phase at which the spike arrives at its destination. After receivingthe feedbacks, the change is made in the time of ﬁring at the sender. Again,achieve a minimum in the consumed energy (a maximum of the superim-posed charges) of this collective operation, at the target, the post-synaptic19pikes must arrive at the same local ”phase” (concerning the spike producedby the target neuron). Given that their Minkowski-coordinates diﬀer onlymarginally, it looks like that the member neurons spike simultaneously, i.e.,they ”ﬁre together.” I s I s FET

R C

Figure 4: A suggested modiﬁed RC circuit to consider saltatoric conduction

The case can be best understood from Figure 4, where a possible techni-cal implementation of the saltatoric behavior is shown (although in this way,only one synaptic input is modeled). A ﬁeld-eﬀect transistor is added tothe usual resistor and capacitor that deﬁne the frequency of the oscillationin the circuit. As long as the synaptic input current does not exceed thecritical value, there is no change in the simulated operation. When the input(synaptic) current, however, generates a voltage on the resistor, that opensthe transistor, the condensator quickly discharges (through a very low limit-ing resistance), and a spike is produced. (In the biology, a refractory periodfollows, when the input charge and current is grounded; the technical neu-ron needs no relaxation so that the operating cycle can start immediately.)Anyhow, such a discharge synchronizes the ”local clock” to the central timebase, independently from its ”local phase”, and makes the operation of theneuron phase-locked to the time-base frequency (RC deﬁnes the frequency,although with low precision, and the saltatoric reset deﬁnes its phase, pre-cisely). Worth to notice that the network of neurons automatically usesthe correct Minkowski-time: neurons at diﬀerent places do not necessarilyhave the same Minkowski-time value, although their functional grouping maycover this diﬀerence.

From a simpliﬁed modeling point of view, learning means to change theweights of the input (synaptic) signals to optimize the operation for the givengoal. Changing the weights, however, results in changes in the time of theirspiking. Given that the members of an assembly are anatomically identical,in this context only the phase is signiﬁcant: the eﬃciency requires that the20pikes arrive at their destination at the same phase (in this context, it means”at the same time”).

In models, not considering the timing of spiking, theresulting ’phase’ feedback is attributed to other input signals, and the eﬀectis distributed among the other (non-temporal) parameters . Changing theweights of neurons’ signals, on one side, means that the learning methodattributes improper weights to all of its signals, while it does not considertheir timing at all. On the other side, it also means that a portion of thegenerated synaptic input charge shall be grounded, when only the weightsare changed. If the learning method considers timing, the same operationof the neurons could be reached using less generated charge, i.e., using lessenergy.The biology can modulate conduction speed, at the price of making struc-tural changes: the corresponding axons must be heavily myelinated. Oneside, making such a structural change, is a prolonged process compared tosynchronization. On the other side, when using this latter method of chang-ing the timely behavior of neurons frequently, it may be less expensive (interms of energy consumption), to accelerate (the produced) less charge thanto produce and deliver much more ions . This is why nature combines thesetwo methods: the thickness of the myelin layer can be diﬀerent, and leads todiﬀerent conduction speeds. The set of neurons, used to solve a new task,can be conﬁgured quickly via earthing (i.e., wasting) part of the producedcharge, but when using this new experience in a longer time duration, itmay be optimal to modify the structure via myelination, and in this way toreduce the need of energy for producing and transmitting ions in the longrun. Also ” axon-speciﬁc adjustment of node of Ranvier length is potentiallyan energy-eﬃcient and rapid mechanism for tuning the arrival time of in-formation in the CNS ” [33]. A model that is worth checking when studyingshort and long-term learning. The self-reconﬁguration ability of the brain isunderpinned by mathematical modeling [34], too.Anyhow, the arrival time of information plays a major role, but the tech-nical neurons, representing neuronal states with voltage levels, miss all kindsof temporal information. Without handling temporal information, vital foroperating the biological systems correctly, it is impossible to produce accu-rate biology-mimicking simulations. Even the timestamps delivered in thespikes can not include the phase: integrating in a time slot is not sensitiveto the phase. In technical systems (including many-thread simulations onsupercomputers), the ”spatiotemporal” behavior is not considered at all. Inbiology-mimicking systems (including the ”liquid state machine”) the ”spa-21 a a y I npu t L a y e r H i dd e n L a y e r O u t pu t L a y e r Neuron T i m e ( n o t p r o p o r t i o n a l ) A Parallel busesLinear communication/processing N N N N T , i n P r o c e ss T , o u t T , i n P r o c e ss T , o u t T , i n P r o c e ss T , o u t T , i n P r o c e ss T , o u t T o t a l t i m e Neuron T i m e ( n o t p r o p o r t i o n a l ) B Sequential busCommunication bound; nonlinear N N N N B U S T , i n T , i n P r o c e ss T , o u t T , o u t T , i n P r o c e ss T , i n P r o c e ss T , o u t T , o u t T , o u t T , o u t T , i n T , i n P r o c e ss T , o u t T , o u t T o t a l t i m e B a n d w i d t h B a n d w i d t h Neuron T i m e ( n o t p r o p o r t i o n a l ) C Sequential busCommunication rooﬂine; nonlinear N N N N B U S T , i n T , i n P r o c e ss T , o u t T , o u t T , i n P r o c e ss T , i n T , o u t P r o c e ss T , o u t T , o u t T , i n T , i n T , o u t T , i n T , i n P r o c e ss T , o u t T , o u t T o t a l t i m e B a n d w i d t h B a n d w i d t h Figure 5: Implementing neuronal communication in diﬀerent technical approaches. A (thebiological implementation): the parallel bus; B and C(the technical implementation): theshared serial bus, before and after reaching the communication ”rooﬂine” [35]. tiotemporal behavior” is handled in a way where time and place are separable.That is, they are not connected in a way as proposed here.

4. Technical implementations

The components of technical computing systems (including biology-mimickingneuromorphic ones) are connected through a set of wires, called ”bus”. Thebus is essentially the physical appearance of the ”technical implementation”of communication, stemming from the Single Processor Approach [36], asillustrated in Fig. 5. The inset shows a simple neuromorphic use case: oneinput neuron and one output neuron communicate through a hidden layer,comprising only two neurons. Fig. 5.A mostly shows the biological imple-mentation: all neurons are directly wired to their partners , i.e., a system of”parallel buses” (axons) exists. Notice that the operating time also comprisestwo ”non-payload” times ( T t ): data input and data output, which coincidewith the non-payload time of the other communication party. The diagramdisplays the logical and temporal dependencies of the neuronal functionality.The payload operation (”the computing”) can only start after its data isdelivered (by the, from this point of view, non-payload functionality: input-side communication), and the output communication can only begin when22he computing ﬁnished. Importantly, communication and calculation mutu-ally block each other . Two important points that neuromorphic systems mustmimic noticed immediately: i/ the communication time is an integral part ofthe total execution time , and ii/ the ability to communicate is a native func-tionality of the system. In such a parallel implementation, the performanceof the system , measured as the resulting total time (processing + transmit-ting), scales linearly with increasing either the non-payload communicationspeed or the payload processing speed .Fig. 5.B shows a technical implementation of a high-speed shared bus forcommunication. To the right of the grid, the activity that loads the busat the given time is shown. A double arrow illustrates the communicationbandwidth, the length of which is proportional to the number of packagesthe bus can deliver in a given time unit. We assume that the input neuroncan send its information in a single message to the hidden layer; furthermore,the processing by neurons in the hidden layer both starts and ends simul-taneously. However, the neurons must compete for accessing the bus, andonly one of them can send its message immediately, the other(s) must waituntil the bus gets released. The output neuron can only receive the messagewhen the ﬁrst neuron completed its sending. Furthermore, the output neu-ron must ﬁrst acquire the second message from the bus, and the processingcan only begin after having both input arguments. This constraint results insequential bus delays both during non-payload processing in the hidden layerand payload processing in the output neuron . Adding one more neuron to thelayer introduces one more delay.At this point, two wrong solutions can be taken: either the second neu-ron must wait until the second input arrives (in biology, a spike also carriesa synchronization signal, and triggers its integration), or (in ”technical neu-rons”, using continuous levels rather than pulses, this synchronization facilityis omitted) changes its output continuously, as the inputs arrive, and its pro-cessing speed enables. In the latter case, however, until the second inputarrives (and gets processed) the neuron provides an output signal, diﬀeringfrom the one expected based on the mathematical dependence . As discussed indetail in [11] , this, temporarily may be wrong, output signal is known in theelectronics, and those ”glitches” are eliminated via using a ”worst-case” delayfor the output signal. However, including a serial bus in that computationwould enormously prolong the needed ”worst-case” delay.Using the formalism introduced above, T t = 2 · T B + T d + X , i.e., the busmust be reached in time T B (not only the operand delivered to the bus, but23 RQ BGT BRQ Bdt BGT Bdt − . − . . . .

61 234 xyt

Figure 6: The operation of the sequential bus, in time-space coordinate system system.Near to axis t , the lack of vertical arrows signals ”idle waiting” time also waiting for arbitration: the right to use the shared bus), twice, plus thephysical delivery T d through the bus. The X denotes ”foreign contribution”:if the bus is not dedicated for ”neurons in this layer only”, any other traﬃcalso loads the bus: both messages from diﬀerent layers and the general systemmessages may make processing slower (and add their contribution to fakingthe imitated biological eﬀect).Even if only one single neuron exists in the hidden layer, it must usethe mechanisms of sharing the bus, case by case. The physical delivery tothe bus takes more time than a transfer to a neighboring neuron (both thearbiter and the bus are in cm distance range, meaning several nsec transfertimes, while the direct transfer between the connected gates may be in the psec range). If we have more neurons (such as a hidden layer) on the busand work in parallel, they must all wait for the bus. The high-speed busis very slightly loaded when only a couple of neurons are present. Its loadincreases linearly with the number of neurons in the hidden layer (or, maybe,all neurons in the system). The temporal behavior of the bus, however, isdiﬀerent.Under a biology-mimicking workload, the second neuron must wait for24ll its inputs originating in the hidden layer. If we have L neurons in thehidden layer, the transmission time of the neuron behind the hidden layer is T t = L · · T B + T d + X . This temporal behavior explains why ” shallow networkswith many neurons per layer . . . scale worse than deep networks with lessneurons ”[37]: the physical bus delivery time T d , as well as the processing time T p , become marginal if the layer forces to make many arbitrations to reachthe bus : the number of the neurons in the hidden layer deﬁnes the transfertime (Recall Figs. 1 and 2 for the consequences of increasing the transfertime). In deeper networks, the system sends its messages at diﬀerent timesin its diﬀerent layers (and, even they may have independent buses betweenthe layers), although the shared bus persists in limiting the communication .Notice that there is no way to organize the message traﬃc: only one busexists.Fig. 6 discusses, in terms of ”temporal logic”, the case depicted in the insetin Fig. 5 (where the same operation is discussed in conventional terms): whyusing high-speed buses for connecting modern computer components leadsto very severe performance loss, especially when one attempts to imitateneuromorphic operation. The two neurons of the hidden layer are positionedat (-0.3,0) and (0.6,0). The bus is at a position (0,0.5). The two neuronsmake their computation (green arrows at the position of neurons), then theywant to tell their result to their fellow neurons. Unlike in biology, ﬁrst, theymust have access to the shared bus (red arrows). The bus requests mustreach the arbiter, that needs time, and so does the grant signal. The coreat (-.3,0) is closer to the bus, so its request is granted. As soon as the grantsignal reaches the requesting core, the bus operation is initiated, and thedata starts to travel to the bus. As soon as it reaches the bus, it is forwardedby the high speed of the bus. At that point, the other core’s bus request isgranted, and ﬁnally, the computed result of the second neuron is bused.At this point comes into picture the role of the workload on the system:the two neurons in the hidden layer want to use the single shared bus, at thesame time, for communication. As a consequence, the apparent processingtime is several times higher, than the physical processing time, and it in-creases linearly with the number of neurons in the hidden layer (and, maybe,with also the total number of neurons in the system, if a single high-speedbus is used).The ratio of the time spent with forwarding data on the high-speed busgradually decreases as the system’s size increases. In vast systems, especiallywhen attempting to mimic neuromorphic workload, the speed of the bus is etting marginal . Notice that the times shown in the ﬁgure are not propor-tional: the (temporal) distances between cores are in the several picosecondsrange, while the bus (and the arbiter) are at a distance well above nanosec-onds, so the actual temporal behavior (and the idle time stemming from it) ismuch worse than the ﬁgure suggests . ”The idea of using the popular sharedbus to implement the communication medium is no longer acceptable, mainlydue to its high contention.” [38]. The extraordinary workload of AI, makesit much harder to operate the systems.When imitating biological processes, one needs to consider both the timeat which the event can be ”seen” in wet neurobiology (the biological time)and the time duration that the computer processor needs to deliver the resultcorresponding to the biological event (computing time) . The computingobjects, intending to imitate biological systems, need to be aware of both timescales. As discussed above, in connection with the serial bus, the technicalimplementation may introduce enormously low payload computing eﬃciency,and considerably distorts the time relations between computing and datadelivery times.Passing of time is measured via counting some periodic events, such asclock periods in computing systems or spiking events in biological systems. Inthis event-based world, everything happening between those events happens”at the same time”. However, the technical implementation (including mea-suring biological processes) may introduce another, unintended, granularity.The biological neurons perform analog integration; the technological imple-mentations are prepared to perform ”step-wise” digital integration. This stepinvolves (mostly) losing phase information. Furthermore, as detailed in [13] ,it introduces severe payload performance limits for neuronal operations. During training, we start showing an input, and the system begins towork, using the initial values of its synaptic weights. Those weights may berandomized, may be set according to some presumption, or maybe that theycorrespond to the previous input data. The signals that the system sends arecorrect, but a receiver does not know the future: a signal must be physicallydelivered before it can be processed. Before that time, the neurons (and The ”wall-clock” time, reﬂecting computer operational details, is not considered here Even if the message envelope contains a time stamp that neuron,reﬂecting the change its output caused. Without this, neurons receive feed-back about ”the eﬀect of all fellow neurons, including me” . Receiving aspike, deﬁnes the beginning of the time of validity of the signal; ”leaking”also deﬁnes its ”expiration time”. When using spiking networks, their tem-poral behavior is vital.In the example in [11] , in the one-bit adder, the ﬁrst AND gate has quitea short indeﬁnite time, but the OR has a long one. Neuronal operations showa similar behavior concerning having undeﬁned states and weights, althoughthey are more complex than a simple adder, and their operations are muchlonger. Essentially, their operation is an iteration, where the actors mostlyuse mostly undeﬁned input signals, and surely adjust their weights to falsesignals initially, and with a signiﬁcant time delay at later times . Not con-sidering temporal behavior leads to painfully slow and doubtful convergence.

The larger is the system, the slower is its convergence, and the higher is thechance of ”over-ﬁtting”.Computing neuronal feedback results in a faster way, cannot help much,if at all. Delivering feedback information also needs time and uses the sameshared medium, with all its disadvantages. In biology, the ”computing time”and ”communication time” are in the same order of magnitude. In its techni-cal implementation, the communication time is very much longer than com-putation. That is, the received feedback refers to a time (and related statevariables) that was valid a very long time ago .In excessive systems, to provide seemingly higher performance, some re-sult/ feedback events must be dropped because of their long queueing. Giventhat the feedback is computed from the results of the neuron that receivesfeedback, the physical implementation of the computing system converts thelogical dependence to time dependence [11] . Because of this time sequence,feedback messages will arrive to the neuron at a later physical time (evenif at the same biological time , according to their time stamp they carry), sothey stand at the end of the input message queue. Because of this, it is Maybe it is worth to re-discuss, whether in ”spatiotemporal” systems full or partialderivates shall be used. are dropped if the receiving process is busy overseveral delivery cycles ” [39].

In vast systems, feedback in the learning processinvolves results based on undeﬁned inputs, furthermore, the calculated (andmaybe correct) feedback may be neglected .The statements above are underpinned in other experimental investiga-tions, too. Introducing the spatio-temporal behavior of ANNs, even in itssimple form, using separated (i.e., not connected in the way proposed here)time and space contributions to describe them, ”factorizing the 3D convolu-tional ﬁlters into separate spatial and temporal components yields signiﬁcantgains in accuracy” and eﬃciency of video analysis [40, 41]. Furthermore, an-other careful analysis also discovered, that ”

Yet the task of training suchnetworks remains a challenging optimization problem. Several related prob-lems arise: very long training time (several weeks on modern computers, forsome problems), the potential for over-ﬁtting (whereby the learned functionis too speciﬁc to the training data and generalizes poorly to unseen data), andmore technically, the vanishing gradient problem ”. It is correctly stated thatone of the reasons of the issues is the communication load, so it is a plausi-ble attempt to reduce the number of communication units (BTW: the ideamimics the way as biology works): ”

The immediate eﬀect of activating fewerunits is that propagating information through the network will be faster, bothat training and at test time. ”[42] However, confusing the biological and thecomputational times distorts their timing relations. This also means that thecomputed feedback, based maybe on undeﬁned inputs, reaches the previouslayer’s neurons faster. A natural consequence is that (see their Fig. 5): ” As λ s increases, the running time decreases, but so does performance. ” The roleof time (mismatching) is conﬁrmed directly, via making investigations in thetime domain. ” The CNN models are more sensitive to low-frequency chan-nels than high-frequency channels ” [43]: the feedback can follow the changesin function of the speed of the changes compared to the speed of feedbackcalculation.

5. Conclusions

To understand a neural network [44] means not only correct coding andusing proper weights: considering the timing relations properly are at leastas crucial. The larger the systems, the more crucial. The timely behavior isthe key to learning and development. Both describing them mathematicallyand implementing them technically, without considering their true time de-28endence, they mimic something diﬀerent. That approaches result in ” thatany studies on processes like plasticity, learning, and development exhibitedover hours and days of biological time are outside our reach ”. [39]The temporal behavior is a crucial attribute of neuronal operation, bothin biological and technical computing systems. The technical implementa-tions lack synchronization, so they confuse the biological and computationaltime scales. When imitating biological computing operation in technicalcomputing systems, the time relations may be drastically distorted, whichleads to unrealistic imitation of the biological behavior. In the ﬁrst round,the proposed mathematical handling enables us to ﬁnd the reasons for theineﬃcient/erroneous/slow operation of the artiﬁcial neuronal systems. Thesecond round helps to prepare systems with much higher eﬃcacy and (froma biological point of view) correct operation.

Acknowledgements

The authors thank Prof. P´eter Somogyi for valuable comments on a pre-vious version of the manuscript. Project no. 136496 has been implementedwith the support provided from the National Research, Development andInnovation Fund of Hungary, ﬁnanced under the K funding scheme.

Conﬂict of interest statement

The authors declare that no competing interests exist.

ReferencesReferences [1] F. R. Pajevic S, Basser PJ, Role of myelin plasticity in oscillationsand synchrony of neuronal activity, Neuroscience 13 (2014) 135–147.doi:doi: 10.1016/j.neuroscience.2013.11.007.URL [2] W. Maass, T. Natschl¨ager, H. Markram, Real-time comput-ing without stable states: A new framework for neural com-putation based on perturbations, Neural Computation 14 (11)(2002) 2531–2560. arXiv:https://doi.org/10.1162/089976602760407955,29oi:10.1162/089976602760407955.URL https://doi.org/10.1162/089976602760407955 [3] E. Iranmehr, S. B. Shouraki, M. M. Faraji, N. Bagheri, B. Linares-Barranco, Bio-inspired evolutionary model of spiking neural networksin ionic liquid space, Frontiers in Neuroscience 13 (2019) 1085.doi:10.3389/fnins.2019.01085.URL [4] G. Buzs´aki, The Brain from Inside Out, 1st Edition, Oxford UniversityPress, 2019.[5] A. Einstein, On the Electrodynamics of Moving Bodies,Annalen der Physik (in German) 10 (17) (1905) 891–921.doi:10.1002/andp.19053221004.[6] A. Das, The special theory of relativity: a mathematical exposition, 1stEdition, Springer, 1993.[7] J. J. P. Eckert, J. W. Mauchly, Automatic High-Speed Computing: AProgress Report on the EDVAC, Tech. Rep. Report of Work under Con-tract No. W-670-ORD-4926, Supplement No 4, Moore School Library,University of Pennsylvania, Philadephia (September 1945).[8] M. D. Godfrey, D. F. Hendry, The Computer as von Neumann PlannedIt, IEEE Annals of the History of Computing 15 (1) (1993) 11–21.[9] R. Waser (Ed.), Advanced Electronics Materials and Novel Devices, Na-noelectronics and Information Technology, Wiley, 2012.[10] H. Simon, Why we need Exascale and why we won’t get there by 2020,in: Exascale Radioastronomy Meeting, AASCTS2, 2014.URL [11] J. V´egh, Introducing Temporal Behavior to Computing Science, in: 2020CSCE, Fundamentals of Computing Science, IEEE, 2020, pp. AcceptedFCS2930, in print. arXiv:2006.01128.URL https://arxiv.org/abs/2006.01128 https://doi.org/10.1007%2Fs11227-020-03210-4 [13] J. V´egh, How Amdahl’s Law limits performance of large artiﬁcial neuralnetworks, Brain Informatics 6 (2019) 1–11.URL https://braininformatics.springeropen.com/articles/10.1186/ s40708-019-0097-2/metrics [14] J. V´egh, How deep machine learning can be, A Closer Look at Convo-lutional Neural Networks, Nova, In press, 2020, pp. 141–169.URL https://arxiv.org/abs/2005.00872 [15] J. V´egh, Which scaling rule applies to Artiﬁcial Neural Networks, in:Computational Science and Computational Intelligence (CSCE) The22nd Int’l Conf on Artiﬁcial Intelligence (ICAI’20), IEEE, 2020, pp.Accepted ICA2246, in print. arXiv:2005.08942.URL http://arxiv.org/abs/2005.08942 [16] P. W. Anderson, More Is Diﬀerent, Science 177 (1972) 393–396.doi:10.1126/science. 177.4047.393.[17] K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubi-atowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel,K. Yelick, A View of the Parallel Computing Landscape, Comm. ACM52 (10) (2009) 56–67.[18] S(o)OS project, Resource-independent ex-ecution support on exa-scale systems, (2010).[19] Machine Intelligence Research Institute, Erik DeBenedictis on super-computing (2014).URL https://intelligence.org/2014/04/03/erik-debenedictis/ [20] J. V´egh, A. Tisan, The need for modern computing paradigm: Scienceapplied to computing, in: International Conference on ComputationalScience and Computational Intelligence CSCI The 25th Int’l Conf onParallel and Distributed Processing Techniques and Applications, IEEE,31019, pp. 1523–1532. doi:10.1109/CSCI49370.2019.00283.URL http://arxiv.org/abs/1908.02651 [21] J. Dongarra, Report on the Fujitsu Fugaku System, Tech. Rep. TechReport ICL-UT-20-06, University of Tennessee Department of ElectricalEngineering and Computer Science (June 2016).URL http://bit.ly/fugaku-report [22] A. Haidar, S. Tomov, J. Dongarra, N. J. Higham, Harnessing GPU Ten-sor Cores for Fast FP16 Arithmetic to Speed Up Mixed-precision Itera-tive Reﬁnement Solvers, in: Proceedings of the International Conferencefor High Performance Computing, Networking, Storage, and Analysis,SC ’18, IEEE Press, 2018, pp. 47:1–47:11.[23] E. Chicca, G. Indiveri, A recipe for creating ideal hybrid memristive-CMOS neuromorphic processing systems, Applied Physics Letters116 (12) (2020) 120501. arXiv:https://doi.org/10.1063/1.5142089,doi:10.1063/1.5142089.URL https://doi.org/10.1063/1.5142089 [24] Building brain-inspired computing, Nature Communications 10 (12)(2019) 4838.URL https://doi.org/10.1038/s41467-019-12521-x [25] F. M. David, J. C. Carlyle, R. H. Campbell, Context Switch Overheadsfor Linux on ARM Platforms, in: Proceedings of the 2007 Workshopon Experimental Computer Science, ExpCS ’07, ACM, New York, NY,USA, 2007. doi:10.1145/1281700.1281703.URL http://doi.acm.org/10.1145/1281700.1281703 [26] D. Tsafrir, The context-switch overhead inﬂicted by hardware interrupts(and the enigma of do-nothing loops), in: Proceedings of the 2007 Work-shop on Experimental Computer Science, ExpCS ’07, ACM, New York,NY, USA, 2007, pp. 3–3.[27] E. R. Kandel, J. H. Schwartz, T. M. Jessell, S. A. S. abd A. J. Hudspeth,Principles of Neural Science, 5th Edition, The McGraw-Hill, 2013.[28] J. D. Kendall, S. Kumar, The building blocks of a brain-inspired com-puter, Appl. Phys. Rev. 7 (2020) 011305. doi:10.1063/1.5129306.3229] T. Williams, Phase coupling by synaptic spread in chains ofcoupled neuronal oscillators, Science 258 (5082) (1992) 662–665.arXiv:https://science.sciencemag.org/content/258/5082/662.full.pdf,doi:10.1126/science.1411575.URL https://science.sciencemag.org/content/258/5082/662 [30] L. M. Alonso1, M. O. Magnasco, Complex spatiotemporal behavior andcoherent excitations in critically-coupled chains of neural circuits, Chaos:An Interdisciplinary Journal of Nonlinear Science 28 (2018) 093102.doi:doi: 10.1063/1.5011766.URL https://doi.org/10.1063/1.5011766 [31] G. Buzs´aki, Rhythms of the Brain, 1st Edition, Oxford University Press,2006.[32] A. Losonczy, J. Magee, Integrative properties of radial oblique dendritesin hippocampal CA1 pyramidal neurons, Neuron 50 (2006) 291–307.[33] A.-C. IL, F. MC, C. L, I. K, T. K, A. D., Node of Ran-vier length as a potential regulator of myelinated axon con-duction speed, https://pubmed.ncbi.nlm.nih.gov/28130923/ (2017).doi:doi:10.7554/eLife.23329.[34] C. Kirst, C. D. Modes, M. O. Magnasco, Shifting attention to dy-namics: Self-reconﬁguration of neural networks, Current Opinionin Systems Biology 3 (2017) 132 – 140, • Mathematical mod-elling • Mathematical modelling, Dynamics of brain activity atthe systems level • Clinical and translational systems biology.doi:https://doi.org/10.1016/j.coisb.2017.04.006.URL [35] S. Williams, A. Waterman, D. Patterson, Rooﬂine: An insightful visualperformance model for multicore architectures, Commun. ACM 52 (4)(2009) 65–76.[36] G. M. Amdahl, Validity of the Single Processor Approach to AchievingLarge-Scale Computing Capabilities, in: AFIPS Conference Proceed-ings, Vol. 30, 1967, pp. 483–485. doi:10.1145/1465482.1465560.3337] J. Keuper, F.-J. Preundt, Distributed Training of Deep Neural Net-works: Theoretical and Practical Limits of Parallel Scalability, in:2nd Workshop on Machine Learning in HPC Environments (MLHPC),IEEE, 2016, pp. 1469–1476. doi:10.1109/MLHPC.2016.006.URL [38] L. de Macedo Mourelle, N. Nedjah, F. G. Pessanha, Reconﬁgurable andAdaptive Computing: Theory and Applications, CRC press, 2016, Ch. 5:Interprocess Communication via Crossbar for Shared Memory Systems-on-chip.[39] S. J. van Albada, A. G. Rowley, J. Senk, M. Hopkins, M. Schmidt,A. B. Stokes, D. R. Lester, M. Diesmann, S. B. Furber, PerformanceComparison of the Digital Neuromorphic Hardware SpiNNaker and theNeural Network Simulation Software NEST for a Full-Scale CorticalMicrocircuit Model, Frontiers in Neuroscience 12 (2018) 291.[40] S. Xie, C. Sun, J. Huang, Z. Tu, K. Murphy, Rethinking spatiotemporalfeature learning: Speed-accuracy trade-oﬀs in video classiﬁcation, in:V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), ComputerVision – ECCV 2018, Springer International Publishing, Cham, 2018,pp. 318–335.[41] D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, M. Paluri, A closerlook at spatiotemporal convolutions for action recognition, in: 2018IEEE/CVF Conference on Computer Vision and Pattern Recognition,2018, pp. 6450–6459.[42] E. Bengio, P.-L. Bacon, J. Pineau, D. Precu, Conditional Compu-tation in Neural Networks for faster models, in: ICLR’16:, 2016.arXiv:1511.06297.URL https://arxiv.org/pdf/1511.06297https://arxiv.org/pdf/1511.06297