[PDF] Algorithmic Complexities in Backpropagation and Tropical Neural Networks

Abstract

In this note, we propose a novel technique to reduce the algorithmic complexity of neural network training by using matrices of tropical real numbers instead of matrices of real numbers. Since the tropical arithmetics replaces multiplication with addition, and addition with max, we theoretically achieve several order of magnitude better constant factors in time complexities in the training phase. The fact that we replace the field of real numbers with the tropical semiring of real numbers and yet achieve the same classification results via neural networks come from deep results in topology and analysis, which we verify in our note. We then explore artificial neural networks in terms of tropical arithmetics and tropical algebraic geometry, and introduce the multi-layered tropical neural networks as universal approximators. After giving a tropical reformulation of the backpropagation algorithm, we verify the algorithmic complexity is substantially lower than the usual backpropagation as the tropical arithmetic is free of the complexity of usual multiplication.

Full PDF

aa r X i v : . [ c s . CC ] J a n Algorithmic Complexities in Backpropagationand Tropical Neural Networks

Özgür Ceyhan [email protected]

January 5, 2021

Abstract

In this note, we propose a novel technique to reduce the algorithmic complexityof neural network training by using matrices of tropical real numbers instead ofmatrices of real numbers. Since the tropical arithmetics replaces multiplicationwith addition, and addition with max, we theoretically achieve several order ofmagnitude better constant factors in time complexities in the training phase.The fact that we replace the ﬁeld of real numbers with the tropical semiringof real numbers and yet achieve the same classiﬁcation results via neural networkscome from deep results in topology and analysis, which we verify in our note. Wethen explore artiﬁcial neural networks in terms of tropical arithmetics and tropicalalgebraic geometry, and introduce the multi-layered tropical neural networks asuniversal approximators. After giving a tropical reformulation of the backpropaga-tion algorithm, we verify the algorithmic complexity is substantially lower than theusual backpropagation as the tropical arithmetic is free of the complexity of usualmultiplication.

The main strategies for building capable and more eﬃcient neural networks can besummarised as(I) Developing and manufacturing more capable hardware;(II) Designing smaller and more robust versions of neural networks that realise thesame tasks;(III) Reducing the computational complexities of learning algorithms without chang-ing the structures of neural networks or hardware.The approach (I) is an industrial design and manufacturing challenge. The approach (II)is essentially the subject of network pruning. This note will focus only on a theoreticalapproach on (III) based on tropical arithmetics and geometry.1he main training method of multi-layered neural networks is the backpropagationalgorithm and its variations (see, for instance [Ru16] for variants of backpropagationand their properties). The backpropagation is essentially a recursive gradient descenttechnique that works on large matrices. However, the sizes of the matrices determined bythe initial parameters of neural networks are enormous: The state-of-the-art applicationsmay have 10 -10 neurons, therefore, have as many parameters as the pairings ofthese neurons, see [DBBNG, §1.2.4]. Many neural network implementations requireadequately large computational resources when the scale of computations is that big, see[DBBNG] and references therein. Naturally, since the computational resources requiredare large, any reduction in the algorithmic complexity of elementary operations is goingto provide substantial advantages. To this end, we would like to eliminate the basicarithmetic operation of multiplication, which is algorithmically more complex thanaddition, subtraction and 𝑚𝑖𝑛 / 𝑚𝑎𝑥 , without changing the nature of the classiﬁcationproblem that we wish to realise with a neural network.We propose to reach our goal by using tropical arithmetics and tropical geometry.While tropicalisation is a relatively new concept, its core idea has been used in engi-neering for decades in areas such as the linear control theory and the combinatorialoptimisation. The eﬀective use of tropicalization in mathematics goes back to Viroin the 80’s where he constructs real algebraic varieties with prescribed topology toaddress Hilbert’s 16th and related problems [Vi06]. Further studies on Viro’s methodrevealed the tropical semiring, an algebraic structure whose arithmetics is devoid ofmultiplication, is the key algebraic structure behind Viro’s results [Vi01]. In this notewe claim that the tropical geometry also provides a suitable setup to construct thebackpropagation algorithm with substantially better complexity.After brieﬂy describing the sources of the algorithmic complexities in backpropa-gation techniques in §2, and introducing the basic notions in tropical arithmetics andtropical geometry in §3.1, we ﬁrst deﬁne the tropical limit of the rectiﬁed linear unit(ReLU) in §3.2, and explore the multilayered feedforward neural networks using thistropical unit in §3.3. We show that these tropical neural networks have the propertiesof the universal approximator as the deep ReLU neural networks. The last section §4focuses on the topological realisation of the classiﬁcation problems via backpropagationbased on the tropical semiring. In §4.1, we ﬁrst reformulate the classiﬁcation problemas a topological problem of exploring the connected components of the complement ofthe zero locus of a function approximated by a neural network. Finally, we introduce thetropicalization of the backpropagation algorithm in §4.2, that essentially approximatesthe original classiﬁcation problem. We conclude that, while we achieve an algorithmwhich approximates the original problem with less algorithmic complexity, it does notrequire substantial recoding, only the replacement of linear algebra packages with theirtropical versions. This note essentially summarises the results presented by the author at

Séminaire "FablesGéométriques" in University of Geneva on December 9, 2016. . The version at hand unige.ch/math/tggroup/doku.php?id=fables Tropical geometry of neural networks appeared inICML 2018 by Zhang et al was not available to the author when this paper is written.This note is kept as it was initially submitted and no new results has been incorporated.

Assume we have a multi-layered neural network with the weight matrix W ( 𝑘 ) = h 𝑤 ( 𝑘 ) 𝑖 𝑗 i connecting the layers 𝑘 − 𝑘 . The backpropagation algorithm aims at minimisinga predetermined error function 𝐸 by using an iterative process of gradient descent. Inthis algorithm, one calculates the gradient matrix (∇ 𝐸 ) ( 𝑘 ) =  𝜕𝐸𝜕𝑤 ( 𝑘 ) 𝑖 𝑗  , 𝑘 = , . . . , 𝑙 for training data, and adjusts the weight matrix W ( 𝑘 ) by adding a correction term Δ W ( 𝑘 ) = − 𝜖 ∇ ( 𝑘 ) 𝐸 in order to minimise the error function iteratively. The backpropa-gated error on the 𝑘 -th layer is computed via an iterative matrix multiplication Δ W ( 𝑘 ) = D ( 𝑘 ) W ( 𝑘 + ) · · · D ( 𝑙 ) W ( 𝑙 + ) e (1)where for each layer 𝑘 , the matrix D ( 𝑘 ) is the diagonal matrix composed of the deriva-tives of the activation function with respect to its arguments, and the vector e containsthe derivatives of the output errors with respect to the arguments of the last layer. Fordetails, see for instance [Ro96, §7.3.3] or [DBBNG, §6.5]. The algorithmic complexity of the backpropagation algorithm as presented in (1) hasessentially two layers:(i) The complexity of calculating the matrix product.(ii) The complexity of the arithmetic operations involved in each step of the matrixmultiplications.As we mentioned in the introduction, one can design smaller and more robust neuralnetworks performing the same task, but we would need diﬀerent pruning techniques. Forvarious network pruning techniques, see for example [HMD15, LKDSG, TF97, YCS16].The ordinary multiplication algorithm of matrices of size 𝑛 × 𝑚 and 𝑚 × 𝑝 has thealgorithmic complexity of 𝑂 ( 𝑛𝑚 𝑝 ) . Any simpliﬁcation in the structure of the networkis going to decrease the algorithmic complexity as the sizes of the resulting matriceswill decrease. Even though there are matrix multiplication algorithms with better3symptotic complexities (see for instance [Ga12]), they are generally impractical due tothe large constant factors in their running times. It is also not very clear whether theseimproved matrix multiplications algorithms are well-suited to the GPU’s.In this note, we propose to focus on the more subtle source of the complexity (ii):the arithmetic operations. In algebraic terms, we propose replacing the base ﬁeld ofreal numbers and corresponding arithmetic operations with the tropical semiring ofreals with its own simpler arithmetic operations. Since we only swap the ﬁeld ofreal numbers with the tropical ring of real numbers, we note that any reduction onthe algorithmic complexity of arithmetic operations would not require any structuralchanges in the backpropagation algorithm. We discuss these aspects in §4.2. Thus,such a swap requires minimal adaptation in the existing code bases, i.e., swapping theclassical linear package with a tropical linear algebra package. Remark 2.2.1

Approximation of activation function is also notable source of compu-tational complexity as it plays a role in (1) as entries of the matrices D ( 𝑘 ) . Most of theactivation functions are intentionally non-linear, and their evaluations require certainprecision. However, tropicalisation of activation functions and especially their approx-imations are computationally less complex as they are realised in terms of piecewiselinear functions, see §3.2 below. min / max , addition and multiplication There are substantial diﬀerences in (time) complexities of diﬀerent the elementarybinary operations. The amount of the operations to sum two 𝑛 -bits numbers viaschoolbook addition algorithm is 𝑂 ( 𝑛 ) . Similarly, the upper bound of the numberoperations to decide the maximum (or the minimum) of the two 𝑛 -bits numbers issimilarly 𝑂 ( 𝑛 ) . More importantly, the average-case complexity of max (and min ) is 𝑂 ( ) .By contrast, the schoolbook multiplication for the same size is 𝑂 ( 𝑛 ) . While thereare other multiplication algorithms with better algorithmic complexity, such as theKaratsuba algorithm of order 𝑂 ( 𝑛 log ( ) ) [Ka95], they all have the complexity of 𝑂 ( 𝑛 𝜆 ) with 𝜆 >

1. It is important to note that these lower algorithmic complexities are achievedasymptotically in most cases, and they are usually eﬀectively implement for integers.Moreover, one may argue that the Kolmogorov complexities of these asymptoticallybetter algorithms are not better as they are signiﬁcantly diﬃcult to implement.

The actual execution time of a speciﬁc code depends on numerous factors such as theprocessor speed, the instruction set, disk speed, and the compiler used. An old rule ofthumb in designing numerical experiments that dictates avoiding multiplications anddivisions in simulations in favor of additions and subtractions in order to improve theactual execution time may heuristically seem redundant on modern processors as theyclosed the time-cost-gap between addition and multiplication drastically. However, onecan still say the following on the algorithmic complexity of arithmetic operations:- Integer sums and 𝑚𝑎𝑥 / 𝑚𝑖𝑛 take the same amount of time;4 Floating-point operations are slower than integer operations of the same size;- Floating-point multiplications are slower than the sums. The energy consumption does not favour the multiplications over the summationseither: the required energy for multiplication is always considerably higher than thesummation [Hor14, pg. 32]. Even if we disregard the complexities in CPU designs,both execution time and energy consumption criteria suggest that the reduction ofalgorithmic complexity by eliminating the multiplications as much as possible provideconsiderable beneﬁts.

In this section, we introduce the basic notions in tropical arithmetics and tropical geom-etry, then use them to introduce the tropical neural networks as universal approximators.

The tropical semiring is the limit ℏ → +∞ of the family 𝑆 ℏ : = ( T , ⊕ ℏ , ⊗ ℏ ) where T : = R ∪ {−∞} with the following arithmetic operations; for 𝑎, 𝑏 ∈ T , 𝑎 ⊕ ℏ 𝑏 : = ( log ℏ ( ℏ 𝑎 + ℏ 𝑏 ) when ℏ ∈ ( 𝑒, +∞) max { 𝑎, 𝑏 } when ℏ → +∞ (2) 𝑎 ⊗ ℏ 𝑏 : = log ℏ ( ℏ 𝑎 · ℏ 𝑏 ) = 𝑎 + 𝑏. (3) 𝑆 ℏ form a semiring (a ring without additive inverses) and admit the semiring isomor-phism 𝐷 ℏ : ( T , × , +) → 𝑆 ℏ for any ﬁnite value of ℏ , see [IMS, Vi01]. The family 𝑆 ℏ in (2) relates the ordinary addition and multiplication operations onthe set of real numbers with the tropical arithmetic in the limit. This limiting processis also called the Maslov dequantization [IMS, Vi01]. The tropical limit 𝑆 ∞ admits atropical division, however, the subtraction is impossible due to the idempotency of ⊕ ℏ ,i.e. 𝑥 ⊕ ∞ 𝑥 = max { 𝑥, 𝑥 } = 𝑥 . The role of the additive zero is played by −∞ , and themultiplicative unit becomes 0. A tropical polynomial is a tropical sum of tropical monomials, therefore, a polynomialof the form 𝑃 ( x ) = Õ ( 𝑗 ,..., 𝑗 𝑛 ) ∈ 𝑉 𝑎 𝑗 ,..., 𝑗 𝑛 𝑥 𝑗 · · · 𝑥 𝑗 𝑛 𝑛 , with x : = ( 𝑥 , . . . , 𝑥 𝑛 ) , See for instance, http://nicolas.limare.net/pro/notes/2014/12/12_arit_speed/ and https://lemire.me/blog/2010/07/19/is-multiplication-slower-than-addition/ The parameter ℏ is not just the reminiscent of Planck constant. The tropical limit ℏ → ∞ is essentiallythe quasi-classical (i.e., zero temperature) limit of a certain model in quantum mechanics.

5s evaluated in the following form in the tropical ring 𝑃 𝑡𝑟 ( x ) : = 𝐷 ∞ ( 𝑃 ( x )) = Ê 𝑉 (cid:16) 𝑎 𝑗 ,..., 𝑗 𝑛 ⊗ ∞ 𝑥 𝑗 ⊗ ∞ · · · ⊗ ∞ 𝑥 𝑗 𝑛 𝑛 (cid:17) = max ( 𝑗 ,..., 𝑗 𝑛 ) ∈ 𝑉 { 𝑎 𝑗 ,..., 𝑗 𝑛 + Õ 𝑗 𝑘 𝑥 𝑘 } (4)where 𝑉 ⊂ Z 𝑛 is a ﬁnite set of points with non-negative coordinates and the coeﬃcients 𝑎 ’s are tropical numbers.The zero set of a tropical polynomial 𝑃 𝑡𝑟 is the set of tropical vectors x for whicheither 𝑃 𝑡𝑟 ( x ) = −∞ , or there exists a pair 𝑖 ≠ 𝑗 in 𝑉 such that 𝑎 𝑖 ,...,𝑖 𝑛 ⊗ ∞ 𝑥 𝑖 ⊗ ∞ · · · ⊗ ∞ 𝑥 𝑖 𝑛 𝑛 = 𝑎 𝑗 ,..., 𝑗 𝑛 ⊗ ∞ 𝑥 𝑗 ⊗ ∞ · · · ⊗ ∞ 𝑥 𝑗 𝑛 𝑛 . Therefore, one can picture the tropical zero set as the union of intersections of hyper-planes deﬁned by the tropical monomials. In other words, the tropical zero set deﬁnedby such a polynomial is given by the corner locus, that is where the tropical polynomial(4) is not locally aﬃne-linear.

In order to describe the tropical degeneration of the rectiﬁed linear unit (ReLU) 𝜎 ( x ) = max ( , 𝑏 + 𝑛 Õ 𝑖 = 𝑎 𝑖 𝑥 𝑖 ) where x = ( 𝑥 , . . . , 𝑥 𝑛 ) ∈ R 𝑛 , (5)we consider the log-log plot of this activation function as family with respect to aparameter ℏ ∈ [ 𝑒, ∞) . The transition to the log paper corresponds to the change ofcoordinates: 𝑣 ℏ = log ℏ ( 𝑦 ) , and 𝑢 𝑖 = log ℏ ( 𝑥 𝑖 ) Then, we simply obtain 𝑣 ℏ = log ℏ ( 𝑛 Õ 𝑖 = ℏ 𝛼 𝑖 ℏ 𝑢 𝑖 + ℏ 𝛽 ) where 𝑏 = ℏ 𝛽 and 𝑎 𝑖 = ℏ 𝛼 𝑖 , 𝑖 = , . . . , 𝑛 . In its domain, the tropical limit ℏ → ∞ of thislog-log graph of the function ReLU becomes 𝜎 𝑡𝑟 : = lim ℏ →∞ 𝑣 ℏ = max { 𝑏, max 𝑖 = ,...,𝑘 { 𝑎 𝑖 + 𝑥 𝑖 }} . (6)In other words, the tropical degeneration of ReLU is another ReLU in an appropriatelydeﬁned domain. Multi-layered neural networks are used for approximating an unknown function de-scribed by a sample of points in a large aﬃne space R 𝑛 called a data set . The theoretical6nderpinnings of such an approximation goes as far back as Kolmogorov [Sp64, Cy89,H91]. We know that for a given bounded, and monotonically-increasing continuousfunction 𝜎 and a compact domain Ω ⊆ R 𝑛 , any continuous function 𝑓 : Ω → R can beapproximated by ﬁnite linear sums of the form 𝐹 ( 𝑥 ) = 𝑁 Õ 𝑖 = 𝛽 𝑖 𝜎 ( 𝑏 𝑖 + 𝑛 Õ 𝑗 = 𝑎 𝑖 𝑗 𝑥 𝑗 ) . (7)As a result, the neural networks with diﬀerent activation function such as the logisticfunction, arctan, tanh, SoftPlus etc., are often thought as the universal approximator ofcontinuous functions [Ro96, DBBNG].We view neural networks as a concrete computational manifestation of this UniversalApproximation Theorem where 𝜎 plays the role of the activation function in a neuralnetwork. Our focus in this paper is to develop better computational methods in achievingsuch approximations. The rectiﬁed linear unit (ReLU) we deﬁned in (5) also lies in this class of functions thatcan be used for approximations of continuous and the piecewise smooth functions onany compact domain in R 𝑛 . As we observe in (6) that ReLU can also be represented byusing tropical unit on compact domains. Thus, we can simply state that the ReLu andthe tropical units 𝜎 𝑡𝑟 in (6) are equivalent from the perspective of aproximation theory: Proposition 3.5.1

Multilayered feedforward neural networks using tropical unit 𝜎 𝑡𝑟 in (6) as the activation function can give arbitrarily close approximations for anycontinuous and piecewise smooth function on any compact domain in R 𝑛 . For the deep ReLU networks, these approximations work as eﬀectively. For detailssee [PV17, Ya16].

Consider a smooth map 𝑓 : R 𝑛 + → R deﬁned over the cone R 𝑛 + = ( , ∞) 𝑛 , and considerthe zero locus 𝑍 𝑓 : = { 𝑥 ∈ R 𝑛 | 𝑓 ( 𝑥 ) = } of 𝑓 . Let us deﬁne a map 𝐹 : R 𝑛 + \ 𝑍 𝑓 → 𝐻 ( R 𝑛 + \ 𝑍 𝑓 ) from the complement of 𝑍 𝑓 to the set of connected components of R 𝑛 + \ 𝑍 𝑓 . The map 𝐹 simply sends each elements 𝑥 ∈ R 𝑛 + \ 𝑍 𝑓 to their homology classes [ 𝑥 ] ∈ 𝐻 ( R 𝑛 + \ 𝑍 𝑓 ) ,i.e. the connected component of R 𝑛 + \ 𝑍 𝑓 containing 𝑥 . For basics of (co)homologytheory, see [BT95].Now, given an arbitrary ﬁnite data set (or a ﬁnite set of compact subsets) Ω in R 𝑛 + ,we can reformulate a classiﬁcation problem 𝑁 : Ω → { , · · · 𝑘 } (8)7s a problem of ﬁnding a smooth map 𝑓 satisfying the following property: for 𝑥, 𝑦 ∈ Ω , 𝑁 ( 𝑥 ) = 𝑁 ( 𝑦 ) ⇐⇒ 𝐹 ( 𝑥 ) = 𝐹 ( 𝑦 ) . Clearly, this reformulation requires that R 𝑛 + \ 𝑍 𝑓 has at least 𝑘 connected componentsso that 𝐹 can realise the same classiﬁcation problem (8). Let 𝑓 be a function which realises the classiﬁcation problem in (8), and let { 𝑓 𝑖 | 𝑖 = , . . . , ∞} be a sequence of deep ReLU networks coming from the backpropagationalgorithm in (1) lim 𝑘 →∞ 𝑓 𝑘 = 𝑓 . Each function 𝑓 𝑘 is piecewise linear and continuous as they are realised by multilayeredReLU networks. See Proposition 3.5.1.The corner locus of each 𝑓 𝑘 is mapped into the corner locus of the tropical limit 𝑓 𝑡𝑟𝑘 = lim ℏ →∞ 𝑓 𝑘 . In addition to the tropical images of these existing corner locus, thetropical limit of 𝑓 𝑘 form additional corner locus which is the tropical limit of the zerolocus 𝑍 𝑓 𝑘 = { 𝑓 𝑘 = } ⊂ R 𝑛 + under tropical degeneration. This new corner locus 𝑍 𝑡𝑟𝑓 𝑘 isthe tropical zero set of 𝑓 𝑡𝑟𝑘 deﬁned by lim ℏ →∞ 𝑓 𝑘 . Lemma 4.2.1

The zero locus 𝑍 𝑓 𝑘 of 𝑓 𝑘 is homeomorphic to the tropical set 𝑍 𝑡𝑟𝑓 𝑘 . This statement follows from the fact that the mapLog ℏ : R 𝑛 + → R 𝑛 : ( 𝑥 , . . . , 𝑥 𝑛 ) ↦→ ( log ℏ ( 𝑥 ) , . . . , log ℏ ( 𝑥 𝑛 )) (9)is a diﬀeomorphism for all ℏ < ∞ . Then the image Log ℏ ( 𝑍 𝑓 𝑘 ) is homeomorphic to 𝑍 𝑓 𝑘 for any ﬁnite ℏ . For the tropical limit ℏ → ∞ , we use the factlim ℏ →∞ 𝑑𝑑 ℏ Log ℏ = 𝑍 𝑡𝑟𝑓 𝑘 is homeomorphic to Log ℏ ( 𝑍 𝑓 𝑘 ) for suﬃciently large ℏ .We note that this statement is in fact a special case of Viro’s theorem [Vi06]. Viroobserved that tropical degeneration preserves the topology of real algebraic varieties.He developed a method, known as Viro patchworking , that combinatorially constructsany real algebraic variety with a prescribed topology. We believe that Viro’s method inits general form can also be directly used in classiﬁcation problems in (8).

Corollary 4.2.2

The classiﬁcation 𝑁 : Ω → { , · · · 𝑘 } in (8) can be realised by thetropical set 𝑓 𝑡𝑟 = lim 𝑘 →∞ 𝑓 𝑡𝑟𝑘 as the tropical set that 𝑓 𝑡𝑟 deﬁnes is homeomorphic to 𝑍 𝑓 . ℏ (9) to each stepof the backpropagation algorithm given in (1) to deﬁne the tropical version of thebackpropagation algorithm. This is done by taking tropical image of each entry of thematrices, and then replacing the matrix addition and multiplication by their respectivetropical arithmetic operations ⊕ ∞ and ⊗ ∞ in (2): Let A = [ 𝑎 𝑖 𝑗 ] and B = [ 𝐵 𝑖 𝑗 ] be 𝑛 × 𝑚 matrices. The tropical matrix sum , A ⊕ ∞ B , is then obtained by evaluating the tropicalsum of the corresponding entries, ( A ⊕ ∞ B ) 𝑖 𝑗 : = 𝑎 𝑖 𝑗 ⊕ ∞ 𝑏 𝑖 𝑗 = max { 𝑎 𝑖 𝑗 , 𝑏 𝑖 𝑗 } . The tropical multiplication A ⊗ ∞ B of two matrices A = [ 𝑎 𝑖 𝑗 ] ∈ R 𝑚 × 𝑛 and B = [ 𝑏 𝑖 𝑗 ] ∈ R 𝑛 × 𝑝 is the given by the matrix C = [ 𝑐 𝑖 𝑗 ] ∈ R 𝑚 × 𝑝 with entries 𝑐 𝑖 𝑗 : = ⊕ ∞ ( 𝑎 𝑖𝑘 ⊗ ∞ 𝑏 𝑘 𝑗 ) = max 𝑘 { 𝑎 𝑖𝑘 + 𝑏 𝑘 𝑗 } . By using the error term (1) and tropical linear algebra deﬁned above, we formulatetropical gradient descent iteration as follows W ( 𝑘 ) new = W ( 𝑘 ) ⊕ ∞ Δ W ( 𝑘 ) = W ( 𝑘 ) ⊕ ∞ − 𝜖 (cid:16) D ( 𝑘 ) ⊗ ∞ W ( 𝑘 + ) ⊗ ∞ · · · ⊗ ∞ D ( 𝑙 ) ⊗ ∞ W ( 𝑙 + ) ⊗ ∞ e (cid:17) . (10) Corollary 4.2.3

The tropical backpropagation algorithm in (10) has a lower algo-rithmic complexity relative to the vanilla backpropagation algorithm.

This statement follows from the construction of tropical backpropagation whichsimply eliminates the complexities of multiplications.

Remark 4.2.4

On a machine with a ﬁxed register size (the number of bits availableto represent a number), the improvements we get by swapping the ring of reals withthe tropical semiring of reals will only eﬀect the the constant factor of the complexityof matrix operations. On a classical machine with 128-bit registers, the complexity ofordinary multiplication of two 𝑛 × 𝑛 matrices is 𝑂 ( 𝑛 ) with constant factor of whilethe same multiplication on a tropical machine with the same size registers will takeagain 𝑂 ( 𝑛 ) time but with the constant factor of . Remark 4.2.5

The tropical linear algebra is eﬃcient and lowers the algorithmic com-plexity of large matrix operations, and therefore, ﬁts well with the backpropagation andmay be used in other applications. However, there are also serious limitations as itdoes not admit matrix inversion due to its idempotent nature.

In this note, we deﬁned the tropical limit of the ReLU function which is used in manyneural network models. We also showed that the multilayered feedforward neuralnetworks using this tropical unit carry the properties of the universal approximator.With further analysis, we established that the topology of the zero loci of functions9ealised by the multilayered neural networks do not change when their tropical limit isconsidered. This observation allowed us to tropicalize the backpropagation algorithmsolving any classiﬁcation problem. The tropical backpropagation algorithm is simplyobtained from the classical backpropagation algorithm in (1) and replacing the matrixaddition and multiplication with their tropical analogues based on the tropical arithmeticoperations ⊕ ∞ and ⊗ ∞ in (2).As the tropical backpropagation algorithm works over the tropical semiring,it comeswith a considerably algorithmic advantages with almost no drawback: • Tropicalization reduces the algorithmic complexity signiﬁcantly by eliminatingmultiplications. • The performance gain out of tropicalization does not come with a cost of sub-stantial change in the existing code, since it only requires a swaping an ordinarylinear algebra library with an appropriate tropical linear algebra library.

Acknowledgments

I wish to thank to Grisha Mikhalkin, not only for introducing me to the tropical geometryover the years, but also for inviting me over to Geneva (in various occasions) andlistening my half baked ideas with Ilia Itenberg patiently. I also thank to Yiğit Gündüçwho introduced me to the concepts of computational complexity decades earlier.I am grateful to Atabey Kaygun, for being an unwearying friend and collaborator.Main idea of this note appeared ﬁrst during a discussion with Atabey, and he read andcommented on all versions of it, i.e., he is a secret author of this paper. Saying that, allmistakes are mine, mine only.

References [BT95] R. Bott, L.W. Tu,

Diﬀerential Forms in Algebraic Topology.

Graduate Texts inMathematics, Springer, 1995.[Cy89] G. Cybenko,

Approximations by superpositions of sigmoidal functions.

Mathe-matics of Control, Signals, and Systems (1989), 2 (4),303–314.[DBBNG] C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, R. Garcia,

IncorporatingSecond-Order Functional Knowledge for Better Option Pricing.

Advances in NeuralInformation Processing Systems 13, MIT Press, 2001, 472–478.[HMD15] S. Han, H. Mao, W.J. Dally,

Deep Compression: Compressing DeepNeural Networks with Pruning, Trained Quantization and Huﬀman Coding. https://arxiv.org/abs/1510.00149 [H91] K. Hornik,

Approximation Capabilities of Multilayer Feedforward Networks.

Neural Networks, 4(2) (1991), 251–257.[Hor14] M. Horowitz,

Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014.10IMS] I. Itenberg, G. Mikhalkin, E.I. Shustin,

Tropical Algebraic Geometry.

Oberwol-fach Series Birkhäuser; 2 edition (2009).[Ka95] A.A. Karatsuba,

The Complexity of Computations . Proceedings of the SteklovInstitute of Mathematics. 211 (1995): 16–183.[LLPS] M. Leshno, V.Ya. Lin, A. Pinkus, S. Schocken,

Multilayer Feedforward Net-works With a Nonpolynomial Activation Function Can Approximate Any Function.

Neural Networks, Vol. 6, pp. 861–867, 1993.[Ga12] F. Le Gall,

Faster algorithms for rectangular matrix multiplication , Proceed-ings of the 53rd Annual IEEE Symposium on Foundations of Computer Science(FOCS 2012), pp. 514–523.[LKDSG] H. Li, A. Kadav, I. Durdanovic, H. Samet, H.P. Graf,

Pruning Filters forEﬃcient ConvNets. https://arxiv.org/abs/1608.08710 [PV17] P. Petersen, F. Voigtlaender,

Optimal approximation ofpiecewise smooth functions using deep ReLU neural networks. https://arxiv.org/abs/1709.05289v3 [Ro96] R. Rojas,

Neural Networks: A Systematic Introduction.

Springer, 1996.[Ru16] S. Ruder,

An overview of gradient descent optimization algorithms. https://arxiv.org/pdf/1609.04747.pdf .[Sp64] D. Sprecher,

On the Structure of Continuous Functions of Several Variables ,Transactions of the American Mathematical Society, Vol. 115 (1964), pp. 340–355.[TF97] G. Thimm, E. Fissler,

Pruning of neural networks. http://publications.idiap.ch/downloads/reports/1997/rr97-03.pdf [Vi06] O. Viro,

Patchworking Real Algebraic Varieties. https://arxiv.org/pdf/math/0611382.pdf [Vi01] O. Viro,

Dequantization of Real Algebraic Geometry on a Logarithmic Paper.

Proceedings of the 3rd European Congress of Mathematicians, Birkhäuser, Progressin Math, 201, (2001), 135–146.[Ya16] D. Yarotsky,

Error bounds for approximations with deep ReLU networks. https://arxiv.org/abs/1610.01145 [YCS16] T.J. Yang, Y.H. Chen, V. Sze,

Designing Energy-EﬃcientConvolutional Neural Networks using Energy-Aware Pruning. https://arxiv.org/abs/1611.05128https://arxiv.org/abs/1611.05128