Maximum Entropy Principle underlying the dynamics of automobile sales
A. Hernando, D. Villuendas, M. Sulc, R. Hernando, R. Seoane, A. Plastino
MMaximum Entropy Principle underlying the dynamics of automobile sales
D. Villuendas, A. Hernando,
2, 3, ∗ M. Sulc, R. Hernando, R. Seoane, and A. Plastino
5, 6 Data Science Management and Market Intelligence, SEAT, S.A., Martorell, Spain Social Thermodynamics Applied Research (SThAR SA), EPFL Innovation Park Bˆat. C, Lausanne, Switzerland Departament FQA, Facultat de F´ısica, Universitat de Barcelona, Barcelona, Spain Centres Cient´ıfics i Tecnol`ogics (CCiT), Universitat de Barcelona, Barcelona, Spain Instituto de Fsica La Plata-CCT-CONICET, Universidad Nacional de La Plata, La Plata, Argentina Physics Department and IFISC-CSIC, University of the Balearic Islands, Palma de Mallorca, Spain
We analyze an exhaustive data-set of new-cars monthly sales. The set refers to 10 years ofSpanish sales of more than 6500 different car model configurations and a total of 10M sold cars,from January 2007 to January 2017. We find that for those model configurations with a monthlymarket-share higher than 0.1% the sales become scalable obeying Gibrat’s law of proportional growthunder logistic dynamics. Remarkably, the distribution of total sales follows the predictions of theMaximum Entropy Principle for systems subject to proportional growth in dynamical equilibrium.We also encounter that the associated dynamics are non-Markovian, i.e., the system has a decayingmemory or inertia of about 5 years. Thus, car sales are predictable within a certain time-period.We show that the main characteristics of the dynamics can be described via a construct based uponthe Langevin equation. This construct encompasses the fundamental principles that any predictivemodel on car sales should obey.
The automobile industry is experiencing a deep eco-nomic and technological change. It is moving from (i)a fossil-fueled, private-ownership based, manually-drivento (ii) an electric, sharing based, driver-less one[1, 2].Understanding the supply-and-demand dynamics of au-tomobile sales could help to achieve a smooth economictransition between (i) and (ii). Indeed, 88.1 million ofboth cars and light commercial vehicles were sold world-wide in 2016[3], involving billions of dollars per year. Un-planned circumstances or forecast failures could generateimmense loses in this market and its ties, as previouslyseen in past crisis and economic transitions as that after2007[4].Thanks to the rise of both Big Data and digital toolswe have today access to exhaustive databases that allowus to analyze socioeconomic systems using ideas of statis-tical physics [5–7]. Examples run from allometric laws oncity populations[8–10], evolution of firm sizes[11], trans-portation networks[12, 13] and human mobility[14], topopularity of digital products[15], the structure of theInternet[16] or even diffusion of memes[17]. In such anenvironment we analyze here the sales of new commer-cial vehicles in Spain from January 2007 to January 2017.The corresponding exhaustive database is published bythe Spanish Directorate-General of Traffic (DGT) andcontains registrations of more than 6 500 different con-figurations (model + body shape + engine) and morethan 10 million sold cars[18]. To analyze the evolution ofcar sales, we apply, in particular, the procedure used inRefs. 11, 23–25 on the aggregated data of monthly regis-trations (or sales) per each car configuration (model) asprovided by Ref. [19]. The procedure is summarized asfollows:1. We first identify the main dynamical variables. In ourcase, the variable is the total number of cars sold for the i -th automobile model at time t , x i ( t ).2. We compare x i ( t ) versus its time derivative ˙ x i ( t ) soas to find indications of any possible scaling rule in theunderlying equation of motion, of the form˙ x i ( t ) = v i ( t )[ x i ( t )] q , (1)where v i ( t ) is the growth rate at time t and q is theexponent that parameterizes the dynamics. Due to thestochastic nature of the growth rates, it is convenient tocompare x i with the variance of ˙ x i , obtained from allthose i ’s endowed with a similar value of x . Thus, we fitthe variance to an expression of the form Var[ ˙ x ] = T q x q ,where T q is used to measure the size of the fluctuations,that might be called a “temperature” [27, 28]. If onefinds contributions of several independent componentswith the form of Eq. (1) with different exponents —as˙ x i ( t ) = (cid:80) q v q,i ( t )[ x i ( t )] q — we will define a temperatureassociated to each term from the fit Var[ ˙ x ] = (cid:80) q T q x q .3. Next, we independently analyze the density distribu-tion of total car sales per model p ( x ) dx . The principle ofMaximum Entropy (MaxEnt) with dynamical informa-tion states that the natural variable for measuring theentropy is the one that linearizes the equation of mo-tion, transforming the underlying symmetry into a trans-lational one[22]. For an equation of motion as Eq. (1),this is achieved via the Tsallis logarithm and its inversefunction the Tsallis exponential[20] defining the new vari-able u = log q ( x/x ) —or x = x exp q ( u )— where x isany value of reference (needed to keep the arguments di-mensionless). After this transformation, the equation ofmotion becomes ˙ u i ( t ) = v i ( t ), which is indeed transla-tionally invariant. If the system is in equilibrium, themost probable distribution is the one that maximizes theentropy under the system’s observable constraints. If the a r X i v : . [ phy s i c s . s o c - ph ] M a y Year ( t ) S a l e s ( ) M o d e l s S o l d c a r s ( ) FIG. 1: Top: Evolution of car sales for a sample of car-models of the database. Saturation (as that described bythe logistic equation) is observed. Middle: Evolution of thenumber of different models available per month during theten years recorded in the database. Bottom: Monthly newcar sales in Spain, with a total of 10 062 193 sales during thelast ten years. form of the empirical distribution fits the form predictedby MaxEnt, we can perform an independent quantitativemeasure of q directly from the equilibrium distribution.4. If the value of q one gets from the equation of mo-tion, and the value of q measured from the distribution-fitagree, we assume that our procedure is correct, provingthat the system is in dynamical equilibrium and obeysthe Maximum Entropy Principle.5. As an additional test for our procedure, we alsoperform numerical experiments by defining microscopicequations of motion equivalent to those observed in theempirical system. We then analyze the evolution of thesimulated system following the same steps as we tra-versed for the empirical one. If we obtain equivalent re-sults, we would reconfirm that our procedure is indeedcorrect.As stated in step 1, our relevant variable is the totalof car sales x i for the i th automobile model. We dis-play in Fig. 1 a sample of some trajectories for the tenyears covered by our dataset, where a logistic profile isclearly visible (i.e., growth with saturation). Saturationoccurs when the popularity of a model drops because ofthe availability of a newer version, or if a competitor ad-vances on the same market niche. We also show in Fig. 1the evolution of the number of different models sold permonth during the ten years included in the dataset.
10 100 10 Sold cars ( x ) G r o w t h v a r i a n c e ( V a r [ x ’ ]) ˙ G r o w t h v a r i a n c e ( V a r [ x ]) ˙ FIG. 2: Top: Empirical variance of the sales’ growth versustotal car sales (gray points). We compare the moving average(red line) with the analytical fit to Eq. (1) (black line). Wefind three regimes with different exponents. Bottom: same astop, but with growth corrected by the logistic equation (seetext), that fits to the form of Eq. (3).
When we compare x vs. Var[ ˙ x ] so as to fit Eq. (1), wefind what looks like a three-component equation of mo-tion, with an exponent i) for low sales of q = 0 . ± . q = 0 . ± .
01, and iii) for high sales of q = 0 . ± .
01. R2 adopts a value of 0 . q derived from the equilibrium distribution willnot match these 3 values, as we will show later. However,paying attention to the high sales in Fig. 2 (arrow), weappreciate the existence of points with low variance. In-deed, here growth rates drop to such an extent that oneapproaches saturation and this would generate an under-estimation of the exponents. To get deeper insights intothe actual underlying dynamics, we need to correct foreach car model the effects of saturation. To this aim, weneed to reconsider the form of the logistic equation[15]˙ x i ( t ) = v i ( t ) x i ( t ) [1 − x i ( t ) /X i ] , (2)where X i is the final number of total car sales. Ifwe reconsider the trajectories by first defining ˙ x i (cid:48) ( t ) =˙ x i ( t ) / [1 − x i ( t ) /X i ], we should recover an expression ofthe form of Eq. (1) as ˙ x i (cid:48) = v i x i that corrects growth -3 Monthly market share (%)
20 10 S c a l ab ili t y ( % )
100 10 -2 FIG. 3: Scalability (defined as the contribution of the pro-portional term to the total growth in Eq. (3)) as a function ofthe monthly market share: with 0.005% of the market share,the proportional term starts to be dominant, reaching around90% of the total contribution with 0.03% of monthly sales anda 95% with 0.1% of the market (dashed blue lines and shadedareas). The average maximum market share for a single modelis around 2% (dashed gray line). rates for the effects of saturation. Comparing in this way x vs. Var[ ˙ x (cid:48) ], we obtain the curve displayed in Fig. 2,which nicely describes a three-terms function: one hasVar[ ˙ x (cid:48) ] = T x q + T / x + T , with q = 1 . ± . . T ) = − . ± . T / ) = 1 . ± .
06, and log( T ) = 5 . ± . x (cid:48) i ( t ) = v ,i ( t ) x i ( t ) + v / ,i ( t ) (cid:112) x i ( t ) + v ,i ( t ) . (3)These three components have been already observedfor firm sizes[11], city populations[24] and are well-understood. The first term with q = 1 is Gibrat’s lawof proportional growth[21], which is expected for mul-tiplicative processes. Such behavior indicates that thepopularity of an automobile model grows as more cars aresold, following a rich-get-richer mechanism. The secondterm, with exponent q = 1 /
2, emerges from the propor-tional growth as a side effect of the central limit theorem(or finite-size effects), as shown in Ref. 25. It has beenshown that this component is characterized by uncorre-lated noise and no extra physics is expected. Finally, thelast term in Eq. (3) correspond to linear forces ( q = 0),independent of the value of x . This last term becomesrelevant only for small number of sales.In our procedure we will focus only on the proportionalregime, where the most interesting physics takes place.However, one may ask: which is the critical number ofsold cars so as to consider that the pertinent trajectorywould be accommodated by the proportional regime?A first estimation can be obtained via the overlap be-tween the first and second terms of Eq. (3), obtaining x = T / /T = 129 cars. At this critical value we haveequal contributions from each of the two terms. It is then Time interval (years) T i m e c o rr e l a t i o n C u m u l a t i v e ( P ( x ) / N ) Sold Cars ( x ) FIG. 4: Left: Time correlation of the growth rates as a func-tion of the time interval. A clear decay is found that becomesslightly negative after 5 years. Thus, predictability is possible,but for no longer than 5 years. Right: Empirical cumulativedistribution of total sold cars per model (gray dots) comparedwith the MaxEnt prediction [see Eq. (8)] (black line). natural to ask when we can consider that the dynam-ics is fully governed by proportional or scalable growth.We can combine our two previous fits to compare themarket share or relative number of sold cars per month χ = ˙ x/ (cid:80) j ˙ x j with the relative contribution of the pro-portional term to the total growth, that we call scala-bility : κ ( x ) = (cid:112) T x / Var[ ˙ x i (cid:48) ]. As seen in Fig. 3, anautomobile model is κ = 95% into the scalable growthregime when it reaches a χ = 0 .
1% of monthly relativesales or market-share.Another relevant question regarding growth rates be-comes legitimate at this point. It is crucial for pre-dictability: Are car sales a non-Markovian process? Inother words, are the sales of today correlated with thoseof yesterday, and thus predictable? Buying a car is anindividual decision, undertaken according to a plethoraof various considerations. We are looking at a complexdecision-process that takes its time and relies on deci-sions taken previously by other individuals. As we didfor cities’ populations in Refs. 10, 26 which unraveled theexistence of a “memory” in cities, we have attempted todefine here the putative system’s memory as the timecorrelation c ( τ ) = Cor[ ˙ x ( t ) , ˙ x ( t + τ )], where τ is the cor-relation time interval. We find an exponential-like decay,with slightly negative values after τ = 5 years as shown inFig. 4. This in turn establishes the limits of predictabil-ity: no accurate prediction on sales can be done for longerthan that time-period.We continue now with the step 3 outlined on page 1.For maximizing the entropy S , it was shown in Ref. 15that the only observable and objective constraints forsystems with logistic growth are, due to the limitation inunits of resources, the total available units (in our case,total number of sold cars X ) and the number of elementssharing these units (for us, the number of available au-tomobile models N ). Considering only those car-modelsthe sales of which exceeded x = 130 units in the tenyears (the limit of proportional growth as derived be-fore) we have X = 9 957 537 and N = 3 084. FollowingRefs. 11, 23–25, we write our thermodynamic potentialas: Ω = − T S − µN + Λ X (4)where T , µ , and Λ are the concomitant Lagrange mul-tipliers for the general variational problem of the ther-modynamic potential Ω( S, N, X ). For a general equa-tion of motion with arbitrary q , we assume that an un-derlying probability density p ( u ) governs our process,where u = log q ( x/x ), N = (cid:82) ∞ dup ( u ) and X = (cid:82) ∞ dup ( u ) exp q ( u ). We explicitly cast the MaxEnt prob-lem as a p − variational one:0 = δ p Ω = δ p (cid:90) ∞ dup ( u ) (cid:8) T log[ p ( u ) /N ] − µ − Λ exp q ( u ) (cid:9) (5)where δ p represent variations with respect to p ( u ). Wefind the solution p ( u ) = zN exp[ − Λ ∗ exp q ( u )] which interms of the variable x is a power law with exponentialcut-off: p ( x ) = zN exp( − Λ ∗ x/x ) x q , (6)where Λ ∗ = Λ /T and z = exp( µ/T ). The first constraintnormalizes p ( u ) to the number of car models N , and thesecond to the total car sales, yielding the equations ofstate: z = ( x Λ ∗ ) − q / Γ(1 − q, Λ ∗ ) , (7a) x N/X = Λ ∗ Γ(1 − q, Λ ∗ ) / Γ(2 − q, Λ ∗ ) . (7b)The cumulative distribution can be written as P ( x ) /N = 1 − Γ(1 − q, Λ ∗ x/x )Γ(1 − q, Λ ∗ ) . (8)Passing now to step 4 of page 1, we show in Fig. 4the comparison of the MaxEnt prediction with the em-pirical cumulative distribution of total car sales. Wehave considered only sizes larger than x = 130 andfit Eq. (8) via log( x ), q , and log(Λ ∗ ) (logarithms areused for numerical stability). We find a remarkable fitwith log( x ) = 4 . ± . q = 1 . ± . ∗ ) = − . ± . . − . q , X and N ) are0 . . the dynamics of car sales are very close toequilibrium and the distribution of total sales obeys theMaximum Entropy Principle for scale-free systems .We finally proceed to step 5 on page 2 by defining theequations needed for performing microscopic simulationsof the empiric system. For writing these equations, weneed the following considerations: L /T L * -2 -4 -4 -3 -5 -3 -2 C u m u l a t i v e ( P ( x ) / N ) x FIG. 5: Left: Λ ∗ as measured from the equilibrium distribu-tion versus the value of Λ /T used in the simulations. Eachdot is an independent realization using the range of values forΛ and T η described in the text. Colors represent realizationswith the same value of ∆ t .
1. Geometric Brownian walkers are known to obeyGibrat’s law and reproduce states of dynamical equilib-rium following entropic laws. However, Brownian walk-ers are Markovian. We will use instead a Langeving-likeequation with a viscous term[27, 28], which is known toreproduce time correlations with exponential decay.2. The constraints in the thermodynamic potential ofEq. (4) are introduced as forces in the Langevin equation,in analogy to what is done in molecular simulations.3. In addition to the proportional growth q = 1, themodel should include linear forces ( q = 0) and finite sizefluctuations ( q = 1 / v ,i ( t ) = η i ( t ) − Λ x i ( t ) − γv ,i ( t ) , (9)˙ x i ( t ) = v ,i ( t ) x i ( t ) + v / ,i ( t ) (cid:112) x i ( t ) + v ,i ( t ) ∨ , where γ is the dumping or viscosity which controlsthe inertia, and the thermal bath obeys (cid:104) η i ( t ) η j ( t (cid:48) ) (cid:105) =2 T η γ δ ij δ ( t − t (cid:48) ) with T η defining its temperature. Theother two sources of noise are described as White Noisewith (cid:104) v / ,i ( t ) v / ,j ( t (cid:48) ) (cid:105) = 2 T / / ∆ t δ ij δ ( t − t (cid:48) ), and (cid:104) v ,i ( t ) v ,j ( t (cid:48) ) (cid:105) = 2 T / ∆ t δ ij δ ( t − t (cid:48) ) with T / , and T their respective temperatures. Here, δ is the Dirac’sDelta and ∆ t is the discretized interval of time used tonumerically solve our equations. Finally, the symbol ∨ indicates choosing the maximum between the quantityon the left and on the right, preventing negative valuesof the growth ˙ x i .Since our aim is to prove that the entropic procedureproperly predicts the equilibrium distribution in the pro-portional regime, we have reduced Eqs. (9) for the sakeof simplicity by removing the finite size term q = 1 / γ = 1 / ∆ t . (For the interested reader, a fullexhaustive exercise solving and exploring Eqs. (9) will bepublished elsewhere.)We have solved the simplified version of Eqs. (9) for therange of values 0 . ≤ T η ≤
500 and 10 − ≤ Λ ≤ . N = 3000, T = 100, and intervals of times in a rangeof 10 − ≤ ∆ t ≤ − , starting at an initial configurationwhere every walker has x (0) = 1. Eventually, the growthrate of the walker becomes zero due to the force inducedby the Λ term, reaching saturation. So as to keep thenumber of active walkers constant, we add a new oneat x = 1 every time an older walker saturates. Aftersome time-intervals have elapsed, the system reaches athermodynamical equilibrium.We study the scalable growth by selecting only walk-ers with values x i ( t ) >
100 and measuring the effec-tive temperature as the variance of the growth rates T = Var[ ˙ x/x ] at equilibrium. We find that the sys-tem’s temperature and that of the thermal bath obeylog( T ) = (0 . ± .
06) + (0 . ± .
02) log( T η ). The equi-librium distribution p ( x ) dx fits the form of the MaxEntprediction in Eq. (8): when fitting via q and Λ ∗ , we find q = 1 . ± .
01 and log(Λ ∗ ) = (3 . ± .
2) + log(Λ /T )—independently of the other parameters ∆ t and T , seeFig. 5— confirming the correctness of our entropic pro-cedure. We show in Fig. 5 the cumulative distribu-tion for a simulation using the empirical temperature T = exp( − .
41) ( T η = 0 . . × − ,obtaining Λ ∗ = 0 . ∗ Rev. Mod. Phys. Nonextensive Entropy:Interdisciplinary applications , Oxford University Press,Oxford, 2004; C. Tsallis,
Introduction to NonextensiveStatistical Mechanics: Approaching a Complex World ,Springer, New York, 2009.[21] H. Rozenfeld, et al.,
Proc. Nat. Acad. Sci. , 18702(2008).[22] A. Hernando, A. Plastino, A.R. Plastino, Eur. Phys. J.B 85, 147 (2012)[23] A. Hernando, A. Plastino, Eur. Phys. J. B 85, 293 (2012).[24] A. Hernando, R. Hernando, A. Plastino, A.R. Plastino,J. R. Soc. Interface 10, 20120758 (2013).[25] A. Hernando and A. Plastino, Phys. Rev. E 86, 066105(2012).[26] A. Hernando, R. Hernando, A. Plastino, J. R. Soc. Inter-, 18702(2008).[22] A. Hernando, A. Plastino, A.R. Plastino, Eur. Phys. J.B 85, 147 (2012)[23] A. Hernando, A. Plastino, Eur. Phys. J. B 85, 293 (2012).[24] A. Hernando, R. Hernando, A. Plastino, A.R. Plastino,J. R. Soc. Interface 10, 20120758 (2013).[25] A. Hernando and A. Plastino, Phys. Rev. E 86, 066105(2012).[26] A. Hernando, R. Hernando, A. Plastino, J. R. Soc. Inter-