Abstract

Recently a measure for Economic Complexity named ECI+ has been proposed by Albeaik et al. We like the ECI+ algorithm because it is mathematically identical to the Fitness algorithm, the measure for Economic Complexity we introduced in 2012. We demonstrate that the mathematical structure of ECI+ is strictly equivalent to that of Fitness (up to normalization and rescaling). We then show how the claims of Albeaik et al. about the ability of Fitness to describe the Economic Complexity of a country are incorrect. Finally, we hypothesize how the wrong results reported by these authors could have been obtained by not iterating the algorithm.

Full PDF

WWhy we like the ECI+ algorithm

Andrea Gabrielli , Matthieu Cristelli , Dario Mazzilli , AndreaTacchella , Andrea Zaccaria , and Luciano Pietronero ∗ Institute for Complex Systems, CNR Department of Physics, Sapienza University of RomeAugust 4, 2017

Abstract

Recently a measure for Economic Complexity named ECI+ has beenproposed by Albeaik et al. We like the ECI+ algorithm because it is math-ematically identical to the Fitness algorithm, the measure for EconomicComplexity we introduced in 2012. We demonstrate that the mathemat-ical structure of ECI+ is strictly equivalent to that of Fitness (up tonormalization and rescaling). We then show how the claims of Albeaiket al. about the ability of Fitness to describe the Economic Complexityof a country are incorrect. Finally we hypothesize how the wrong resultsreported by these authors could have been obtained by not iterating thealgorithm.

Let us call X cp the extensive export matrix giving, in a ﬁxed year, theexport expressed in dollars of the product p by the country c . By itsdeﬁnition we have that X c = (cid:88) p X cp is the total export of country c in that year. Analogously the quantity X p = (cid:88) c X cp gives the total amount (in dollars) of product p exported in the same yearby all countries. Finally, we call X = (cid:88) cp X cp ∗ [email protected] a r X i v : . [ q -f i n . E C ] A ug he total world export of the considered year.Let us now recall the fundamental equations of the algorithm [1, 2] fromwhich it is possible to compute the Fitness of countries and the Complex-ity of products from the COMTRADE data of international export.In this case the export matrix is binarized by using the Revealed Compar-ative Advantage (RCA) criterion. The RCA of a country c on a product p is deﬁned as RCA cp = X cp /X c X p /X which can be read as the ratio between the share of product p in theexport basket of country c and the share of the same product in the totalworld export (or equivalently as the ratio between the share of the country c in the total export of product p and the share of the same country inthe total world export). RCA cp is generally considered to give a measureof how “good” is a country c in exporting (and therefore producing) aproduct p : if RCA cp > c is in average better than the restof the world to export p . Consequently, the criterion to introduce the binary export matrix M cp is simply the following: if RCA cp > M cp = 1, while if RCA cp ≤ M cp = 0.Through the matrix M cp we can deﬁne the ﬁtness-complexity algorithmrespectively for the ﬁtness F c of countries and complexity Q p of productsas [1, 2]  F ( N +1) c = (cid:80) p M cp Q ( N ) p Q ( N +1) p = (cid:80) c McpF ( N ) c (1)with the condition of normalizing at each step all F c ’s and Q p ’s by dividingat each iteration their values by the mean values respectively over all c and all p at the same iteration in order to avoid possible divergences dueto the hyperbolic nature of the second equation.We now show in few steps that ECI+ and PCI+ formulas deﬁned in [3] canbe simply seen as the version of Eqs. (1) where M cp is simply substitutedby the extensive matrix X cp (change that was already discussed in [2]).First, let us substitute the second equation of (1) at iteration N in theﬁrst one at iteration N + 1: F ( N +1) c = (cid:88) p M cp (cid:80) c (cid:48) M c (cid:48) p F (cid:48) ( N − c (2)If we now substitute X cp to M cp and rename F (2 N ) c = X Nc we get exactlyEq. (10) in [3]: X Nc = (cid:88) p X cp (cid:80) c (cid:48) X c (cid:48) p X (cid:48) N − c (3)In order to rank countries the authors of [3] propose a measure of compet-itiveness of countries, called ECI+, which, a part from the subtraction ofan iteration independent oﬀset , is given by log X ∞ c in strict analogy with This country-dependent oﬀset log (cid:80) c X cp X p can be seen as obtained by the same formula(3) for X Nc with X N − c = 1 for all c . Note that, diﬀerently from what written in [3], the hat was done for instance in [4, 5]. Analogously, if (i) we substitute theﬁrst equation of Eq. (1) at iteration N in the second one, (ii) substitute X cp to M cp in the same equation, and rename 1 /Q (2 N ) p = X Np we getexactly Eq. (13) of [3]: X Np = (cid:88) c X cp (cid:80) p (cid:48) X cp (cid:48) X (cid:48) N − p (4)The reciprocal algebraic relation between Q (2 N ) p and X Np is recovered inthe deﬁnition of the metrics called PCI+ in Eq. 16 in [3] as − log X ∞ p =log(1 /X ∞ p ), apart from the addition of another iteration independent oﬀ-set log X p .Similarly to the ﬁtness-complexity algorithm, both X Nc and X Np are nor-malized at each iteration by dividing by an appropriate mean of theirvalues respectively over all countries and all products in order to avoiddivergences due to the non-linear hyperbolic nature of the algorithm. Theauthors chose for this purpose the geometric mean, probably taking intoaccount the extensive nature of the matrix X cp . Given the equivalence of the algorithms, the claim reported in [3] thatcontinuous data can be used in ECI+ but not in Fitness is indeed extrav-agant. In the second part of [3] it is argued that the same algorithm workswell when it is named ECI+ but not when it is named Fitness.The solution of the puzzle is in the diﬀerent input data used. Clearly,the Fitness algorithm can be used with continuous, discrete, intensive,or extensive data, depending on the objective of the analysis, as alreadydiscussed in [2]. Albeaik et al. mix diﬀerent input data (extensive forECI+ and intensive for Fitness), and this is used as an erroneous evi-dence for an apparent diﬀerence in the algorithms. Moreover they statethat Fitness is strongly correlated with diversity, as it is obvious due to theexplicit sum over the products that a country exports. This sum howeveris weighted by the complexity of products, and this introduces residualsthat are strongly informative. However the diversity term is importantand cannot be disregarded, as it is a fundamental principle of EconomicComplexity. Being deﬁned in the same exact way, ECI+ also has a de-pendence on country size, which is trivially and explicitly removed bysubtracting the term log( (cid:80) p X cp X p ), which has a 0.97 correlation with X ∞ c .The Fitness measure as reported in [3] shows an anomalous ranking, insharp contrast with the established literature [2]. In order to investigatethis puzzle we reconstructed the input data used in that paper to the best argument of the logarithm is not the “average share that the country represents in the exportof a product”, but the sum of the shares of all countries in the export of the product p . We do not understand why the authors of [3] in the deﬁnition [log X p − log X ∞ p ] of PCI+have X ∞ p which is adimensional while X p is measured in dollars. Why does this metricschange if we measure export in euros instead of dollars? f our ability and ran both algorithms. The results are discussed in thefollowing.In [3] it is stated that ”Fitness Complexity ranks many Southern Europeancountries (such as Spain, Italy, and Portugal) at the top of the ranking,and also, provides very low complexity values for advanced East Asian andEuropean economics, such as South Korea, Switzerland, Finland, Japan,and Singapore”. First, it should be noted that country rankings obtainedwith Fitness have been published in [2], and ignored by Albeaik et al.,who report very diﬀerent results, obtained only by them to the best ofour knowledge. Indeed, the real Fitness shows that the top 5 countriesby Fitness in 2010 were Germany, China, Italy, Japan and United States;moreover, the 5 countries that are said to have ”very low complexity val-ues” are all in the top 20% of the ranking [2]. One should note thatthese datasets refer only to manufacturing, without taking into accountservices.Moreover, Albeaik et al. state that ”for the Fitness measure, the economyof Greece is ranked higher than that of Japan, Sweden, or China.” This isagain wrong and inconsistent with the results we published in [2], whereGreece is ranked 34th, while Japan, Sweden, and China are 4th, 14th, and2nd respectively. We point out that, given the strong weight the Fitnessmeasure gives to diversiﬁcation, it is really hard to believe a dataset existssuch as China is ranked below Greece.The strong diﬀerences between the rankings published in [2] and thosereported in [3] can not be explained by a diﬀerence in the starting datasetalone. We base our following analysis on the BACI dataset [6], whichwe ﬁlter following the prescriptions given in [3]. On such dataset we re-compute Fitness and ECI+ up to convergence, following the prescriptionsgiven in [7] (we give the number of the performed iterations for repro-ducibility: 200). Curiously enough, the Fitness rankings reported in [3]are extremely similar to those that one would obtain on the same datasetby iterating the algorithm just once , which appears totally unreasonable.If one iterates the Fitness algorithm for 200 steps, the rankings appearmuch more reasonable and very diﬀerent from those reported in the ECI+paper, as well as coherent with those reported in [2].We found also very strange that, in order to have Spain at the top of theFitness ranking, as reported by Albeaik et al, after 1 iteration we have toset F c = k c and Q p = k p instead of the usual constant initial conditionsused for Fitness ( F c = 1 ∀ c and Q p = 1 ∀ p ). While the starting point ofthe iteration procedure becomes irrelevant when the algorithm is iteratedup to convergence (and in this case Spain is never at the top), obviously itbecomes more important if the number of iterations is reduced. In orderto understand which misconceptions lead Albeaik et al. to such Fitnessranking, we tried to reproduce their results by visually comparing ourcomputations of the Fitness and ECI+ with theirs. In particular, in ﬁg.1 we reproduced the original comparison between ECI+ and Fitness aspresented in [3] and compare it with the one recomputed by us . The up-per ﬁgure is the original Fitness vs ECI+ graph taken from [3], the center In order to reproduce the ﬁgures in [3] we standardized the Fitness. We point out thatthe correct procedure is to take its logarithm. gure is our best reproduction of that plot, that is done by performingonly one (1) iteration of Fitness and with the initial conditions mentionedabove. In the lower panel we present the comparison between ECI+ andthe logarithm of the value of Fitness at convergence, obtained with 200iterations. While the authors of [3] compute the logarithm of X afteriterating, they omit to compute the logarithm of Fitness as it is usuallydone in the literature [1, 2, 4, 5] to compare it to other macroeconomicintensive indicators. One can easily realize that the best reproduction ofthe results presented by Albeaik et al. is obtained if the Fitness algorithmis iterated only once, which is clearly a mistake. On the contrary, if thesame algorithm is iterated up to convergence, the two measures correlatemore, given the mathematical equivalence of the algorithms. However,since the input matrices diﬀer, some deviations are still present. Conclusions

In summary, the paper of Albeaik et al. [3] does not introduce any newalgorithm but just renames the Fitness one as ”ECI+”. In this respectone may also note that ECI+ has nothing to do with the original ECI [8].A detailed discussion of the problems of ECI and the reasons to introducethe Fitness can be found in [2]. Also in that occasion the authors of[8] took inspiration from our work and learned, without ever citing, thatthe linear calculation of the ECI can be solved exactly by computing aneigenvector rather than with 18 iterations.The numerical results reported in [3] for the Fitness are incorrect andeven embarrassing in view of the mathematical equivalence of ECI+ andFitness. Albeaik et al. [3] present a totally distorted view of the situation,from both a mathematical and numerical point of view.

References [1] A. Tacchella, M. Cristelli, G. Caldarelli, A. Gabrielli, L. Pietronero,Scientiﬁc Reports , 723 (2012).[2] M. Cristelli, A. Gabrielli, A. Tacchella, G. Caldarelli, L. Pietronero,PLoS ONE , e7072 (2013).[3] S. Albeaik, M. Kaltenberg, M. Alsaleh, C. A. Hidalgo,https://arxiv.org/abs/1707.05826v3 (2017).[4] M. Cristelli, A. Tacchella, L. Pietronero, PLoS ONE (2), e0117174(2015).[5] E. Pugliese, G.L. Chiarotti, A. Zaccaria, L. Pietronero, PloS ONE (1), e0168540 (2017)[6] G. Gaulier, and S. Zignago. ”Baci: international trade database atthe product-level (the 1994-2007 version).” (2010).[7] E. Pugliese, A. Zaccaria, and L. Pietronero. The European PhysicalJournal Special Topics 225.10: 1893-1911. (2016)[8] C.A. Hidalgo and R. Hausmann. PNAS 106.26: 10570-10575 (2009). �� - - - + ℱ �� - - - - -

20 ECI + Log ℱ F L ogF Taken from [1]Fitness 1 Iteration (Authors’ Calculations)

Log Fitness 200 Iterations (Authors’ Calculations)(Authors’ Calculations)