The rich still get richer: Empirical comparison of preferential attachment via linking statistics in Bitcoin and Ethereum
Dániel Kondor, Nikola Bulatovic, József Stéger, István Csabai, Gábor Vattay
TThe rich still get richer: Empirical comparison of preferentialattachment via linking statistics in Bitcoin and Ethereum
Dániel Kondor , ∗ , Nikola Bulatovic , József Stéger , István Csabai , Gábor Vattay Singapore-MIT Alliance for Research and Technology, Singapore Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary ∗ E-mail: [email protected]
February 25, 2021
Abstract
Bitcoin and Ethereum transactions present one of thelargest real-world complex networks that are publiclyavailable for study, including a detailed picture of theirtime evolution. As such, they have received a con-siderable amount of attention from the network sci-ence community, beside analysis from an economic orcryptography perspective. Among these studies, inan analysis on the early instance of the Bitcoin net-work, we have shown the clear presence of the pref-erential attachment, or “rich-get-richer” phenomenon.Now, we revisit this question, using a recent versionof the Bitcoin network that has grown almost 100-foldsince our original analysis. Furthermore, we addition-ally carry out a comparison with Ethereum, the sec-ond most important cryptocurrency. Our results showthat preferential attachment continues to be a key fac-tor in the evolution of both the Bitcoin and Ethereumtransactoin networks. To facilitate further analysis, wepublish a recent version of both transaction networks,and an efficient software implementation that is ableto evaluate linking statistics necessary for learn aboutpreferential attachment on networks with several hun-dred million edges.
Cryptocurrencies have presented a disruptive changefor both economics and computer science. Over thepast years, interest in cryptocurrencies resulted in ahuge amount of money invested in them [1] and agrowing amount of research carried out on diverse ap-plication possibilities of the underlying technologies,e.g. blockchain and decentralized trust [2, 3, 4]. Atthe same time, cryptocurrencies provide a unique op-portunity as financial systems where the whole list oftransactions is exposed, making possible to study thedynamic interactions taking place in them [5, 6, 7, 8].This way, cryptocurrencies present a unique perspec- tive by providing the complete history on how novel,alternative financial systems evolve from their incep-tion [9, 10]. Furthermore, the appearance of cryptocur-rencies has helped research connecting network infor-mation with economical analysis to gain momentumdue to the availability of high volume data [11, 12, 13,14].Considering the list of transactions as an evolvingnetwork, cryptocurrencies present one of the largestreal-world networks that can be analyzed by the sci-entific community, with several hundred million totaledges. This can be of interest in itself, as it allowsto test theories about evolving and time-varying net-works on large scales with better statistical confidence.While there is significant interest in how cryptocurren-cies work from a network science perspective [15, 16, 8],we still do not have a comprehensive understanding ofwhich are the relevant processes that shape their net-work structure.In the current study, we evaluate key network char-acteristics on the transaction networks of Bitcoin andEthereum, the two most popular cryptocurrencies. Wespecifically look at network evolution and the dynamicsof how nodes gain new transaction partners and gainor lose balance. We build on our previous work [5] thatfocused only on the initial phase of Bitcoin and foundthat preferential attachment drives the evolution of thetransaction network and concentration of wealth. Con-sidering the scale of Bitcoin and the many factors influ-encing transaction dynamics, it is remarkable how wellpower-law degree distributions and preferential attach-ment describe its evolution. In the current work, weextend our previous analysis to a significantly longerperiod of trading with multiple up- and downturns inthe market for both Bitcoin and Ethereum; in the caseof Bitcoin, this means an almost 100-fold growth intotal network size. This allows us to test if the maintransaction dynamics found previously stay significantduring a timeframe when cryptocurrencies gained sev-eral orders of magnitude in total investment and be-came a main market component instead of just a niche.1 a r X i v : . [ phy s i c s . s o c - ph ] F e b e show that a process of preferential attachment con-tinues to be determinant for both cryptocurrencies andis robust with regard of the time period analyzed andthe method used to reconstruct the transaction net-work.We download and process the transaction history ofboth Bitcoin and Ethereum and reconstruct the tem-porally evolving transaction network. Since the maincomponents of the network are the transactions whichare instantaneous events, there are multiple possiblechoices for defining a network among the addresses.We show that the activity of addresses is characterizedby fat-tailed distributions both in terms of temporalextent, number of transactions they participate in andaddresses they come in contact with. Most addressesare short lived according to the practice of users of fre-quently generating new addresses to obtain increasedprivacy, while some addresses participate in an espe-cially large number of transactions over an extendedtime range, giving rise to power-law degree distribu-tions in the aggregated network.We perform a more in-depth analysis of transactiondynamics, testing how preferential attachment can ex-plain the broad degree distributions seen in the aggre-gated transaction networks. We evaluate statistics ofnew edge formation using the rank function methodol-ogy developed in our previous work [5] using differentlevels of temporal aggregation, testing also the robust-ness of results. During our analysis, we perform anin-depth comparison among Bitcoin and Ethereum, fo-cusing on comparing the transaction dynamics of regu-lar addresses in the two systems and between addressesand smart contracts in Ethereum. We adapted the Bitcoin Core client program (ver-sion 0.19) by adding functionality to write out dataabout transactions and blocks in a CSV format .We used this client to download and extract theblockchain on February 7, 2020. Our data includes616,345 blocks with 500,663,153 transactions among609,963,452 unique addresses in total.We construct a network among addresses by creat-ing a directed edge between each input and outputaddress for each transaction, excluding self-edges. Theresulting network has 3,648,627,182 unique edges, thatappear 4,834,306,446 times in total. Note that in Bit-coin, a transaction can have multiple input and outputaddresses and thus can result in the addition of mul- Source code of our modified client is available at https://github.com/dkondor/bitcoin/tree/0.19 tiple edges [6]. Also, transaction inputs must alwaysinclude the full amount received by a previous transac-tion output; when spending less than this amount, theremainder (or “change”) is directed to one of the ad-dresses of the spending user in a separate transactionoutput. This results in a large number of self-edges inpractice.
We use the OpenEthereum client to synchronize withthe blockchain and then use the Ethereum-ETL clientto output a the transaction history in CSV format. Weextracted data on February 2, 2020; this includes thefirst 9.4 million blocks in the chain, with a total of628,810,973 transactions among 68,429,208 unique ad-dresses. Ethereum transactions are one-to-one: eachtransaction has only one input and output addressand thus can be directly mapped to a directed edgein a network among addresses. Contrary to Bitcoin,in Ethereum, the balance of an address is recorded asan intrinsic property in the system; this way, spendingis possible in any denomination, and does not requirethe “change” mechanism used in Bitcoin.
In the usual picture of growing complex networks,edges are typically considered static entities that rep-resent existing connection which can be gained or lostover time. For transactions in cryptocurrencies, thispicture is not accurate: since transactions are instanta-neous events, the presence of an edge in our network in-dicates that at least on transaction took place betweentwo addresses over the lifetime of the network. Giventhe timescales in our analysis, edges that correspond totransactions that happened a long time ago lose theirrelevance (e.g. if a user abandons using a certain ad-dress, as is often the case). To account for this, we canuse an alternate network definition, where edges havea finite “lifetime”: they are created when a transac-tion happens between two addresses, and are removedif a certain time passes without repeated transactionsbetween the same pair of addresses. Removal of anedge also decreases the network degree of the asso-ciated nodes. This means that activity is gradually“forgotten”, at least for the purpose of our analysis.In this case, the indegree of a node naturally repre-sents the number of distinct transaction partners it hadin a recent time interval. We can choose this time in-terval to correspond to a presumption of “memory” inthe dynamics between addresses. In practice, we cre-ated networks where the lifetime of edges was limitedto one day and 30 days beside the fully time-aggregatednetwork.2 .3 Preferential attachment
Preferential attachment is a model of network evolu-tion originally suggested by Barabási and Albert [17],based on the models studied originally in different con-texts by Yule and Simon [18, 19]. The original modelpredicts a power-law degree distribution with an expo-nent of γ = 2; it was later generalized to yield networkswith power-law degree distributions of arbitrary expo-nents [20]. Preferential attachment was observed ei-ther directly or indirectly in many real-world complexnetworks in the past decades [21, 22, 23], including anearly phase of Bitcoin [5].In this paper, we focus on a model of nonlinear pref-erential attachment [24], represented in our case by thesimple rule that the probability of a new link connect-ing to a target node with indegree k is proportional to k α . Note that we do not restrict this process to linksfrom new nodes, as we expect a significant amount oflinks to be created between already existing nodes, adeparture of the original Barabási-Albert model [17].Also, the choice is made among existing nodes, thusthe total probability of connecting to any node withindegree k is Π( k ) ∼ n ( k ) k α (1)where n ( k ) is the number of nodes with indegree k in the network (i.e. the empirical degree distribution).In an evolving network, the degree distribution willchange over time, making it difficult to compare prob-abilities of events that occur at different times with dif-ferent network configurations. We overcome this prob-lem by calculating the transformed rank of the targetindegree for each linking event: R ≡ P k target k =0 n ( k ) k α P k max k =0 n ( k ) k α (2)where k target is the indegree of the node receiving thenew link. If our assumption about the preferential at-tachment process and the α exponent holds true, thenempirical R values calculated for a set of linking eventswill be distributed in a uniform way over the [0 ,
1] in-terval [5]. Since the R transformed rank values arenormalized this way, values from different time points(and thus different stages of the evolving network) canbe analyzed together. Furthermore, by limiting theset of events considered to smaller time intervals, therole of the preferential attachment process in networkevolution at different times can be easily compared.In practice, we can calculate transformed ranks forany value of the α exponent. In this article, we com-pare several α values and identify the one that bestfits a uniform distribution. Note that a hypothesis ofno preferential attachment (i.e. a case where networkdegree does not affect the probability of attracting new transaction partners) can be readily represented in thisframework by α = 0.Evaluating the statistics of preferential attachmentrequires calculating the R value in Eq. (3) for each“event”, i.e. possible multiple times for each transac-tion, based on the actual degree distribution in thenetwork. Since the number of transactions is in theorder of hundreds of millions for both networks, a di-rect summation over the degree distribution (that has aruntime complexity of O ( N ) for a network of N nodes)is not feasible. However, using a properly augmentedbinary search tree as the data structure to store the de-gree distribution along with partial sums of k α , we areable to perform the calculation of R values in O (log N )time complexity, making it possible to evaluate thedistribution of R values over hundreds of millions ofevents. We describe the necessary tools used for thispurpose in the Supplementary Material, while we pub-lish the source code of an efficient augmented binarysearch tree implementation used for this purpose on-line [25, 26]. Both Bitcoin and Ethereum has experienced a greatamount of growth over their lifetime, including multi-ple “peaks”, where a sudden surge of interest resultedin large upticks of both exchange price and network ac-tivity. Since early 2018 when cryptocurrencies gainedan unprecedented global attention, daily activity forboth Bitcoin and Ethereum has had an approximatelyconstant rate however, in contrast to previous peri-ods of growth. This could be the consequence of get-ting close to the technical limits of transaction vol-ume that the networks are able to handle . Also, sincethe beginning of 2018, the total capitalization of cryp-tocurrencies (for simplicity, defined as the total valueof coins in circulation based on the current exchangerate) have approached that of publicly traded stockswith the highest capitalization; this could limit furtherspeculative investment in them.We perform a simple characterization of structure bylooking at the degree distribution of transaction net-works. More specifically, we are interested in indegreedistributions, since this can be interpreted as a mea-sure of capacity to attract interaction with externalentities. Both networks are characterized by fat-tailed Both Bitcoin and Ethereum have hard limits on the amountof data, and thus the number of transactions that can be in-cluded in blocks (Bitcoin directly limits the block size, whileEthereum limits the maximum gas amount to be used in blocks).Approaching this limit will result in transaction fees increasing(since miners will prefer to include transactions with more fees).This functions as a natural feedback loop that discourages creat-ing too many transactions and thus limits the network activity. a c t i v e node s a c i t v e edge s active nodesactive edges Figure 1: Timeline of activity in the Bitcoin network, measured by the number of nodes (addresses) and edgesactive each day on a linear (left) and logarithmic (right) scale. We see that the activity in Bitcoin experienced asteady growth over several years after an initial surge of interest in 2011. In the recent years, growths has taperedoff, with activity stabilizing around a few million edges per day. a c t i v e node s a c i t v e edge s active nodesactive edges
100 1000 10000 100000 1x10 Figure 2: Timeline of activity in the Ethereum network, measured by the number of nodes (addresses) and edgesactive each day on a linear (left) and logarithmic (right) scale. Growth of activity here is characterized by twodistinct phases: an approximately exponential growth phase in the first 2 . e-060.00010.011100100001e+061e+08 1 10 100 1000 10000 100000 1x10 f r equen cy address indegree 2009201020112013201520172020 f r equen cy balance [BTC] 201020112013201520172019 Figure 3: Distribution of network indegrees (left) and address balances (right) for Bitcoin. Indegrees are determinedby the total number of distinct transaction partners over the lifetime of the network. Both of these distributions arefat-tailed and are robust over the period of almost ten years despite the size of the network increasing by multipleorders of magnitude. The black line in the left figure shows a power-law fit for the final distribution that has anexponent of 2 .
68. The fit was carried out with the plfit package [27], based on the algorithm of Clauset et al. [28]. f r equen cy address indegree 20162017201820192020 f r equen cy contract indegree 20162017201820192020 Figure 4: Indegree distribution of regular addresses (left) and contract addresses (right) in Ethereum. Thesedistributions are also characterized as fat-tailed ones, and are well approximated by power-laws, similarly to Bitcoin.Again, the time evolution is robust over a period of almost five years, during which the Ethereum network grewover 100-fold. Black lines show power-law fits for the final distribution, with exponents of − .
54 and − .
19 foraddresses and contracts respectively. Fits were carried out with the plfit package [27], based on the algorithm ofClauset et al. [28]. 5istributions over their lifetime that are well approxi-mated with power-laws (Figs. 3 and 4). The stabilityin shape of these distributions is especially remark-able considering that different stages of the networksdepicted in Figs. 3 and 4 represent an over 100-foldincrease in size (over 10,000-fold increase in the case ofBitcoin when comparing very early instances with thelatest ones).
We test for the presence of preferential attachmentby considering all transactions that add new links tothe aggregated networks and calculating transformedranks according to Eq. (3). In Fig. 5 and Fig. 7, we dis-play the transformed ranks in order, i.e. as a functionof their cumulative distribution function (CDF), forthe case of the Bitcoin and Ethereum transaction net-works, and for the evolution of Bitcoin balances. Foreach case, a perfect fit with the model of nonlinear pref-erential attachment (i.e. Eq. (1)) would be a straightline, corresponding to the case where the transformedranks are uniformly distributed in the [0 ,
1] interval.Finding an exponent that best describes the processmeans finding a case where a straight line best approx-imates the distribution of transformed rank values.In most cases, a significant feature is that the distri-butions do not start from zero. The means that thereis a large number of transactions that target newlycreated addresses, in contrast to the original nonlinearpreferential attachment model, where the probabilityof an edge targeting a non-existent node (i.e. a nodewith a degree of zero) is zero. This is understandablegiven that users can freely create any number of ad-dresses, and are advised to often move their wealthto new addresses. Also, many service providers createunique addresses for their customers, which necessar-ily have zero degree then. Given this, we might needto restrict the preferential attachment model to onlyapply to existing addresses, while we acknowledge thatlinking to new addresses is governed by more specificrules that are relevant to cryptocurrency system usage.Given this observation, we only focus on nonzerotransformed ranks when considering if they can be fit-ted with a uniform distribution. Graphically, this cor-responds to starting the lines that represent such uni-form distributions (the black lines in Figs. 5 and 7)from the CDF value that corresponds to the firstnonzero transformed rank.In each case, we see strong evidence for the presenceof a preferential attachment process. This is clear bythe fact that ranks calculated under the α = 0 assump-tion always result in a much worse fit than α > α presumed exponent in Figs. 6 and 8. Overall, ex-ponents around α = 1 give the best fits; but there aresome further interesting observations regarding typicalvalues.In the case of the Bitcoin transaction network, lin-ear preferential attachment is the most plausible modelfor the case of newly created edges, either from new orfrom existing nodes. This is consistent with our earlierresults [5] that were done for this network at a muchearlier stage. For the case of repeated edges (i.e. re-peated transactions on edges that appeared before),we see a slight superlinear case, with α = 1 .
15 and α = 1 . α = 0 .
85 being themost plausible exponent. This is again consistent withour earlier results [5].In the case of Ethereum, we separately analyze thecase where edges connect to regular addresses (left col-umn in Fig. 7; top row in Fig. 8) and the case wherethe target of an edge is a smart contract (right col-umn in Fig. 7; bottom row in Fig. 8). For regularaddresses, we see some evidence of superlinear prefer-ential attachment ( α = 1 .
15 being the most plausibleexponent); nevertheless, a uniform distribution doesnot seem a very good fit in this case, as we see signifi-cant further features in the distribution of transformedranks in Fig. 7. Still, we can say that a form of pref-erential attachment is important in this process, sincethe case of α = 0 gives a much worse agreement withthe empirical distribution of transformed ranks thanany other case. For smart contracts, the distributionsfit more nicely, and suggest a slightly sublinear pro-cess, with α = 0 .
85 being the most plausible exponent,with the exception of the case, where a newly createdaddress initiates a transaction; in this case, α = 1 givesbetter fit. We repeated the procedure of calculating the trans-formed ranks for variants of the transaction networkswhere edges are assumed to have limited lifetimes,i.e. one day or 30 days. This means that indegrees ofnodes can decrease in the case when edges are removed.Detailed results are shown in the Supplementary Ma-terial, in Figs. S1–S7. These results are highly con-sistent with what we have obtained for the fully timeaggregated network, showing an evidence of preferen-tial attachment as well. Best fitting exponents are verysimilar in all cases for Bitcoin, while for Ethereum ad-dresses, we see slightly higher exponents for short time6 c a l c u l a t ed r an k CDFBitcoin, new edges from existing nodes c a l c u l a t ed r an k CDFBitcoin, new edges from new nodes c a l c u l a t ed r an k CDFBitcoin, transactions on existing egdes c a l c u l a t ed r an k CDFBitcoin, balances, weighted a = 0.00a = 0.25 a = 0.50a = 0.70 a = 0.85a = 1.00 a = 1.15a = 1.30 a = 1.50
Figure 5: Testing for preferential attachment in Bitcoin. The four panels show the cumulative distribution oftransformed ranks in the case of four different types of events. All cases exhibit a clear sign of preferentialattachment. At the same time, there is a clear case of transactions that target new nodes (i.e. nodes with zerodegree). This is understandable given the nature of Bitcoin, where users are encouraged to frequently generate newaddresses to enhance privacy. Black lines show the expected ideal (i.e. uniform) distribution. Kolmogorov-Smirnovdifferences from these distributions are shown in Fig. 6. K - S d i ff e r en c e exponentnew edges from existing nodes K - S d i ff e r en c e exponentnew edges from new nodes K - S d i ff e r en c e exponentexisting edges K - S d i ff e r en c e exponentbalances Figure 6: Kolmogorov-Smirnov differences from the presumed uniform distribution for the case of preferentialattachment in Bitcoin, i.e. for results displayed in Fig. 5.7 c a l c u l a t ed r an k CDFEthereum, addresses, new edges from existing nodes 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFEthereum, contracts, new edges from existing nodes c a l c u l a t ed r an k CDFEthereum, addresses, new edges from new nodes 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFEthereum, contracts, new edges from new nodes c a l c u l a t ed r an k CDFEthereum, addresses, transactions on existing edges 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFEthereum, contracts, transactions on existing edges a = 0.00a = 0.25 a = 0.50a = 0.70 a = 0.85a = 1.00 a = 1.15a = 1.30 a = 1.50
Figure 7: Testing for preferential attachment in Ethereum. The left column shows edges where the target is aregular address, while the right column shows edges where the target is a smart contract. Black lines show theexpected ideal (i.e. uniform) distribution. Kolmogorov-Smirnov differences from these distributions are shown inFig. 8. 8 K - S d i ff e r en c e exponentnew edges from existing nodes K - S d i ff e r en c e exponentnew edges from new nodes K - S d i ff e r en c e exponenttransactions on existing edges K - S d i ff e r en c e exponentnew edges from existing nodes 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 1.5 K - S d i ff e r en c e exponentnew edges from new nodes K - S d i ff e r en c e exponenttransactions on existing edges Figure 8: Kolmogorov-Smirnov differences from the presumed uniform distribution for the case of preferentialattachment in Ethereum, i.e. for results displayed in Fig. 7. Top row: results for transactions targeting addresses;bottom row: results for transactions targeting contracts.intervals, hinting at a preference for addresses that al-ready were the target of high activity recently.
So far, we have evaluated statistics of preferential at-tachment in a time-aggregated fashion, i.e. we con-sidered all transaction that happened over their life-time when looking at the distribution of transformedranks. To gain more insights into the process of net-work evolution, we evaluated the distribution of trans-formed ranks in shorter, half-year long time intervals,and show the Kolmogorov-Smirnov distances as a func-tion of exponents in Figs. 9, 10 and 11. We see thatwhile the best fit is achieved around the typical valueof exponents as found previously (see Figs. 6 and 8),there is some noticeable variation, with some time pe-riods showing slightly smaller or larger exponents asbest fits. This hints that there might be importanttime-dependent processes shaping shaping the evolu-tion of the transaction networks beyond preferentialattachment, as also evidences by the deviations of theperfect fit of the transformed rank distributions.
Our results confirm that preferential attachment is akey component shaping the evolution of cryptocur- rency transaction networks, contributing to the heavy-tailed degree distributions that arise. This is true re-gardless of the time scale considered, as focusing onlyon the subnetworks of recent transaction partners re-sults in very similar statistics of edge creation and ac-tivity. While our previous results showed the presenceof preferential attachment in the early Bitcoin network,it is remarkable that the same dynamic is present overa much longer time period that involved an almost100-fold growth in terms of network size and severalup- and downturns in the market.Findings of preferential attachment and heavy-taileddegree distributions matches well with other findingsabout networks that describe interactions betweencomplex and self-organizing social, technological oreconomical phenomena. It is also consistent with thepicture of cryptocurrency networks being made up ofa few very large players interacting with regular userswho have limited activity, especially when consideredon the level of individual addresses.Our work suggests several future directions for re-search. Firstly, while we find that preferential attach-ment is consistently present in all of the studied net-works over their lifetime, our results hint that the de-tailed dynamics of the process (as represented by thebest fitting exponent, and also the shape of the dis-tribution of transformed ranks) changes over time (seeFigs. 9, 10 and 11). A more in-depth investigation ofthese changes could lead to new insights about differ-9 K - S d i ff e r en c e exponentBitcoin, new edges from existing nodes K - S d i ff e r en c e exponentBitcoin, new edges from new nodes K - S d i ff e r en c e exponentBitcoin, transactions on existing edges Figure 9: Kolmogorov-Smirnov differences from the presumed uniform distribution for the case of preferentialattachment in Bitcoin, for distributions disaggregated over time. Each line corresponds to a distribution that wascompiled based on the events taking place in the six month prior to it. K - S d i ff e r en c e exponentEthereum, addresses,new edges from existing nodes K - S d i ff e r en c e exponentEthereum, addresses,new edges from new nodes K - S d i ff e r en c e exponentEthereum, addresses,transactions on existing edges Figure 10: Kolmogorov-Smirnov differences from the presumed uniform distribution for the case of preferentialattachment in Ethereum, for transactions targeting regular addresses, distributions disaggregated over time. Eachline corresponds to a distribution that was compiled based on the events taking place in the six month prior to it.10 K - S d i ff e r en c e exponentEthereum, contracts,new edges from existing nodes 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 1.5 K - S d i ff e r en c e exponentEthereum, contracts,new edges from new nodes K - S d i ff e r en c e exponentEthereum, contracts,transactions on existing edges Figure 11: Kolmogorov-Smirnov differences from the presumed uniform distribution for the case of preferentialattachment in Ethereum, for transactions targeting smart contracts, distributions disaggregated over time. Eachline corresponds to a distribution that was compiled based on the events taking place in the six month prior to it.ent phases of cryptocurrency usage and how it is linkedto structural properties of the transaction network.Second, while the overall trend of preferential at-tachment is quite clear, there are systematic deviationsfrom a perfect fit to the presumed form (Eq. (1)). It isa question whether these could be explained by mod-ifying the functional form or extending it to includereadily available properties of nodes. Research in thisdirection could uncover more detailed driving forces oftransaction network evolution and provide new, gener-alizable models of network growth [29].Finally, depending on availability of datasets, a com-parison between cryptocurrencies and other types ofeconomical or financial transaction networks could in-form about the generalizability of our findings andalso help in better understanding the role cryptocur-rencies play in the global economy [30, 9], a stillwidely debated subject. To facilitate further research,we publish the data and code used in the currentwork [31, 32, 25, 26].
Conflict of Interest Statement
The authors declare that the research was conductedin the absence of any commercial or financial relation-ships that could be construed as a potential conflict ofinterest.
Author Contributions
DK, GV, IC contributed to conception and design ofthe study. DK contributed software. NB, JS per-formed data collection and preprocessing. DK, NB,JS analyzed data. DK performed further data analy-sis and drafted the paper. All authors contributed to manuscript revision, read, and approved the submittedversion.
Acknowledgements
This research was supported by the Hungarian Min-istry of Innovation and Technology and the NationalResearch, Development and Innovation Office withinthe Quantum Information National Laboratory ofHungary.This research is supported by the Singapore Min-istry of National Development and the National Re-search Foundation, Prime Minister’s Office, under theSingapore-MIT Alliance for Research and Technology(SMART) programme.
Data Availability Statement
The datasets for this study are made available asRefs. [31] and [32]. Code and scripts used to generatethe main results of this paper are available as Ref. [26].
References [1] Dirk G. Baur, Ki Hoon Hong, and Adrian D. Lee.Bitcoin: Medium of exchange or speculative assets?
Journal of International Financial Markets, Institu-tions and Money , 2018. ISSN 10424431. https://doi.org/10.1016/j.intfin.2017.12.004 .[2] Zibin Zheng, Shaoan Xie, Hong-Ning Dai, Xiang-ping Chen, and Huaimin Wang. Blockchain Chal-lenges and Opportunities: A Survey.
Interna-tional Journal of Web and Grid Services , pages 1–24, 2016. http://inpluslab.sysu.edu.cn/files/blockchain/blockchain.pdf .
3] Joseph Bonneau, Andrew Miller, Jeremy Clark,Arvind Narayanan, Joshua A Kroll, and Edward WFelten. Research Perspectives and Challenges for Bit-coin and Cryptocurrencies.
IEEE Symposium on Se-curity and Privacy , pages 104–121, 2015. https://doi.org/10.1109/SP.2015.14 .[4] István András Seres, László Gulyás, Dániel A. Nagy,and Péter Burcsi. Topological Analysis of Bit-coin’s Lightning Network.
Mathematical Researchfor Blockchain Economy , pages 1–12, 2020. https://doi.org/10.1007/978-3-030-37110-4_1 .[5] Dániel Kondor, Márton Pósfai, István Csabai, andGábor Vattay. Do the rich get richer? An empiri-cal analysis of the BitCoin transaction network.
PLoSONE , 9(2):e86197, 2014. https://doi.org/10.1371/journal.pone.0086197 .[6] Silivanxay Phetsouvanh, Anwitaman Datta, andFrédérique Oggier. Analysis of multi-input multi-output transactions in the Bitcoin network.
Concur-rency Computation , (June), 2019. https://doi.org/10.1002/cpe.5629 .[7] Frédérique Oggier, Anwitaman Datta, and SilivanxayPhetsouvanh. An ego network analysis of sextortion-ists.
Social Network Analysis and Mining , 10(1), 2020. https://doi.org/10.1007/s13278-020-00650-x .[8] Jiajing Wu, Jieli Liu, Yijing Zhao, and Zibin Zheng.Analysis of Cryptocurrency Transactions from a Net-work Perspective: An Overview. arXiv preprint , 2020. http://arxiv.org/abs/2011.09318 .[9] Stefan Seebacher and Maria Maleshkova. A Model-driven Approach for the Description of BlockchainBusiness Networks.
Proceedings of the 51st Hawaii In-ternational Conference on System Sciences , 9, 2018. https://doi.org/10.24251/hicss.2018.442 .[10] Matthew Francis Dixon, Cuneyt Gurcan Akcora, Yu-lia R Gel, and Murat Kantarcioglu. Blockchain An-alytics for Intraday Financial Risk Modeling.
DigitalFinance , 1:67–89, 2019. https://doi.org/10.1007/s42521-019-00009-8 .[11] Dániel Kondor, István Csabai, János Szüle, MártonPósfai, and Gábor Vattay. Inferring the interplay be-tween network structure and market effects in Bit-coin.
New Journal of Physics , 16(12):125003, dec2014. https://doi.org/10.1088/1367-2630/16/12/125003 .[12] Cuneyt Gurcan, Matthew F Dixon, Yulia R Gel,and Murat Kantarcioglu. Bitcoin risk modeling withblockchain graphs.
Economics Letters , 173:138–142,2018. https://doi.org/10.1016/j.econlet.2018.07.039 .[13] Cuneyt G Akcora, Asim Kumer Dey, Yulia R Gel, andMurat Kantarcioglu. Forecasting Bitcoin Price withGraph. In
Pacific-Asia Conference on Knowledge Dis-covery and Data Mining , pages 765–776. Springer In-ternational Publishing, 2018. https://doi.org/10.1007/978-3-319-93040-4 . [14] Marcell Tamás Kurbucz. Predicting the price of Bit-coin by the most frequent edges of its transaction net-work.
Economics Letters , 184:108655, 2019. https://doi.org/10.1016/j.econlet.2019.108655 .[15] Jiaqi Liang, Linjing Li, and Daniel Zeng. Evolutionarydynamics of cryptocurrency transaction networks: Anempirical study.
PLoS ONE , 13(8):e0202202, 2018. https://doi.org/10.1371/journal.pone.0202202 .[16] Amir Pasha Motamed and Behnam Bahrak. Quanti-tative analysis of cryptocurrencies transaction graph.
Applied Network Science , 4(1), 2019. https://doi.org/10.1007/s41109-019-0249-6 .[17] Albert-László Barabási and Réka Albert. Emergenceof scaling in random networks.
Science , 286(October):509–512, 1999. https://doi.org/10.1126/science.286.5439.509 .[18] G. Udny Yule. A Mathematical Theory of Evolu-tion Based on the Conclusions of Dr. J. C. Willis,F.R.S.
Philosophical Transactions of the Royal Soci-ety of London. Series B, Containing Papers of a Bio-logical Character , 213:21–87, 1925.[19] Herbert A Simon. On a class of skew distributionfunctions.
Biometrika , 42(34):425–440, 1955. https://doi.org/10.1093/biomet/42.3-4.425 .[20] S. N. Dorogovtsev and J. F. F. Mendes. Evolutionof networks with aging of sites.
Physical Review E ,62:1842, 2000. https://doi.org/10.1103/PhysRevE.62.1842 .[21] H Jeong, Z Néda, and Albert-László Barabási. Mea-suring preferential attachment in evolving networks.
EPL (Europhysics Letters) , 567, 2003. https://doi.org/10.1209/epl/i2003-00166-9 .[22] J Kunegis, Marcel Blattner, and Christine Moser.Preferential attachment in online networks: Measure-ment and explanations. In
WebSci ’13 , pages 205–214,2013. https://doi.org/10.1145/2464464.2464514 . http://arxiv.org/abs/1303.6271 .[23] Matjaz Perc. The Matthew effect in empirical data. Journal of The Royal Society Interface , 11(July):20140378, 2014. https://doi.org/10.1098/rsif.2014.0378 .[24] P. L. Krapivsky, S. Redner, and F. Leyvraz. Connec-tivity of growing random networks.
Physical ReviewLetters , 85(21):4629–4632, 2000. https://doi.org/10.1103/PhysRevLett.85.4629 .[25] Dániel Kondor (2020). Generalized order-statistictree implementation. https://github.com/dkondor/orbtree [26] Dániel Kondor (2021). Order-statistic tree with exam-ple code for preferential attachment testing. https://github.com/dkondor/patest_new
27] Tamas Nepusz (2020). plfit – fitting power-law dis-tributions to empirical data. https://github.com/ntamas/plfit [28] A Clauset, CR Shalizi, and MEJ Newman. Power-law distributions in empirical data.
SIAM review ,51(4):661–703, 2009. https://doi.org/10.1137/070710111 .[29] Luka Naglić and Lovro Šubelj. War pact model ofshrinking networks.
PLoS ONE , 14(10):e0223480,2019. https://doi.org/10.1371/journal.pone.0223480 .[30] Ken Alabi. Digital blockchain networks appear to befollowing Metcalfe’s Law.
Electronic Commerce Re-search and Applications , 24:23–29, 2017. ISSN 1567-4223. https://doi.org/10.1016/j.elerap.2017.06.003 .[31] Dániel Kondor, Márton Pósfai, István Csabai, andGábor Vattay. Bitcoin transaction network, 2020.Dryad dataset. https://doi.org/10.5061/dryad.qz612jmcf [32] Dániel Kondor, Nikola Bulatovic, József Stéger, IstvánCsabai, and Gábor Vattay. Ethereum transaction net-work, 2021. Zenodo dataset. https://doi.org/10.5281/zenodo.4543269 [33] Thomas H Cormen, Charles E Leiserson, RonaldL Rivest, and Clifford Stein. Introduction to Algo-rithms, 3rd edition.
MIT Press , 2009.[34] Austern MH, Stroustrup B, Thorup M, Wilkinson J(2003). Untangling the balancing and searching of bal-anced binary search trees.
Software - Practice and Ex-perience , (13), 1273–1298. https://doi.org/10.1002/spe.564 [35] Kondor D (2015). Empirical analysis of complex socialand financial networks. PhD thesis, Eötvös LorándUniversity, Budapest, Hungary. https://doi.org/10.15476/ELTE.2015.083 upplementary Material Beside the analysis presented in the main text, we eval-uated preferential attachment statistics in two addi-tional cases, where we assume that each edge in thenetworks has a limited “lifetime”: it is erased after agiven period of time if activity is not repeated on it. Inpractice, we repeated our main analysis with presumedlifetimes of one and 30 days. The former correspondsto a case where we assume that linking preference isrelated to the incoming transactions an address had onthe previous day, while the latter assumes that trans-actions in the past month are considered.Results are shown as the distribution of transformedranks in Fig. S1 for Bitcoin and in Figs. S4 and S5for Ethereum, while we show the Kolmogorov-Smirnovdistances from uniform distributions as a function ofthe α exponent in Figs. S2 and S3 for Bitcoin and inFigs. S6 and S7 for Ethereum. To evaluate statistics of preferential attachment, weneed to calculate the following transformed rank foreach edge linking event (either new edge creation orrepeated transactions on the same edge): R ≡ P k target k =0 n ( k ) k α P k max k =0 n ( k ) k α (3)For a network with N nodes, a naive implementa-tion will have a runtime complexity of O ( N ). If wehave a total of M events (with M ∼
620 million forEthereum and M ∼ . O ( N M ), assuming thatupdating node degrees is done in O (1) time. Sinceperforming this computation would be extremely sloweven on modern hardware, we created an implementa-tion that based on an augmented red-black tree thathas a total runtime complexity of O ( M log N ).Specifically, our implementation can be considered ageneralized order-statistic tree. An order statistic treeis a binary search tree that allows calculating the rankof any element and finding an element with a givenrank efficiently (in O (log N ) time; for a formal intro-duction to binary search trees, see e.g. [33]). A gener-alization of order statistic trees that allows the efficientcalculation of the partial sum of any value associatedwith its elements can be obtained in a straightforward way, by storing such partial sums as additional data ina suitably augmented binary search tree [34]. A morecomplete treatment on augmented binary search trees,and their usage for calculating transformed ranks wasgiven as Appendix A1 in Ref. [35]; in the following, weprovide a summary of key concepts.For our particular use case, we need to calculate thesums of the k α values. In practice, we start with animplementation of a standard red-black tree [33] thatallows insertion and removal of nodes in O (log N ) timecomplexity. We use the network degrees ( k ) as keys,and store n ( k ), i.e. the number of nodes with degree k as mapped value in each node of the tree. Further-more, we also store the partial sum of n ( k ) k α in eachnode that corresponds to the subtree of that node. Weensure that these sums are recursively updated on eachoperation of the tree (this can be achieved in O (log N )time since each update needs to be only propagatedupward until it reaches the root of the tree.When a degree of a node in our transaction networkchanges, we find a node with the corresponding valuein our red-black tree, decrease the stored count (orremove the tree node if it would reach zero), and re-cursively updated the stored partial sums. After this,we either add a new tree node with the new degree orincrease the count if such a node already exists. Again,we take care to update partial sums.When we need to calculate a transformed rank valuefor a target degree k ∗ , we first find a tree node with key k ∗ . Then we recursively calculate the sums of storedpartial sums in the left subtrees of all nodes startingfrom the selected one up to the tree root node, accord-ing to Algorithm S1. This will give us the nomina-tor in Eq. (3), while the denominator is simply givenby the partial sum value stored in the tree root node.Again, this operation can be carried out in O (log N )time complexity, since each level of the tree is visitedonly maximum once.A general implementation of the augmented red-black tree used in the current work is available asRef. [25]; a somewhat specialized version along withcode and scripts calculating transformed rank statis-tics is available as Ref. [26].14 ne day edge lifetime 30 day edge lifetime c a l c u l a t ed r an k CDFBitcoin, new edges from existing nodes 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFBitcoin, new edges from existing nodes c a l c u l a t ed r an k CDFBitcoin, new edges from new nodes 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFBitcoin, new edges from new nodes c a l c u l a t ed r an k CDFBitcoin, transactions on existing egdes 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFBitcoin, transactions on existing egdes a = 0.00a = 0.25 a = 0.50a = 0.70 a = 0.85a = 1.00 a = 1.15a = 1.30 a = 1.50
Figure S1: Testing for preferential attachment in Bitcoin, assuming a one day (left column) or a 30 day “lifetime” foredges (right column). Three rows show the cumulative distribution of transformed ranks in the case of three differenttypes of events, all of which exhibit preferential attachment in a very similar fashion to the full-time aggregatednetworks. Black lines show the expected ideal (i.e. uniform) distribution. Kolmogorov-Smirnov differences fromthese distributions are shown in Figs. S2 and S3. 15 K - S d i ff e r en c e exponentnew edges from existing nodes K - S d i ff e r en c e exponentnew edges from new nodes K - S d i ff e r en c e exponentexisting edges Figure S2: Kolmogorov-Smirnov differences from the presumed uniform distribution for the case of preferentialattachment in Bitcoin, assuming one day “lifetime” of edges, i.e. for results displayed in Fig. S1. K - S d i ff e r en c e exponentnew edges from existing nodes K - S d i ff e r en c e exponentnew edges from new nodes K - S d i ff e r en c e exponentexisting edges Figure S3: Kolmogorov-Smirnov differences from the presumed uniform distribution for the case of preferentialattachment in Bitcoin, assuming a 30 day “lifetime” of edges, i.e. for results displayed in Fig. S1.16 thereum, one day edge lifetime c a l c u l a t ed r an k CDFEthereum, addresses, new edges from existing nodes 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFEthereum, contracts, new edges from existing nodes c a l c u l a t ed r an k CDFEthereum, addresses, new edges from new nodes 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFEthereum, contracts, new edges from new nodes c a l c u l a t ed r an k CDFEthereum, addresses, transactions on existing edges 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFEthereum, contracts, transactions on existing edges a = 0.00a = 0.25 a = 0.50a = 0.70 a = 0.85a = 1.00 a = 1.15a = 1.30 a = 1.50
Figure S4: Testing for preferential attachment in Ethereum, assuming a one day “lifetime” for edges. The leftcolumn shows edges where the target is a regular address, while the right column shows edges where the target isa smart contract. Black lines show the expected ideal (i.e. uniform) distribution. Kolmogorov-Smirnov differencesfrom these distributions are shown in Fig. S6. 17 thereum, 30 day edge lifetime c a l c u l a t ed r an k CDFEthereum, addresses, new edges from existing nodes 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFEthereum, contracts, new edges from existing nodes c a l c u l a t ed r an k CDFEthereum, addresses, new edges from new nodes 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFEthereum, contracts, new edges from new nodes c a l c u l a t ed r an k CDFEthereum, addresses, transactions on existing edges 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 c a l c u l a t ed r an k CDFEthereum, contracts, transactions on existing edges a = 0.00a = 0.25 a = 0.50a = 0.70 a = 0.85a = 1.00 a = 1.15a = 1.30 a = 1.50
Figure S5: Testing for preferential attachment in Ethereum, assuming a 30 day “lifetime” for edges. The leftcolumn shows edges where the target is a regular address, while the right column shows edges where the target isa smart contract. Black lines show the expected ideal (i.e. uniform) distribution. Kolmogorov-Smirnov differencesfrom these distributions are shown in Fig. S7. 18 thereum, one day edge lifetime K - S d i ff e r en c e exponentnew edges from existing nodes K - S d i ff e r en c e exponentnew edges from new nodes K - S d i ff e r en c e exponenttransactions on existing edges K - S d i ff e r en c e exponentnew edges from existing nodes K - S d i ff e r en c e exponentnew edges from new nodes K - S d i ff e r en c e exponenttransactions on existing edges Figure S6: Kolmogorov-Smirnov differences from the presumed uniform distribution for the case of preferentialattachment in Ethereum, assuming one day “lifetime” of edges, i.e. for results displayed in Fig. S4. Top row:results for transactions targeting addresses; bottom row: results for transactions targeting contracts.
Ethereum, 30 day edge lifetime K - S d i ff e r en c e exponentnew edges from existing nodes K - S d i ff e r en c e exponentnew edges from new nodes K - S d i ff e r en c e exponenttransactions on existing edges K - S d i ff e r en c e exponentnew edges from existing nodes 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 1.5 K - S d i ff e r en c e exponentnew edges from new nodes K - S d i ff e r en c e exponenttransactions on existing edges Figure S7: Kolmogorov-Smirnov differences from the presumed uniform distribution for the case of preferentialattachment in Ethereum, assuming a 30 day “lifetime” of edges, i.e. for results displayed in Fig. S5. Top row:results for transactions targeting addresses; bottom row: results for transactions targeting contracts.19 lgorithm S1
Algorithm calculating the transformedrank for one linking event. T: red-black tree storing the degree distributionand partial sums: x : node of the tree corresponding to degree k x ,storing: n x , the number of such network nodes S x , the partial sum; S x = S y + S z + k αx if y and z are children of x k : target degree of a transaction x = T.find( k ) . find the tree node with the givendegree R = 0 while True do . Calculate the nominator ofEq. (3) y = x .left() if y = T.nil() then R = R + S y end if if x = T.root() then Break end if x = x .parent() end while x = T.root() . Retrieve the denominator R = R/S x Results: R , the transformed rank corresponding toa transaction targeting a node with degree kk