A Blockchain Transaction Graph based Machine Learning Method for Bitcoin Price Prediction
11 A Blockchain Transaction Graph based MachineLearning Method for Bitcoin Price Prediction
Xiao Li and Weili Wu
Abstract —Bitcoin, as one of the most popular cryptocurrency,is recently attracting much attention of investors. Bitcoin priceprediction task is consequently a rising academic topic for provid-ing valuable insights and suggestions. Existing bitcoin predictionworks mostly base on trivial feature engineering, that manuallydesigns features or factors from multiple areas, including BticoinBlockchain information, finance and social media sentiments.The feature engineering not only requires much human effort,but the effectiveness of the intuitively designed features can notbe guaranteed. In this paper, we aim to mining the abundantpatterns encoded in bitcoin transactions, and propose k -odertransaction graph to reveal patterns under different scope. Wepropose the transaction graph based feature to automaticallyencode the patterns. A novel prediction method is proposedto accept the features and make price prediction, which cantake advantage from particular patterns from different historyperiod. The results of comparison experiments demonstrate thatthe proposed method outperforms the most recent state-of-artmethods. Index Terms —Bitcion, Blockchain, Machine Learning, PricePrediction, Transaction Graph
I. I
NTRODUCTION
Bitcoin Blockchain [1], as the very first application ofblockchain, has been attracting more and more attention ofpublic from various areas. The bitcoin is the cryptocurrencytraded in the Bitcoin blockchain, which is a reward to theminers for successfully mining a block. Bitcoin shares similarcharacteristics like many other financial products, eg. stocks,gold and crude oil [2]. The main characteristic of bitcoin is thevolatility in price [3], [4]. Bitcoin was traded at 123.65 USD onSeptember 30, 2013, and reached its peak at 18984.77 USD onDecember 18, 2017, then falls down at 9096.78 USD on July3, 2020 . With the great opportunity of earning fortune fromthe striking difference in prices, bitcoin is becoming a popularfinancial asset, people make investment on it [5]. Bitcoin’smarket capitalization in 2017 reached 300 billion US dollars,almost equal to that of Amazon in 2016 [6].Bitcoin price forecasting can provide valuable suggestionsfor investors to make decision that whether they can buy inmore bitcoins to earn more, or it’s better to sold their bitcoinsout to avoid further shrink of assets. The basic bitcoin priceforecasting only aims to predict the price trend, which onlyprovide the suggestions on whether the price will rise orfall [6], [7]. The results of trend prediction can only presentlimited information to investors, investors always desire more Corresponding Author: Xiao Li, Email: [email protected]. Li and W. Wu are with the Department of Computer Science, TheUniversity of Texas at Dallas, Richardson, TX 75080 USA (e-mail:[email protected]; [email protected]).This work is in part supported by NSF grant 1747818 and 1907472. accurate and more informative suggestions on the price, so thatinvestors can make further analysis to evaluate how much willthe price change impacts their assets. For example, if a trendprediction system suggests that the price will drop, investorsmay be panic and sold out all their bitcoins, however, if theycan know that the price will only drop slightly, investors maychoose to wait for further revival of price, which can avoid theloss of their asset. However, there is only handful work studythe accurate price prediction to predict the price of bitcoin[8],[9], [10], [11]. Therefore in this paper, we study the accuratebitcoin price prediction that is much more critical and usefulin practice.To make the prediction of price, machine learning models,either classification models or regression models, are adoptedas popular prediction models. Therefore well-designed featuresare required to feed the machine learning models. Existingwork has proposed various features to encode the latentproperties behind the Bitcoin price from multiple aspects.The most basic features of blockchain are the indexesreflecting the information of blockchain, such as mean degreeof addresses, number of new addresses, total coin amounttransferred in transactions and so on [8]. Maesa et. al alsoanalysis the features of blockchain by constructing a full usersgraph [12]. Mallqui et. al. [10] include international economicindicators to reflect the feature of the financial market, suchas crude oil future prices, gold future prices, S&P500 future,NASDAQ future, and DAX index, which are features fromfinancial perspective. CerdaR et. al. [11] introduces publicopinion feature into bitcoin price prediction through miningthe sentiment from social media like Twitter. Yao et. al [7]attempt to represent the opinion feature from news articles.These public opinion features are considered based on theintuition that people will take action based on how much thepositive or negative opinions delivered by news articles andsocial media.Those existing work mostly manually define features for thebitcoin price prediction tasks. Although the created featurescover many aspects, including blockchain network, financialmarket information, and even public opinions, the featureengineering is trivial and there are also many latent featuresthat are hard to define them explicitly and manually. On theother hands, if the external factors that beyond the Bitcoinblockchain, such as public opinions or the financial market,contribute to the price change of bitcoin, they will eventuallybe reflected by the changes in the Bitcoin blockchain, becausethe external factors will influence the action of users, then theuser’s action will be reflected by the transactions in BitcionBlockchain. In this paper, we argue that Bitcoin blockchainencodes abundant information that have latent pattern to a r X i v : . [ q -f i n . S T ] A ug represent the features behind the bitcoin price.We believe that the patterns of the transaction is veryexpressive that the patterns can directly reflect what’s goingon the in the blockchain, and thus represents financial statusof bitcoin market. As mentioned in [9], if a input addressesof a transaction is more than the output addresses, then thetransaction is gathering bitcoins, indicating some users arebuying bitcoins. On the other hand, if the input addressesof a transaction is less than the output addresses, then thetransaction is splitting the bitcoins, indicating some users areselling bitcoins. Therefore the patterns of the transactions canrepresent very useful information that can hardly managed bymanual feature engineering.To effectively mining the transaction patterns, we employthe transaction graph to represent the bitcoin blockchain.Further we propose the k -order transaction subgraph to encodethe transaction pattern, with different k , different level ofpatterns. Finally the pattern occurrence matrix is propose tostore the frequency of the patterns occurred in blockchain,which can represent the feature of a period of blockchain.The main contributions of the paper can be summarized asfollows: • We propose the k -order transaction subgraph based onthe transaction graph, to represent the transaction featureof Bitcoin Blockchain. • We proposed transaction graph based feature to encodethe latent patterns behind the transactions, which is fur-ther fed in a novel machine learning based predictionmethod that can effectively learn the characteristics ofdifferent history period. • To our best knowledge, we are the first to utilize onlyBitcoin Blockchain transaction information, to tackle theprice prediction problem. • We evaluate proposed method on real bitcoin price his-torical data, and the results demonstrate the superioritycomparing to recent state-of-the art methods.The remainder of this paper is organized as follows: InSection II, recent related work is presented. Then we describeour proposed transaction graph based price prediction method-ology in Section III. In Section IV, we evaluate the proposedmethod. Finally, the conclusion is made in Section V.II. R
ELATED W ORK
The key issue of bitcoin price prediction or forecastingtask is to discover and analysis determinants of bitcoin price.Since Ladislav Kristoufek [13] studied the connection betweenBitcoin and search queries on Google Trends and Wikipedia,the determinants study has developed rapidly. Kristoufek’sresults show a positive correlation between a price level ofBitcoin and the searched terms, and an asymmetry betweeneffects of search queries related to prices above and below ashort-term trend.Extending from analyzing the relation of Wikipedia andGoogle search queries, researchers also evaluate the influenceof social media or public opinions [11], [7]. Balfagih andKeselj [14] extensively explored the relationship betweenbitcoin tweets and the prices, which utilizes different language modeling approaches, such as tweet embedding and N-Grammodeling. Polaski et. al [15] discover that the bitcoin pricesare primarily driven by the popularity of Bitcoin, the sentimentexpressed in newspaper reports on cryptocurrency, and totalnumber of transactions. Mittal et. al [16] argues that there isa relevant degree of correlation of Google Trends and Tweetvolume data with the price of Bitcoin, while no significant rela-tion with the sentiments of tweets is discovered. Piwowarski et.al[17] aim at analyzing particular relevance of topics on socialmedia for the high volatility of bitcoin price when shifting indistinct four phases across 2017 to 2018.Georgoula et.al [18] show that the sentiment ratio of twitterhas positive correlation with bitcoin prices, while the value ofBitcoins is negatively affected by the exchange rate betweenthe USD and the euro has negative relation with bitcoinprice. Ciaian et. al [19] are the first to studies BitCoin priceformation by considering both traditional features in marketand digital currencies specific factors from the economicalaspects. Brandvold et. al [20] study the contributions of Bitcoinexchanges to price discovery, and demonstrate that Mt.Goxand BTC-e are the market leaders with the highest informationshare. Aggarwal et. al [21] attempt to compare the effects ofdeterminants including bitcoin factors, social media and theGold price. Pieters and Vivanco [22] study the difference inBitcoin prices across 11 different markets, and present thatstandard financial regulations can have a non-negligible impacton the market for Bitcoin.Both Georgoula et.al [18] and Kristoufek [23] studies thedifference of long-term and short-term impact of the determi-nants on bitcoin price. Kristoufek [23] stresses that both timeand frequency are crucial factors for Bitcoin price dynamicssince the bitcoin price evolves overtime, and examines how theinterconnections from various sources behave in both time anddifferent frequencies. Bouoiyour et. al [24] adopt EmpiricalMode Decomposition (EMD) for Bitcoin price analysis, andthe results show that the long-term fundamentals contributesmost to itcoin price variation, though intuitively bitcoin marketseems a short-term speculative bubble. Chen et. al [25] ana-lyze the dependence structure between price and its influencefactors, and based on copula theory, the bitcoin price hasdifferent correlation structures with influence factors.Bitcoin Blockchain structural information is also minedfor discovering the features and determinants of the bitcoinprices. Akcora et. al [9] propose a bitcoin graph model, uponwhich Chainlets is proposed to represent graph structures inthe Bitcoin. A k -chainlet is a subgraph of a bitcoin modelthat contains exactly k transaction nodes. Akcora et. al [9]employ both the features derived from chainlets and heuristicfeatures to fed in machine learning model for price prediction.In Akcora et. al’s further work [8], they propose occurrencematrix and amount matrix to encode the topological featuresof chainlets. In this paper, we also adopt the concept ofoccurrence matrix to encode the topological features. However,we design totally different graph representation model toreveal the topological features of Bitcoin Blockchain.The determinants can be considered as features behind thebitcoin price change, then various machine learning methodscan be adopted to learn the patterns from the features and make bitcoin price forecasting [6], [26], [27]. Felizardo et.al [28], [29] compare several popular machine learningmethods adopted in bitcoin price prediction task, such asBack-propagation Neural Network (BPNN), AutoregressiveIntegrated Moving Average mode (ARIMA), Random For-est (RF), Support Vector Machine (SVM), Long Short-TermMemory (LSTM) and WaveNets. Wu et. al [30] proposed anovel LSTM with AR(2) model that outperforms conventionalLSTM model. Hashish et. al [31] use Hidden Markov Modelsto tackle the volatility of cryptocurrencies and predict thefuture movements with LSTM. Nguyen et. al [32] proposehybrid methods between ARIMA and machine learning topredict the bitcoin next day price.III. M ETHODOLOGY
A. Problem DefinitionDefinition 1: (Bitcoin Price Trend Prediction) :
Giventimestamp t + h , where h ∈ N + , and bitcoin historical datain time period [ t − i, t ] , where i ∈ N + . Let P t denotes theprice of bitcoin at the timestamp t . the bitcoin price trendprediction problem is to predict the label L t + h ∈ {− , } ,where L t + h = 1 if P t + h > P t , and L t + h = − otherwise.Base on above basic Bitcoin Price Trend Prediction task,we define and study the Bitcoin Price Prediction task, whichis defined in Definition 2. Definition 2: (Bitcoin Price Prediction) :
Given timestamp t + h , where h ∈ N + , and bitcoin historical data in time period [ t − i, t ] , where i ∈ N + . Let P t denotes the price of bitcoinat the timestamp t . the bitcoin price prediction problem is topredict the price at timestamp t + h , P t + h .To manage the Bitcoin Price Prediction task defined above,classic machine learning framework is employed in this paper.First, feature vectors are obtained from the Bitcoin Blockchainin the historical time period [ t − i, t ] . Then the feature vectorsare fed in machine learning models to learn and predict thevalue of future bitcoin price. Since the values of prices iscontinuous, the price prediction problem can be considered asa classic regression problem. B. Method Overview
Figure 1 illustrates an overview of the whole method. Wepropose to establish independent models for different h , whichmeans each model only trained for predict specific future time.For example, the model for h = 1 only trained for predictingthe price at tomorrow, and model for h = 1 is only forpredicting the price at the day after tomorrow. For predictingthe bitcoin price at time t (cid:48) , h (cid:48) independent models will betrained and produce h (cid:48) different values of the estimated price ˆ p t (cid:48) , the integrator will then generate the final estimated valuebased on the output of all the independent models.Each independent machine learning model will take thefeatures that generated from transaction graphs as inputs, andis trained separately. The features, e.g. v , ..., v h (cid:48) will beproposed in latter sections. Since the models are trained fordifferent h , the time period utilized for the models shouldalso be different. h can also be considered as the length of thehistory window size, which is a hyper parameter to adjust how much historical information is considered in the the features.The start point and the end point of historical time period isadjusted based on the value of h (cid:48) . There are totally h (cid:48) differentperiods utilized to learn the patterns to predict the price at thesame day ˆ p t (cid:48) .The integrator can be a simple linear function, that: ˆ p t (cid:48) = α ∗ ˆ p t (cid:48) + α ∗ ˆ p t (cid:48) + ... + α h (cid:48) ∗ ˆ p t (cid:48) h (cid:48) (1)where α + α + ... + α h (cid:48) = 1 .In this paper, we elaborately design the weights. Let W i = [ α , α , α , ..., α i ] . Specially, if the history window sizeis 1, which indicates we only employ one model to makeprediction, W = [ α ] = [1 . . With the increase of historywindow size, for i > , the W i is defined as Equation 2: W i +1 [ k ] = W i [ k ] ( k = 1 , ..., i − W i +1 [ i ] = W i [ i ] ∗ rW i +1 [ i + 1] = W i [ i ] ∗ (1 − r ) (2)where r controls the speed of decay of weights correspondingfor further history. Equation 2 maintains the property that (cid:80) α j ∈ W i α j = 1 for i > . C. Transaction Graph
In order to mining the bitcoin blockchain information, weobtain the feature by employ transaction graph to representthe bitcoin blockchain.There are similar concepts of transaction graph inliterature[8], [12], here we define the transaction graph asfollowing.
Definition 3: (Transaction Graph) : A transaction graph is adirected graph G = ( A, T, E ) , where A is the set of addressesin Blockchain, T is the set of transactions of blockchain, and E is the set of direct link from a i ∈ A to t k ∈ T , indicating a i is one of the inputs of t k , or from t k ∈ T to a j ∈ A ,indicating a j is one of the outputs of t k .Figure 2 presents a simple example of a transaction graph,which contains 8 addresses and 4 transactions. D. k-order transaction subgraph
The k -order transaction subgraph of a transaction t i is agraph G kt i that contains only t i and the transactions that spentthe output of t i in next k − steps, and the corresponding ad-dresses connecting to these transactions. The formal definitionis given in Definition 4. Definition 4: ( K -order transaction subgraph) : The K -ordertransaction subgraph of a transaction t i is a graph G kt i =( A k , T k , E k ) , where T k = { t j | ∃ a n ∈ A k − , ( a n , t j ) ∈ E and ∃ ( t l , a n ) ∈ E k − f or t l ∈ T k − } , A k = { a n | a n ∈ A k − or ( t j , a n ) ∈ E where t j ∈ T k } .Specially, if k = 1 , G t i = ( A , T , E ) , where A = { a n | ( a n , t i ) ∈ E or ( t i , a n ) ∈ E } , T = { t i } and E = { ( a n , t i ) or ( t i , a n ) | a n ∈ A } .Obviously, if k = 1 , then the K -order transaction subgraphof t i contains only t i along with its input addresses and outputaddresses. With k increases, the k order transaction subgraphwill trace further along with the bitcoin flow outputted by 𝐺[𝑡 $ −1 − 1, 𝑡 $ − 1]G[𝑡 $ −1 − 2,𝑡 $ − 1]G[𝑡 $ −1 − ℎ′,𝑡 $ − 1] Model for ℎ = 1
Model for ℎ = 2
Model for ℎ = ℎ′ … 𝑣 / 𝑣 𝑣 𝑝 / 𝑝 𝑝 Integrator 𝑝 Fig. 1: The overview of whole Method t1t2 t3t4a1 a4a2 a6a3 a5a7 a8
Fig. 2: A Simple Transaction Graphtransaction t i . Figure 3(a) and 3(c) shows the -order and -order transaction subgraph of the transaction t in Figure 2,respectively. Figure 3(b) and 3(d) shows the -order and -order transaction subgraph of t , respectively.The k -order transaction subgraphs have different pattern byconsidering the different structures of them. Here we considerdifferent patterns as different numbers of inputs and outputsaddresses of the k -order transaction subgraphs.The input addresses of a k -order transaction subgraph G kt i is the addresses that only give inputs to transactions in G kt i ,and meanwhile do not accept any output from any transactionsin G kt i . Contrarily, the output addresses of G kt i is the addressesthat only accepts outputs of transactions in G kt i , and do not giveany input from any transactions in G kt i . For clear notations,the definition of input and output addresses are defined inDefinition 5. Definition 5: (Input and Output addresses of K -ordertransaction subgraph) : The input and output addresses of K -order transaction subgraph G kt i is I G kti and O G kti , respec-tively. I G kti = { a n |∃ ( a n , t j ) ∈ E k , t j ∈ T k and ∀ t k ∈ T k , ( t k , a n ) / ∈ E k } . O G kti = { a n |∃ ( t k , a n ) ∈ E k , t k ∈ T k and ∀ t j ∈ T k , ( a n , t j ) / ∈ E k } .In Figure 3(a), the addresses a and a are the inputaddresses of G t , and the address a is the output addressof G t . For higher orders of transaction subgraph, the inputand output addresses may be more complicated. For example,in Figure 3(c), the input addresses of G t are { a , a } = I G t ,and the output addresses are { a } = O G t . Please note that t1a1a2 a5 (a) -order transaction subgrpah of t , G t t2 a6a3 a7 (b) -order transaction subgrpah of t , G t t1 t3t4a1a2 a5 a8 (c) -order transaction subgrpah of t , G t t2 t4a6a3 a7 a8 (d) -order transaction subgrpah of t , G t Fig. 3: The order nd -order transaction subgraph of t inFigure 2 a is not an input nor an output address, the function of a in G t is only for transition of Bitcoins. Similarly, in Figure 3(d),the input addresses of G t are { a } = I G t , while the outputaddresses of G t is { a } = O G t .Based on the concept of I G kti and O G kti , we can further de-fine the pattern of a transaction subgraph. The pattern of a k -order transaction graph is denoted as G k ( m,n ) = { G kt i ||I G kti | = m, |O G kti | = n } , where m and n are the number of input addresses and output addresses of G kt i respectively.For a given transaction graph generated from the transactioninformation in bitcoin blockchain during a specific period, wecan obtain k order transaction subgraph G kt i of each transaction t i ∈ T . The obtained transaction subgraphs may belongs todifferent patterns. For the example in Figure 3, G t belongsto the pattern G , , while G t belongs to the pattern G , .We believe these patterns contain valuable informationrevealing the characteristics of the transaction graph of corre-sponding blockchain in a period. Besides, the patterns obtainedunder different order k can reveal different level of latentinformation. The benefit of denoting the pattern based onthe number of input addresses and out addresses is that thepatters can be easily encoded into matrices, and therefore canbe adopted as the features of the current transaction graph.The two key information that can represent the featuresof a transaction graph are 1)what kinds of patterns occurredin the transaction graph, and 2) how many times of thesepatterns occurred. We extend the concept of occurrence matrixin literature [8] to k order occurrence matrix, denoted as OC k ,where the entry of OC k is OC k ( m,n ) = | G k ( m,n ) | .Finally we concatenate OC k for k = 1 , , , .., s as thefeature of the transaction graph G that representing the corre-sponding blockchain period. Now the Bitcoin Price Prediction problem in Definition 2 can be specified in detail: use thefeature vector v that is a concatenation of OC k , which iscalculated on the transaction graph based on the bitcoinhistorical data in time period [ t − i, t ] , to predict the bitcoinprice at timestamp t + h , P t + h . In this paper, we apply atrick on this prediction task, that instead of directly fittingthe relation of the feature vector v with P t + h , we fit v with P t + h − P t . The reason is that the feature actually is representthe new transaction data in past period, not the cumulativetransaction data, so predicting the price difference rather thanthe price itself is more reasonable. E. Computation of Occurrence Matrix
The above sections give a comprehensive interpretation ofthe occurrence matrix. In this section, we propose an iterativemanner for realistic implementation by multiplying matricesto efficiently compute the occurrence matrix.Let H ∈ R | T |×| T | be the matrix denoting the input ad-dresses of each transaction, The entry of H is H A i ,t i = 1 , if A i is the set of input addresses of transaction t i , otherwise H A i ,t i = 0 . Figure 4 shows the H matrix of transaction graphin Figure 2. {a1,a2}{a3}{a4,a5}{a6,a7} t1 t2 t3 t4 Fig. 4: The H matrix of Transaction Graph in Figure 2.Let P ∈ R | A |×| T | be the matrix denoting the input relation-ship between each address a i and each transaction t j . P i,j = 1 if a i is one of the input addresses of transaction t j , otherwise, P i,j = 0 . Figure 5 shows The P matrix of transaction graphin Figure 2.
11 1 11 111 t1 t2 t3 t4a1a2a3a4a5a6a7a8
Fig. 5: The P matrix of Transaction Graph in Figure 2.Then let Q ∈ R | T |×| A | be the matrix denoting the outputrelationship between each address a j and each transaction t i . Q i,j = 1 if a j is one of the output addresses of transaction t i , and Q i,j = 0 otherwise. Figure 6 shows The Q matrix oftransaction graph in Figure 2. t1t2t3t4 a1 a2 a3 a4 a5 a6 a7 a8 Fig. 6: The Q matrix of Transaction Graph in Figure 2.For calculating the k order occurrence matrix OC k , we firstneed to derive the transition matrix M ∈ R | T |×| A | for the k order transaction graph, which is derived through Equation 3. M k = H ( QP ) k − Q (3)The entry of matrix M k , M kA i ,a j > if there is a flow fromtransactions t i , whose input addresses set is A i , to address a j ,otherwise, M kA i ,a j = 0 . In fact, we can easily understand thatthe M kA i ,a j denotes how many possible path from transaction t i to address a j in the k order transaction graph of transaction t i .Therefore | A i | is the number of input addresses of the k order transaction graph of t i , and (cid:80) a j ∈ A I { M kA i ,a j > } isthe number of output addresses of the k order transaction graphof t i . I {∗} = 1 if the condition ∗ is satisfied, and I {∗} = 0 otherwise.Now each entry of OC k , OC k ( m,n ) , can be calculated basedon the k order transition matrix M k through Equation 4. OC k ( m,n ) = (cid:88) A i I {| A i | = m & (cid:88) a j ∈ A I { M kA i ,a j > } = n } , (4) OC k ( m,n ) = (cid:88) A i I {| A i | = m & (cid:88) a j ∈ A I { M kA i ,a j > } = n } , m < , n < , (cid:88) A i I {| A i | ≥
20 & (cid:88) a j ∈ A I { M kA i ,a j > } = n } , m = 20 , n < , (cid:88) A i I {| A i | = m & (cid:88) a j ∈ A I { M kA i ,a j > } ≥ } , m < , n = 20 , (cid:88) A i I {| A i | ≥
20 & (cid:88) a j ∈ A I { M kA i ,a j > } ≥ } , m = 20 , n = 20 . (5) where A i is the set of input addresses of transaction t i , namely A i = { a k | ( a k , t i ) ∈ E } .For the simple example in Figure 2, if k = 1 , the transitionmatrix M is illustrated in Figure 7. Then the occurrencematrix OC can be easily derived. a1 a2 a3 a4 a5 a6 a7 a8{a1,a2}{a3}{a4,a5}{a6,a7} 𝑀 " = 𝐻 𝑄𝑃 ’ 𝑄 =
Fig. 7: Order Transition Matrix M of Transaction Graphin Figure 2. OC , = 3 , because |{ a , a }| = |{ a , a }| = |{ a , a }| = 2 , and (cid:80) a j ∈ A I { M { a ,a } ,a j > } = (cid:80) a j ∈ A I { M { a ,a } ,a j > } = (cid:80) a j ∈ A I { M { a ,a } ,a j > } =1 . OC , = 1 , because |{ a }| = 1 , and (cid:80) a j ∈ A I { M { a } ,a j > } = 2 . In addition, all the otherentries of OC is 0, since there is no other pattern for the -order transactions graphs.If k = 2 , the calculation of the transition matrix M isillustrated in Figure 8. Then the occurrence matrix OC canbe calculated as follows . OC , = 1 , since |{ a , a }| = 2 and (cid:80) a j ∈ A I { M { a ,a } ,a j > } = 1 . OC , = 1 , since |{ a }| = 1 and (cid:80) a j ∈ A I { M { a } ,a j > } = 1 . All the otherentries of OC is 0.The dimension of occurrence matrices may be differentfor different k -order, and different transactions. However,occurrence matrices of unified size are required for formatingthe feature vector of the same size, so that the features can befed in machine learning based prediction models. Accordingto literature [9], nearly 97.57% transactions have the inputsand outputs sized no greater than 20. Therefore, for the lessthan 3% left transactions, whose number of inputs or outputsis greater than 20, we manually set number as 20. The k -orderoccurrence matrix OC k now can be defined as Equation 5.IV. E XPERIMENTS
In this section, we present the evaluation of proposed trans-action graph based blockchain feature and price predictionmethods.
A. Data preparation
To conduct the bitcoin price prediction task, we collectthe Bitcoin blockchain historical data and the bitcoin pricehistorical data. The Bitcoin blockchain data is downloadedfrom Google Bigquery public dataset crypto bitcoin whosedata is exported using bitcoin etl tool . The bitcoin price datais collected from Coindesk .We select two history periods for the experiments. • Interval 1 : From August 19th, 2013 to July 19th, 2016.The timestamps are divided daily. This period contains1065 days, the first 80% days are for training and theleft 20% is for testing. • Interval 2 : From April 1st, 2013 to April 1st, 2017. Thetimestamps are divided daily. This period contains 1461days, the first 70% days are for training and the other30% is for testing.The interval 1 and interval 2 is identical to the datasetsin literature [10], which will be compared as a baseline innext sections. In this paper, we predict bitcoin daily closingprice during the above periods. The bitcoin price of interval1 and interval 2 is presented in Figure 9 and Figure 10,respectively. It is possible to observe, that the bitcoin pricesshow a high volatility, which indicates that the nature of thebitcoin can hardly be intuitively discovered , therefore thefeatures designed manually may be ineffective.For the evaluation metric, we adopt Mean Absolute Percent-age Error (MAPE) to show the error between predicted pricesand real prices. The MAPE is formally defined as Equation 6.
M AP E = 1 N N (cid:88) i =1 | ˆ p i − p i | p i (6)where ˆ p i is the predicted bitcoin price, while p i is the realprice. B. Baselines and Comparison Results1) Baselines:
We select Mallqui et. al. [10] ’s work as thebaseline, where similar bitcoin price prediction task is studied.Mallqui et. al. propose various attributes of Bitcoin Blockchainas features, including history price, volume of trades and eventhe finacial indicators from other financial area such as theGold price and Nasdaq price. Mallqui et. al. utilize several Dataset ID is bigquery-public-data: crypto bitcoin athttps://cloud.google.com/bigquery https://github.com/blockchain-etl/bitcoin-etl 𝑀 " = 𝐻 𝑄𝑃 ’ 𝑄 = {a1,a2}{a3}{a4,a5}{a6,a7} t1 t2 t3 t4 t1 t2 t3 t4t1t2t3t4 t1t2t3t4 a1 a2 a3 a4 a5 a6 a7 a8 × ×= a1 a2 a3 a4 a5 a6 a7 a8{a1,a2}{a3}{a4,a5}{a6,a7} Fig. 8: Order Transition Matrix M of Transaction Graph in Figure 2. Dates P r i c e s Fig. 9: Bitcoin Daily Closing Price of Interval 1
Dates P r i c e s Fig. 10: Bitcoin Daily Closing Price of Interval 2machine learning methods to make prediction of bitcoin pricebased on the proposed features.Specifically, among the machine learning methods in [10],the SVM model shows the best prediction performance, there-fore we adopt the SVM prediction model as the baselineprediction model, denoted as
M allquiet.al. − SV M .
2) Results:
To provide intuitive understanding of the priceprediction results, we visualize the best results at interval1 and interval2 in Figure 11 and Figure 12 respectively. P r i c e realpredict Fig. 11: Price Prediction Results on interval1 P r i c e realpredict Fig. 12: Price Prediction Results on interval2
The numerical results in terms of
M AP E is presented inTable I. The results are obtained by the SVM version of ourmethod and under k = 2 and r = 0 . . The results show thatwhen h = 1 , that we only train the model for predict the nextday, the results achieve lower MAPE than the baseline thatalso only trains one single model. Since both the baselineand our method here utilize the SVM model, the results can demonstrate the effectiveness of our proposed transactiongraph based blockchain feature.Further, when we train more models for both h = 1 and h = 2 , and integrate the results together as the finalresults, the performance can be even better. Therefore, theresults demonstrate out proposed prediction method, that isto consider and combine the different latent features underdifferent historical period, can boost the accuracy of theprediction.TABLE I: MAPE of Baseline and Our Method on Two TimePeriods Methods \ Periods interval1 interval2
Mallquiet.al. − SV M [10] 1.91% 1.81%Our(h=1) 1.75% 1.754%Our(h=2) 2.58% 2.590%Our(integrated) 1.69% 1.751%
C. Accuracy for future time t (cid:48) We study the accuracy for prediction of different futuretime stamps t (cid:48) . Intuitively, further future is harder to predict,which indicates that the MAPE value will decrease withthe increase of distance of future (also called horizon). Weconduct experiments on 2016 and 2017 Bitcoin Blockchainrespectively, and the results are illustrated in Figure 13. Weset k = 2 and history window size as 1.The results is consistent with the intuition, that the MAPEkeeps rising when we try to predict further future. In addition,the results demonstrate that the speed of growth differs withperiods. The error rises faster in 2017 than 2016, whichindicates that predicting the price in 2017 is much harder thanit in 2016. This reflects the fact that the bitcoin prices fluctuatemuch strikingly in 2017 than 2016. M A P E Fig. 13: MAPE of predicting price at different horizon on 2016and 2017 Bitcoin market
D. Influence of the length of history window
The length of history window size indicates how muchhistorical information are utilized for the prediction. Since ourproposed method trains models for different history window M A P E Fig. 14: MAPE of predicting price on 2016 Bitcoin marketwith different length of history window
Fig. 15: MAPE of predicting price on 2017 Bitcoin marketwith different length of history windowsizes and integrate the results as final results, history windowsize can therefore impact the performance of whole frame-work. We compare the results obtained under different historywindow size in 2016 and 2017, and the results are illustratedin Figure 14 and Figure 15 respectively. We set r = 0 . and r = 0 . for 2016 and 2017 respectively, since this settingachieves best performance in our test.The result in Figure 14 and Figure 15 demonstrates thatintroducing more than one history time stamps (days in thispaper) can significantly improve the prediction results. How-ever, involving too much historical information may leadingincrease of computational consumption, since each length ofhistory time stamps will be assigned one model to train thefeatures. The results also shows that the performance of theframework will be relatively stable when the length of historywindow is greater or equal to 2. Therefore, in practice theframework will not consume too much to reach the bestperformance. V. C
ONCLUSIONS
In this paper, we proposed a transaction graph based ma-chine learning method for bitcoin price prediction. The k -ordertransaction graphs of the transactions are proposed to revealthe transaction patterns in Bitcoin blockchain. The occurrencematrix is then defined to encode the patterns information andfurther be represented as the features of Bitcoin blockchain.We also provide mathematical formula for iterative implemen-tation of the features. Results of comparison experiments showthat our proposed method outperforms the most recent state-of-art method, and demonstrate the effectiveness of automaticallylearning the transaction patterns from multiple blockchainhistory periods. R EFERENCES[1] S. Nakamoto, “Bitcoin : A peer-to-peer electronic cash system,” 2009.[2] S. Vassiliadis, P. Papadopoulos, M. Rangoussi, T. Konieczny, andJ. Gralewski, “Bitcoin value analysis based on cross-correlations,”
Journal of Internet Banking and Commerce , vol. 22, no. S7, p. 1, 2017.[3] H. A. Aalborg, P. Moln´ar, and J. E. de Vries, “What can explainthe price, volatility and trading volume of bitcoin?”
Finance ResearchLetters , vol. 29, pp. 255–265, 2019.[4] M. Balcilar, E. Bouri, R. Gupta, and D. Roubaud, “Can volume predictbitcoin returns and volatility? a quantiles-based approach,”
EconomicModelling , vol. 64, pp. 74–81, 2017.[5] D. L. Yermack, “is bitcoin a real currency? an economic appraisal,”
Economics of Innovation eJournal , 2013.[6] Z. Chen, C. Li, and W. Sun, “Bitcoin price prediction usingmachine learning: An approach to sample dimension engineering,”
J. Comput. Appl. Math. , vol. 365, 2020. [Online]. Available:https://doi.org/10.1016/j.cam.2019.112395[7] W. Yao, K. Xu, and Q. Li, “Exploring the influence of news articleson bitcoin price with machine learning,” in . IEEE, 2019, pp. 1–6. [Online]. Available:https://doi.org/10.1109/ISCC47284.2019.8969596[8] N. C. Abay, C. G. Akcora, Y. R. Gel, M. Kantarcioglu, U. D.Islambekov, Y. Tian, and B. M. Thuraisingham, “Chainnet: Learning onblockchain graphs with topological features,” in , J. Wang, K. Shim, and X. Wu, Eds. IEEE, 2019, pp. 946–951.[Online]. Available: https://doi.org/10.1109/ICDM.2019.00105[9] C. G. Akcora, A. K. Dey, Y. R. Gel, and M. Kantarcioglu,“Forecasting bitcoin price with graph chainlets,” in
Advances inKnowledge Discovery and Data Mining - 22nd Pacific-Asia Conference,PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings,Part III , ser. Lecture Notes in Computer Science, D. Q. Phung,V. S. Tseng, G. I. Webb, B. Ho, M. Ganji, and L. Rashidi,Eds., vol. 10939. Springer, 2018, pp. 765–776. [Online]. Available:https://doi.org/10.1007/978-3-319-93040-4 60[10] D. C. A. Mallqui and R. A. S. Fernandes, “Predicting thedirection, maximum, minimum and closing prices of daily bitcoinexchange rate using machine learning techniques,”
Appl. SoftComput. , vol. 75, pp. 596–606, 2019. [Online]. Available: https://doi.org/10.1016/j.asoc.2018.11.038[11] G. C. Cerda and J. L. Reutter, “Bitcoin price prediction throughopinion mining,” in
Companion of The 2019 World Wide WebConference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019 ,S. Amer-Yahia, M. Mahdian, A. Goel, G. Houben, K. Lerman, J. J.McAuley, R. Baeza-Yates, and L. Zia, Eds. ACM, 2019, pp. 755–762.[Online]. Available: https://doi.org/10.1145/3308560.3316454[12] D. D. F. Maesa, A. Marino, and L. Ricci, “Uncovering the bitcoinblockchain: An analysis of the full users graph,” in . IEEE,2016, pp. 537–546. [Online]. Available: https://doi.org/10.1109/DSAA.2016.52[13] L. Kristoufek, “Bitcoin meets google trends and wikipedia: Quantifyingthe relationship between phenomena of the internet era,”
Scientificreports , vol. 3, no. 1, pp. 1–7, 2013. [14] A. M. Balfagih and V. Keselj, “Evaluating sentiment c1assifiers forbitcoin tweets in price prediction task,” in . IEEE, 2019, pp. 5499–5506. [Online]. Available:https://doi.org/10.1109/BigData47090.2019.9006140[15] M. Polasik, A. I. Piotrowska, T. P. Wisniewski, R. Kotkowski, andG. Lightfoot, “Price fluctuations and the use of bitcoin: An empiricalinquiry,”
Int. J. Electron. Commer. , vol. 20, no. 1, pp. 9–49, 2015.[Online]. Available: https://doi.org/10.1080/10864415.2016.1061413[16] A. Mittal, V. Dhiman, A. Singh, and C. Prakash, “Short-term bitcoinprice fluctuation prediction using social media and web search data,” in . IEEE, 2019, pp. 1–6.[Online]. Available: https://doi.org/10.1109/IC3.2019.8844899[17] A. Burnie and E. Yilmaz, “An analysis of the change in discussionson social media with bitcoin price,” in
Proceedings of the 42ndInternational ACM SIGIR Conference on Research and Developmentin Information Retrieval, SIGIR 2019, Paris, France, July 21-25,2019 , B. Piwowarski, M. Chevalier, ´E. Gaussier, Y. Maarek, J. Nie,and F. Scholer, Eds. ACM, 2019, pp. 889–892. [Online]. Available:https://doi.org/10.1145/3331184.3331304[18] I. Georgoula, D. Pournarakis, C. Bilanakos, D. N. Sotiropoulos,and G. M. Giaglis, “Using time-series and sentiment analysis todetect the determinants of bitcoin prices,” in . AISeL, 2015, p. 20. [Online].Available: http://aisel.aisnet.org/mcis2015/20[19] P. Ciaian, M. Rajcaniova, and dArtis Kancs, “The economics ofbitcoin price formation,”
Applied Economics , vol. 48, no. 19, pp. 1799–1815, 2016. [Online]. Available: https://doi.org/10.1080/00036846.2015.1109038[20] M. Brandvold, P. Moln´ar, K. Vagstad, and O. C. A. Valstad, “Pricediscovery on bitcoin exchanges,”
Journal of International FinancialMarkets, Institutions and Money , vol. 36, pp. 18–35, 2015.[21] A. Aggarwal, I. Gupta, N. Garg, and A. Goel, “Deep learning approachto determine the impact of socio economic factors on bitcoin priceprediction,” in . IEEE, 2019,pp. 1–5. [Online]. Available: https://doi.org/10.1109/IC3.2019.8844928[22] G. Pieters and S. Vivanco, “Financial regulations and price inconsisten-cies across bitcoin markets,”
Inf. Econ. Policy , vol. 39, pp. 1–14, 2017.[Online]. Available: https://doi.org/10.1016/j.infoecopol.2017.02.002[23] L. Kristoufek, “What are the main drivers of the bitcoin price? evidencefrom wavelet coherence analysis,”
PloS one , vol. 10, no. 4, p. e0123923,2015.[24] J. Bouoiyour, R. Selmi, A. K. Tiwari, O. R. Olayeni et al. , “What drivesbitcoin price,”
Economics Bulletin , vol. 36, no. 2, pp. 843–850, 2016.[25] W. Chen, Z. Zheng, M. Ma, J. Wu, Y. Zhou, and J. Yao,“Dependence structure between bitcoin price and its influence factors,”
IJCSE , vol. 21, no. 3, pp. 334–345, 2020. [Online]. Available:https://doi.org/10.1504/IJCSE.2020.106058[26] S. Yogeshwaran, M. J. Kaur, and P. Maheshwari, “Project basedlearning: Predicting bitcoin prices using deep learning,” in
IEEEGlobal Engineering Education Conference, EDUCON 2019, Dubai,United Arab Emirates, April 8-11, 2019 , A. K. Ashmawy andS. Schreiter, Eds. IEEE, 2019, pp. 1449–1454. [Online]. Available:https://doi.org/10.1109/EDUCON.2019.8725091[27] E. Sin and L. Wang, “Bitcoin price prediction using ensembles of neuralnetworks,” in , Y. Liu, L. Zhao, G. Cai, G. Xiao, K. Li,and L. Wang, Eds. IEEE, 2017, pp. 666–671. [Online]. Available:https://doi.org/10.1109/FSKD.2017.8393351[28] L. Felizardo, R. Oliveira, E. Del-Moral-Hernandez, and F. Cozman,“Comparative study of bitcoin price prediction using wavenets,recurrent neural networks and other machine learning methods,”in . IEEE, 2019, pp. 1–6. [Online]. Available: https://doi.org/10.1109/BESC48373.2019.8963009[29] C. Chen, J. Chang, F. Lin, J. Hung, C. Lin, and Y. Wang, “Comparisonof forcasting ability between backpropagation network and ARIMAin the prediction of bitcoin price,” in . IEEE, 2019, pp.1–2. [Online]. Available: https://doi.org/10.1109/ISPACS48206.2019.8986297 [30] C. Wu, C. Lu, Y. Ma, and R. Lu, “A new forecasting frameworkfor bitcoin price with LSTM,” in , H. Tong, Z. J. Li, F. Zhu, and J. Yu, Eds.IEEE, 2018, pp. 168–175. [Online]. Available: https://doi.org/10.1109/ICDMW.2018.00032[31] I. A. Hashish, F. Forni, G. Andreotti, T. Facchinetti, and S. Darjani,“A hybrid model for bitcoin prices prediction using hidden markovmodels and optimized LSTM networks,” in . IEEE, 2019, pp. 721–728. [Online]. Available: https://doi.org/10.1109/ETFA.2019.8869094[32] D. Nguyen and H. Le, “Predicting the price of bitcoin usinghybrid ARIMA and machine learning,” in Future Data and SecurityEngineering - 6th International Conference, FDSE 2019, Nha TrangCity, Vietnam, November 27-29, 2019, Proceedings , ser. Lecture Notesin Computer Science, T. K. Dang, J. K¨ung, M. Takizawa, and S. H. Bui,Eds., vol. 11814. Springer, 2019, pp. 696–704. [Online]. Available:https://doi.org/10.1007/978-3-030-35653-8 49
Xiao Li received his B.S. and M.S degree in Software Engineering fromDalian University of Technology, China in 2016 and 2019, respectively. He iscurrently pursuing the Ph.D. degree with the Department of Computer Science,University of Texas at Dallas, Richardson, TX, USA. His current researchinterests include data mining and Blockchain.