A first econometric analysis of the CRIX family
AA first econometric analysis of the CRIX family
Shi Chen , Cathy Yi-Hsuan Chen , Wolfgang Karl H¨ardle ,TM Lee , Bobby Ong September 28, 2020 Corresponding author. Humboldt-Universit¨at zu Berlin, C.A.S.E.-Center of Ap-plied Statistics and Economics, Unter den Linden 6, 10099 Berlin, Germany. Email:[email protected] Humboldt-Universit¨at zu Berlin, C.A.S.E.-Center of Applied Statistics and Eco-nomics, Unter den Linden 6, 10099 Berlin, Germany. Chung Hua University, depart-ment of Finance, 707 Sec.2 WuFu Rd., Hsinchu, Taiwan. Email: [email protected] Humboldt-Universit¨at zu Berlin, C.A.S.E.-Center of Applied Statistics and Eco-nomics, Unter den Linden 6, 10099 Berlin, Germany. Email: [email protected] fellow in Sim Kee Boon Institute for Financial Economics, Singapore Man-agement University, 90 Stamford Road, 6th Level, School of Economics, Singapore178903. CoinGecko, 101 Upper Cross Street, No. 05-16 People’s Park Centre, Singapore058357. Email: [email protected] CoinGecko, 101 Upper Cross Street, No. 05-16 People’s Park Centre, Singapore058357. Email: [email protected] a r X i v : . [ q -f i n . S T ] S e p ontents hapter 1 A first econometric analysisof the CRIX family
The CRIX (CRyptocurrency IndeX) has been constructed based on approxi-mately 30 cryptos and captures high coverage of available market capitalisation.The CRIX index family covers a range of cryptos based on different liquidityrules and various model selection criteria. Details of ECRIX (Exact CRIX),EFCRIX (Exact Full CRIX) and also intraday CRIX movements may be foundon the webpage of hu.berlin/crix.In order to price contingent claims one needs to first understand the dynam-ics of these indices. Here we provide a first econometric analysis of the CRIXfamily within a time-series framework. The key steps of our analysis includemodel selection, estimation and testing. Linear dependence is removed by anARIMA model, the diagnostic checking resulted in an ARIMA(2,0,2) model forthe available sample period from Aug 1st, 2014 to April 6th, 2016. The modelresiduals showed the well known phenomenon of volatility clustering. There-fore a further refinement lead us to an ARIMA(2,0,2)- t The CRyptocurrency IndeX developed by H¨ardle and Trimborn (2015)is aimed to provide a market measure which consists of a selection of represen-tative cryptos. The index fulfills the requirement of having a dynamic structureby relying on statistical time series techniques. The following table 1.1 are the30 cryptocurrencies used in the construction of CRIX index.The Research Data Center supported by Collaborative Research Center(CRC) 649 provides access to the dataset. At time of writing, Bitcoins marketcapitalization as a percentage of CRIX total market capitalization is 83%.No. Cryptos Symbol Description1 Bitcoin BTC Bitcoin is the first cryptocurrency. It was createdby the anonymous person(s) named Satoshi Nako-moto in 2009 and has a limited supply of 21 millioncoins. It uses the SHA-256 Proof-of-Work hashingalgorithm.2 Ethereum ETH Ethereum is a Turing-completed cryptocurrencyplatform created by Vitalik Buterin. It raised US$18million worth of bitcoins during a crowdsale of ethertokens in 2014. Ethereum allows for token creationand smart contracts to be written on top of the plat-form. The DAO (No.30) and DigixDAO (No.15) aretwo tokens created on the Ethereum platform that isalso used in the construction of CRIX.3 Steem STEEM Steem is a social-media platform that rewards usersfor participation with tokens. Users can earn to-kens by creating and curating content. The Steemwhitepaper was co-authored by Daniel Larimer whois also the founder of BitShares (No.16).3 Ripple XRP Ripple is a payment system created by Ripple Labsin San Francisco. It allows for banks worldwide totransact with each other without the need of a cen-tral correspondent. Banks such as Santander andUniCredit have begun experimenting on the Rippleplatform. It was one of the earliest altcoin in themarket and is not a copy of Bitcoin’s source code.5 Litecoin LTC Litecoin is branded the ”silver to bitcoin’s gold”.It was created by Charles Lee, an ex-employee ofGoogle and current employee of Coinbase. Charlesmodified Bitcoin’s source code and made use of theScrypt Proof-of-Work hashing algorithm. There is atotal of 84 million litecoin with a block time of 2.5minutes. Initial reward was 50 LTC per block withrewards halving every 840,000 blocks.6 NEM NEM NEM, short for New Economy Movement is a cryp-tocurrency platform launched in 2015 that is writtenfrom scratch on the Java platform. It provides manyservices on top of payments such as messaging, assetmaking and naming system.7 Dash DASH Dash (previously known as Darkcoin and XCoin)is a privacy-centric cryptocurrency. It anonymizestransactions using PrivateSend (previously known asDarkSend), a concept that extends the idea of Coin-Join. PrivateSend achieves obfuscation by combin-ing bitcoin transactions with another persons trans-actions using common denominations of 0.1DASH,1DASH, 10DASH and 100DASH.8 Maidsafecoin MAID MaidSafeCoin is the cryptocurrency for the SAFE(Secure Access For Everyone) network. The net-work aims to do away with third-party central serversin order to enable privacy and anonymity for Inter-net users. It allows users to earn tokens by shar-ing their computing resources (storage space, CPU,bandwidth) with the network. Maidsafecoin was re-leased on the Omni Layer.4 Lisk LSK Lisk is a Javascript platform for the creation ofdecentralized applications (DApps) and sidechains.Javascript was chosen because it is the most popularprogramming language on Github. It was created byOlivier Beddows and Max Kordek who were activelyinvolved in the Crypti altcoin before this. Lisk con-ducted a crowdsale in early 2016 that raised aboutUS$6.15 million.10 Dogecoin DOGE Dogecoin was created by Jackson Palmer and BillyMarkus. It is based on the ”doge”, an Internet memebased on a Shiba Inu dog. Both the founders createdDogecoin for it to be fun so that it can appeal to alarger group of people beyond the core Bitcoin audi-ence. Dogecoin found a niche as a tipping platformon Twitter and Reddit. It was merged-mined withLitecoin (No.5) on 11 September 2014.11 NXT NXT NXT is the first 100% Proof-of-Stake cryptocurrency.It is a cryptocurrency platform that allows for thecreation of tokens, messaging, domain name systemand marketplace. There is a total of 1 billion coinscreated and it has a block time of 1 minute.12 Monero XMR Monero is another privacy-centric altcoin that aimsto anonymize transactions. It is based on theCryptonote protocol which uses Ring Signatures toconceal sender identities. Many users, including thesender will sign a transaction thereby making it verydifficult to trace the true sender of a transaction.13 Synereo AMP Synereo is a decentralized and distributed social net-work service. It conducted its crowdsale in March2015 on the Omni Layer where 18.5% of its tokenswere sold.14 Emercoin EMC Emercoin provides a key-value storage system, whichallows for a Domain Name System (DNS) for .coin,.emc, .lib and .bazar domain extensions. It is in-spired by Namecoin (No.26) DNS system whichuses the .bit domain extension. It uses a Proof-of-Work/Proof-of-Stake hashing algorithm and allowsfor a maximum name length of 512.55 DigixDAO DGO DigixDAO is a gold-backed token on the Etheruem(No.2) platform. Each token represents 1 gram ofgold and each token is divisible to 0.001 gram. Thetokens on the Ethereum platform are audited to en-sure that the said amount of gold is held in reservesin Singapore.16 BitShares BTS BitShares is a cryptocurrency platform that allowsfor many features such as a decentralized asset ex-change, user-issued assets, price-stable cryptocurren-cies, stakeholder approved project funding and trans-ferable named accounts. It uses a Delegated Proof-of-Stake consensus algorithm.17 Factom FCT Factom allows businesses and governments to recorddata on the Bitcoin blockchain. It does this by hash-ing entries before adding it onto a list. The entriescan be viewed but not modified thus ensuring in-tegrity of data records.18 Siacoin SC Sia is a decentralized cloud storage platform whereusers can rent storage space from each other. Thedata is encrypted into many pieces and uploaded todifferent hosts for storage.19 Stellar STR Stellar was created by Jed McCaleb, who was alsothe founder of Ripple (No.4) and Mt. Gox, thepreviously-largest bitcoin exchange which is nowbankrupt. Stellar was created using a forked sourcecode of Ripple. Stellar’s mission is to expand finan-cial access and literacy worldwide.20 Bytecoin BCN Bytecoin is a privacy-centric cryptocurrency and isthe first cryptocurrency created with the CryptoNoteprotocol. Its codebase is not a fork of Bitcoins.61 Peercoin PPC Peercoin (previously known as PPCoin) was cre-ated by Sunny King. It was the first implementa-tion of Proof-of-Stake. It uses a hybrid Proof-of-Work/Proof-of-Stake system. Proof-of-Stake is moreefficient as it does not require any mining equipmentsto create blocks. Block creation is done via holdingstake in the coin and therefore resistant to 51% min-ing attacks.22 Tether USDT ether is backed 1-to-1 with traditional US Dollar inreserves so 1
U SDT = 1
U SD . It is digital tokens for-matted to work seamlessly on the Bitcoin blockchain.It exists as tokens on the Omni protocol.23 Counterparty XCP Counterparty is the first cryptocurrency to makeuse of Proof-of-Burn as a method to distributetokens. Proof-of-Burn works by having userssend bitcoins to an unspendable address, in thiscase: 1
CounterpartyXXXXXXXXXXXXXXXU W LpV r . A total of 2,125 BTC were burnt in thismanner, creating 2.6 million XCP tokens. The Proof-of-Burn method ensures that the Counterparty de-velopers do not enjoy any privilege and allows forfair distribution of tokens. Counterparty is based onthe Bitcoin platform and allows for creation of assetssuch as Storjcoin X (No.25).24 Agoras AGRS Agoras is an application and smart currency marketbuilt on the Tau-Chain to feature intelligent personalagents, programming market, computational powermarket, and a futuristic search engine.25 Storjcoin X SJCX Storjcoin X is used as a token to exchange cloud stor-age and bandwidth access. Users can obtain Storj-coin X by renting out resources to the network viaDriveMiner and they will be able to rent space fromother users by paying Storjcoin X using Metadisk.Storjcoin X is an asset created on the Counterpartyplatform (No.23).76 Namecoin NMC Namecoin is one of the earliest altcoin that has beenadapted from Bitcoins source code to allow for a dif-ferent use case. It provides a decentralised key-valuesystem that allows for the creation of an alternativeDomain Name System that cannot be censored bygovernments. It uses the .bit domain extension. Itwas merge-mined with Bitcoin from September 2011.27 Ybcoin YBC Ybcoin is a cryptocurrency from China that was cre-ated in June 2013. It uses the Proof-of-Stake hashingalgorithm.28 Nautiluscoin NAUT Nautiluscoin uses DigiShield difficulty retargetingsystem to safeguard against multi-pool miners. Ithas a Nautiluscoin Stabilization Fund (NSF) to re-duce price volatility.29 Fedoracoin TIPS Fedoracoin is based on the Tips Fedora Internetmeme. Fedoracoin is also used as a tipping cryp-tocurrency.30 The DAO DAO The DAO, short for Distributed Autonomous Orga-nization ran one of the most successful crowdfundingcampaign when it raised over US$160 million. TheDAO is a smart contract written on the Ethereum(No.2) platform. The DAO grants token holders vot-ing rights to make decision in the organization basedon proportion of tokens owned. In June 2016, a hackoccurred resulting in the loss of about US$60 mil-lion. The Ethereum Foundation decided the reversethe hack by conducting a hardfork of the Ethereumplatform.Table 1.1: 30 cryptocurrencies used in construction of CRIX.
In the crypto market, the CRIX index was designed as a sample drawn from thepool of cryptos to represent the market performance of leading currencies. Inorder for an index to work as an investment benchmark, in this section we firstfocus on the stochastic properties of CRIX. The plots are often the first step inan exploratory analysis. Figure 1.1 shows the daily values from 01/08/2014 to8
015 2016
Figure 1.1: CRIX Daily Price from Aug 1st, 2014 to April 6th, 2016econ crix06/04/2016. We can observe that the values of CRIX fell down substantiallyuntil the mid of 2015, CRIX did poorly, perhaps as a result of the cool off ofthe cryptocurrency. After a few months moving up and down, the CRIX was,however, sloped up till now as a better year for crypto market. It is worthwhileto note here that the CRIX index were largely impacted and/or influenced bythe crypto market, therefore, makes it a better indicator for the market perfor-mance.To find out the dynamics of CRIX, we would first look closer to stationary timeseries. A stationary time series is one whose stochastic properties such as mean,variance etc are all constant over time. Most statistical forecasting methods arebased on the stationary assumption, however the CRIX is far from stationaryas observed in Figure 1.1. Therefore we need first to transform the originaldata into stationary time series through the use of mathematical transforma-tions. Such transformations includes detrending, seasonal adjustment and etc,the most general class of models amongst them is ARIMA fitting, which will beexplained in next section 1.2.In practice, the difference between consecutive observations was generally com-puted to make a time series stationary. Such transformations can help stabilizethe mean by removing the changes in the levels of a time series, therefore re-moving the trend and seasonality. Here the log returns of CRIX are computed9
015 2016 − . − . . . . Figure 1.2: The log returns of CRIX index from Aug 2th, 2014 to April 6th,2016 econ crixfor further analysis, we remove the unequal variances using the log of the dataand take difference to get rid of the trend component. Figure 1.2 shows the timeseries plot of daily log returns of the CRIX index (henceafter CRIX returns),with the mean is -0.0004 and volatility is 0.0325.We continue to investigate distributional properties. We have the histogram ofCRIX returns plotted in the left panel of Figure 1.3, compared with the normaldensity function plotted in blue. The right panel is QQ plot of CRIX dailyreturns. We can conclude that the CRIX returns is not normal distributed.Another approach widely used in density estimation is kernel density estima-tion. Furthermore, there are various methods to test if sample follows a specifcdistribution, for example Kolmogorov-Smirnoff test and Shapiro-Test.10 istogram of ret D en s i t y −0.2 0.0 0.1 0.2 −3 −1 0 1 2 3 − − − Normal Q−Q Plot
Theoretical Quantiles S a m p l e Q uan t il e s Figure 1.3: Histogram and QQ plot of CRIX returns. econ crix
The ARIMA( p, d, q ) model with p standing for the lag order of the autoregressivemodel, d is the degree of differencing and q is the lag order of the moving averagemodel, is given by (for d = 1)∆ y t = a ∆ y t − + a ∆ y t − + . . . + a p ∆ y t − p + ε t + b ε t − + b ε t − + . . . + b q ε t − q (1.1)or a ( L )∆ y t = b L ε t (1.2)where ∆ y t = y t − y t − is the differenced series and can be replaced by higherorder differencing ∆ d y t if necessary. L is the lag operator and ε t ∼ N(0 , σ ).There are two approaches to identify and fit an appropriate ARIMA( p, d, q )model. The first one is the Box-Jenkins procedure (subsection 1.2.1), anotherone to select models is selection criteria like Akaike information criterion (AIC)and Bayesian or Schwartz Information criterion (BIC), see subsection 1.2.2.11 .2.1 Box-Jenkins Procedure The Box-Jenkins procedure comprises the following stages:1. Identification of lag orders p, d and q .2. Parameter estimation3. Diagnostic checkingA detailed illustration of each stages can be found in the textbook of Box et al.(2015).In the first identification stage, one needs first to determine the degree of inte-gration d . Figure 1.2 shows that the CRIX returns are generally stationary overtime. As well as looking at the time plot, the sample autocorrelation function(ACF) is also useful for identifying the non-stationary time series. The valuesof ACF will drop to zero relatively quickly compared to the non-stationary case.Furthermore, the unit root tests can be more objectively to determine if differ-encing is required. For instance, the augmented Dickey-Fuller (ADF) test andKPSS test, see Dickey and Fuller (1981) and Kwiatkowski et al. (1992) for moretechnical details.Given d , one identifies the lag orders ( p, q ) by checking ACF plots to find thetotal correlation between different lag functions. In an MA context, there isno autocorrelation between y t and y t − q − , the ACF dies out at q . A secondinsight one obtain is from the partial autocorrelation function (PACF). For anAR( p ) process, when the effects of the lags y t − , y t − , . . . , y t − p − are excluded,the autocorrelation between y t and y t − p is zero. Hence an PACF plot for p = 1will drop at lag 1. We exhibit the discussion thus far by analyzing the daily log return of CRIXintroduced in subsection 1.1.2. The stationarity of the return series is tested byADF (null hypothesis: unit root) and KPSS (null hypothesis: stationary) tests.The p -values are 0.01 for ADF test, 0.1 for KPSS test. Hence one concludesstationarity on the level d = 0.The next step is to choose the lag orders of p and q for the ARIMA model. Thesample ACF and PACF are calculated and depicted in Figure 1.4, with bluedashed lines as 95% limits. The results suggest that the CRIX log returns arenot random. The Ljung-Box test statistic for examining the null hypothesis of12 − . . . . . . . Lag S a m p l e A u t o c o rr e l a t i on − . − . − . . . . . Lag S a m p l e P a r t i a l A u t o c o rr e l a t i on Figure 1.4: The sample ACF and PACF plots of daily CRIX returns from Aug2th, 2014 to April 6th, 2016, with lags = 20. econ arimaindependence yields a p -value of 0.0017. Hence one rejects the null hypothesisand suggests that the CRIX return series has autocorrelation structure.The ACF pattern in Figure 1.4 suggests that the existence of strong autocorre-lations in lag 2 and 8, partial autocorrelation in lag 2, 6 and 8. These resultssuggest that the CRIX return series can be modeled by some ARIMA process,for example ARIMA(2 , , AIC ( M ) = − L ( M ) + 2 p ( M ) (1.3) BIC ( M ) = − L ( M ) + p ( M ) log n (1.4)where n is the number of observations, p ( M ) is the number of parameters inmodel M and L ( M ) represents the likelihood function of the parameters eval-uated at the Maximum Likelihood Estimation (MLE).The first terms − L ( M ) in equation (1.3) and (1.4) reflect the goodness offit for MLE, while the second terms stand for the model complexity. Therefore13IC and BIC can be viewed as measures that combine fit and complexity. Themain difference between two measures is the BIC is asympototically consistentwhile AIC is not. Compared with BIC, AIC tends to overparameterize. We start with ARIMA(1 , ,
1) as an example, fit the ARIMA(1 , ,
1) modelderived from equation (1.1), y t = a y t − + ε t + b ε t − The estimated parameters are: ˆ a = 0 . . b = − . . y t represents the CRIX returns.In the third stage of Box-Jenkins procedure one evaluates the validity of theestimated model. The results of diagnostic checking is reported in the threediagnostic plots of Figure 1.5. The upper panel is the standardized residuals,the middle one is the ACF of residuals and the lower panel is the Ljung-Box teststatistic for the null hypothesis of residual independence. One observes that thesignificant autocorrelations of the model residuals appear at lag of 2, 3, 6 and8, and the low p -values of the Ljung-Box test statistic after lag 1. We cannotreject the null hypothesis at these lags, hence ARIMA(1 , ,
1) model is not theenough to get rid of the serial dependence. A more appropriate lag orders isneeded for better model fitting.Nevertheless, model diagnostic checking is often used together with model selec-tion criteria. In practice, these two approaches complement each other. Basedon the discussion results of Figure 1.4 in subsection 1.2.2, we select a combina-tion of ( p, d, q ) with d = { , } and p, q = { , , , , , } . A calculation of theAIC and BIC for each model find out the best six models listed in Table 1.2. Ingeneral, an ARIMA(2,0,2) model y t = c + a y t − + a y t − + ε t + b ε t − + b ε t − (1.5)performs best. Its diagnostic plots are plotted in Figure 1.6 and look verygood, the significant p -values of Ljung-Box test statistic suggest the indepen-dence structure of model residuals. Furthermore, the estimate of each elementin equation (1.5) is reported in Table 1.3.With the identified ARIMA model and its estimated parameters, we predict theCRIX retures for the next 30 days under the ARIMA(2,0,2) model. The out-of-sample prediction result is shown in Figure 1.7. The 95% confidence bandsare computed using a rule of thumb of ”prediction ± ∗ standard deviation”.14 tandardized Residuals Time0 100 200 300 400 500 600 − − . . . Lag A C F ACF of Residuals . . . p values for Ljung−Box statistic lag p v a l ue Figure 1.5: Diagnostic checking result of ARIMA(1,0,1). econ arima15RIMA model selected AIC BICARIMA(2,0,0) -2468.83 -2451.15ARIMA(2,0,2) -2474.25 -2447.73ARIMA(2,0,3) -2472.72 -2441.78ARIMA(4,0,2) -2476.35 -2440.99ARIMA(2,1,1) -2459.15 -2441.47ARIMA(2,1,3) -2464.14 -2437.62Table 1.2: The ARIMA model selection with AIC and BIC.econ arimaCoefficients Estimate Standard deviationintercept c -0.0004 0.0012 a -0.6989 0.1124 a -0.7508 0.1191 b b tandardized Residuals Time0 100 200 300 400 500 600 − − . . . Lag A C F ACF of Residuals . . . p values for Ljung−Box statistic lag p v a l ue Figure 1.6: Diagnostic checking result of ARIMA(2,0,2). econ arima17
100 200 300 400 500 600 − . − . . . . days l og r e t u r n Figure 1.7: CRIX returns and predicted values. The confidence bands are reddashed lines. econ arima
Homoskedasticity is a frequently used assumption in the framework of time se-ries analysis, that is, the variance of all squared error terms is assumed to beconstant through time, see Brooks (2014). Nevertheless we can observe het-eroskedasticity in many cases when the variances of the data are different overdifferent periods.In subsection 1.2.3 we have built an ARIMA model for the CRIX return seriesto model intertemporal dependence. Although the ACF of model residuals hasno significant lags as evidenced by the large p -values for the Ljung-Box testin Figure 1.6, the time series plot of residuals shows some clusters of volatil-ity. To be more specific, we display the squared residual plot of the selectedARIMA(2,0,2) model in Figure 1.8.To incorporate the univariate heteroskedasticity, we first fit an ARCH (AutoRe-gressive Conditional Heteroskedasticity) model in subsection 1.3.1. In subsec-tion 1.3.2, its generalization, the GARCH (Generalized AutoRegressive Condi-tional Heteroskedasticity) model, provides even more flexible volatility pattern.In addition, a variety of extensions of the standard GARCH models will be ex-plored in subsection 1.3.3. 18
015 2016 . . . . . . Index
Figure 1.8: The squared ARIMA(2,0,2) residuals of CRIX returns.econ vola
The ARCH( q ) model introduced by Engle (1982) is formulated as, ε t = Z t σ t Z t ∼ N (0 , σ t = ω + α ε t − + . . . + α p ε t − p (1.6)where ε t is the model residual and σ t is the variance of ε t conditional on theinformation available at time t . It should be noted that the parameters shouldsatisfy α i > , ∀ i = 1 , . . . , p . The assumption of (cid:80) pi α i < σ t is asymptotically stationary over time.Based on the estimation results of subsection 1.2.3, we proceed to examinethe heteroskedasticity effect observed in Figure 1.8. The model residual ε t inequation (1.5) is used to test for ARCH effects using ARCH LM (Lagrange mul-tiplier) test, the small p -value of 2 . e −
16 cannot reject its null hypothesis of noARCH effects. Another approach we can use is the Ljung-Box test for squaredmodel residuals, see Tsay (2005). These two tests show similar result as thesmall p -value of Ljung-Box test statistic indicates the dependence structure of19 . . . . . . Lag S a m p l e A u t o c o rr e l a t i on − . . . . . . . Lag S a m p l e P a r t i a l A u t o c o rr e l a t i on Figure 1.9: The ACF and PACF of squared residuals of ARIMA(2,0,2) model.econ vola ε t , .To determine the lag orders of ARCH model, we display the ACF and PACF ofsquared residuals in Figure 1.9. The autocorrelations display a cutoff after thefirst two lags as well as some remaining lags are significant. The PACF plot inthe right panel has a significant spike before lag 2. Therefore the lag orders ofARCH model should be at least 2.We fit the ARCH models to the residuals using candidate values of q from 1 to4, where all models are estimated by MLE based on the stochastic of equation(1.6). The results of model comparison are contained in Table 1.4. The Loglikelihood and information criteria jointly select an ARCH(3) model, with theestimated parameters presented in Table 1.5. All the parameters except for thethird one are significant at the 0.1% level. Bollerslev (1986) further extended ARCH model by adding the conditional het-eroskedasticity moving average items in equation (1.6), the GARCH model in-dicates that the current volatility depends on past volatilities σ t − i and observa-20odel Log Likelihood AIC BICARCH(1) 1281.7 -2567.4 -2558.6ARCH(2) 1283.4 -2560.8 -2547.6ARCH(3) 1291.6 -2575.2 -2557.5ARCH(4) 1288.8 -2567.5 -2545.4Table 1.4: Estimation result of ARIMA-ARCH models. econ archCoefficients Estimates Standard deviation Ljung-Box test statistic ω . (cid:63) α . (cid:63) α α . (cid:63) Table 1.5: Estimation result of ARIMA(2,0,2)-ARCH(3) model, with significantlevel is 0.1%. econ arch21ARCH models Log likelihood AIC BICGARCH(1,1) 1305.355 -4.239 -4.210GARCH(1,2) 1309.363 -4.249 -4.213GARCH(2,1) 1305.142 -4.235 -4.199GARCH(2,2) 1309.363 -4.245 -4.202Table 1.6: Comparison of GARCH model, orders up to p = q = 2.econ garchtions of model residual ε t − j .The standard GARCH( p, q ) is written as, ε t = Z t σ t Z t ∼ N (0 , σ t = ω + p (cid:88) i =1 β i σ t − i + q (cid:88) j =1 α j ε t − j (1.7)with the condition that, ω > α i ≥ , β i ≥ p (cid:88) i =1 β i + q (cid:88) j =1 α j < p = q = 1 is sufficient in most cases.The comparison of different GARCH models is reported in Table 1.6, the selec-tion of lag orders up to p = q = 2. It shows that a GARCH(1,2) model performsslightly better than the other ones through the comparison of Log Likelihoodand information criteria. Using the GARCH(1,2) model as selected, σ t = ω + β σ t − + α ε t − + α ε t − (1.9)We obtain the estimation results presented in Table 1.7. The conditions ω > ω . e −
05 4 . e −
05 2 . ∗ α . e −
01 3 . e −
02 4 . ∗∗∗ β . e −
02 8 . e −
02 0 . β . e −
01 8 . e −
02 7 . ∗∗∗ Table 1.7: Estimation result of ARIMA(2,0,2)-GARCH(1,2) model. ∗ representssignificant level of 5% and ∗ ∗ ∗ of 0.1%. econ garchCoefficients Estimates Standard deviation Ljung-Box test statistic ω . e −
05 2 . e −
05 2 . ∗ α . e −
01 2 . e −
02 4 . ∗∗∗ β . e −
02 3 . e −
02 20 . ∗∗∗ Table 1.8: Estimation result of ARIMA(2,0,2)-GARCH(1,1) model. ∗ representssignificant level of 5% and ∗ ∗ ∗ of 0.1%. econ garchand α + β + β = 0 . < β is not significant using from the Ljung-Box test statistic.Aforementioned, GARCH(1,1) is sufficient in most cases, we proceed further tofit the model residuals of ARIMA to the GARCH(1,1) model and present the es-timation result in Table 1.8. The GARCH(1,1) outperforms the ARCH(3) modelwith all the estimated parameters are significant. The estimated parameters ω > α + β = 0 . < (cid:80) pi =1 β i + (cid:80) qj =1 α j revealsthe persistence of volatility, we know that the GARCH(1,1) is more persistent involatility compared than GARCH(1,2). Therefore for simplicity, GARCH(1,1)is suggested for further analysis in CRIX dynamics.We have the model residuals of ARMA-GARCH process plotted in Figure23.10. Figure 1.11 displays the ACF and PACF plots for model residuals of − . − . . . Index
Figure 1.10: The ARIMA(2,0,2)-GARCH(1,1) residuals. econ garchARIMA(2,0,2)-GARCH(1,1) process. We can see all the values are within thebands, which suggests that the model residuals have no dependence structureover different lags. Therefore GARCH(1,1) model is sufficient enough to explainthe heteroskedasticity effect discussed in subsection 1.3.1.
As we observed in Figure 1.2, the return series of CRIX exhibits leptokurtosis.We further check the QQ-plot in Figure 1.12, which suggests the fat tail of modelresiduals using ARIMA(2,0,2)-GARCH(1,1) process. The Kolmogorov distancebetween residuals of the selected model and normal distribution is reported inTable 1.9. With the small p -value of Kolmogorov-Smirnov test statistic, wereject the null hypothesis that the model residuals are drawn from the normaldistribution.We impose the assumption on the residuals with student distribution, that is,applying the non-normal assumption on Z t in equation (1.7). With Z t ∼ t ( d )to replace the normal assumption of Z t in GARCH model, the MLE is imple-mented for model estimation. The results for ARIMA- t -GARCH process are24 . . . . . . Lag S a m p l e A u t o c o rr e l a t i on − . − . . . . Lag S a m p l e P a r t i a l A u t o c o rr e l a t i on Figure 1.11: The ACF and PACF plots for model residuals of ARIMA(2,0,2)-GARCH(1,1) process. econ garch −3 −2 −1 0 1 2 3 − − − qnorm − QQ Plot Theoretical Quantiles S a m p l e Q uan t il e s Figure 1.12: The QQ plots of model residuals of ARIMA-GARCH process.econ garch25odel Kolmogorov distance P-valueARIMA-GARCH 0.495 2 . e − ω . e −
05 5 . e −
05 1 . α . e −
01 1 . e −
01 1 . (cid:5) β . e −
01 6 . e −
02 12 . ∗∗∗ ξ . e + 00 3 . e −
01 7 . ∗∗∗ Table 1.10: Estimation result of ARIMA(2,0,2)- t -GARCH(1,1) model. (cid:5) repre-sents significant level of 10% and ∗ ∗ ∗ of 0.1%. econ tgarchrepresented in Table 1.10. The shape parameter ξ controls the height and fat-tail of density function, therefore different shape of distribution function. It isobvious that the shape parameter is significantly from zero. The QQ plot inFigure 1.13 indicates that the residuals are quite close to student- t distribution.The ACF and PACF plots for ARIMA- t -GARCH is following in Figure 1.14,with all values stay inside the bounds. Hence the residuals and their varianceare uncorrelated.In addition to the property of leptokurtosis, leverage effect is commonly observedin practice. According to a large literature, such as Engle and Ng (1993), theleverage effect refers to the volatility of an asset tends to respond asymmetricallywith negative or positive shocks, declines in prices or returns are accompanied bylarger increase in volatility compared with the decrease of volatility associatedwith rising asset market. Although the introduced GARCH model successfullysolve the problem of volatility clustering, the σ t cannot capture the leverageeffect.To overcome this, the exponential GARCH (EGARCH) model with standard in-novations proposed by Nelson (1991) can be expressed in the following nonlinearform, 26 − − qstd − QQ Plot Theoretical Quantiles S a m p l e Q uan t il e s Figure 1.13: The QQ plot of t -GARCH(1,1) model. econ tgarch . . . . . . Lag
ACF of Squared Residuals − . − . . . . Lag
PACF of Squared Residuals
Figure 1.14: The ACF and PACF plots for model residuals of ARIMA(2,0,2)- t -GARCH(1,1) process. econ tgarch27oefficients Estimates Standard deviation Ljung-Box test statistic ω . e −
05 4 . e −
05 2 . ∗ α . e −
01 3 . e −
02 4 . ∗ β . e −
02 8 . e −
02 0 . φ . e −
01 8 . e −
02 7 . ∗ Table 1.11: Estimation result of ARIMA(2,0,2)- t -EGARCH(1,1) model. ∗ rep-resents significant level of 5% and ∗ ∗ ∗ of 0.1%. econ tgarch ε t = Z t σ t Z t ∼ N (0 , σ t ) = ω + p (cid:88) i =1 β i log( σ t − i ) + q (cid:88) j =1 g j ( Z t − j ) (1.10)where g j ( Z t ) = α j Z t + φ j ( | Z t − j | − E | Z t − j | ) with j = 1 , , . . . , q . When φ j = 0,we have the logarithmic GARCH (LGARCH) model from Geweke (1986) andPantula (1986). However LGARCH is not popular due to the high value of thefirst few ACF of ε .Based on the results shown in Figure 1.12, we fit a EGARCH(1,1) modelwith student t distributed innovation term. The estimation results using theARIMA(2,0,2)- t -EGARCH(1,1) model is reported in Table 1.11.The ACF and PACF of ARIMA- t -EGARCH residuals are plotted in Figure 1.15.The small values indicate independent structure of model residuals. We furthercheck the QQ plot in Figure 1.16, the model residuals fit better to student- t distribution compared with normal case of Figure 1.12.We compare the model performance of selected GARCH models in Table 1.12,where the log likelihood and information criteria select the t -GARCH(1,1) model.With the selected ARIMA(2,0,2)- t -GARCH(1,1) model, we conduct a 30-stepahead forecast. The forecast performance is plotted in Figure 1.17 with the 95%confidence bands marked in blue. 28 . . . . . . Lag S a m p l e A u t o c o rr e l a t i on − . − . . . . Lag S a m p l e P a r t i a l A u t o c o rr e l a t i on Figure 1.15: The ACF and PACF for model residuals of ARIMA- t -EGARCHprocess. econ tgarch −6 −4 −2 0 2 4 6 − − std − QQ Plot Theoretical Quantiles S a m p l e Q uan t il e s G A RCH m ode l : e G A RCH
Figure 1.16: The QQ plot of t -EGARCH(1,1) model. econ tgarch29ARCH models Log likelihood AIC BICGARCH(1,1) 1305.355 -4.239 -4.210 t -GARCH(1,1) 1309.363 -4.249 -4.213 t -EGARCH(1,1) 1305.142 -4.235 -4.199Table 1.12: Comparison of the variants of GARCH model.econ tgarch − . − . . . Index x Prediction with confidence intervals X^ t + h X^ t + h - t + h + Figure 1.17: The 30-step ahead forecast using ARIMA- t -GARCH process.econ tgarch30 .4 Multivariate GARCH Model While modelling volatility of CRIX returns has been the main center of atten-tion, understanding the co-movements of different indices in CRIX family are ofgreat importance. In this subsection we proceed further to MGARCH (multi-variate GARCH) model, whose model specification allows for a flexible dynamicstructure. It provides us a tool to analyze the volatility and co-volatility dy-namic of asset returns in a portfolio.
Consider the error term ε t with E ( ε t ) = 0 and the conditional covariance matrixgiven by the ( d × d ) positive definite matrix H t , we assume that, ε t = H t η t (1.11)where H t can be obtained by Cholesky factorization of H t . η t is an iid innova-tion vector such that, E ( η t ) = 0 (1.12) Var ( η t ) = E ( η t η (cid:62) t ) = I d with I d is the identity matrix with order of d .So far the standard MGARCH framework is defined, different specification of H t yields various parametric formulations. The first MGARCH model was directlygeneralization of univariate GARCH model proposed by Bollerslev et al. (1988),which is called VEC model. Let vech ( · ) denotes an operator that stacks thecolumns of the lower triangular part of its argument square matrix. The VECmodel is formulated as, vech ( H t ) = c + q (cid:88) j =1 A j vech (cid:0) ε t − j ε Tt − j (cid:1) + p (cid:88) i =1 B i vech ( H t − i ) (1.13)where A j and B i are parameter matrices and c is a vector of constant compo-nents.However it is difficult to ensure the positive definiteness of H t in VEC modelwithout strong assumptions on parameter, Engle and Kroner (1995) proposedthe BEKK specification (defined by Baba et al. (1990)) that easily imposespositive definite under weak assumption. The form is given by, H t = CC (cid:62) + K (cid:88) k =1 q (cid:88) j =1 A (cid:62) kj ε t − j ε Tt − j A kj + K (cid:88) k =1 p (cid:88) i =1 B (cid:62) ki H t − i B ki (1.14)31here C is a lower triangular parameter matrix.Other than the direct generalization of GARCH models introduced above, thenonlinear combination of univariate GARCH models are more easily estimable.This kind of MGARCH model are based on the decomposition of the condi-tional covariance matrix into conditional standard deviations and correlations.The simplest is Constant Conditional Correlation (CCC) model introduced byBollerslev (1990). The conditional correlation matrix of CCC model is timeinvariant, can be expressed as, H t = D t P D t (1.15)where D t denotes the diagonal matrix with the conditional variances along thediagonal. Therefore { D t } ii = σ it , with each σ it is a univariate GARCH model.To overcome this limitation, Engle (2002) proposed a Dynamic Conditional Cor-relation (DCC) model that allows for dynamic conditional correlation structure.Rather than assuming that the conditional correlation ρ ij between the i -th and j -th component is constant in P , it is now the ij -th element of the matrix P t which is defined as, H t = D t P t D t (1.16) P t = ( I (cid:12) Q t ) − Q t ( I (cid:12) Q t ) − with Q t = (1 − a − b ) S + aε t − ε (cid:62) t − + b Q t − (1.17)where a is positive and b is a non-negative scalar such that a + b < S isunconditional matrix of ε t , Q is positive definite. Figure 1.18 presents the time path of price series for each indices of CRIX fam-ily. As observed, the price processes are slightly different after October of 2015.Before that, three indices present similar trend over time. This indicates thatthe ARIMA(2,0,2) model selected for CRIX return to remove the intertemporaldependence can be implemented to ECRIX and EFCRIX as well, the model se-lection and estimation procedure are similar to the way of CRIX. In this section,the ARIMA fitting residuals for each index are used for the following analysis .The DCC-GARCH(1,1) model estimation is employed by the QMLE based onthe stochastic process of equations (1.16) and (1.17). One of the assumptions isthe iid innovation term of η t in equation 1.11. We check the standard residuals32
015 2016
Figure 1.18: The price process of CRIX (black), ECRIX (blue) and EFCRIX(red). econ ccgar − − S e r i e s − S e r i e s − S e r i e s Figure 1.19: The standard error of DCC-GARCH model, with CRIX(upper),ECRIX (middle) and EFCRIX(lower). econ ccgar33f DCC-GARCH(1,1) in Figure 1.19, which displays white noise pattern to someextent.The estimation results are contained in Table 1.13.
Index type Coef.
Estimates Std Error t test p -value CRIX µ ω α β ECRIX µ ω α β EFCRIX µ ω α β DCC a b µ and the constant ω from equation (1.7). Each σ it is a univariate34ARCH(1,1) model, σ CRIX,t = 0 . ε CRIX,t − + 0 . σ CRIX,t − σ ECRIX,t = 0 . ε ECRIX,t − + 0 . σ ECRIX,t − σ EF CRIX,t = 0 . ε EF CRIX,t − + 0 . σ EF CRIX,t − The matrix Q t of equation (1.17) is, Q t = (1 − . − . S + 0 . ε t − ε (cid:62) t − + 0 . Q t − with the unconditional covariance matrix S , S = .
994 0 .
994 0 . .
994 0 .
994 0 . .
994 0 .
993 0 . Based on the estimation of DCC-GARCH(1,1) model, the estimated and realizedvolatility are shown in Figure 1.20. The volatility clustering feature is seengraphically from the presence of the sustained periods of high or low volatility, the large changes tend to cluster together. In general, the DCC-GARCH(1,1)fitting is satisfactory as it captures almost all significant volatility changes.Figure 1.21 presents the estimated autocorrelation dynamics for each of thefollowing series (CRIX v.s. ECRIX, CRIX v.s. EFCRIX and ECRIX v.s.EFCRIX) respectively. We can observe that three autocorrelation dynamicsare similar as we expect. To be more specific, three indices are highly positivecorrelated during the whole sample period. As evidenced in Figure 1.18, thetime period after the third semester of 2015 is characterized by relatively lowercorrelation between three indices, which in turn explains the slightly declines inthe autocorrelation dynamics.To check the adequacy of MGARCH model, we compare the ACF and PACFplots between the premodel squared residual ε t and the DCC-GARCH(1,1)squared residuals. Figure 1.22 and Figure 1.23 show the GARCH effect islargely eliminated by DCC-GARCH model. Most of the lags are within the95% confidence bands marked in blue.35
100 200 300 400 500 600 . . . days c r i x . . . days e c r i x . . . days e f c r i x Figure 1.20: The estimated volatility (black) and realized volatility (grey) us-ing DCC-GARCH model, with CRIX (upper), ECRIX(middle) and EFCRIX(lower). econ ccgar36
015 2016 . . CRIX v.s. ECRIX . . CRIX v.s. EFCRIX . . ECRIX v.s. EFCRIX
Figure 1.21: The dynamic autocorrelation between three CRIX indices: CRIX,ECRIX and EFCRIX estimated by DCC-GARCH model. econ ccgar37 . . . Lag CR I X ACF of Premodel Residuals . . . Lag CR I X ACF of DCC Residuals . . . Lag E CR I X ACF of Premodel Residuals . . . Lag E CR I X ACF of DCC Residuals . . . Lag E F CR I X ACF of Premodel Residuals . . . Lag E F CR I X ACF of DCC Residuals
Figure 1.22: The comparison of ACF between premodel squared residuals andDCC squared residuals. 38
10 15 20 − . . . . Lag CR I X PACF of Premodel Residuals − . . Lag CR I X PACF of DCC Residuals − . . . . Lag E CR I X PACF of Premodel Residuals − . . Lag E CR I X PACF of DCC Residuals − . . . . Lag E F CR I X PACF of Premodel Residuals − . . Lag E F CR I X PACF of DCC Residuals
Figure 1.23: The comparison of PACF between premodel squared residuals andDCC squared residuals. 39
100 200 300 400 500 600 700 . . . days c r i x . . . days e c r i x . . . days e f c r i x Figure 1.24: 100-step ahead forecasts of estimated volatility using DCC-GARCH(1,1) model.Moreover, we conduct a 100-step ahead forecast of estimated volatility as il-lustrated in Figure 1.24, the forecast behavior generally follows the estimateddynamics (black line).
Understanding the dynamics of asset returns is of great importance, it is thefirst step for practitioners go further with analysis of cryprocurrency markets,like volatility modelling, option pricing and forecasting etc. The motivation be-hind trying to identify the most accurate econometric model, to determine the40arameters that captures economic behavior arises from the desire to producethe dynamic modeling procedure.In general it is difficult to model asset returns with basic time series model dueto the features of heavy tail, correlated for different time period and volatilityclustering. Here we provide a detailed step-by-step econometric analysis usingthe data of CRIX family: CRIX, ECRIX and EFCRIX. The time horizon forour data sample is from 01/08/2014 to 06/04/2016.At first, an ARIMA model is implemented for removing the intertemporal de-pendence. The diagnostic checking stage helps to identify the most accurateeconometric model. We then observe the well-known volatility clustering phe-nomenon from the estimated model residuals. Hence volatility models such asARCH, GARCH and EGARCH are introduced to eliminate the effect of het-eroskedasticity. Additionally, it is observed that the GARCH residuals showsfat-tail properties. We impose the assumption on the residuals with student- t distribution, t -GARCH(1,1) is selected as the best fitted model for all oursample of data based on measures of Log likelihood, AIC and BIC. Finally, amultivariate volatility model, DCC-GARCH(1,1), in order to show the volatilityclustering and time varying covariances between three CRIX indices.With the econometric model on the hand, it facilitates the practitioners to makefinancial decisions, especially in the context of pricing and hedging of derivativeinstruments. 41 ibliography Akaike, H. (1974). A new look at the statistical model identification.
AutomaticControl, IEEE Transactions on , 19(6):716–723.Baba, Y., Engle, R., Kraft, D., and Kroner, K. (1990). Multivariate simultane-ous generalized arch, department of economics, university of california at sandiego. Technical report, Working Paper.Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity.
Journal of econometrics , 31(3):307–327.Bollerslev, T. (1990). Modelling the coherence in short-run nominal exchangerates: a multivariate generalized arch model.
The review of economics andstatistics , pages 498–505.Bollerslev, T., Engle, R. F., and Wooldridge, J. M. (1988). A capital asset pric-ing model with time-varying covariances.
The Journal of Political Economy ,pages 116–131.Box, G. E., Jenkins, G. M., Reinsel, G. C., and Ljung, G. M. (2015).
Timeseries analysis: forecasting and control . John Wiley & Sons.Brooks, C. (2014).
Introductory econometrics for finance . Cambridge universitypress.Dickey, D. A. and Fuller, W. A. (1981). Likelihood ratio statistics for autoregres-sive time series with a unit root.
Econometrica: Journal of the EconometricSociety , pages 1057–1072.Engle, R. (2002). Dynamic conditional correlation: A simple class of multivari-ate generalized autoregressive conditional heteroskedasticity models.
Journalof Business & Economic Statistics , 20(3):339–350.Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with esti-mates of the variance of united kingdom inflation.
Econometrica: Journal ofthe Econometric Society , pages 987–1007.Engle, R. F. and Kroner, K. F. (1995). Multivariate simultaneous generalizedarch.
Econometric theory , 11(01):122–150.42ngle, R. F. and Ng, V. K. (1993). Measuring and testing the impact of newson volatility.
The journal of finance , 48(5):1749–1778.Franke, J., H¨ardle, W. K., and Hafner, C. M. (2015).
Statistics of FinancialMarkets: An Introduction . Springer.Geweke, J. (1986). Modelling the persistence of conditional variances: A com-ment.
Econometric Reviews , 5(1):57–61.Hamilton, J. D. (1994).
Time series analysis , volume 2. Princeton universitypress Princeton.H¨ardle, W. K. and Trimborn, S. (2015). Crix or evaluating blockchain basedcurrencies. Technical report, SFB 649 Discussion Paper.Kwiatkowski, D., Phillips, P. C., Schmidt, P., and Shin, Y. (1992). Testing thenull hypothesis of stationarity against the alternative of a unit root: How sureare we that economic time series have a unit root?
Journal of econometrics ,54(1):159–178.L¨utkepohl, H. (2005).
New introduction to multiple time series analysis .Springer Science & Business Media.Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A newapproach.
Econometrica: Journal of the Econometric Society , pages 347–370.Pantula, S. G. (1986). Comment.
Econometric Reviews , 5(1):71–74.Rachev, S. T., Mittnik, S., Fabozzi, F. J., Focardi, S. M., and Ja ˆAˇsi´c, T. (2007).
Financial econometrics: from basics to advanced modeling techniques , volume150. John Wiley & Sons.Schwarz, G. et al. (1978). Estimating the dimension of a model.
The annals ofstatistics , 6(2):461–464.Tsay, R. S. (2005).