[PDF] CRIX an index for cryptocurrencies

Abstract

Full PDF

CCRIX an Index for cryptocurrencies Wolfgang Karl Härdle September 22, 2020

The cryptocurrency market is unique on many levels: Very volatile, frequently changingmarket structure, emerging and vanishing of cryptocurrencies on a daily level. Following itsdevelopment became a diﬃcult task with the success of cryptocurrencies (CCs) other thanBitcoin. For ﬁat currency markets, the IMF oﬀers the index SDR and, prior to the EUR, theECU existed, which was an index representing the development of European currencies. Indexproviders decide on a ﬁxed number of index constituents which will represent the marketsegment. It is a challenge to ﬁx a number and develop rules for the constituents in view of themarket changes. In the frequently changing CC market, this challenge is even more severe. Amethod relying on the AIC is proposed to quickly react to market changes and therefore enableus to create an index, referred to as CRIX, for the cryptocurrency market. CRIX is chosenby model selection such that it represents the market well to enable each interested partystudying economic questions in this market and to invest into the market. The diversiﬁednature of the CC market makes the inclusion of altcoins in the index product critical toimprove tracking performance. We have shown that assigning optimal weights to altcoinshelps to reduce the tracking errors of a CC portfolio, despite the fact that their market cap ismuch smaller relative to Bitcoin. The codes used here are available via . JEL classiﬁcation : C51, C52, G10 Financial support from the Deutsche Forschungsgemeinschaft via CRC 649 ”Economic Risk”, IRTG 1792”High Dimensional Non Stationary Time Series”, as well as the Czech Science Foundation under grant no.19-28231X, the Yushan Scholar Program and the European Union’s Horizon 2020 research and innovationprogram ”FIN-TECH: A Financial supervision and Technology compliance training programme” underthe grant agreement No 825215 (Topic: ICT-35-2018, Type of action: CSA), Humboldt-Universität zuBerlin, is gratefully acknowledged. Department of Statistics & Applied Probability, National University of Singapore, Singapore and Humboldt-Universität zu Berlin, C.A.S.E. - Center for Applied Statistics and Economics, Spandauer Str. 1, 10178Berlin, Germany, tel: +65 6516-1245, E-Mail: [email protected] Humboldt-Universität zu Berlin, C.A.S.E. - Center for Applied Statistics and Economics, Spandauer Str. 1,10178 Berlin, Germany and SKBI School of Business, Singapore Management University, 50 StamfordRoad, Singapore 178899, tel: +49 (0)30 2093-5630, E-Mail: [email protected] a r X i v : . [ q -f i n . S T ] S e p eywords : Index construction, model selection, bitcoin, cryptocurrency, CRIX, altcoinThis is a post-peer-review, pre-copyedit version of an article published in the Journal ofEmpirical Finance, Vol. 49, 107-122 (2018). The ﬁnal authenticated version is available onlineat: https://doi.org/10.1016/j.jempﬁn.2018.08.0042 Introduction

More and more companies have started oﬀering digital payment systems. Smartphoneshave evolved into a digital wallet, telephone companies oﬀer banking related services: clearsignal that we are about to enter the era of digital ﬁnance. In fact we are already actinginside a digital economy. The market for e- x ( x = “ﬁnance,” “money,” “book,” you nameit . . . ) has not only picked up enormous momentum but has become standard for drivinginnovative activities in the global economy. A few clicks at y and payment at z bringsour purchase to location w . Own currencies for the digital market were therefore just amatter of time. Due to organizational diﬃculties the idea of the Nobel Laureate Hayek, see hayek_denationalization_1990 of letting companies oﬀer concurrent currencies seemedfor a long time scarcely feasible, but the invention of the Blockchain has made it possibleto bring his vision to life. Cryptocurrencies (CCs) have surfaced and opened up an angletowards this new level of economic interaction. Since the appearance of Bitcoins, several newCCs have spread through the Web and oﬀered new ways of proliferation. Even states acceptthem as a legal payment method or part of economic interaction. E.g., the USA classiﬁes CCsas commodities, kawa_bitcoin_2015 and lately Japan announced that they accept themas a legal currency, econotimes_japans_2016

Obviously, the crypto market is fanningout and shows clear signs of acceptance and deepening liquidity, so that a closer look at itsgeneral moves and dynamics is called for.The transaction graph of Bitcoin (BTC), the Blockchain, has received much attention,see e.g. ron_quantitative_2013 and reid_analysis_2013

Even the economics of BTChas been studied, e.g. bolt_value_2016 and kristoufek_what_2015

To our best knowl-edge, the development of the entire CC market has not been studied so far, only subsam-ples have been taken into account. wang_buzz_2017 studied the variations of 5 CCs. elendner_cross-section_2017 analyzed the top 10 CCs by market capitalization andfound that their returns are weakly correlated with each other. Furthermore, a PrincipalComponent (PC) Analysis, carried out in the same reference, showed 7 out of 10 PC were3ecessary to describe more than 90% of the variance. These ﬁndings indicate the priceevolution of CCs is very diﬀerent from each other. This brings us to the conclusion thatBTC, even though it dominates the market in terms of its market capitalization, can notlead the direction of the market. The movements of other CCs are important too, when oneanalyzes the market. Having a closer look at the diﬀerent CCs, it becomes obvious theyhave diﬀerent kind of missions and technical aspects. Bitcoin pioneered as the token of theﬁrst decentralised, distributed ledger, giving start to multiple interpretations of its natureand purpose: new type of currency, commodity (like gold), alternative asset or innovativetechnology. The currently second most important CC by market capitalization - Ethereum -was created with a particular goal in mind - to power the blockchain based Ethereum platformfor company building (DAO) and smart contract implementation. This idea triggered anunprecedented interest as it allowed companies to enter the ﬁeld without creating their ownblockchain ecosystem. Newcomers could beneﬁt from the existing supporters of the respectiveplatform, which allowed faster entry, adoption and operation. Other CCs, like Ripple (XRP),are intended to fuel the transaction network bridging traditional markets (banks) and thecrypto ecosystem. Ripple also became one of the ﬁrst successful cases of pre-emitted CC,abandoning the idea of decentralisation. Since the appearance of BTC many technologicaladvancements took place. Some CCs are designed for faster (or even immediate) transactions,like Litecoin (LTC), some are more eﬃcient energy-wise, like DASH. Many embraced diﬀerenthashing algorithms, altering the mining process, like Monero. Long ASIC domination is beingdisrupted, Proof-of-work is replaced by Proof-of-Stake, new ways to motivate those providingcomputational power are introduced. Regardless the type of CC, one witnesses a new kindof transaction network with a diﬀerent approach for fees and handling of trust issues. Theintended and actual usage can be interpreted as the business model of the diﬀerent CCs andthe participation in either CC can give advantages over others, white_market_2014

In the ﬁrst month of 2017, CCs other than BTC (altcoins) showed a strong gain in theirmarket capitalization, reducing the dominance of BTC in the market. The ﬁnding of very4iﬀerent movements of CCs and the stronger position of alternative CCs in the marketinfers the necessity of a market index for the CC market for tracking the market movements.Comparing CCs against a market index answers economic questions like which businessmodel is more successful than another one, gained recently compared to other CCs, drivesthe success of the market, is more established. Comparing a CC market index against othermarket indices answers economic and ﬁnancial questions like which market proxy is morevolatile, has more tail risk, attracts more investments. We construct CRIX, a market index(benchmark) which will enable each interested party to study the outlined economic questions,the performance of the CC market as a whole or of single CCs. Studying the stochasticdynamics of CRIX will allow a la limite to create ETFs or contingent claims.Many index providers construct their indices with a ﬁxed number of constituents, seee.g. ftse_ftse_2016 s&p_index_2014 and deutsche_boerse_ag_guide_2013

Ifthe respective index is intended to be a proxy for the performance of a market, this requireshuge trust from economists and investors into the choice of the index constituents by theindex provider. On the other hand, the CRSP index family, derived for the US market, crsp_crsp_2015 has no boundary on the number of index constituents. The number ofconstituents is reviewed daily and adjusted until the index members cover a predeﬁned shareof the market capitalization. Such a dynamic methodology is important in the market of CCssince the number of CCs changes daily. Additionally the market value of CCs often changesfrequently, which increases the market volatility and therefore the need for considering sucha CC for the representation of the market. Our intention is extending the idea behind theCRSP indices. Our ﬁrst goal is constructing a methodology for CRIX which relies on modelselection criteria to receive a proxy for the market and to replace the trust problematic witha statistical methodology. The resulting methodology is dynamic in the number of indexconstituents, like the CRSP indices. By this method only CCs which add informative value tothe index are considered, which makes it representative. If more CCs than BTC are necessaryto fulﬁll this requirement, they will be added. However we are concerned with the dominance5f BTC in an index solely relying on market capitalization. Thus we introduce a secondweighting scheme based on weighting by trading volume. Due to the usage of trading volume,the respective index is constructed in terms of trading focus. If the market participants focusmore on altcoins than on BTC, these receive a higher weight. On the other hand, if themarket focus is truly on BTC, it will receive a high weight in either index. Our second goal,constructing an investable index will be fulﬁlled by the methodology itself due to having asparse index, only consisting of actively traded CCs in a market with low transaction costs.Note that due to the low transaction costs in the CC market, a dynamic methodology createslow additional costs. Additionally to the methodology ensuring an investable index, theproposed trading volume weighting scheme further supports this goal.Investing into an ETF composed of the constituents of CRIX implies some diﬀerencescompared to traditional index investing. In the traditional setting only the constituents arereviewed and replaced on the review date - if necessary - according to the index rules. Indynamic index investing the constituents are also reviewed for their number. This requiresthe manager of the fund to buy and sell more assets on the review date. In a market withhigh transaction costs, this approach is more costly. But the market of CCs has very lowtransaction costs, thus this problem won’t occur in this market.To compute CRIX, the diﬀerences in the log returns of the market against a selection of pos-sible indices is evaluated. The results show, that the AIC works well to evaluate the diﬀerences.It penalizes the index for the number of constituents. For the calculation of the respectivelikelihoods, a non-parametric approach using the epanechnikov_non-parametric_1969 kernel is applied. The proof for the impact of the value of an asset in the market on the AICmethod is given, thus a top-down approach is applied to select the assets for the benchmarksto choose from, where the sorting depends on either market cap or trading volume. Thenumber of constituents is recalculated quarterly to ensure an up-to-date ﬁt to the currentmarket situation. With CRIX one may study the contingent claims and the stochastic natureof this index, chen_econometric_2017 or study the CC market characteristics against6raditional markets, hardle_crix_2015

This paper is structured as follows. Section 2 introduces the topic and reviews the basicsof index construction. In Section 3 the method for dynamic index construction for CRIX isdescribed and Section 4 introduces the remaining rules for CRIX. Section 5 describes furthervariants to create a CRIX family. Their performance is tested in Section 6. In Sections 7and 8 the new method is applied to the German and Mexican stock markets to check theperformance of the methodology against existing indices. The codes used to obtain the resultsin this paper are available via . The basic idea of any price index is to weight the prices of its constituent goods by thequantities of the goods purchased or consumed. The Laspeyres index takes the value of abasket of k assets and compares it against a base period: P L t ( k ) = P ki =1 P it Q i P ki =1 P i Q i (1)with P it the price of asset i at time t and Q i the quantity of asset i at time 0 (the baseperiod). For market indices, such as CRSP, S&P500 or DAX, the quantity Q i is the numberof shares of the asset i in the base period. Multiplied with its corresponding price, themarket capitalization results, hence the constituents of the index are weighted by their marketcapitalizations. These indices are often referred to as benchmarks for their respective market.We deﬁne the term benchmark: Deﬁnition 1.

A benchmark is a measure which consists of a selection of CCs that arerepresenting the market.

But markets change. A company which was representative for market developmentsyesterday might no longer be important today. On top of that, companies can go bankrupt,a corporation can raise the number of its outstanding shares, or trading in it can become7nfrequent. All these situations must produce a change in the index structure, so that themarket is still adequately represented. Hence companies have to drop out of the indexand have to be replaced by others. The index rules determine in which cases such an eventhappens. The formula of Laspeyres (1) can not handle such events entirely because a change ofconstituents will result in a change in the index value that is not due to price changes. Therefore,established price indices like DAX or S&P500, see deutsche_boerse_ag_guide_2013 and s&p_index_2014 respectively, and the newly founded index CRIX( k ), a CRyptocurrencyIndeX, thecrix.de, use the adjusted formula of Laspeyres,CRIX t ( k, β ) = P ki =1 β i,t − l P it Q i,t − l Divisor ( k ) t − l (2)with P , Q and i deﬁned as before, β i,t − l the adjustment factor of asset i found at time point t − l , l indicates that this is the l -th adjustment factor, and t − l the last time point when Q i,t − l , Divisor ( k ) i,t − l and β i,t − l were updated. In the classical setting, β i,t − l is deﬁned to be β i,t − l = 1for all i and l . Anyhow, some indices use β i,t − l to achieve maximum weighting rules, e.g. deutsche_boerse_ag_guide_2013 and mexbol_prices_2013 The

Divisor ensuresthat the index value of CRIX has a predeﬁned value on the starting date. It is deﬁned as

Divisor ( k, β ) = P ki =1 β i P i Q i starting value . (3)The starting value could be any possible number, commonly 100, 1000 or 10000. It ensuresthat a positive or negative development from the base period will be revealed. Wheneverchanges to the structure of CRIX occur, the Divisor is adjusted in such a way that only pricechanges are reﬂected by the index. Deﬁning k and k as number of constituents, it results P k i =1 β i,t − l − P i,t − Q i,t − l − Divisor ( k , β ) t − l − = CRIX t − ( k , β ) = CRIX t ( k , β ) = P k j =1 β j,t − l P j,t Q j,t − l Divisor ( k , β ) t − l . (4)In indices like FTSE, S&P500 or DAX the number of index members is ﬁxed, k = k , see8 tse_ftse_2016 s&p_index_2014 and deutsche_boerse_ag_guide_2013 As longas the goal behind these indices is the reﬂection of the price development of the selectedassets, this is a straightforward approach. But, e.g., DAX is also meant to be an indicatorfor the development of the market as a whole, see jansen_deutsche_1992

This raisesautomatically the question of whether the included assets and the weighting scheme arerepresenting the market. Since the constituents are chosen using a top-down approach,meaning that the biggest companies by market capitalization are included, the intuitiveanswer is yes. But it leaves a sour taste that additional assets may describe the marketmore appropriately. Furthermore diﬀerent weighting schemes provide another view on themarket. One may object by referring to total market indices like the Wilshire 5000, S&P TotalMarket Index or CRSP U.S. Total Market Index, see wilshire_associates_wilshire_2015s&p_dow_2015 and crsp_crsp_2015 that are providing a full description. But ﬁnancialpractice has shown that smaller indices like DAX30 and S&P500 receive more attention inevaluating the movements of their corresponding markets, probably because they are easier toinvest in due to the smaller number of constituents. It is therefore appealing to know whichare the representative assets in a market and which smaller number of index constituentseases the handling of a tracking portfolio. Additionally, one may be concerned that an indexwould include illiquid and non-investable assets which makes the management of a trackingportfolio even more diﬃcult. Figure 1 shows that this is indeed a problem in the CC market.Some CCs have a fairly high market capitalization while their respective trading volume isvery low. This is problematic, because an asset which is not frequently traded can not addenough information to a market index to display market changes and is diﬃcult to trade for aninvestor. Hence, one goal behind constructing CRIX is making it investable by concentratingon liquid CCs:

Deﬁnition 2.

Between investment portfolios with equal performance, the one with the leastassets is preferable.

We react to the goals and problems in two ways: First, these thoughts raise the question9 lllllll lll lll ll lll ll llll lll lllll ll ll ll ll l llllll ll ll lll lllllllllll llllllllll lll lll llll lll l ll l ll ll l ll ll lll lll llll llllllll ll l ll lll ll lllll lll l ll llll llllll lllll lll l llll llll ll l ll l lllllllll ll lll ll llll l l lllll lll l ll llll llll l llll ll l llll l lll lllll llll lll ll ll l lll l lll ll ll lll lll ll lll l ll llll lll l lll llll llll lll ll llll ll lll llll l lll l ll lll l lll ll llll lllll lll llll lll lll l ll llll lll l lll llll lll ll l lll l lll lll ll ll lll lll lll l − Comparison volume and market capitalization log mean market capitalization l og m ean v o l u m e Figure 1: Comparison of the log mean trading volume and log mean market capitalization,both measured in USD, for all CCs in the dataset over the time period 20140401 -20170325 VolMarketCapComparisonwhich value of k is "optimal" for building an investable benchmark for the market. Additionally,especially young and innovative markets may change their structure over time. Therefore, aquantiﬁcation of an accurate CC benchmark with sparse number of constituents is asked for.Since the CC market shows a frequently changing market structure with a huge number ofilliquid CCs, a time varying index selection structure is applied. The later described selectionmethod omits illiquid CCs by construction, because only CCs who show changes in theirreturn series can be selected to be added to CRIX by the method. Due to the low transactioncosts in this market, a dynamic methodology is applicable since it does not raise the costsof restructuring a tracking portfolio too much. Secondly, we apply two kind of weightingschemes, Table 1. We apply the classical setting to build a proper market index which isonly ﬂexible in terms of the dynamic constituents and tackles the illiquidity issue due to theapplied selection method. The liquidity weighting allows one to weight CCs higher, whichare more traded relative to their market capitalization and therefore implicitly acquire moreﬁnancial attention. This weighting scheme bails (2) down to weighting the price development10y their trading volume,LCRIX t ( k, β ) = P ki =1 V ol i,t − l P i,t − l Q i,t − l P it Q i,t − l Divisor ( k ) t − l = P ki =1 V ol i,t − l P i,t − l P it Divisor ( k ) t − l . (5)The latter is referred to as Liquidity CRIX (LCRIX). This approach has the potential todiminish the inﬂuence of e.g. Bitcoin stronger than the market cap weighting, if the relationof trading volume to market cap is higher for other CCs. In section 6 we show that LCRIXhas a better mean directional accuracy than CRIX and puts more weight on altcoins, Table 8,therefore tackling the issue of BTC dominance when the actual trading amount suggests adiﬀerent result. market cap weighting liquidity weighting β i,t − l V ol i,t − l P i,t − l Q i,t − l Table 1: Weighting schemes for derivation of CRIX

This section is dedicated to describing the composition rule which is used to ﬁnd the numberof index members—the spine of CRIX and LCRIX. Since CRIX will be a benchmark for theCC market, the dimension and evaluation of the market has to be deﬁned:

Deﬁnition 3.

The total market (TM) consists of all CCs in the CC universe. Its value isthe combined market value of the CCs.

To compare the TM with a benchmark candidate, it will be normalized by a Divisor,TM( K ) t = P Ki =1 P it Q i,t − l Divisor ( K ) t − l (6)with K the number of all CCs in the CC universe. Note that no adjustment factor is used for11M( K ) t . For the volume weighting, the TM is deﬁned as LTM respectively,LTM( K ) t = P Ki =1 V ol i,t − l P i,t − l Q i,t − l P it Q i,t − l Divisor ( K ) t − l . (7)In the further explanations, the focus lies on the TM. However when LCRIX is derived, itis optimized against LTM. The results can be easily extended to the case of LTM. Furtherdeﬁne the log returns: ε ( K ) T Mt = log { TM( K ) t } − log { TM( K ) t − } (8) ε ( k, β ) CRIXt = log { CRIX( k, β ) t } − log { CRIX( k, β ) t − } , (9)where CRIX( k, β ) t is the CRIX with k constituents at time point t .The goal is to optimize k and β so that a sparse but accurate approximation in terms ofmin k,β k ε ( k, β ) k = min k,β k ε ( K ) T M − ε ( k, β ) CRIX k , (10)is achieved, where ε ( k, β ) is the diﬀerence in the log returns of TM( K ) and CRIX( k, β ). Asquared loss function is chosen in (10), since it heavily penalizes deviations.Since the value of TM( K ) t is unknown and not measurable due to a lack of informa-tion, the total market index will be deﬁned and used as a proxy for the TM( K ). Thedeﬁnition is inspired by total market indices like crsp_crsp_2015 s&p_dow_2015 and wilshire_associates_wilshire_2015 They use all stocks for which prices are available.

Deﬁnition 4.

The total market index (TMI) contains all CCs in the CC universe for whichprices are available. The CCs are weighted by their market capitalization.

This changes (6) to TMI t ( k max ) = P k max i =1 P it Q i,t − l Divisor ( k max ) t − l k max the maximum number of CCs with available prices and (10) tomin k,β k b ε ( k, β ) k = min k,β k ε ( k max ) T MI − ε ( k, β ) CRIX k (11)s.t.: 1 ≤ k ≤ k u k = k + s (12) k u ∈ [1 , k max ] s ∈ [1 , k max − k ] β × k = (1 , . . . , , β k +1 , . . . , β k + s ) > β k +1 , . . . , β k + s ∈ ( −∞ , ∞ ) , where ε ( k max ) T MI are the log returns for TMI. In the derivation of LCRIX, the optimizationis performed against LTMI and β × k = ( β , . . . , β k , β k +1 , . . . , β k + s ) > where β i = V ol i,t − l P i,t − l Q i,t − l for i = 1 , . . . , k and β k +1 , . . . , β k + s ∈ ( −∞ , ∞ ).Several constraints were introduced with (11). The parameters β k +1 , . . . , β k + s are includedto evaluate if adding s more assets to the index explains the diﬀerence between ε ( k max ) T MI and ε ( k, β ) CRIX better. The ﬁrst k assets ( k ) won’t be adjusted by a parameter, so noparameter estimation is necessary. This makes the ﬁrst term a constant. The choice of k isimportant since it deﬁnes the number of base CCs to be included in the index. The parametersof the next s assets have to be estimated, so (2) becomesCRIX t ( k, β ) = P k i =1 P it Q i,t − l + P k + sj = k +1 β j,t − l P jt Q j,t − l Divisor ( k ) t − l . A number of criteria are applicable. Model selection (SC) criteria can be categorized bytheir property to be either asymptotic optimal or consistent in choosing the true model. Inthis context will be investigated: Generalized Cross Validation (GC), Generalized Full CrossValidation (GFC), Mallows’ C p , Shibata (SH), Final Prediction Error (FPE) and AkaikeInformation Criterion (AIC), all asymptotic optimal criteria under the assumption of Gaussian13istributed residuals. Since CRIX is supposed to be a benchmark model, all possible modelsunder certain restrictions for the number of parameters are included in the test set,Θ SC = { CRIX( k , β ) , CRIX( k , β ) , . . . } , (13)where k , k , . . . are predeﬁned values and SC ∈ { GC , GFC , C p , SH , FPE , AIC } . Recall thatthe intention behind CRIX is to discover under a squared loss function the best model todescribe the data (benchmark), which supports the choice of an asymptotic optimal criteria.The GC criterion, see craven_smoothing_1978 is deﬁned asGC { b ε ( k, β ) , s } = T − P Tt =1 b ε ( k, β ) t (1 − T − s ) (14)by assuming that s < T . One shall note that s and not k + s deﬁnes the number of variablesto penalize for, since k parameters are set to be 1 and need not be estimated. Accordingto arlot_survey_2010 the asymptotic optimality of GC was shown in several frameworks.The GFC, see droge_comments_1996 GFC { b ε ( k, β ) , s } = T − T X t =1 b ε ( k, β ) t (1 + T − s ) (15)is an alteration.A further score, SH, SH { b ε ( k, β ) , s } = T + 2 sT T X t =1 b ε ( k, β ) t , (16)was shown to be asymptotically optimal, shibata_optimal_1981 and asymptotically equiv-alent to Mallows’ C p and AIC. mallows_comments_1973 ’ C p :C p { b ε ( k, β ) , s } = P Tt =1 b ε ( k, β ) t b σ ( k, β ) − T + 2 · s (17)14ith b σ ( k, β ) the variance of b ε ( k, β ). C p { b ε ( k, β ) , s } tends to choose models which overﬁt and isnot consistent in selecting the true model, see mallick_bayesian_2013 woodroofe_model_1982 and nishii_asymptotic_1984 The FPE uses the formulaFPE { b ε ( k, β ) , s } = T + s ( T − s ) T T X t =1 b ε ( k, β ) t , (18)see akaike_statistical_1970 So far, the discussed criteria depend on little data information. Just the squared residualsand, in the case of Mallows’ C p , the variance are taken into account. The AIC uses moreinformation by depending on the maximum likelihood, derived by L { b ε ( k, β ) } = max β Y t f { b ε ( k, β ) t } , (19)where f , in (21), represents the density of the b ε ( k, β ) t over all t . The AIC is deﬁned to beAIC { b ε ( k, β ) , s } = − L { b ε ( k, β ) } + s · , (20) akaike_information_1998 If the true model is of ﬁnite dimension, then the AIC is notconsistent, compare hurvich_regression_1989 shibata_asymptotic_1983 showed theasymptotic eﬃciency of Mallows’ C p and AIC under the assumption of an inﬁnite number ofregression variables or an increasing number of regression variables with the sample size. Dueto the usage of the density in deriving the AIC, it uses more information about the dataset.Considering that (10) implies the criteria are derived under an expected squared loss function, E ( k ε ( k, β ) k ) = Z ∞−∞ k ε ( k, β ) k f { ε ( k, β ) } dε ( k, β ) , (21)the density, f , can be estimated diﬀerent from the Gaussian distribution. Here, f is estimatednonparametrically with an Epanechnikov kernel, since according to hardle_nonparametric_2004 epanechnikov_non-parametric_1969 kernel shows a good balance between varianceoptimization and numerical performance. In nonparametric estimation with an Epanechnikovkernel, Epa, the estimator of f is derived by b f h ( x ) = 1 nh n X i =1 Epa( x − x i h ) , Epa( u ) = 34 √ − u I ( | u | ≤ √ h is the bandwidth.The bandwidth selection is performed with the plug-in selector by sheather_reliable_1991 and further described in wand_multivariate_1994 The plug-in selector is derived underthe loss function Mean Integrated Squared Error, MISE. hall_kullback-leibler_1987 foundthat the Kullback-Leibler (KL) loss function for selecting the smoothing parameter of the kerneldensity is highly inﬂuenced by the tails of the distribution. devroye_nonparametric_1985 mention that Mean Integrated Error (MIE) is stronger aﬀected than MISE by the tails of thedistribution and kanazawa_hellinger_1993 claims that MIE shall be used if interest isin modeling the tails. kanazawa_hellinger_1993 investigates that the use of a Kullback-Leibler loss function would put more weight on the tails compared to MISE. Since this is notin our interest, the choice of the density smoothing parameter, h , is performed under MISE.Due to the richer information basis of the AIC, we decide to use it as the selection criteriafor CRIX. The choice is supported by an empirical analysis in section 6.To decide with the AIC which number k should be used, a procedure was created whichcompares the squared diﬀerence between log returns of the TMI, see Deﬁnition 4, and severalcandidate indices, k b ε ( k j , β ) k = k ε ( k max ) T MI − ε ( k j , β ) CRIX k , (22)where ε ( k j , β ) CRIX is the log return of CRIX version with k j constituents and b ε ( k j , β ) isthe respective diﬀerence. The candidate indices, CRIX( k j , β ), have diﬀerent numbers ofconstituents which fulﬁll k < k < k < · · · , where k j = k + s ( j − s more assets add information to CRIX. If so, these assetsare added to the intercept and the next s assets are tested for. Assets with a higher marketcapitalization are expected to have a higher inﬂuence on the AIC, so the following theorem isformulated: Theorem 1.

The rate of improvement of the AIC depends on the relative value of an assetin the market.

The proof for the Theorem 1 is given in the Appendix, 11.1, under the assumption ofnormally distributed error terms. Therefore, we will follow the common practise to includethe assets with the highest market capitalization in the index,arg max i k X j =1 P j,i,t − l Q j,i,t − l , i ∈ { , . . . , K } . (23)Thus, a top-down approach to decide about the number of index constituents is applied.For the sorting of the index constituents by highest market capitalization, just the closingdata of the last day of a month are used. We chose to do so, since the next periods CRIXwill just depend on Q i,t − l , (2), and not on data which lie further in the past. This is in linewith the methodology of e.g. the DAX. For LCRIX, the CCs with the highest trading volumeare chosen respectively, arg max i k X j =1 V ol j,i,t − l , i ∈ { , . . . , K } . (24)Since the diﬀerences between the TMI( k max ) and CRIX( k j , β ) are caused over time by themissing time series in CRIX( k j , β ), the independence assumption of the b ε ( k j , β ) for all j cannot be fulﬁlled by construction. But gyorﬁ_nonparametric_1989 give arguments thatunder certain conditions in case of nonparametric density estimation, the rate of convergenceis essentially the same as for an independent sample. Summarizing the described procedure,results to: 17. At time point T + 1, construct TMI( k max )2. Set j = 23. Construct CRIX( k ,

1) and CRIX( k j , β ) , k < k < k < · · ·

4. Compute b ε ( k j , β ) and b ε ( k , f ( b ε ( k , b ε ( k j , β ) with KDE for b ε ( k , { b ε ( k j , β ) , k j − k } and AIC { b ε ( k , , }

7. If j = ( k max − k ) /k : stop, else jump to 3. and j = j + 1The next section describes the further index rules for CRIX. The constituents of the indices are regularly checked so that the corresponding index alwaysrepresents its asset universe well. It is common to do this on a quarterly basis. In case of CRIXthis reallocation is much faster. In the past, coins have shown a very volatile behavior, notjust in the manner of price volatility. In some weeks, many occur out of nothing in the marketand many others vanish from the market even when they were before very important, e.g.,Auroracoin. This calls for a faster reallocation of the market benchmark than on a quarterlybasis. A monthly reallocation is chosen to make sure that CRIX catches the momentumof the CC market well. Therefore, on the last day of every month, the CCs which had thehighest market capitalization on the last day in the last month will be checked and the ﬁrst k will be included in CRIX for the coming month. Accordingly for LCRIX the ones with thehighest trading volume are chosen. 18ince a review of an index is commonly performed on a quarterly basis the number of indexmembers of CRIX will be checked on a quarterly basis too. The described procedure fromSection 3 will be applied to the observations from the last three months on the last day ofthe third month after the markets closed. The number of index constituents, k , will be usedfor the next three months. Thus, CRIX corresponds to a monthly rebalanced portfolio whichnumber of constituents is reviewed quarterly.It may happen that some data are missing for some of the analyzed time series. If an isolatedmissing value occurs alone in the dataset, meaning that the values before and after it arenot missing, then Missing At Random (MAR) is assumed. This assumption means that justobserved information cause the missingness, horton_much_2007 The Last-Observation-Carried-Forward (LOCF) method is then applied to ﬁll the gap for the application of the AIC.We did not choose a diﬀerent approach since a regression or imputation method may alter thedata in the wrong direction. By LOCF, no change is implied and the CC is not excluded. Iftwo or more data are missing in a row, then the MAR assumption may be violated, thereforeno method is applied. The corresponding time series is then excluded from the computationin the derivation period. If data are missing during the computation of the index values,the LOCF method is applied too. This is done to make the index insensitive to this CC atthis time point. CRIX should mimic market changes, therefore an imputation or regressionmethod for the missing data would distort the view on the market.Before continuing, the described rules are summarized:• Quarterly altering of the number of index constituents• Monthly altering of the index constituents• Model selection for index derivation with AIC• Nonparametric estimation of the density• Application of a top-down approach to select the assets for the subset analysis19 Application of LOCF if trading of an asset stops before next reallocation.

Using the described methods and rules from above, three indices will be proposed. Thisindices provide a diﬀerent look at the market.1. CRIX/LCRIX:The ﬁrst and leading index is CRIX and for volume weighting LCRIX. While the choicefor the best number of constituents is made, their numbers are chosen in steps of 5.It is common in ﬁnancial industry to construct market indices with a number of con-stituents which is evenly divisible by 5, see e.g. ftse_ftse_2016 s&p_index_2014deutsche_boerse_ag_guide_2013

Therefore this selection is applied for CRIX( k ), k = 5 , , , . . . with k = 5. Since the global minimum for the selection criterionmay involve many index constituents, but a sparse index is the goal, the search for theoptimal model terminates at level j wheneverAIC { b ε ( k j , β ) , k j − } < AIC { b ε ( k j − , β ) , k j − − } (25)and k j − index constituents are chosen. Therefore merely a local optimum will beachieved in most of the cases for Θ = Θ AIC , in (13). But the choice is still asymptoticallyoptimal by deﬁning Θ = { Θ AIC | k i ≤ k j ∀ i } . In Section 6 it will be shown that theperformance of the index is already very good.2. ECRIX/LECRIX:The second constructed index is called Exact CRIX (ECRIX) and Liquidity ECRIXrespectively. It follows the above rules too. But the number of its constituents is chosenin steps of 1. Therefore the set of models contains CRIX( k ), k = 1 , , , . . . with k = 120nd stops when AIC { b ε ( k j , β ) , k j − } < AIC { b ε ( k j − , β ) , k j − − } . (26)3. EFCRIX/LEFCRIX:Since the decision procedures for CRIX and ECRIX terminate when the AIC rises forthe ﬁrst time, Exact Full CRIX and Liquidity EFCRIX will be constructed to visualizewhether the decision procedure works ﬁne for the covered indices. The intention is tohave an index which may approach the TMI but only in case even small assets helpimprove the view on the total market, a benchmark for the benchmarks. It’ll be derivedwith the AIC procedure, compare Section 3. For k = 1 , , , . . . with k = 1 the decisionrule is based on min k j ,β AIC { b ε ( k j , β ) , k j − } (27)for Θ = Θ AIC , in (13). This index computes the AIC for every possible number ofconstituents and the number is chosen where the AIC becomes minimal.

The indices CRIX, ECRIX, EFCRIX with market cap weighting and LCRIX, LECRIX,LEFCRIX with volume weighting have been proposed to give insight into the CC market.Our RDC CC database covers data for over 1000 CCs, kindly provided by CoinGecko. Thedata used for the analysis cover daily closing data for prices, market volume and marketcapitalization in USD for each CC in the time period from 2014-04-01 to 2017-03-25. Cryptoexchanges are open on the weekends, therefore data for weekend closing prices exist. SinceCC exchanges do not ﬁnish trading after a certain time point every day, a time point whichserves as a closing time has to be deﬁned. CoinGecko used 12 am UTC time zone. One shouldnote that missing data are observed in the dataset, therefore the last rules from Chapter 421 ug 01 2014 Feb 01 2015 Aug 01 2015 Feb 01 2016 Aug 01 2016 Jan 31 2017

Performance of CRIX with AIC

Date I nde x v a l ue Figure 2: Performance of CRIX CRIXindex CRIXcodewill come into play.Figure 2 shows the performance of CRIX, and Figure 6 the diﬀerences between CRIX andboth ECRIX and EFCRIX. For the purpose of comparison, the indices were recalibratedon the recalculation dates since the index constituents change then. We do not provideeach index plot individually since they perform almost equally. However, the AIC methodgave very diﬀerent numbers of constituents for the corresponding indices. The numbers ofconstituents are given in Table 4. For comparison, the number of constituents under theother discussed model selection criteria are provided too. The variance of C p was derivedwith a GARCH(1,1) model, bollerslev_generalized_1986 The corresponding informationfor ECRIX and EFCRIX are given in the same Table, 4. Interestingly the methodology ofEFCRIX causes its number of constituents to reach a relatively stable value for each period.ECRIX has mostly much fewer constituents than CRIX and EFCRIX due to the fact thatthis index just runs until a local optimum. Comparing the number of constituents for CRIXderived with AIC against the other criteria, one sees that GC, GFC and SH tend to choosemore or the same number of constituents than AIC. Also all three criteria suggest the same22esult. C p stops at the initial value for CRIX, ECRIX and EFCRIX. For CRIX, ECRIXand EFCRIX, AIC mostly chooses less constituents compared to all other criteria, except C p which terminates very early. For LCRIX, LECRIX and LEFCRIX mostly less constituentswere chosen than for CRIX, ECRIX and EFCRIX, compare Table 5. Note that the AIC gavethe sparsest result again. AIC GC GFC SH Cp FPECRIX 0.4769 0.4883 0.3755 0.3598 1.9844 0.0042ECRIX 11.0988 10.3673 10.3673 10.4667 79.3979 0.0048EFCRIX 3.1394 0.0116 0.0049 0.0049 79.3979 0.0048LCRIX 0.6417 0.1497 0.1217 0.1211 0.6638 0.0049LECRIX 22.8782 16.7187 16.7187 16.7187 125.0620 0.0047LEFCRIX 7.9158 0.0645 0.0126 0.0126 125.0620 0.0047btc 79.3979 79.3979 79.3979 79.3979 79.3979 79.3979Table 2: Comparison of CRIX, ECRIX, EFCRIX, derived under diﬀerent penalizations, againstTMI under mean of monthly Mean Squared Error, compared with btcAIC GC GFC SH Cp FPECRIX 0.9896 0.9908 0.9918 0.9928 0.9835 1.0000ECRIX 0.9576 0.9586 0.9586 0.9586 0.9133 1.0000EFCRIX 0.9794 0.9990 1.0000 1.0000 0.9133 1.0000LCRIX 0.9928 0.9949 0.9959 0.9959 0.9917 1.0000LECRIX 0.9692 0.9700 0.9700 0.9700 0.9501 1.0000LEFCRIX 0.9855 0.9979 1.0000 1.0000 0.9501 1.0000btc 0.9133 0.9133 0.9133 0.9133 0.9133 0.9133Table 3: Comparison of CRIX, ECRIX, EFCRIX, derived under diﬀerent penalizations, againstTMI under mean of monthly Mean Directional Accuracy, compared with btcThe indices optimized until a local optimum are expected to perform less optimal thanthe globally optimized ones against the TMI/LTMI. Table 2 and Table 3 give the mean over23 Monthly MSE of CRIX with AIC and btc

End of time period M SE Figure 3: Performance of CRIX compared to BTCmonthly Mean Squared Error (MSE) and Mean Directional Accuracy (MDA), deﬁned asMSE { CRIX( k ) } = 1 t + l − t − l t + l X t = t − l { CRIX( k ) t − TMI( k max ) t } (28)MDA { CRIX( k ) } = 1 t + l − t − l t + l X t = t − l I [sign { TMI( k max ) t − TMI( k max ) t − } = sign { CRIX( k ) t − CRIX( k ) t − } ] (29)where t − l and t + l are the beginning and end of the month respectively, I ( · ) is the indicatorfunction and sign( · ) gives the sign of the respective equation. Apparently CRIX performsbest, which can be explained due to its larger number of index constituents. The CRIX,ECRIX and EFCRIX are close in terms of the MDA but the MSE is much better for CRIX.Comparing all the model selection criteria, FPE has the best performance in terms of MSEand MDA, due to choosing high numbers of constituents. The trading volume weightedindices are close in terms of MSE and MDA to their market weighted corresponding indices.At the same time the number of constituents are mostly sparser for the volume weighted ones.24RIX was constructed with steps of ﬁve which is common in practice and performed bestunder AIC. For this case the number of constituents was the most stable, while achievingthe best performance for MSE and MDA. Additionally, the analysis showed that it is indeedunnecessary from a practical viewpoint to choose the global optimal AIC under steps of 1.Even a local optimum and a much more stable number of constituents is able to mimic themarket movements very well in terms of the MDA and MSE. Furthermore, even for ECRIXthere was more than one constituent selected most of the time. This shows that Bitcoin,which currently clearly dominates the market in terms of market capitalization and tradingvolume, does not account for all the variance in the market. Other CCs are important for themarket movements too.Depending on the theoretical and empirical analysis, we decided to continue with the AIC.From the theoretical viewpoint, the AIC uses the most information about the data, since itrelies on the density. From the empirical analysis, the AIC chooses much less constituentsthan GC, GFC, SH and FPE, while its performance in terms of MSE and MDA is close to thethree outlined criteria. The better performance was achieved due to overparametrization ofthe index by GC, GFC, SH and FPE. Therefore, CRIX will be derived with the AIC criterion.Comparing CRIX with the development of BTC, it tracks the market development betterover time. Figure 3 shows the monthly MSE of CRIX with AIC and BTC. In 2016 CRIXtracked the market development much better than BTC, and in the beginning of 2017 evenbetter due to the huge impact of the price gain of altcoins like Ethereum, Ripple and Dash.Their performance is visualized in Figure 4, clearly showing the better performance of CRIXin this time period, driven by price gains in altcoins. Due to the log scale and the high gains ofaltcoins, the diﬀerence between CRIX and BTC appears little, while in fact being considerable.Figure 4b shows the diﬀerence in the log returns of CRIX and BTC. One sees diﬀerencesin their return series, which are particularly strong beginning of 2016 and in March 2017.Comparing the performance of CRIX and LCRIX against BTC, one observes an increasingspread between the indices, Figure 5. It indicates a lower weight of BTC in LCRIX, thus25 . . . . Date

Log10 p r i c e (a) Performance of rescaled log price series ofCRIX, BTC, ETH, XRP and DASH − . − . − . . . Date D i ff e r en c e i n l og r e t u r n s (b) Diﬀerence in the log returns of CRIX andBTC Figure 4: Comparison of performance of CRIX, BTC, ETH, XRP and DASH

Date I nde x P e r f o r m an c e (a) Performance of index series of CRIX, BTCand LCRIX − . . . . . Date D i ff e r en c e i n l og r e t u r n s (b) Diﬀerence in the log returns of CRIX andLCRIX and CRIX and BTC Figure 5: Comparison of performance of CRIX, BTC and LCRIXtackling the issue of dominance of BTC in CRIX by liquidity weighting. Having a look at theactual diﬀerences in the log return series compared to CRIX, Figure 5b, stronger spikes areobserved, thus showing the diﬀerence in the performance from CRIX and LCRIX driven bythe stronger weights on altcoins in LCRIX. Table 8 shows the actual weights given to BTCand altcoins in the respective indices. In the liquidity indices altcoins frequently receive ahigher weight compared to the respective indices based on market capitalization weighting.Once the altcoins received even 52% of the weights in LCRIX. The results show the marketfocus in terms of trading is stronger for altcoins than their market capitalization suggests,thus an index accounting for this is called for, LCRIX. Simultaneously the weighting schemetackles the dominance of BTC in a market capitalization index.26

RIX ECRIX EFCRIXAIC GC GFC SH Cp FPE AIC GC GFC SH Cp FPE AIC GC GFC SH Cp FPE max1 5 10 10 10 10 35 2 2 2 3 1 36 2 7 30 30 1 36 362 10 15 15 15 5 100 3 3 3 3 1 93 3 94 93 93 1 93 1133 5 10 35 35 5 100 5 5 5 5 1 93 5 94 93 93 1 93 1584 10 10 10 40 5 95 3 3 3 3 1 90 3 91 90 90 1 90 1825 10 20 20 20 5 100 2 4 4 4 1 93 12 94 93 93 1 93 1696 10 10 20 20 5 100 2 2 2 2 1 93 2 94 93 93 1 93 1717 5 20 20 20 5 100 1 1 1 1 1 93 16 94 93 93 1 93 1768 15 20 20 20 5 95 3 4 4 4 1 91 3 92 91 91 1 91 1409 15 5 5 5 5 100 3 3 3 3 1 93 3 94 93 93 1 93 18810 15 15 25 25 5 100 3 5 5 5 1 93 3 94 93 93 1 93 20711 10 35 45 45 5 100 2 2 2 2 1 93 4 94 93 93 1 93 221

Table 4: Comparison of AIC, GC, GFC, SH, Cp and the FPE method for the selection of thenumber of index constituents for the CRIX, ECRIX and EFCRIX in the 11 periods

LCRIX LECRIX LEFCRIXAIC GC GFC SH Cp FPE AIC GC GFC SH Cp FPE AIC GC GFC SH Cp FPE max1 5 10 15 15 5 35 2 3 3 3 1 36 2 6 16 16 1 36 362 5 10 10 10 5 100 2 4 4 4 1 93 2 94 93 93 1 93 1133 5 5 20 20 5 100 3 4 4 4 1 93 6 94 93 93 1 93 1584 15 20 20 30 5 95 3 2 2 2 1 90 3 91 90 90 1 90 1825 5 5 5 5 5 100 1 1 1 1 1 93 93 94 93 93 1 93 1696 5 5 5 5 5 100 2 2 2 2 1 93 9 94 93 93 1 93 1717 10 25 30 35 5 100 1 2 2 2 1 93 1 94 93 93 1 93 1768 10 20 35 35 5 95 1 1 1 1 1 91 3 92 91 91 1 91 1409 5 10 10 10 5 100 2 2 2 2 1 93 2 94 93 93 1 93 18810 10 10 10 10 5 100 1 1 1 1 1 93 1 94 93 93 1 93 20711 5 15 15 15 5 100 2 3 3 3 1 93 2 94 93 93 1 93 221

Table 5: Comparison of AIC, GC, GFC, SH, Cp and the FPE method for the selection ofthe number of index constituents for the LCRIX, LECRIX and LEFCRIX in the 11periods

The CRIX methodology was derived with the idea of ﬁnding a method which allows mimickingyoung and fast changing markets appropriately. But well known major markets usually changetheir structure too. So the proposed methodology is tested on the German stock market, whichhas four major indices: DAX, MDAX, SDAX and TecDAX. The DAX is used to determinethe overall market direction, jansen_deutsche_1992

Since it is chosen from the so called27 ug 01 2014 Feb 01 2015 Aug 01 2015 Feb 01 2016 Aug 01 2016 Jan 31 2017 − − − − − Differences between Indices derived with AIC and TMI

Date D i ff e r en c e be t w een I nd i c e s and T M I Figure 6: Realized diﬀerence between TMI and CRIX (solid), ECRIX (dashed), EFCRIX(dotdashed) CRIXfamdiﬀ CRIXcodeprime segment, it has some prior restrictions. It is interesting to see whether our methodologyyields the DAX as an adequate benchmark for the total market. Since the indices are derivedwith market cap weighting scheme, only this methodology is tested. Following Deﬁnition4, all available stocks are deﬁned as the TMI and our new method is applied to ﬁnd anappropriate index. Again, the 7-step method from Section 3 was applied to ﬁnd the numberof constituents, but it starts at 30 members to check if more constituents are necessary. Themethod for the identiﬁcation of k and the reallocation of the included assets is performedquarterly, like DAX. To be in line with the DAX reallocation dates, the index calculation willstart after the third Friday of September and the reallocation dates are the third Fridays ofDecember, March, June and September, see deutsche_boerse_ag_guide_2013 The data were fetched from Datastream in the period 2000-06-16 until 2015-12-18. Allstocks which are German companies and are traded on XETRA are chosen. Any time seriesfor which Datastream reported an error either for the price or market capitalization data wasexcluded from the analysis. The index, computed with the new methodology, is called FlexibleDAX (FDAX). One should note that the analysis starts three months after the starting point28f the dataset due to the initialization period of FDAX.Figure 7 shows the number of members of FDAX and DAX in the respective periods. Mostof the time, the number of index constituents for FDAX is higher than the 30 members of DAX.Just around 2004-2005 is the k more frequently 30. Especially while the turmoil of the ﬁnancialmarkets, starting from 2008/2009, is the number of index constituents much higher. Onemight hint that a higher reported variability in one period should cause an increase in k in thenext period, since it was shown that the selection method depends on the variance, see Section11. Figure 7 shows that this idea can partially be supported. The derivation of the conditionalvariance was performed with a GARCH(1,1) model, bollerslev_generalized_1986 andthe daily results were summed up. Obviously, in the extreme cases increases the k in the nextperiod, see 2001, 2002, 2006 and 2011.The computation of the MSE and MDA, see Table 6, shows that FDAX is a more accuratebenchmark for the total market as DAX. Since jansen_deutsche_1992 state that DAXmay be used to analyze the movements of the total market, an MDA of 92 % is indeed good.But FDAX mimics the market even better, with an MDA of 96 %. Also the MSE for FDAXis much lower than the one of DAX. Therefore the methodology fulﬁlled its goal to ﬁnd asparse, investable and accurate benchmark, depending on the MDA.MSE MDAFDAX vs. TMI 6.36 0.96DAX vs. TMI 51.02 0.92Table 6: Comparison of DAX with CRIX methodology (FDAX) and rescaled DAX againstTMI The Mexican stock market is represented by the IPC35, mexbol_prices_2013

One of itsrules is a readjustment of the weights to lower the eﬀect of dominant stocks. In the CCmarket BTC is such a dominant asset. The CRIX methodology could help to circumvent29

FDAX/DAX index members and FDAX variance

Periods F D AX m e m be r s C u m u l a t ed c ond i t i ona l v a r i an c e Figure 7: Number of constituents of FDAX (solid), DAX (horizontal solid) and cumulatedmonthly variance of FDAX (dashed) CRIXdaxmembersvar CRIXcodearbitrary rules and develop an index to represent the market accurately.The data were fetched from Datastream for the period 1996-06-01 until 2015-05-29 andcover all Mexican companies listed in Datastream. The speciﬁcations of the methodology arethe same as for the German stock market except for the recalculation date. In line with themethodology of the IPC35, the index is recalculated with the closing data of the last businessdays of August, November, February and May, therefore the recalculated index starts on theﬁrst business days of September, December, March and June. The TMI will be all fetchedcompanies. The choice of k starts with 35 since this is the amount of constituents of IPC.Again, the CRIX methodology works well. The MSE is very low compared to the one forthe IPC35 and the MDA gives a much better performance too, see Table 7. We can concludethat the methodology helped to circumvent the usage of arbitrary rules for the weights inthe rules of the indices and enhances at the same time the performance of the market index.Figure 8 shows the number of index members of the FIPC compared to the IPC. Obviously,the methodology also suggests using more than 35 index members half of the time which isthe number of members of the IPC. 30 Comparison of IPC and FIPC index members

Periods F I P C /I P C m e m be r s Figure 8: Number of constituents of FIPC (solid) and IPC (dashed) in the respective periodsCRIXipcmembers CRIXcodeMSE MDAFIPC vs. TMI 24.97 0.97IPC vs. TMI 4743.50 0.91Table 7: Comparison of IPC with CRIX methodology (FIPC) and rescaled IPC against TMI

The movements of CCs are very diﬀerent from each other, elendner_cross-section_2017

So studying the entire market of CCs requires an instrument which adequately captures anddisplays the market movements, an index. But index construction for CCs requires a newmethodology to ﬁnd the right number of index members. Innovative markets, like the onefor CCs, change their structure frequently. The proposed methods were applied to oracle anew family of indices, which are displayed and updated on a daily basis. The performance ofthe new indices were studied and it was shown that the dynamic AIC based methodologyresults in indices with stable properties. The results show that a market like the CC market -momentarily dominated by Bitcoin - still needs a representative index since Bitcoin does notaccount for all the variance in the market. The diversiﬁed nature of the CC market makes31he inclusion of altcoins in the index product critical to improve tracking performance. Wehave shown that assigning optimal weights to altcoins helps to reduce the tracking errors of aCC portfolio, despite the fact that their market cap is much smaller relative to Bitcoin.Besides the classical market capitalization weighting, a volume weighting scheme wasproposed. The corresponding indices are sparser in terms of constituents while having acomparable performance, which gives support to this weighting scheme under the goals of thestudy. The AIC based method was also applied to the German stock market. The resultsyield a more accurate benchmark in terms of MDA. In applying the CRIX methodology tothe Mexican stock market, which is dominated by Telmex, one ﬁnds high accuracy of it interms of MSE and MDA.We conclude, that the CRIX technology enhances the construction of an index if the goalis to ﬁnd a sparse, investable and accurate benchmark.

10 Acknowledgments

We would like to thank the editor and an anonymous referee for their valuable commentsto this article. Our thanks extends to David Lee Kuo Chuen and Ernie G. S. Teo for theircomments in several discussions. Financial support from the Deutsche Forschungsgemeinschaftvia CRC 649 ”Economic Risk” and IRTG 1792 ”High Dimensional Non Stationary TimeSeries”, Humboldt-Universität zu Berlin, is gratefully acknowledged.32

Proof:

Assume normally distributed error terms, (10) and (22): ε ( k, β ) ∼ N { , σ ( k, β ) } , b ε ( k, β ) ∼ N { , ˆ σ ( k, β ) } . Thenlog L { ε ( k, β ) } = − T π ) − T σ ( k, β ) − σ ( k, β ) T X t =1 ε ( k, β ) t . (30)Denote RSS { b ε ( k, β ) } = P Tt =1 b ε ( k, β ) t and ˆ σ ( k, β ) = T − RSS { b ε ( k, β ) } . Thenlog L { b ε ( k, β ) } = − T π ) − T T − RSS { b ε ( k, β ) } − T − RSS { b ε ( k, β ) } RSS { b ε ( k, β ) } (31)= − T π ) − T T − RSS { b ε ( k, β ) } − T − T T − RSS { b ε ( k, β ) } + C (33)with C = − T log(2 π ) − T . Since C does not depend on any model parameters, just on thedata length T , this part of the equation could be omitted.AIC { b ε ( k, β ) , s } = T log T − RSS { b ε ( k, β ) } + 2 · s (34)= T log b σ ( k, β ) + 2 · s (35)The enhancement in the ﬁt to the Total Market Index (TMI) by adding more constituents, s , determines the degree of improvement of the likelihood.33ith the linearity property of the expectation operator, assume without loss of generality E { ε ( k max ) T M } = E { ε ( k, β ) CRIX } = 0 t ∈ { , . . . , T } t − l = 0 s = 1 b σ ( k, β ) = Var { b ε ( k, β ) } = Var { ε ( k max ) T M − ε ( k, β ) CRIX } = T X t =1 " log  k max X i =1 P it Q i, ( k X i =1 P i,t − Q i, + β P k +1 ,t − Q k +1 , )  − log  k max X i =1 P i,t − Q i, ( k X i =1 P i,t Q i, + β P k +1 ,t Q k +1 , )  = T X t =1 " log  k max X i =1 P it Q i, k X i =1 P i,t − Q i, + k max X i =1 P it Q i, β P k +1 ,t − Q k +1 ,  − log  k max X i =1 P i,t − Q i, k X i =1 P i,t Q i, + k max X i =1 P i,t − Q i, β P k +1 ,t Q k +1 ,  Using the relation log( a + b ) = log( a ) + log(1 + ba ), it results:= T X t =1 " log  k max X i =1 P it Q i, k X i =1 P i,t − Q i,  + log ( P k max i =1 P it Q i, β P k +1 ,t − Q k +1 , P k max i =1 P it Q i, P ki =1 P i,t − Q i, ) − log  k max X i =1 P i,t − Q i, k X i =1 P i,t Q i,  + log ( P k max i =1 P i,t − Q i, β P k +1 ,t Q k +1 , P k max i =1 P i,t − Q i, P ki =1 P i,t Q i, ) = T X t =1 log  k max X i =1 P it Q i, k X i =1 P i,t − Q i,  − log  k max X i =1 P i,t − Q i, k X i =1 P i,t Q i,  + " log ( β P k +1 ,t − Q k +1 , P ki =1 P i,t − Q i, ) − log ( β P k +1 ,t Q k +1 , P ki =1 P i,t Q i, ) (36)Solving the derivation and writing the terms which do not depend on β as A t and the last34art of (36) as B t : b σ ( k, β ) = T X t =1 A t + 2 log  k max X i =1 P it Q i, k X i =1 P i,t − Q i,  B t −  k max X i =1 P i,t − Q i, k X i =1 P i,t Q i,  B t + B t = T X t =1 A t + 2 B t  log  k max X i =1 P it Q i, k X i =1 P i,t − Q i,  − log  k max X i =1 P i,t − Q i, k X i =1 P i,t Q i,  + B t = T X t =1 A t + 2 B t h ε ( k max ) T M − ε ( k, CRIX i + B t Since normally distributed error terms are assumed, note that β = Cov { b ε ( k, ,ε k +1 } V ar { ε k +1 } , where ε k +1 is the log return of P i,t Q i, . The change in the variance will depend on the additionalvariance which the new constituent can explain, see β . Furthermore, it depends on thevalue of P k +1 ,t Q k +1 , relative to P ki =1 P i,t Q i, , (36), which is the summed market value of theconstituents in the index. This infers that constituents with a higher market capitalizationare more likely to be part of the index. (cid:4) This gives support to using the often applied top-down approach, which we use for theconstruction of CRIX too. 35 e r i o d s C R I X L C R I X E C R I X L E C R I X E F C R I X L E F C R I X k B TC a l t c o i n s k B TC a l t c o i n s k B TC a l t c o i n s k B TC a l t c o i n s k B TC a l t c o i n s k B TC a l t c o i n s . . . . . . . . . . . .

03 2014/0950 . . . . . . . . . . . .

14 2014/1050 . . . . . . . . . . . .

09 2014/11100 . . . . . . . . . . . .

01 2014/12100 . . . . . . . . . . . .

03 2015/01100 . . . . . . . . . . . .

02 2015/0250 . . . . . . . . . . . .

10 2015/0350 . . . . . . . . . . . .

04 2015/0450 . . . . . . . . . . . .

06 2015/05100 . . . . . . . . . . . .

04 2015/06100 . . . . . . . . . . . .

12 2015/07100 . . . . . . . . . . . .

52 2015/08100 . . . . . . . . . . . .

42 2015/09100 . . . . . . . . . . . .

48 2015/10100 . . . . . . . . . . . .

42 2015/11100 . . . . . . . . . . . .

18 2015/12100 . . . . . . . . . . . .

16 2016/01100 . . . . . . . . . . . .

13 2016/0250 . . . . . . . . . . . .

00 2016/0350 . . . . . . . . . . . .

00 2016/0450 . . . . . . . . . . . .

00 2016/05150 . . . . . . . . . . . .

25 2016/06150 . . . . . . . . . . . .

35 2016/07150 . . . . . . . . . . . .

29 2016/08150 . . . . . . . . . . . .

28 2016/09150 . . . . . . . . . . . .

23 2016/10150 . . . . . . . . . . . .

15 2016/11150 . . . . . . . . . . . .

00 2016/12150 . . . . . . . . . . . .

00 2017/01150 . . . . . . . . . . . .

00 2017/02100 . . . . . . . . . . . .

06 2017/03100 . . . . . . . . . . . . T a b l e : A v e r ag e m o n t h l y w e i g h t s o f B TC a nd a l t c o i n s i n t h e r e s p ec t i v e p e r i o d s i n t h e i nd i ce ss