Estimation of time-varying kernel densities and chronology of the impact of COVID-19 on financial markets
EEstimation of time-varying kernel densitiesand chronology of the impactof COVID-19 on financial markets
Matthieu Garcin a, ∗ , Jules Klein b , Sana Laaribi b July 20, 2020
Abstract
The time-varying kernel density estimation relies on two free parameters: the bandwidthand the discount factor. We propose to select these parameters so as to minimize a criterionconsistent with the traditional requirements of the validation of a probability density forecast.These requirements are both the uniformity and the independence of the so-called probabilityintegral transforms, which are the forecast time-varying cumulated distributions applied to theobservations. We thus build a new numerical criterion incorporating both the uniformity andindependence properties by the mean of an adapted Kolmogorov-Smirnov statistic. We applythis method to financial markets during the COVID-19 crisis. We determine the time-varyingdensity of daily price returns of several stock indices and, using various divergence statistics,we are able to describe the chronology of the crisis as well as regional disparities. For instance,we observe a more limited impact of COVID-19 on financial markets in China, a strong impactin the US, and a slow recovery in Europe.
Keywords – bandwidth selection, divergence statistics, financial crisis, kernel density, probabilityintegral transform
The knowledge of the distribution of price returns is overriding in finance. Indeed, forecasts andrisk measures, such as the Value-at-Risk (VaR), the expected shortfall, or even the volatility, can beseen as scalars calculated from a probability density function (pdf). Practitioners appreciate thesescalars for their simplicity but the pdf contains relevant and more comprehensive information. Anaccurate description of the pdf is thus worthwhile in a financial perspective.For this reason, pdf in finance should not be limited to the popular Gaussian distribution. Morerealistic parametric distributions have thus been put forward [26], like the NIG distribution [12]or the alpha-stable one [37], among many others. Besides, the non-parametric alternative makesit possible to depict more accurately the real pdf but it may be subject to overfitting if it does not ∗ Corresponding author: [email protected]. a L´eonard de Vinci Pˆole Universitaire, Research center, 92916 Paris La D´efense, France. b ESILV, 92916 Paris La D´efense, France.The authors would like to thank Brieuc-Marie Le Brigand for his valuable help in the implementation of some ofthe methods described in this paper. a r X i v : . [ q -f i n . S T ] J u l nclude any regularization. For this purpose, Beran has proposed the minimum Hellinger distanceapproach, in which a non-parametric pdf has to be estimated first, before being approximated bya parametric distribution [4]. This approach finds some applications in finance [24]. Other semi-parametric approaches include the distortion of a parametric density in order to take into accounthigher-order empirical moments, using for example an Edgeworth expansion [22, 15], which alsohas some applications in finance [29]. Finally, non-parametric approaches, like the kernel density,include a smoothing parameter, called bandwidth, which is supposed to balance accuracy andstatistical robustness [41, 38]. Selecting an appropriate bandwidth is a hard task often left asideby practitioners in finance, but some statistical methods propose criteria that the bandwidth shouldminimize, like the asymptotic mean integrated square error (AMISE) [19].In finance, non-parametric methods are not limited to the estimation of a pdf. We can cite forexample their use to estimate the impact of market events, as with the non-parametric news impactcurve in econometric volatility models [30, 11, 7, 14]. The rationale of such an approach is thatlinear impact models misestimate the reaction of markets to extreme events. Similarly, parametricmodels do often not describe accurately enough the tails of the pdf of price returns. Extreme eventsmay also lead to other methodological choices in addition to the non-parametric approach. Indeed,like all the previous financial crisis, the recent crisis provoked by the COVID-19 pandemic hashighlighted that the occurrence of an extreme event may have a sustainable effect on the market,namely on the pdf of price returns. Bad economic news do not only result in one extreme dailyprice variation but it can initiate a longer turmoil period. This alternation of market regimesincites to introduce time-varying densities. Once again, several approaches are possible, dependingon the fact that we consider a parametric pdf [1] or a non-parametric one [17]. We are particularlyinterested in the non-parametric approach, which offers the possibility to reach a higher accuracy.We stress the fact that applications of time-varying kernels in finance are not limited to estimatinga pdf of price returns. They indeed include interest rate models [44] or the study of the dynamicpdf of correlation coefficients with a particular shape during tumble periods [23].Time-varying kernel densities rely on the choice of two important free parameters: the bandwidthsmooths the pdf at a given date, exactly as in the static approach, whereas the discount factorsmooths the variations in time. Harvey and Oryshchenko propose a maximum likelihood approachto select these free parameters [17]. However, the literature about density validation requiresstronger properties, namely the fact that the cumulative distribution function (cdf) of price returnsmust form a set of iid uniform variables, known as the probability integral transforms (PITs) [9].In the present paper, we thus propose a new selection rule of the bandwidth and of the discountparameter, so that it is consistent with the validation rule of the pdf. The main challenge is thento build the criterion, that is to define a function of the bandwidth and of the discount factorthat we intend to minimize. Indeed, the traditional approach for validating an estimated pdfconsists in a series of statistical tests and graphical analysis, not in a sole numerical criterion. Thecriterion we propose relies on a Kolmogorov-Smirnov statistic, that we can replace by any statisticof distribution divergence. We also adapt this statistic so as to minimize the discrepancy of theseries of the PITs for which we need the independence. Our work is not the first one to stressthe limitations of the maximum likelihood approach in the selection of the two free parameters.We can indeed cite an article following a least-square approach [35] and another one maximizinga uniformity criterion for the PITs by the mean of an artificial neural network [42]. Our methoddiffers from the latter as it also takes into account the independence of the PITs and it requiresonly standard statistical tools compared to artificial neural networks.We apply our new method to several stock indices before and during the financial crisis inducedby the COVID-19 in the US, in Europe, and in Asia. The question of the impact of the pandemicon stock markets is a hot topic. Several papers deal with this subject and stress the exceptionalamplitude of the crisis [2, 3, 32]. We propose here a new outlook on this financial crisis, using the2ime-varying kernel densities to describe its chronology. We also study the significance of the dailykernel density with respect to the pdf in a steady market. This makes it possible to determine theinterval of dates for which the distribution of price returns significantly indicates a financial crisis.In particular, we observe that the speed at which markets recover varies a lot among the regionsconsidered.The paper is organized as follows. In Section 2, we introduce the method for estimating a dynamickernel density along with the selection rule for the bandwidth and the discount factor. In Section 3,we apply this method to stock markets during the COVID-19 crisis. Section 4 concludes. In this section, we introduce a method to estimate a time-varying density. For this purpose, werecall how we can estimate a static non-parametric density as well as its dynamic adaptation. Thismethod relies on the choice of two free parameters. The main innovation of this paper consists inbasing this choice on a quantitative version of criteria usually devoted to the evaluation of forecastdensities. The last subsection is about some divergence statistics between two densities. We willuse these divergences in the empirical part of this paper to quantify the amplitude of the variationsof the densities through time and to determine the significance of these variations.
A widespread non-parametric method to estimate a pdf uses kernels. The kernel density is definedby: ˆ f ( x ) = 1 th t (cid:88) i =1 K (cid:18) x − X i h (cid:19) , where h > K a function following the same rules as a pdf, namely itis positive, integrable and its integral is one [41, 38]. With these two properties, ˆ f also has thefeatures of a density. In particular, when integrating ˆ f , the substitution y = ( x − X i ) /h in each ofthe t integrals in the sum clearly shows that we need to normalize the sum by th in order to have (cid:82) R ˆ f ( x ) dx = 1. The symmetry and the continuity of the kernel is also often desirable.The rationale of the kernel density is to make a continuous generalization of a histogram. Indeed,in the histogram, we count the number of occurrences in given intervals. The thinner the intervals,the more accurate the density estimation. But very thin intervals lead to overfitting, with a veryerratic estimated density. To avoid this, we prefer to smooth the histogram. A simple mannerto do this consists in replacing the number of occurrences of observations in each thin intervalby a criterion of proximity of each observation to the middle of this interval. This is how thekernel density works. The proximity function K must thus reach its maximum in zero and it mustdecrease progressively when its argument gets away from zero. Thus, the impact of X t on theestimated pdf in x is maximal for x = X t and it decreases progressively when | x − X t | becomeshigher until reaching zero, at least asymptotically. It means that the observation of X t will haveno impact on the density ˆ f ( x ) if X t is by far greater or lower than x .There exists a large literature on the choice of the kernel K [34, 6]. Epanechnikov and Gaussiankernels are widespread, due to their simplicity. But it seems, according to the related literature,that the choice of the kernel if often less overriding than the choice of the bandwidth h . Indeed,this parameter plays the role of a regularization parameter. In practice, we tune h in order to In the empirical application, we use Epanechnikov kernel. h , the wider each kernel and the larger the intervalon which each observation has an impact. We review in Section 2.4 some methods to select thisbandwidth. We can change the formulation of the kernel density in order to take into account its progressiveevolution through time. We get this dynamic version of the kernel density by the mean of weights w t,i : ˆ f t ( x ) = 1 h t (cid:88) i =1 w t,i K (cid:18) x − X i h (cid:19) , (1)such that (cid:80) ti =1 w t,i = 1 [17]. For a fixed t , if the weights w t,i increase with i , more recentobservations will be overweighted and the update of the kernel density is consistent with theeconomic intuition. The exponential weighting is widespread in the statistical literature as it canreduce the computation of a density update to a simple recursive formula instead of the linear costinduced by the whole estimation of the density from scratch. We then express the weights by w t,i = 1 − ω − ω t ω t − i , where 0 < ω ≤
1. With this setting for the weights, we note ˆ f h,ωt the density introduced inequation (1). Then, when the duration of the estimation sample is large enough with respect tothe speed of decay of the weights, a good approximation of ω t,i is (1 − ω ) ω t − i . The recursiveformula that the dynamic kernel density follows is then:ˆ f h,ωt +1 ( x ) = ω ˆ f h,ωt ( x ) + 1 − ωh K (cid:18) x − X t +1 h (cid:19) . (2)The two free parameters of this dynamic non-parametric density are the bandwidth h , and thediscount factor ω . Starting at a given time t from a density with an exponential weighting, suchas in equation (1), we obtain the density at subsequent times iteratively by applying equation (2).Along with the time-varying density, we can build the corresponding cdf. Integrating equation (1),the first estimated cdf, at time t , is:ˆ F h,ωt ( x ) = 1 − ω − ω t t (cid:88) i =1 ω t − i K (cid:18) x − X i h (cid:19) , where K is the primitive of K such that lim x →∞ K ( x ) = 1. Subsequently, we get the cdf at a time t + 1 by the mean of the following iteration:ˆ F h,ωt +1 ( x ) = ω ˆ F h,ωt ( x ) + (1 − ω ) K (cid:18) x − X t +1 h (cid:19) , which is the primitive of equation (2).Other approaches are possible for estimating a time-varying density. For example, we could haveestimated static densities on successive intervals and have then smoothed the transition between theresulting densities. For parametric densities, this amounts to smoothing time-varying parameters,which is a well-known subject in statistics [13]. However, this approach is not very natural fornon-parametric densities, as we need a big amount of data to estimate one static density.4 .3 Evaluation of the quality of the dynamic density The purpose of density forecast may vary a lot. In a financial perspective, one may need it to buildrisk measures, or to forecast an average price return or a most likely price return. In practice, aninvestment decision is to be made relying on this density. One must then evaluate the quality ofthe forecast with respect to a loss function corresponding to this decision. Unfortunately, therecannot exist any absolute ranking of density forecasts valid for all the possible loss functions [9].We therefore have to make a choice which is necessarily subject to discussion. In the econometricliterature, we can find an evaluation of the density forecast by the mean of the likelihood ofobservations [17]. We think that this choice, as focusing on the body of the distribution, neglectsthe behaviour of the density in its tails. A more general perspective would incite to choose adensity forecast consistent with the real density, even with its tails. Such a forecast would be morerelevant in finance for calculating a VaR or an expected shortfall. However, the real density is neverobserved. If we had a static density forecast, the evaluation of this forecast could simply consist incomparing it with the empirical density of all the observed price returns. But the forecast density issupposed to change at each time, so that we have to base our analysis on another invariant density.This is the purpose of the analysis of the PITs, introduced by Diebold, Gunther, and Tay [9] andwidespread in the literature of evaluation of density forecasts [16, 17, 20]. We now expose thismethod, that we will adapt in the next subsection from the evaluation of density forecasts to theselection of the bandwidth and of the discount factor.We observe T successive price returns: X , ..., X T . We use the t first to build a density estimationˆ f h,ωt , using equation (1). This density includes a discount in order to depict more closely recentobservations. We thus conceive ˆ f h,ωt as a forecast of the true density f t +1 of X t +1 , as well aswe conceive ˆ f h,ωt , for any t ≥ t , as a forecast of the true density f t +1 of X t +1 . Of course, f t varies with t , and we only observe one random variable in this density, namely X t . However, weare able to build a density which does not depend on t and which will therefore be very useful forevaluating the quality of the density forecast. This invariant distribution is the one of the PITvariables, which are defined by: Z h,ωt = ˆ F h,ωt − ( X t ) . Indeed, if our forecast is good, that is if ∀ t ≥ t + 1, ˆ F h,ωt − = F t , where F t is the true cdf, then,whatever t , Z h,ωt follows a uniform random variable in the interval [0 , Z h,ωt ∼ U (0 ,
1) [9]. Infact, this idea is quite old [33] and is even something with which any person simulating randomvariables following a given cdf is familiar. In addition to being uniform, the variables Z h,ωt +1 , ..., Z h,ωT must also be independent.Thanks to the PITs, we have T − t observations in the same uniform density. This makes itpossible to evaluate the density forecast: we have to check that Z h,ωt +1 , ..., Z h,ωT are indeed iid anduniform in [0 , The literature about bandwidth selection is very rich [41, 19, 38]. Beyond the rule-of-thumb whichoften leads the choice of the bandwidth h among practitioners, we can cite more relevant methodsof selection, such as the minimization of AMISE [36], evaluated for instance with cross validation ora plug-in technique. As exposed in the previous subsection, we will try to select h in order to makethe distribution of the PITs close to a uniform distribution, and the uniform case is trivial in the5MISE approach and makes this method ineffective. We can also cite the possibility to estimatea time-varying bandwidth, like in the literature about online estimation of kernel density [43, 21].Our approach will be different from the online framework: our time-varying aspect is not about h but about the density.Our problem, in addition, is not only about selecting h . We have to select it jointly with thediscount factor ω . As already mentioned, we can base this selection on the maximization of alikelihood [17]. But we want to have an accurate description of the true density, not to make thebest point forecast. This thus incites us to use PITs and to adapt the method of evaluation ofdensity forecasts. We have two objectives regarding the PITs: the uniformity and the independence.We first focus on the uniformity. According to Diebold, Gunther, and Tay, methods based onstatistical tests, such as the Kolmogorov-Smirnov test, are not relevant because nonconstructive,insofar as they do not indicate why PITs are not uniform [9]. They thus prefer a qualitativeanalysis using graphical tools such as a simple histogram. Besides this mainstream approach, somepapers propose a statistical test assessing the uniformity of the PITs [5]. Our framework is in factdifferent as we do not want to determine whether our density forecast is good or not. Instead, givena density model, we only want to select its best parameters, here h and ω . Maybe our forecastwill be poor, even though the non-parametric approach makes this case unlikely, but we will havedone the best with respect to the model used. We thus do not want to test the consistence ofour PITs with a uniform distribution, but we select the parameters h and ω minimizing some teststatistic. We choose to minimize the Kolmogorov-Smirnov statistic, k , because it is widespreadand easy to understand. This statistic is simply the maximum gap between the empirical cdf andthe theoretical one, which, in our case, is uniform: k ( Z h,ωt +1 , ..., Z h,ωT ) = max t +1 ≤ s ≤ T (cid:12)(cid:12)(cid:12)(cid:12) s − t T − t + 1 − Z h,ωπ ( s ; t +1 ,T ) (cid:12)(cid:12)(cid:12)(cid:12) , where s ∈ (cid:74) u, v (cid:75) (cid:55)→ π ( s ; u, v ) is a permutation of (cid:74) u, v (cid:75) defining the new order: Z h,ωπ ( u ; u,v ) ≤ Z h,ωπ ( u +1; u,v ) ≤ ... ≤ Z h,ωπ ( v − u,v ) ≤ Z h,ωπ ( v ; u,v ) .But the Kolmogorov-Smirnov statistic says nothing about the independence of the variables andthe fact that the sampling is random [10]. This property is however crucial and its absence couldlead to nonsense estimations [18, 8]. In the standard approach regarding the evaluation of densityforecast, the independence is assessed by graphical tools, such as a correlogram [9]. We would likeagain a more systematic approach. We thus use an additional criterion coming from the literatureof simulation of quasi-random variables. We indeed want our series of PITs to be a low-discrepancysequence [27, 39, 28]. This is even more important that we want to estimate a time-varying densityof price returns in a regime-switching market. The rationale behind the discrepancy is that theuniformity must be a feature not only of the PITs in the interval [ t + 1 , T ] but also of the PITs inany of its subintervals: the sequence must be equidistributed. This method will avoid almost staticdensities in which price returns are globally well distributed, but with mainly high PITs in a bullishregime and then mainly low PITs during a crisis period. The discrepancy criterion we propose tominimize is then the worst uniformity statistic over all the subintervals of [ t + 1 , T ]. However,the Kolmogorov-Smirnov statistic depends on the size of the sample. We thus consider insteada size-adapted version of this statistic. Indeed, for a sample of n → + ∞ observations, √ n × k has a limit distribution which does not depend on n and that is used for the Kolmogorov-Smirnovstatistical test [25]. We also choose a minimal size ν above which we consider that the asymptoticKolmogorov distribution may be applied. Thanks to this size-adapted statistic, our discrepancy We may consider, for example, ν = 22, so that we verify the uniformity for every one-month interval of dailyprice returns. It is the choice made in the empirical part of this paper.
6n fact focuses on the subinterval of size higher than ν with the least uniform PITs: d ν ( Z h,ωt +1 , ..., Z h,ωT ) = max t +1 ≤ s , <ω ≤ d ν ( Z h,ωt +1 , ..., Z h,ωT ) . (3)We also propose a constrained version of this optimisation problem. Indeed, the above uncon-strained problem may lead to a dynamic of densities far from the economic intuition, for examplewith very rough densities. In practice, the time-varying densities we have built with this methodseem empirically robust, at first sight. But we are interested in defining some reasonable bounds for h or ω . In order to have a robust time-varying density, we want that an isolated observation doesnot change too much the density between two consecutive dates. To state things quantitatively,we want to limit the Kolmogorov-Smirnov statistic between two densities at consecutive dates.We propose ν − as an upper bound. This choice has the advantage not to introduce a new freeparameter but to link it to the validation rule exposed above. The rationale of this bound is thefollowing. With the daily ν − bound, the maximal change of the Kolomogorov-Smirnov statistic,which is 1, cannot be reached before the minimal validation horizon, ν . In a new market regime,the density may be strongly transformed. The time needed to update the density and to depict thisnew regime will thus be roughly ν days if the bound for the daily variation of the Kolomogorov-Smirnov statistic is ν − . Minimizing the discrepancy of the PITs for intervals smaller than ν dayswould then be irrelevant. This is the reason why there should be a consistence of the minimalvalidation horizon with the bound of the variation of the Kolmogorov-Smirnov statistic betweentwo consecutive dates. The link between the ν − bound and the parameters is straightforward, asthe update of the cdf leads to an increase of the cdf at one point of at most 1 − ω . Therefore, weintroduce a new bound for ω and the constrained problem is as follows:( h (cid:63)c , ω (cid:63)c ) = argmin h> , − ν − <ω ≤ d ν ( Z h,ωt +1 , ..., Z h,ωT ) . (4)We could also want to set bounds to h in order to secure the robustness of the density at one dateinstead of the robustness across time. Nevertheless, we consider that the robustness across timeis enough. Indeed, as the density will not change very rapidly, each density will be a fairly goodforecast for observations close in time. In the method exposed above to select h and ω , we minimize the divergence between an empiricaldistribution and a uniform one. In particular, we use the Kolmogorov-Smirnov statistic to depictthis divergence because of both its simplicity and its asymptotic behaviour. But other divergencemetrics could replace the Kolmogorov-Smirnov statistic in this method.We can also use various divergence statistics to quantify to which extent the estimated pdf isdifferent from what it was at a reference date and thus track the evolution of the pdf through time.This is what these various statistics are devoted to in the empirical part of this paper. In addition,thanks to simulations, we will determine confidence intervals for each of these divergences at eachdate, so that we will be able to assess whether the evolution of the pdf through time is significantor not. We now review three of these divergence statistics in addition to the Kolmogorov-Smirnov7tatistic. Some are based on a comparison of densities, and others on a comparison of cdfs or evenof quantiles.First we recall the definition of the Kolmogorov-Smirnov statistic between the cdfs ˆ F t and ˆ F t : k (cid:16) ˆ F t , ˆ F t (cid:17) = sup x ∈ R (cid:12)(cid:12)(cid:12) ˆ F t ( x ) − ˆ F t ( x ) (cid:12)(cid:12)(cid:12) . Whereas the Kolmogorov-Smirnov statistic considers the maximal difference between two cdfs, theHellinger distance is the cumulated difference between densities: H (cid:16) ˆ f t , ˆ f t (cid:17) = (cid:115) (cid:90) R (cid:18)(cid:113) ˆ f t ( x ) − (cid:113) ˆ f t ( x ) (cid:19) dx. The p -Wasserstein distance, given p ≥
1, is related to the optimal transportation theory [40]. It isthe minimal cost to reconfigure one pdf in another one. The Kolmogorov-Smirnov statistic is the L ∞ distance between the cdfs, whereas the Wasserstein metric is the L p distance between theirquantiles. The p -Wasserstein distance is indeed defined by [31]: W p (cid:16) ˆ F t , ˆ F t (cid:17) = (cid:18)(cid:90) (cid:12)(cid:12)(cid:12) ˆ F − t ( α ) − ˆ F − t ( α ) (cid:12)(cid:12)(cid:12) p dα (cid:19) /p . In this paper, we focus on the case p = 1, for which the Wasserstein distance is also equal tothe L distance between cdfs [31]. It thus clearly generalizes the Kolmogorov-Smirnov statistic.We will see in the empirical part of this paper that this generalization may be more appropriatethan the Kolmogorov-Smirnov statistic to assess the significance of the variations of the distri-bution. Indeed, the occurrence of several extreme observations may not impact significantly theKolmogorov-Smirnov statistic, which mainly focuses on the body of the distribution, whereas theWasserstein distance takes into account the whole distribution. On the other hand, since a uni-form distribution is not subject to a dichotomy between body and tails, the Kolmogorov-Smirnovstatistic seems appropriate for assessing the uniformity of the PITs.As opposed to the other divergences, the Kullback-Leibler divergence is not strictly speaking adistance function as it is not symmetric in the two densities. It is related to Shannon’s entropy. Itis defined by: D KL (cid:16) ˆ f t (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) ˆ f t (cid:17) = (cid:90) R ˆ f t ( x ) log (cid:32) ˆ f t ( x )ˆ f t ( x ) (cid:33) dx. All these divergences can be generalized easily if we work with a discrete grid instead of R . Theyare also always positive and with a value of zero if ˆ f t = ˆ f t . We have applied the above method to the estimation of time-varying densities of several stockindices. We consider American indices (NASDAQ Composite, S&P 500, S&P 100), Europeanindices (EURO STOXX 50, Euronext 100, DAX, CAC 40), and Asian indices (Nikkei 225, KOSPI,SSE 50), with a particular focus on S&P 500, EURO STOXX 50, and the South-Korean KOSPIindices. We have used data from Yahoo finance in the time interval from 04/17/2015 to 05/28/2020.The study period includes the economic crisis related to the COVID-19. In particular, we study theimpact of the COVID-19 on three stock markets corresponding to economic areas with different8risis management regarding the pandemic. The questions we want to answer are about thesignificance of this impact and the characterization of a recovery after the peak of the crisis.We have estimated daily a pdf of daily price returns from the date t corresponding to November1st, 2019. These densities include observations from 2015, exponentially weighted with an optimaldiscount factor depending on the index. We provide these optimal discount factors in Table 1,along with the optimal bandwidth, determined by equation (3), as well as the constrained versiondefined by equation (4). h (cid:63) ω (cid:63) h (cid:63)c ω (cid:63)c NASDAQ Composite 0.0110 0.827 0.0101 0.955S&P 500 0.0122 0.840 0.0121 0.955S&P 100 0.0124 0.877 0.0124 0.955EURO STOXX 50 0.0124 0.831 0.0124 0.960Euronext 100 0.0110 0.883 0.0123 0.963DAX 0.0038 0.864 0.0014 0.955CAC 40 0.0117 0.864 0.0119 0.960Nikkei 225 0.0124 0.790 0.0165 0.955KOSPI 0.0124 0.813 0.0286 0.955SSE 50 0.0107 0.778 0.0002 0.974Mean value 0.0110 0.838 0.0112 0.959Median value 0.0122 0.840 0.0123 0.955Table 1: Optimal bandwidth h (cid:63) and discount factor ω (cid:63) for severalstock indices for densities between November 2019 and May 2020.The constrained version is h (cid:63)c and ω (cid:63)c .The optimal bandwidth is close to 0 . h m = 0 . ω m = 0 . (cid:46) before the crisis: on the 16th December 2019, in a period where the markets were steady, (cid:46) at the first turmoil in the markets, the 7th February 2020, (cid:46) at the peak of the pandemic, which occurs at a different date for each market, (cid:46) at the end of our sample, the 28th May 2020.We determine the date of the peak of the pandemic as the date t maximizing the Hellinger distanceof the density ˆ f h m ,ω m t with respect to the estimated pdf in t , that is ˆ f h m ,ω m t . This peak doesnot follow an epidemiological definition, since we only observe financial data. It corresponds to amaximal divergence with the steady state of the market.For the KOSPI and the EURO STOXX 50, the pdf before the crisis looks like a Gaussian dis-tribution, with thin tails. Then the pdf slightly widens on the losses side. At the peak of the9igure 1: Estimated dynamic pdf of daily price returns for S&P 500(top left), EURO STOXX 50 (top right), and KOSPI (bottom) in-dices. 10risis, the pdf crushes, with very fat tails. After the peak, it tends to an asymmetric distributionwith a negative skewness and slowly decreasing tails. The chronology is similar for the S&P500,except that the pdf the 7th February is similar to the one before the crisis. It may indicate a lowresponsiveness of the US market in front of the outbreak. Or it may denote a temporary lag in theimpact of the COVID-19 on the US market, reflecting the lag in the spread of the outbreak in theregion.Displaying pdfs at several dates as in Figure 1 makes it possible to depict the chronology of thecrisis. But it is limited since displaying this density every day of our sample would make thefigure unreadable. Therefore, instead of displaying each pdf, we display one statistic per day. Thisstatistic must reflect the divergence of the pdf with respect to a steady state of markets. We thusdetermine the Kolmogorov-Smirnov statistic, the Hellinger distance, the 1-Wasserstein distance, aswell as the Kullback-Leibler divergence of the pdf each day with respect to the pdf in t . Resultsare displayed in Figure 2. Whatever the divergence statistic, we observe first a slight increasefrom 0 toward a low positive value until the beginning of the crisis, where the divergence sharplyincreases till the peak where it begins to slowly decrease. This last phase corresponds to the slowrecovery of the markets after the crisis.Figure 2: Daily evolution through time of four divergence statistics:the Kolmogorov-Smirnov statistic (top left), the Hellinger distance(top right), the Wasserstein distance (bottom left), and the Kullback-Leibler divergence (bottom right). The curves correspond to S&P500 (black), EURO STOXX 50 (dark grey), and KOSPI (light grey)indices. The dotted lines are simulated confidence intervals, withconfidence levels, from the bottom to the top: 95%, 99% and 99 . H is thus that all the price returns are iid Gaussian variables. For11ach statistic and each date, we represent the quantile estimated on simulations and correspondingto three confidence levels: 95%, 99% and 99 . t , for a given stock index, ifthe divergence of the current pdf with respect to the pdf in t is above a particular curve of theconfidence interval, we reject H with the corresponding confidence level p . In other words, weconsider the pdf in t to be significantly different from the pdf in t with a confidence level p .Depending on the divergence considered, we are able to determine the peak of the impact as thedate maximizing the statistic. We display in Table 2 the date of the peak as well as the value ofthe divergence at the peak, before the crisis, and late May in the Hellinger approach. Accordingto this table, the strongest impact is in the US but the recovery seems faster there than in Europe.The smallest impact and the fastest recovery is by far in China. The peak occurs between the 25thMarch (China and South-Korea) and the 6th April (US and Japan), whatever the index, exceptfor the DAX, whose peak is in May. We use the other divergences as a robustness check of theseresults. The conclusions are in fact similar: small impact and almost total recovery late May forthe Chinese SSE 50 index, strongest impact on the US market, slowest recovery on the Europeanmarket. We also observe some variations in the estimation of the peak date. The most surprisingone is provided by the Kolmogorov-Smirnov statistic, according to which the peak is reached firstin Europe the 18th March, before continental Asia and US the 23rd March, and finally Japan the2nd April. H on 7th Feb. H at the peak Date of the peak H on 28th MayNASDAQ Composite 0.111 0.531 2020-04-01 0.295S&P 500 0.084 0.562 2020-04-06 0.363S&P 100 0.089 0.562 2020-04-06 0.324EURO STOXX 50 0.051 0.466 2020-03-27 0.398Euronext 100 0.050 0.484 2020-03-27 0.385DAX 0.061 0.458 2020-05-05 0.377CAC 40 0.052 0.479 2020-03-27 0.389Nikkei 225 0.095 0.477 2020-04-06 0.350KOSPI 0.083 0.518 2020-03-25 0.294SSE 50 0.070 0.381 2020-03-25 0.122Table 2: Hellinger distance H with respect to t . The peak corre-sponds to when the maximal Hellinger distance is reached. Dates arein 2020.We stress the fact that the alternative chronology of the peaks is not the only particularity ofthe Kolmogorov-Smirnov statistic with respect to the three other divergence statistics we haveimplemented. For instance, the significance of the financial crisis in some regions is questionableaccording to this divergence statistic, as one can see in Figure 2. We can nevertheless explainthis striking, and certainly dubious, conclusion. Indeed, when simulating two sets of iid ran-dom variables, we get two kernel densities but the Kolmogorov-Smirnov statistic focuses on onlyone quantile, generally corresponding to where the cdf is the steepest, that is in the body ofthe distribution. If we disrupt one of these two densities with a limited number of outliers, theKolmogorov-Smirnov statistic may not change a lot as this modification mainly impacts the tailsand not the body of the distribution. On the contrary, the three other divergence statistics are lessrobust to outliers as they are defined by integrals over all the distribution. Their responsivenessto a crisis is thus higher. For this reason, we prefer them to the Kolmogorov-Smirnov statistic forassessing the significance of the variations of a dynamic pdf.12 Conclusion
In this paper, we have introduced a new method to select the two free parameters of a dynamickernel density estimation, namely the discount factor and the bandwidth. This method relies onthe maximisation of the accuracy of the daily pdf. This accuracy is to be understood in the sense ofthe literature about density forecast evaluation: the PIT of each new observation, expressed usingthe time-varying distribution, forms a set of variables which must be iid uniform variables. We usethe Kolmogorov-Smirnov statistic and a discrepancy statistic to build a quantitative criterion ofaccuracy of the pdf. It is this criterion that we try to maximize when selecting the bandwidth andthe discount factor of our time-varying pdf.We have applied this method to financial data. In particular, we represent the evolution of the pdfof daily price returns for several stock indices during the COVID-19 pandemic. We are thus ableto expose an accurate chronology of the financial crisis. Though the impact of the pandemic onthe Chinese market seems limited, we observe that the strongest impact occurred in the US. Theslowest recovery is in Europe, for which the pdf of daily returns is still significantly different froma steady market late May 2020. On the contrary, the recovery of the Chinese and South-Koreanmarkets is very rapid. According to several divergence statistics late May 2020, they are even notsignificantly different from what they were before the crisis.
References [1]
Ammy-Driss, A., and Garcin, M. (2020),
Efficiency of the financial markets during theCOVID-19 crisis: time-varying parameters of fractional stable dynamics , Working paper[2]
Arias-Calluari, K., Alonso-Marroquin, F., Najafi, M.N., and Harr´e, M. (2020),
Forecasting the effect of COVID-19 on the S&P500 , arXiv preprint[3]
Baker, S.R., Bloom, N., Davis, S.J., Kost, K.J., Sammon, M.C., and Viratyosin, T. (2020),
The unprecedented stock market impact of COVID-19 , NBER working paper w26945[4]
Beran, R. (1977),
Minimum Hellinger distance estimates for parametric models , Annals ofstatistics, 5, 3: 445-463[5]
Berkowitz, J. (2001),
Testing density forecasts, with applications to risk management , Jour-nal of business & economic statistics, 19, 4: 465-474[6]
Bouezmarni, T., and Rombouts, J.V. (2010),
Nonparametric density estimation for mul-tivariate bounded data , Journal of statistical planning and inference, 140, 1: 139-152[7]
B¨uhlmann, P., and McNeil, A.J. (2002),
An algorithm for nonparametric GARCH mod-elling , Computational statistics & data analysis, 40, 4: 665-683[8]
Davis, M.H. (2016),
Verification of internal risk measure estimates , Statistics & risk model-ing, 33, 3-4: 67-93[9]
Diebold, F.X., Gunther, T.A., and Tay, A.S. (1998),
Evaluating density forecasts, withapplications to financial risk management , International economic review, 39: 863-883[10]
Diebold, F.X., Tay, A.S., and Wallis, K.F. (1999),
Evaluating density forecasts of in-flation: the survey of professional forecasters , in Engle, R.F., and White, H. (Eds.),
Coin-tegration, causality, and forecasting: a Festschrift in honour of Clive W.J. Granger , Oxforduniversity press, pp. 76-90 1311]
Fan, J., and Yao, Q. (1998),
Efficient estimation of conditional variance functions instochastic regression , Biometrika, 85, 3: 645-660[12]
Forsberg, L., and Bollerslev, T. (2002),
Bridging the gap between the distribution ofrealized (ECU) volatility and ARCH modelling (of the Euro): the GARCH-NIG model , Journalof applied econometrics, 17, 5: 535-548[13]
Garcin, M. (2017),
Estimation of time-dependent Hurst exponents with variational smoothingand application to forecasting foreign exchange rates , Physica A: statistical mechanics and itsapplications, 483: 462-479[14]
Garcin, M., and Goulet, C. (2019),
Non-parametric news impact curve: a variationalapproach , to appear in Soft computing[15]
Garcin, M., and Gu´egan, D. (2014),
Probability density of the empirical wavelet coefficientsof a noisy chaos , Physica D: nonlinear phenomena, 276: 28-47[16]
Gneiting, T., Balabdaoui, F., and Raftery, A.E. (2007),
Probabilistic forecasts, calibra-tion and sharpness , Journal of the royal statistical society: series B (statistical methodology),69, 2: 243-268[17]
Harvey, A., and Oryshchenko, V. (2012),
Kernel density estimation for time series data ,International journal of forecasting, 28, 1: 3-14[18]
Holzmann, H., and Eulert, M. (2014),
The role of the information set for forecasting –with applications to risk management , Annals of applied statistics, 8, 1: 595-621[19]
Jones, M.C., Marron, J.S., and Sheather, S.J. (1996),
A brief survey of bandwidthselection for density estimation , Journal of the American statistical association, 91, 433: 401-407[20]
Ko, S.I., and Park, S.Y. (2013),
Multivariate density forecast evaluation: a modified ap-proach , International journal of forecasting, 29, 3: 431-441[21]
Kristan, M., Leonardis, A., and Skoˇcaj, D. (2011),
Multivariate online kernel densityestimation with Gaussian kernels , Pattern recognition, 44, 10-11: 2630-2642[22]
Lacoume, J.-L., Amblard, P.-O., and Comon, P. (1997),
Statistiques d’ordre sup´erieurpour le traitement du signal , Masson, Paris[23]
Li, Z., Liu, S., and Tian, M. (2014),
Collective behavior of equity returns and marketvolatility , Journal of data science, 12, 3: 545-561[24]
Luong, A., and Bilodeau, C. (2017),
Simulated minimum Hellinger distance estimationfor some continuous financial and actuarial models , Open journal of statistics, 7, 4: 743-759[25]
Marsaglia, G., Tsang, W.W., and Wang, J. (2003),
Evaluating Kolmogorov’s distribu-tion , Journal of statistical software, 8, 18: 1-4[26]
Morimura, T., Sugiyama, M., Kashima, H., Hachiya, H., and Tanaka, T. (2012),
Parametric return density estimation for reinforcement learning , arXiv preprint[27]
Niederreiter, H. (1988),
Low-discrepancy and low-dispersion sequences , Journal of numbertheory, 30, 1: 51-70[28]
Niederreiter, H. (2017),
Recent constructions of low-discrepancy sequences , Mathematicsand computers in simulation, 135: 18-27 1429] ˜N´ıguez, T.M., and Perote, J. (2017),
Moments expansion densities for quantifying finan-cial risk , North American journal of economics and finance, 42: 53-69[30]
Pagan, A.R., and Schwert, G.W. (1990),
Alternative models for conditional stock volatil-ity , Journal of econometrics, 45, 1-2: 267-290[31]
Panaretos, V.M., and Zemel, Y. (2019),
Statistical aspects of Wasserstein distances ,Annual review of statistics and its application, 6: 405-431[32]
Pavlyshenko, B.M. (2020),
Regression approach for modeling COVID-19 spread and itsimpact on stock market , arXiv preprint[33]
Rosenblatt, M. (1952),
Remarks on a multivariate transformation , Annals of mathematicalstatistics, 23, 3: 470-472[34]
Scaillet, O. (2004),
Density estimation using inverse and reciprocal inverse Gaussian ker-nels , Nonparametric statistics, 16, 1-2: 217-226[35]
Semeyutin, A., and O’Neill, R. (2019),
A brief survey on the choice of parameters for:”Kernel density estimation for time series data” , North American journal of economics andfinance, 50: 101038[36]
Silverman, B.W. (1986),
Density estimation for statistics and data analysis , CRC press[37]
Tokat, Y., Rachev, S.T., and Schwartz, E.S. (2003),
The stable non-Gaussian assetallocation: a comparison with the classical Gaussian approach , Journal of economic dynamicsand control, 27, 6: 937-969[38]
Tsybakov, A.B. (2008),
Introduction to nonparametric estimation , Springer science & busi-ness media[39]
Tuffin, B. (1996),
On the use of low discrepancy sequences in Monte Carlo methods , MonteCarlo methods and applications, 2, 4: 295-320[40]
Villani, C. (2003),
Topics in optimal transportation , American mathematical society, 58[41]
Wand, M.P., and Jones, M.C. (1994),
Kernel smoothing , CRC press[42]
Wang, X., Tsokos, C.P., and Saghafi, A. (2018),
Improved parameter estimation oftime dependent kernel density by using artificial neural networks , Journal of finance and datascience, 4, 3: 172-182[43]
Wegman, E.J., and Davies, H.I. (1979),
Remarks on some recursive estimators of a prob-ability density , Annals of statistics, 7, 2: 316-327[44]
Zhang, T., and Wu, W.B. (2015),