[PDF] Pattern recognition in micro-trading behaviors before stock price jumps: A framework based on multivariate time series analysis

Abstract

Studying the micro-trading behaviors before stock price jumps is an important problem for financial regulations and investment decisions. In this study, we provide a new framework to study pre-jump trading behaviors based on multivariate time series analysis. Different from the existing literature, our methodology takes into account the temporal information embedded in the trading-related attributes and can better evaluate and compare the abnormality levels of different attributes. Moreover, it can explore the joint informativeness of the attributes as well as select a subset of highly informative but minimally redundant attributes to analyze the homogeneous and idiosyncratic patterns in the pre-jump trades of individual stocks. In addition, our analysis involves a set of technical indicators to describe micro-trading behaviors. To illustrate the viability of the proposed methodology, an application case is conducted based on the level-2 data of 189 constituent stocks of the China Security Index 300. The individual and joint informativeness levels of the attributes in predicting price jumps are evaluated and compared. To this end, our experiment provides a set of jump indicators that can represent the pre-jump trading behaviors in the Chinese stock market and have detected some stocks with extremely abnormal pre-jump trades.

Full PDF

PPattern recognition in trading behaviors before stock price jumps:new method based on multivariate time series classiﬁcation

Ao Kong*

School of Finance, Nanjing University of Finance and Economics, Nanjing 210023, China

Robert Azencott

Department of Mathematics, University of Houston, TX 77204, USA

Hongliang Zhu

School of Management and Engineering, Nanjing University, Nanjing 210023, China

Abstract

This paper extends the work of Boudt and Pertitjean (2014) and investigates the trading patternsbefore price jumps in the stock market based on a new multivariate time classiﬁcation technique.Diﬀerent from Boudt and Pertitjean (2014), our analyzing scheme can explore the “time-seriesinformation” embedded in the trading-related attributes and provides a set of jump indicators forabnormal pattern recognition. In addition to the commonly used liquidity measures, our analysisalso involves a set of technical indicators to describe the micro-trading behaviors. An empiricalstudy is conducted on the level-2 data of the constituent stocks of China Security Index 300. Itis found that among all the candidate attributes, several volume and volatility-related attributesexhibit the most signiﬁcant abnormality before price jumps. Though some of the abnormalitiesstart just shortly before the occurrence of the jumps, some start much earlier. We also ﬁnd thatmost of our attributes have low mutual dependencies with each other from the perspective oftime-series analysis, which allows various perspectives to study the market trading behaviors. Tothis end, our experiment provides a set of jump indicators that can eﬀectively detect the stockswith extremely abnormal trading behaviors before price jumps. More importantly, our study oﬀersa new framework and potential useful directions for trading-related pattern recognition problemusing the time series classiﬁcation techniques.

Keywords: price jumps, high-frequency data, mutual information, multivariate time seriesanalysis, pattern recognition

1. Introduction

Large and discontinuous changes, known as jumps, are essential components of stock pricedynamics(Merton, 1976). With the perceived risk diﬀering from small and regular price movements,they play an important role in the risk measurement, option pricing, and portfolio allocation(Duﬃe

Preprint submitted to xxxx November 11, 2020 a r X i v : . [ q -f i n . S T ] N ov nd Pan, 2001; Jarrow and Rosenfeld, 1984; Kapadia and Zekhnini, 2019; Zhou et al., 2019).Asa result, investigating the trading patterns before price jumps, as well as exploring the jumpindicators in the stock market are of great importance in the ﬁnancial ﬁeld (Farmer et al., 2004;Lee and Mykland, 2008; Boudt and Pertitjean, 2014). Even so, limited literature can be foundrelated to this topic. It is possible because the jumps, occurring instantly, often follow microprecursors hiding in the complex noise of price dynamics.Jiang and Lo (2011) was the ﬁrst to study the liquidity dynamics before price jumps, butbased on the treasury market and concentrate on the eﬀect of microeconomic news announce-ments. Regarding the stock market, Boudt and Pertitjean (2014) was the ﬁrst to investigate thecharacteristics of tick data before the jumps in the U.S. market based on more than ten liquiditymeasures and found that some measures, such as the trading volume, the number of trades, thequoted and eﬀective price spread exhibit signiﬁcant abnormal movements. Following the work ofBoudt and Pertitjean (2014), B¸edowska-S´ojka (2016) and Wan et al. (2017) then revealed somediﬀerent liquidity dynamics before the price jumps in the Polish and Chinese stock markets basedon tick and level-2 data respectively. Despite the diﬀerent markets, the statistical methods in thethese studies are similar: after extracting the values of the liquidity measure within a prior windowbefore intraday price jumps, a Mann-Whitney test is used to compare their median values to thosewithin non-jumping days from time to time. It is worthy to note that the common Mann-Whitneytest treats the liquidity series as a set of independent random variables, thus has neglected theinformation embedded in the whole series.Our work is then an extension of the work of Boudt and Pertitjean (2014), as well as thefollowing works related to liquidity pattern recognition in the stock market(Wan et al., 2017;B¸edowska-S´ojka, 2016). Diﬀerent from them, we investigate the liquidity dynamics based on a newtechnique for multivariate time series classiﬁcation(Ircio et al., 2020). This technique takes intoaccount the information and properties of the time series and provides a tool to select a highlyinformative combination of attributes for pattern recognition. Among all the previous methodsfor multivariate time series analysis (Fang et al., 2015; Jovic and Jovic, 2017; Han et al., 2015;Yoon et al., 2005; He et al., 2019), we choose this technique because it does not transform theoriginal time series into a diﬀerent representation; thus, the issue of losing interpretability ofselected features has been avoided. Apart from the liquidity measures, we also involve a numberof technical indicators in our study to describe the intraday trading behaviors, the idea of whichis inspired by our previous work (Kong et al., 2020). Time series-based mutual information, asa key point of the classiﬁcation technique, is used to evaluate the abnormality embedded in thevariables related to the occurrence of price jumps, which is more suitable to analyze the non-lineardependency between random variables with high noise levels while do not need to assume theirdistributions(Shannon, 2001). The goal of our study is to select a generic set of jump indicators2hat can represent the abnormal patterns of trading behaviors before price jumps for all stocks, anduse them to identify the common and idiosyncratic patterns of the stocks in the market. On topof these, our study aims to provide a new framework for this trading-related pattern recognitionproblem using time series analyzing tools.Our paper is also related to several papers that analyse the liquidity dynamics before ﬁrm-speciﬁc news(Ranaldo, 2008; Lakhal, 2008; Riordan et al., 2013). Although ﬁrm-speciﬁc news canforce stock price jumps, conditioning the intraday analysis on these scenarios does not completelycapture the trading behaviors of market participants before stock price jumps since many jumpsare not directly associated with public news(Joulin et al., 2008; Lahaye et al., 2011).The main contribution of our paper are in several aspects. First, we propose a systematicframework based on a new multivariate time series classiﬁcation technique to study the liquiditydynamics before stock price jumps. Compared to existing method, our approach takes into accountthe trading information along the time scale when evaluating the information level of trading-related attributes, so that the duration and the level of the abnormality can be better exploredand compared among attributes; on the other hand, our approach can evaluate the combinatorypower of the candidate attributes in recognizing the abnormal patterns as well as select a set ofjump indicators for trading-related pattern analysis of individual stocks. Second, in addition to thecommonly used liquidity measures, our experiment incorporates a set of technical indicators, whichhave low mutual dependencies with each other and with the liquidity measures from the perspectiveof time-series information. Then the whole set of candidate attributes allow to assess the tradingbehaviors from various perspectives. Third, based on our approach, we ﬁnd that although mostof the abnormalities are present just shortly before the occurrence of the jumps, some start muchearlier. This complement the literature on the study of starting point of the abnormal tradingmovements from the perspective of time series analysis. Forth, our study provides a generic setof jump indicators for abnormal pattern recognition, which can eﬀectively detect the stocks thathave extremely abnormal trading behaviors before price jumps.The remainder of this paper is organized as follows. Section 2 provides the preliminaries of thenew multivariate time series classiﬁcation technique. Section 3 details our proposed experimentalmethodology. Section 4 shows the experimental results while section 5 concludes.

2. Preliminaries: Mutual information-based multivariate time series analysis

Consider P random variables T S (1) , T S (2) , · · · , T S ( P ) , whose realizations are time series. As-sume we have a set of samples { X , X , · · · , X N } , where X n = ( ts (1) n , ts (2) n , · · · , ts ( P ) n ), n ∈{ , , · · · , N } , and ts ( p ) n is a realization of T S ( p ) , p ∈ { , , · · · , P } . Let C be a discrete classi-ﬁcation variable, with associate labels for each of the samples { c , c , · · · , c n } . C can take valuesin a ﬁnite set, but in this paper, only two class labels (1/0) are used.3 .1. Mutual information between time series To estimate the mutual information between two time series

T S ( p ) and T S ( q ) , consider a setof ”sliced” samples { ˜ X , ˜ X , · · · , ˜ X N } , where ˜ X n = ( ts ( p ) n , ts ( q ) n ) , n ∈ { , , · · · , N } is a reducedsample of X n with ts ( p ) n and ts ( q ) n being realizations of T S ( p ) , and T S ( q ) .Let ξ ( n ) be the distance from a sample ˜ X n to its k th nearest neighbor, where the distance Dist ( ˜ X m , ˜ X n ) between two samples X m = ( ts ( p ) m , ts ( q ) m ) and X n = ( ts ( p ) n , ts ( q ) n ) are deﬁned as themaximum value along the two variables Dist ( X m , X n ) = max { dist ( ts ( p ) m , ts ( p ) n ) , dist ( ts ( q ) m , ts ( q ) n ) } , where dist ( ts ( p ) , ts ( q ) ) is the distance between two time series ts ( p ) and ts ( q ) . Then one can countthe number ν ts ( p ) n of the time series ts ( p ) m , m ∈ { , , · · · , N } − { n } whose distance dist from ts ( p ) n is equivalent or less than ξ ( n ), i.e., ν ts ( p ) n = |{ m | m ∈ { , , · · · , N } − { n } , dist ( ts ( p ) m , ts ( p ) n ) < ξ ( n ) }| . Similarly, replace p by q , one can compute ν ts ( q ) n .Following the method of Kraskov et al. (2004) and Ircio et al. (2020), the mutual information I ( T S ( p ) ; T S ( q ) ) between two time series T S ( p ) and T S ( q ) is computed by I ( T S ( p ) ; T S ( q ) ) = Ψ( k ) + Ψ( N ) − k − (cid:16) N N (cid:88) n =1 (cid:0) Ψ( ν ts ( p ) n ) + Ψ( ν ts ( q ) n ) (cid:1)(cid:17) , (1)where Ψ( x ) is the digamma function, with Ψ( x ) = Ψ( x ) + x and Ψ(1) = − . To estimate the mutual information between a time series

T S and the classiﬁcation variable C ,consider the realizations { ts , ts , · · · , ts N } of T S , and their associated class labels { c , c , · · · , c n } .Let d ( n ) be the distance from a time series sample ts n to its k th nearest neighbors withinthe subset belong to c n class. Then one can count the number ν ts n of time series ts m , m ∈{ , , · · · , N } − { n } whose distance from ts n is equivalent or less than d ( n ), i.e., ν ts n = |{ m | m ∈ { , , · · · , N } − { n } , dist ( ts m , ts n ) < d ( n ) }| . Following the method of Ross (2014) and Ircio et al. (2020), the mutual information I ( T S ; C )4etween the time series T S and the classiﬁcation variable C is computed by I ( T S ; C ) = Ψ( k ) + Ψ( N ) − (cid:16) N N (cid:88) n =1 (cid:0) Ψ( ν c n ) + Ψ( ν ts n ) (cid:1)(cid:17) , (2)where Ψ( x ) is the digamma function as above and ν c n is the number of time series samples whoseclass labels are c n . An optimal set of features should contain a list of highly informative and minimally redundantfeatures. In our study, the minimum-Redundancy Maximum-Relevancy (mRMR) feature selectionalgorithm combined with mutual information is used to select the optimal set of time series fea-tures(Ircio et al., 2020). The goal of mRMR is to select a subset of features that maximize thescore function J ( s ) = (cid:88) s ∈ S I ( T S ( s ) ; C ) − | S | (cid:88) s ∈ S (cid:88) q ∈ Sq (cid:54) = s I ( T S ( s ) ; T S ( q ) ) , (3)where S is the index of the selected features. To achieve this goal, we apply a forward selectionsearch strategy (Meyer et al., 2008): at the ﬁrst step, the feature which has the largest mutualinformation with the class label, i.e., the one with the highest discriminating power, is selected;then at each of the following steps when the list of S (cid:48) features are selected, mRMR ranks all thefeatures T S ( q ) that are not selected according to I ( T S ( s ) ; T S ( q ) ) − (cid:80) q ∈ S (cid:48) I ( T S ( s ) ; T S ( q ) ), andselects the top ranked one. In the meanwhile, at each step, the value of the score function iscomputed, and the set of features is ﬁnally determined when the score function is maximized.

3. Proposed experimental methodology .2. Intraday jump detection

To explore trading patterns before price jumps, the ﬁrst key step is to detect accurately the jumpcomponents in the stock price series. Early studies use parametric models, such as GARCH-jumpor SV-jump models, to estimate jumps, which always involve uncertain pre-setting of model formsand complicate parameter estimation(Maheu and McCurdy, 2004; Eraker et al., 2003). Recentdevelopment of jump detection methods in using high-frequency data introduce a type of non-parametric estimation of the jump component in asset prices, allowing more accurate identiﬁcationof jumps at daily or even intraday levels (Barndorﬀ-Nielsen and Shephard, 2006; Jiang and Oomen,2008; Ait-Sahalia and Jacod, 2009; Corsi et al., 2010; Podolskij and Ziggel, 2010; Andersen et al.,2012; Lee and Mykland, 2008; Jiang and Zhu, 2017).Our study concentrates on the intraday jumps. LM test, proposed by Lee and Mykland (2008),is the most widely used technique to detect intraday jumps. After a trading day is divided into M of ∆ t -minute intervals, the LM test uses the following jump detecting statistic to examine thepresence of a jump in the i th interval of day t : L t,i = r t,i ˆ σ t,i , (4)where r t,i is the log return in this interval, and ˆ σ t,i is the estimated instantaneous volatility,computed by the realized bipower variation of the returns in previous K − σ t,i = 1 K − i − (cid:88) j = i − K +2 | r t,j − || r t,j | . (5)Lee and Mykland (2008) derives the following rejection region for the null hypothesis at a signiﬁ-cance level of α that no jump is present in this interval L t,i − C MT S MT > − log( − log(1 − α )) , (6)where C MT = (cid:112) M T ) c − log π + log(log( M T ))2 c (cid:112) M T ) , S MT = 1 c (cid:112) M T ) (7) c = (cid:112) /π and T is the total number of days.The usage of a 5-minute interval to detect intraday jumps is a consensus choice in the existingliterature(Bollerslev and Todorov, 2011; Liu et al., 2015; Wan et al., 2017), which represents a trade-oﬀ between maximizing statistical power and minimizing the eﬀect of microstructure noise(Caporinet al., 2017). The Chinese Stock Exchange opens from 9:30 am to 11:30 am and 1:00 pm to 3:00pm, with a total of 4 hours in a trading day. So a trading day is divided equivalently into 48 of5-minute intervals. Then the LM method is used to detect the jumps within each interval. We6hoose the previous interval window K = 240 for the detection following the suggestion of Lee andMykland (2008) and Wan et al. (2017). Table 1 lists the candidate attributes we use to describe the micro-trading behaviors, which canbe divided into two categories, liquidity measures and technical indicators. This set of attributeswas initially used in our previous study, where they were processed into tabular data to ﬁt intotraditional machine learners for jump prediction(Kong et al., 2020). In this paper, we will use thetime series of these attributes to study the micro-trading patterns. Formulas of these attributescan be found there and the computation of these attributes is summarized as follows.We still base our computation on the sequence of 5-minute intervals in each trading day. Theliquidity measures are simply computed for each stock within each interval, evaluating the marketquality based on the level-2 information during that 5-minute period. The technical indicators arecomputed at the end of each interval using the 5-minute sampled price and volume information,describing the dynamics of the market trend in the near past. There are 18 technical indicators,while 12 of them requires the parameters of lagged periods. The lagged periods here are hencethe number of lagged intervals, which are set to 5 and 20 to take account of the trends within ashorter and longer period. This actually gives us 30 technical indicators. To this end, the tradingbehaviors within a trading day is described by 40 attribute series of length 48, which are computedinterval by interval of a day.

Table 1: Attributes

Liquidity measures Technical indicatorsReturn r Price rate of change

P ROC ( q )Number of trades K Volume rate of change

V ROC ( q )Trading volume V Moving average of price

M A ( q )Trading size S Exponential moving average of price

EM A ( q )Trade imbalance T I

Bias to MA

BIAS ( q )Depth imbalance DI Bias to EMA

EBIAS ( q )Quoted spread QS Price oscillator to MA

OSCP ( q )Eﬀective spread ES Price oscillator to EMA

EOSCP ( q )Realized volatility RV Fast stochastic %K f K ( q )Cumulative return R Fast stochastic %D f D ( q )Slow stochastic %D sD ( q )Commodity channel index CCI ( q )Accumulation/Distribution oscillator ADO

True range

T R

Price and volume trend

P V T

On balance volume

OBV

Negative volume index

N V I

Positive volume index

P V I

In the technical indicators, q is the parameters of lagged period.7 .4. Attribute extraction To explore the “abnormal” patterns of the attributes before price jumps, the main experimentof our study is to compare the dynamics of the attributes between the jumping and non-jumpingsamples. For the jumping samples, the attribute series are extracted within a 4-hour windowbefore the occurrence of jumps. According to the literature, a 4-hour window is large enough toobserve the abnormal patterns before price jumps(Boudt and Pertitjean, 2014; Wan et al., 2017).For the non-jumping samples, the time series are extracted within steady days. The steady daysare deﬁned as the days without jumps within the prior or subsequent 5 days.It is noticed that some of the attributes have high idiosyncrasy (Podolskij and Ziggel, 2010;Boudt and Pertitjean, 2014). To compare across diﬀerent stocks, days and intraday times, theliquidity measures need to be standardized; for a similar reason, we also standardize the technicalindicators which has high idiosyncrasy. According to Boudt and Pertitjean (2014) and Kong et al.(2020), there are two standardization methods: one is performed by dividing each coordinateof the time series by the median of their data at the same time of the previous 60 days andsubtracting 1; the other is performed by subtracting the median of their data at the same time ofthe previous 60 days from each coordinate of the time series. In our study, 15 types of attributeshas high idiosyncrasy: U , K , S , RV , M A , EM A , T R , P V T and

OBV are standardized bythe ﬁrst method, while OI , DI , QS , ES , V ROC , N V I and

P V I , which are already ratios, arestandardized by the second one.Besides, to avoid missing information in time series, we delete four types of samples in thedataset: First, we do not consider the samples in the ﬁrst 60 days of each stock. Second, dueto the 10% price change limit rule in the Chinese stock market, all the sequential limit-ups orlimit-downs except the ﬁrst one are deleted. Third, any sample within a day after the suspensionof a stock is deleted. Forth, because of the implementation of a circuit breaker mechanism on theChinese stock market from 2016/01/04 to 2016/01/07, during which the whole stock market haltedseveral times, the samples between 2016/01/04 and 2016/01/08 for all stocks are deleted. Besides,dividends are subtracted from prices on dividend distribution dates, leading to a large change inprices, but they are not considered in our study either.

After the above data processing, we have obtained a group of time series for each attributebefore the jumping times and on steady days respectively for each stock. To better reveal thestatistical pattern of each stock, we then take the median values of the time series within eachgroup for each stock. The median values are computed at each time coordinate. Figure 1 providesa general view of the stock representation methodology in this section.8 igure 1: General view of the stock representation methodology

Prior to price jumps, we then have 189 time series respectively for each attribute. That is,before the occurrence of jumps, each stock is represented by a set of 40 time series. If we treat anattribute as a random variable, the 189 time series samples can be seen as its random realizations.Similarly, on steady days, the median value of the time series for each attribute is computed foreach stock. While the 4-hour information window under the jumping scenario can start from anytime of a day, we should allow any starting point for the time series on steady days. Therefore, toperform a fair comparison, we randomize the order of the time series one by one on steady days andthen compute their median values in each group. Then we have 40 time series to represent eachstock on steady days. That is, we have 189 “virtual” time series respectively for each attribute,which can also be recognized as 189 realization samples of each attribute on steady days.For the following analysis, we label the jumping and non-jumping samples as 1 and 0 respec-tively. To this end, we have 378 binary labeled samples { ( X , C ) , ( X , C ) , · · · , ( X , C ) } ,where C i ∈ { , } corresponds to the label of each sample, and X i = ( ts (1) i , ts (2) i , · · · , ts (40) i ), i ∈ { , , · · · , } is represented by the 40 attribute time series of length 48. Using the method described in section 2.3, one can select a set of attributes, called jumpindicators, which are highly related to the arrival of price jumps. The mutual information betweeneach attribute and the label variable can be computed following section 2.2, while the mutualinformation between two time series is evaluated as outlined in section 2.1. In these computations,the distance dist ( ts ( p ) , ts ( q ) ) between two time series is needed. There are several types of distances9or time series in literature and we use three typical ones in our study, including the Euclideandistance, the Chebychev distance and the distance based on dynamic time warping.Diﬀerent from dynamic time warping, both the Euclidean distance and the Chebychev distancetreat the time series as vectors. The Euclidean distance is simply the L2-norm distance betweentwo vectors, while the Chebychev distance takes the maximum coordinate diﬀerence between thetwo vectors. Dynamic time warping(DTW), on the other hand, allows the comparison between twotime series with varying lengths and speeds(Berndt and Cliﬀord, 1994). In our study, although theattribute series are always extracted with the same length, their diﬀerence in the speed of changeshould be considered. In general, DTW searches for an optimal match between two vectors sothat the Euclidean distances between their corresponding points is minimized. Besides, it has tocomply with the following rules: ﬁrst, every point in one sequence should be matched with one ormore points of the other sequence; second, the ﬁrst points from both sequence should be matched,and the same for the last points; third, the mapping of the points from one of the sequences tothe other must be monotonically increasing, and vice versa. Detailed implementation of the DTWalgorithm can be found in Berndt and Cliﬀord (1994).The estimation of mutual information based on the real data can be blurred by systematic errorsresulting from ﬁnite-size issue(Steuer et al., 2002). To minimize this error, we correct the mutualinformation values by subtracting a zero-baseline. Random noise should ideally has zero mutualinformation with any random variable. So as in Steuer et al. (2002), we randomly permutatethe realizations of the two variables, and compute the mutual information based on the surrogatepairs. This procedure is then repeated for many times (say 100) and the average of all the mutualinformation values are used as the zero-baseline. While the dynamics of the trading behaviors before price jumps is described by a set of jumpindicators for each stock, one might wonder how the pattern diﬀers from stock to stock and whethersome of the stocks share similar patterns before price jumps. Clustering the stocks according totheir jump indicator patterns can give a fast answer to this question, as well as to detect the stocksthat have idiosyncratic trading patterns before price jumps.Clustering is a task of partition the samples into several groups, where the samples in the samegroup are more similar to each other than to those in other groups. It is a widely used technique inpattern recognition and data mining problems. There are many types of clustering methods, such asconnectivity-based clustering, centroid-based clustering, density-based clustering, and grid-basedclustering. Diﬀerent from other methods, connectivity-based clustering, also known as hierarchicalclustering, do not provide a unique partitioning of the samples, but a dendrogram, showing how anextensive hierarchy of clusters merges with each other. The advantage of the method is that the10sers can choose a set of appropriate clusters by choosing a cutoﬀ of the inconsistency coeﬃcienton the linkages of the dendrogram. Besides, one needs to choose the linkage criterion for computingthe distance between clusters. In our study, the popular choice of “unweighted average linkage” isused.In addition, clustering analysis can be easily dominated by the attribute with an extremelylarge scale. So to minimize the scale diﬀerence among jump indicators, each of the indicator series( a , a , · · · , a T ) are normalized before the clustering analysis by the min-max method norma t = a t − minvmaxv − minv , where minv and maxv is the minimal and maximal value of a t taken over all stocks.

4. Empirical results

Following the method in section 3.2, we detect all the intraday jumps in the 5-minute sampledprice series of the 189 sample stocks. To be more strict on the detected jumps, we set a signiﬁcancelevel of 1%. Table 2 presents the total number of jumps counted over all the stocks as well as theiraverage sizes. We can see that the number of positive jumps is larger than that of the negativejumps; while the average sizes of the negative jumps are larger. It can be explained by the factthat in the Chinese stock market most of the players are retail investors, who are more tending tochase after rising prices than falling ones given the condition that the market does not allow shortsales; in the meanwhile, when prices are depreciating, the retail investors are more panic to shortthe stocks, resulting in larger negative jumps.

Table 2: Statistical description of detected jumps

Positive jumps Negative jumpsNumber 20734 11597Average return 0.0266 -0.0363

To explore the abnormal trading behaviors of individual stocks before price jumps, we extractthe attribute series within a 4-hour window, which is 48 intervals, before each intraday jump ofeach stock as outlined in section 3.4. Then prior to the occurrence of the jumps, each stock isrepresented by the median values of the attributes as outlined in section 3.5. To give a bird’s eyeview of the dynamics of all the attributes before price jumps, we plot the median value of the 189samples before price jumps as in Boudt and Pertitjean (2014) and Wan et al. (2017)(Figure 2-3).11 -48 -36 -24 -12-50510 10 -4 V -48 -36 -24 -1200.51 -48 -36 -24 -1200.51 -48 -36 -24 -1200.10.2-48 -36 -24 -12-0.100.10.2 -48 -36 -24 -12-0.2-0.100.1-48 -36 -24 -12-101 10 -4 -48 -36 -24 -12-101 10 -4 -48 -36 -24 -12-20246 10 -3 -48 -36 -24 -1200.51 PROC(5) -48 -36 -24 -12-0.100.10.2

PROC(20) -48 -36 -24 -12-0.200.20.4

VROC(5) -48 -36 -24 -120204060

VROC(20) -48 -36 -24 -12050100

MA(5) -48 -36 -24 -12-0.0200.020.04

MA(20) -48 -36 -24 -12-0.0200.020.04

EMA(5) -48 -36 -24 -12-0.0200.020.04

EMA(20) -48 -36 -24 -12-0.0200.020.04

BIAS(5) -48 -36 -24 -1201020 10 -4 BIAS(20) -48 -36 -24 -12-10123 10 -3 Figure 2: Dynamics of candidate attributes before price jumps. The solid line is the median values. The shadedregion represents the range between 5% and 95% quantiles. The x-axis is the index of the intervals before pricejumps.

48 -36 -24 -12405060-48 -36 -24 -12405060 -48 -36 -24 -12405060-48 -36 -24 -12-40-200204060 -48 -36 -24 -12050100-48 -36 -24 -120.020.040.060.08 -48 -36 -24 -12-0.0500.050.1-48 -36 -24 -12-202 -48 -36 -24 -12050100 fastpctD(20) -48 -36 -24 -12405060

Figure 3: Dynamics of candidate attributes before price jumps. The solid line is the median values. The shadedregion represents the range between 5% and 95% quantiles. The x-axis is the index of the intervals before pricejumps. r , V , K , RV , P ROC , V ROC , BIAS , OSCP , T R , f astpctK , f astpctD , spctD , CCI , exhibit signiﬁcantly abnormal surge justshortly before the occurrence of price jumps; while in most of the time, they maintain comparablystable. This might indicate that the abnormal movement only starts shortly before the occurrenceof price jumps. The ﬁgures for the liquidity measures are very similar to those observed by Boudtand Pertitjean (2014) and Wan et al. (2017), who also compared these dynamics with the dayswithout jumps through Mann-Whitney test. However, that conclusion is based on a point-by-pointanalysis, which does not take account of the characteristics of the whole time series; besides, itcan not compare the informative levels among the attributes. We believe that further statisticalanalysis based on time series need to be conducted to explore the abnormal dynamics of thecandidate attributes, for which we use the mutual information-based technique as follows.

Mutual information between each attribute and the label variable, as deﬁned in section 2.2,evaluates the levels of informative of each attribute with respect to the arrival of price jumps. Table3 gives the mutual information values of the 40 attributes based on the three types of distances and1, 3, or 5 nearest neighbors during the computation. To minimize the systematic errors resultingfrom ﬁnite-size issue, the zero-baseline has been subtracted as mentioned in section 3.6.There is no meaning to compare the mutual information values but the relative rankings ofthe attributes across diﬀerent distance or k parameter settings. Here a two-sided Wilcoxon signedrank test is adopted to compare the ranks of the attributes between any two of the nine scenarios.At the 1% signiﬁcance level, all the tests fails to reject the null hypothesis of zero median in therank diﬀerence. This indicates that the choice of the distance or the k parameters in general havelow inﬂuence in comparing the information level of the attributes.The boldfaced values in the table show that the trading volume V , the number of trades K , therealized volatility RV , the volume rate of change P ROC and the commodity channel index

CCI have very high information levels with respect to the arrival of price jumps. V , K and V ROC areall volume-related attributes, the signiﬁcant abnormality of which has also been observed in bothBoudt and Pertitjean (2014) and Wan et al. (2017), indicating a demand for immediate executionbefore price jumps. RV and CCI are all volatility related measures, related to which Wan et al.(2017) observed that the volatility is signiﬁcantly large before price jumps.On the other hand, Boudt and Pertitjean (2014) and Wan et al. (2017) have also investigatedthe dynamics of other liquidity measures, such as QS , ES , OI , DI , before price jumps. Accordingto the two studies, the abnormal movements of these measures are not very obvious. However,in our results, the information levels of OI and ES are moderate, while those of DI and QS arecomparably lower; nevertheless the abnormality of trading behaviors embedded in these attributes14 able 3: Information level of candidate attributes Euclidean Chebychev DTWNeighbors 1 3 5 1 3 5 1 3 51 r 0.36 0.32 0.30 0.32 0.28 0.24 0.23 0.21 0.212 V

10 R 0.47 0.43 0.41 0.46 0.41 0.37 0.57 0.55 0.5511 PROC(5) 0.51 0.50 0.38 0.47 0.38 0.37 0.48 0.41 0.4112 PROC(20) 0.50 0.43 0.37 0.43 0.45 0.40 0.52 0.52 0.5213 VROC(5)

14 VROC(20)

15 MA(5) 0.37 0.25 0.17 0.36 0.28 0.19 0.58 0.45 0.3516 MA(20) 0.31 0.22 0.16 0.34 0.26 0.20 0.57 0.40 0.3417 EMA(5) 0.35 0.25 0.18 0.33 0.26 0.22 0.57 0.44 0.3618 EMA(20) 0.35 0.21 0.15 0.34 0.25 0.21 0.54 0.40 0.3219 BIAS(5) 0.51 0.44 0.42 0.46 0.43 0.40 0.53 0.52 0.5120 BIAS(20) 0.58 0.54 0.51 0.57 0.51 0.50 0.64 0.62 0.6021 EBIAS(5) 0.56 0.52 0.51 0.46 0.47 0.47 0.57 0.56 0.5422 EBIAS(20) 0.58 0.52 0.51 0.55 0.49 0.47 0.62 0.60 0.6023 OSCP(5) 0.42 0.47 0.41 0.49 0.42 0.39 0.42 0.43 0.4324 OSCP(20) 0.51 0.47 0.44 0.53 0.48 0.46 0.58 0.55 0.5425 EOSCP(5) 0.56 0.52 0.50 0.46 0.47 0.48 0.56 0.57 0.5526 EOSCP(20) 0.55 0.50 0.49 0.52 0.47 0.46 0.59 0.58 0.5827 ADO 0.14 0.09 0.09 0.10 0.08 0.09 0.09 0.11 0.1228 TR 0.60 0.57 0.52 0.60 0.58 0.56 0.58 0.59 0.5929 fastpctK(5) 0.47 0.42 0.40 0.45 0.43 0.44 0.51 0.54 0.5130 fastpctK(20) 0.47 0.47 0.42 0.53 0.45 0.44 0.62 0.61 0.6031 fastpctD(5) 0.55 0.52 0.49 0.51 0.47 0.45 0.66 0.63 0.6232 fastpctD(20) 0.55 0.53 0.48 0.51 0.47 0.45 0.65 0.65 0.6433 spctD(5) 0.54 0.51 0.47 0.47 0.45 0.42 0.66 0.64 0.6334 spctD(20) 0.45 0.42 0.41 0.44 0.45 0.43 0.67 0.65 0.6435 CCI(5) 0.54 0.50 0.49 0.48 0.45 0.46 0.61 0.59 0.5836 CCI(20)

37 PVT 0.22 0.12 0.09 0.21 0.16 0.13 0.31 0.18 0.1338 OBV 0.15 0.09 0.04 0.16 0.07 0.06 0.40 0.20 0.1239 NVI 0.14 0.08 0.05 0.13 0.09 0.06 0.28 0.19 0.1340 PVI 0.19 0.09 0.04 0.24 0.14 0.10 0.36 0.21 0.10The attributes ranked within the top 10 lists under all types of distances are boldfaced.15an not be ignored.

It seems from Figures 2 and 3 that the 4-hour window is too long to detect the abnormaldynamics of the candidate attributes, as most of the abnormality occurs just shortly before thearrival of price jumps. To examine the conclusion, further investigation based on time seriesanalysis from the perspective of information theory is given as follows. Similar to Table 3, wecompute the mutual information between each attribute with the class label variable, but shrinkthe prior 4-hour window(48 intervals) to 3 hours, 2 hours, 1 hour or 30 minutes long (i.e., 36, 24, 12or 6 intervals). That is we evaluate the information level of each attribute within shorter periodsright before price jumps.There is no gold standard to choose the type of distance or the k parameter in k -nearestneighbors method (Ircio et al., 2020); but as veriﬁed in Table 3, the comparison among attributesdoes not alter very much with diﬀerent choices of these parameters. To be concise, we only presentthe results in the main text based on the Euclidean distance and the 3-nearest neighbors methodas in Table 4.According to the deﬁnition of the mutual information in section 2.1 and 2.2, if the signiﬁcantabnormality of an attribute occurs just shortly before the arrival of price jumps, ideally its infor-mation level within a shorter period of time should not diﬀer much from those within the largest4-hour window. One can observe from Table 4 that the information levels of most attributes,especially the highly informative ones, do not change very much as the prior window size shrinksfrom 48 intervals(4 hours) to 6 intervals(30 minutes). This indicates that a lot of the signiﬁcantabnormalities indeed occur only shortly before the arrival of price jumps, which is consistent withthe ﬁndings in existing literature (Boudt and Pertitjean, 2014; Wan et al., 2017; B¸edowska-S´ojka,2016). But there are still some attributes, which achieve much lower information levels withinshorter prior windows, such as DI , ES , QS , RV , M A , EM A , P ROC , T R . This implies thatthe abnormal movements of these attributes start much earlier and last for longer time until theoccurrence of price jumps.

After evaluating the information levels of individual attributes, it is natural to consider the dis-criminating power of a combination of these attributes. Under the context of pattern recognitionand machine learning, a combination of attributes tends to achieve better results. But it is impor-tant to check if there is high mutual dependency among the attributes. The mutual dependencycan be evaluated by the mutual information outlined in section 2.1. Figure 4 shows the mutualdependency levels of pairs of candidates based on diﬀerent distance and parameter settings. The16 able 4: Information level of candidate attributes in diﬀerent sized prior windows

Prior intervals -48 -36 -24 -12 -61 r 0.32 0.34 0.33 0.29 0.332 V 0.67 0.67 0.65 0.65 0.623 K 0.66 0.66 0.64 0.64 0.614 S 0.49 0.48 0.49 0.49 0.525 OI 0.34 0.35 0.31 0.29 0.296 DI 0.22 0.22 0.19 0.15 0.117 QS 0.22 0.21 0.20 0.14 0.128 ES 0.26 0.24 0.22 0.19 0.179 RV 0.67 0.66 0.63 0.60 0.5910 R 0.43 0.39 0.38 0.40 0.3911 PROC(5) 0.50 0.50 0.45 0.43 0.3612 PROC(20) 0.43 0.46 0.40 0.41 0.3813 VROC(5) 0.62 0.63 0.64 0.65 0.6514 VROC(20) 0.68 0.67 0.66 0.65 0.6615 MA(5) 0.25 0.23 0.19 0.10 0.1116 MA(20) 0.22 0.11 0.09 0.10 0.0517 EMA(5) 0.25 0.22 0.18 0.15 0.1318 EMA(20) 0.21 0.12 0.11 0.07 0.0819 BIAS(5) 0.44 0.47 0.51 0.51 0.4820 BIAS(20) 0.54 0.53 0.53 0.51 0.4921 EBIAS(5) 0.52 0.53 0.56 0.52 0.4922 EBIAS(20) 0.52 0.54 0.56 0.54 0.5423 OSCP(5) 0.47 0.47 0.49 0.43 0.3924 OSCP(20) 0.47 0.47 0.44 0.41 0.4425 EOSCP(5) 0.52 0.54 0.55 0.51 0.5126 EOSCP(20) 0.50 0.50 0.53 0.50 0.4927 ADO 0.09 0.07 0.08 0.07 0.0828 TR 0.57 0.56 0.56 0.49 0.4829 fastpctK(5) 0.42 0.45 0.48 0.48 0.4530 fastpctK(20) 0.47 0.46 0.47 0.48 0.4531 fastpctD(5) 0.52 0.52 0.53 0.51 0.4632 fastpctD(20) 0.53 0.52 0.49 0.45 0.4433 spctD(5) 0.51 0.50 0.47 0.43 0.4134 spctD(20) 0.42 0.40 0.37 0.38 0.4035 CCI(5) 0.50 0.53 0.56 0.53 0.5136 CCI(20) 0.54 0.51 0.52 0.50 0.4637 PVT 0.12 0.12 0.11 0.11 0.1238 OBV 0.09 0.07 0.06 0.03 0.0639 NVI 0.08 0.09 0.07 0.03 0.0140 PVI 0.09 0.08 0.10 0.04 0.02These mutual information values are computed based on Euclidean distance and 3-nearestneighbors method. 17

10 20 30 4010203040

Figure 4: Mutual dependency level between pairs of candidates based on diﬀerent distance and k -nearest neighborsmethod. Each block in the heatmaps represents the mutual dependency level of the two attributes whose index isdenoted on the x- and y-axis. nine heatmaps are very similar, and show that only a few of the attributes, such as M A ’s and

EM A ’s are highly dependent, while most of the attributes have moderate or very low levels ofmutual dependency.The next task is to select a combination of attributes, called jump indicators, that are highlyrelated to the arrival of price jumps. Because the abnormality of some attributes starts very earlyas concluded from Table 4, the feature selection technique, as presented in section 2.3, is performedon the set of attributes extracted within the prior 4-hour window. For comparison purposes, Table5 exhibits the nine sets of selected jump indicators based on diﬀerent types of distances and k values. It can be seen from the table that all sets of jump indicators include more than 70% ofthe candidate attributes, while the Chebychev and DTW distance sometimes even select almostall the jump attributes. There are up to 22 attributes that have been included in all the ninesets, which is in line with our previous conclusion that these attribute generally have low mutualdependency with each other. It is worthy to note that although some attributes have very lowinformation level, such as ADO , OBV or P V I , and etc., they still play a role in the jump indicatorset. Besides, even the

M A ’s and

EM A ’s are highly dependent, they are simultaneously selectedin most cases, indicating that complementing information can still be found in these attributes.

To cluster the stocks according to their trading abnormality before price jumps, the 189 jumpingsamples (that is the samples ( X i , C i ) with C i = 1) as obtained in section 3.5 are adopted, eachof which corresponds to a stock. To have a more robust clustering result with respect to diﬀerent18 able 5: Selected jump indicators with diﬀerent distances and k parameters Euclidean Chebychev DTWNeighbors 1 3 5 1 3 5 1 3 51 r √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √

10 R √ √ √ √ √ √ √ √ √

11 PROC(5) √ √ √ √ √ √ √ √ √

12 PROC(20) √ √ √

13 VROC(5) √ √ √ √ √ √ √ √ √

14 VROC(20) √ √ √ √ √ √ √ √ √

15 MA(5) √ √ √ √ √ √

16 MA(20) √ √ √ √ √ √ √

17 EMA(5) √ √ √ √ √ √ √

18 EMA(20) √ √ √ √ √ √ √ √

19 BIAS(5) √ √ √ √ √

20 BIAS(20) √ √ √

21 EBIAS(5) √ √ √ √ √ √

22 EBIAS(20) √ √ √

23 OSCP(5) √ √ √ √ √ √ √ √

24 OSCP(20) √ √ √ √ √ √ √ √

25 EOSCP(5) √ √ √ √

26 EOSCP(20)27 ADO √ √ √ √ √ √ √ √ √

28 TR √ √ √ √ √ √ √ √ √

29 fastpctK(5) √ √ √ √ √ √ √ √ √

30 fastpctK(20) √ √ √ √ √

31 fastpctD(5) √ √ √ √ √ √

32 fastpctD(20) √ √ √ √ √ √ √

33 spctD(5) √ √ √ √ √ √ √ √ √

34 spctD(20) √ √ √ √ √ √ √

35 CCI(5) √ √ √ √ √ √ √ √ √

36 CCI(20) √ √ √ √ √ √ √ √ √

37 PVT √ √ √ √ √ √ √ √ √

38 OBV √ √ √ √ √ √ √ √ √

39 NVI √ √ √ √ √ √ √ √ √

40 PVI √ √ √ √ √ √ √ √ √

Total 28 28 29 33 35 37 33 37 38The marker √ denotes the selected attributes in diﬀerent cases.19istances, the 22 jump indicators which are covered by all the nine sets of Table 5 are used torepresent each sample. Thus, each X i is now represented by 22 time series of length 48. Tominimize the eﬀect of scale diﬀerence, each of the indicator series are normalized by the min-maxmethod.Figure 4.6 shows the hierarchical clustering results on the stocks based on the three types ofdistance. In the ﬁgures, the objects are linked by upside-down U-shaped lines. The height of the“U” represents the distance between the two clusters it links. If a link is approximately the sameheight as the links below it, it is said to exhibit a high level of consistency; in the contrary, it issaid to be inconsistent with the links below it. In a cluster analysis, inconsistent links can providethe border of a natural division in a sample set. So from the following dendrogram, we can easilyﬁnd the individual stocks that are very diﬀerent from the majority. Table 6: Most distinct stocks detected by the clustering analysis

Euclidean Chebychev DTW19 157 1920 164 20157 186 157164 164177 177186 188In the ﬁgures, the most distinct stocks are includes in the cluster on the very right side of thetrees. Comparing the height of the very top link to the below ones in each ﬁgure, we can ﬁndthat the clustering analysis based on Chebychev distance recognizes a more diﬀerent cluster to themajority than the other two types of clustering. Table 6 shows the index of the most distinct stocksdetected by the three clustering analysis. Comparing to the Chebychev distance, the Euclideanand DTW distance detect larger and very similar sets of distinct stocks. Nevertheless, the stocksdetected by the Chebycheve distance are included in the other two sets. It is likely that theChebychev distance focuses on the maximal diﬀerence but ignores the small ones between timeseries; in this case, the Euclidean and DTW distance can capture more features of distinct stockswhen performing the clustering, resulting in a larger set of distinct stocks.Figure 4.6 gives, as an example, the attribute patterns of a stock (indexed by 157) before itsprice jumps, and shows how they are diﬀerent from those of most stocks. For parsimony, we onlypresent the dynamic patterns of the six attributes which have the highest discriminating power.In the eight plots, the solid lines represent the median attribute series of stock 157 before pricejumps, while the shaded regions correspond to the range between 5% and 95% quantiles of themedian attribute series of the stocks that are not present in Table 6. Comparing to most ofthe stocks, we notice that for this speciﬁc stock 157, the abnormal movements of the attributes V , K , BIAS (20),

CCI (20), start much earlier, which is about 2 hours (24 intervals) before the20 Hierarchical clustering based on Euclidean distance Hierarchical clustering based on Chebychev distance Hierarchical clustering based on DTW distance

Figure 5: Hierarchical clustering of the stocks RV , V ROC (5),

V ROC (20) and

T R predict the occurrence of theprice jumps in a slower pace: they all reach very high levels but only within about a prior 0.5 hours(6 intervals). Nevertheless, the strong abnormality of these attributes have well explained why thisstock is identiﬁed as a distinct stock by all the three clustering analysis from the perspective ofmicro-trading behavior before price jumps.

Figure 6: The trading patterns of a distinct stock (stock 157) before price jumps as an example. The solid line isthe median time series of the corresponding attribute of the speciﬁc stock. The shaded region represents the rangebetween 5% and 95% quantiles of the median time series of the stocks that are not present in Table 6. The x-axisis the index of the intervals before price jumps.

5. Discussion

Boudt and Pertitjean (2014) investigated the liquidity dynamics around price jumps in the U.S.stock market. Following that, B¸edowska-S´ojka (2016) and Wan et al. (2017) performed similarexperiments respectively on the Polish and Chinese stock markets. In these works, the Mann-Whitney tests on the jumping and non-jumping samples all showed some levels of abnormality inthe liquidity measures related to the occurrence of price jumps. In this paper, we introduce a newmethod based on multivariate time series classiﬁcation to better investigate the abnormal tradingpatterns before price jumps. This method can help to explore the “time-series” information in thecandidate attributes used to describe the trading behaviors and to select a set of attributes, calledjump indicators, to assist jump-related abnormal pattern recognition. In addition to the commonly22sed liquidity measures, technical indicators are further included in the candidate attributes todescribe the micro-trading behaviors from a diﬀerent perspective.Empirical study is conducted on the level-2 data of the constituent stocks of China SecurityIndex 300 (CSI 300). After sample preparation, each stock is represented by a list of attribute seriesunder two scenarios: before the intraday price jumps and on steady days. Evaluated by the mutualinformation, we found that some volume and volatility-related attributes, such as V , K , and RV ,exhibit the most signiﬁcant abnormality before price jumps, which is consistent with the ﬁndings inthe existing literature(Boudt and Pertitjean, 2014; Wan et al., 2017). It is worthy to mention thatthe choice of distance and the k parameter in mutual information computation in general have lowinﬂuence in comparing the information levels of diﬀerent attributes. By evaluating their mutualinformation values within shrinking windows, we also ﬁnd that even most of the attributes showabnormal movements just shortly before the occurrence of the price jumps, there are still some,such as QS , RV , M A , EM A , T R , start to move abnormally much earlier. Besides, the mutualinformation between time series shows that the attributes have low mutual dependency with eachother. Based on that, we then select eﬀectively a set of jump indicators and detect the stocksthat have extremely abnormal trading behaviors before price jumps using clustering analysis. Webelieve that the methodology we proposed as well as the empirical results obtained in our studysupplement the existing studies and provide a new perspective to investigate the micro-tradingbehaviors before price jumps in the ﬁnancial market.In future research, we will focus on the common and idiosyncratic micro-trading patterns ofindividual stocks and investigate the predictability of their price jumps using the jump indicatorswe selected. Based on that, high-frequency portfolio allocation and risk management strategiesrelated to the occurrence of stock price jumps can also be designed.