[PDF] Unraveling S&P500 stock volatility and networks -- An encoding-and-decoding approach

Abstract

Volatility of financial stock is referring to the degree of uncertainty or risk embedded within a stock's dynamics. Such risk has been received huge amounts of attention from diverse financial researchers. By following the concept of regime-switching model, we proposed a non-parametric approach, named encoding-and-decoding, to discover multiple volatility states embedded within a discrete time series of stock returns. The encoding is performed across the entire span of temporal time points for relatively extreme events with respect to a chosen quantile-based threshold. As such the return time series is transformed into Bernoulli-variable processes. In the decoding phase, we computationally seek for locations of change points via estimations based on a new searching algorithm conjunction to the Bayesian information criterion applied on the observed collection of recurrence times upon the binary process. Besides the independence required for building the Geometric distributional likelihood function, the proposed approach can functionally partition the entire return time series into a collection of homogeneous segments without any assumptions of dynamic structure and underlying distributions. In the numerical experiments, our approach is found favorably compared with Viterbi's under Hidden Markov Model (HMM) settings. In the real data applications, volatility dynamics of every single stock of S&P500 are computed and revealed. Then, a non-linear dependency of any stock-pair is derived by measuring through concurrent volatility states. Finally, various networks dealing with distinct financial implications are consequently established to represent different aspects of global connectivity among all stocks in S&P500.

Full PDF

UUnraveling S & P500 stock volatility and networks -An encoding and decoding approach

Xiaodong Wang and Fushing Hsieh Department of Statistics, University of California, Davis.

Abstract

We extend the Hierarchical Factor Segmentation(HFS) algorithm for discov-ering multiple volatility states process hidden within each individual S & P500stock’s return time series. Then we develop an associative measure to linkstocks into directed networks of various scales of associations. Such networksshed lights on which stocks would likely stimulate or even promote, if notcause, volatility on other linked stocks. Our computing endeavors startingfrom encoding events of large return on the original time axis to transformthe original return time series into a recurrence-time process on discrete-time-axis. By adopting BIC and clustering analysis, we identify potentialmultiple volatility states, and then apply the extended HFS algorithm onthe recurrence time series to discover its underlying volatility state process.Our decoding approach is found favorably compared with Viterbi’s in ex-periments involving both light and heavy tail distributions. After recoveringthe volatility state process back to the original time-axis, we decode andrepresent stock dynamics of each stock. Our measurement of associationis measured through overlapping concurrent volatility states upon a cho-sen window. Consequently, we establish data-driven associative networks forS & P500 stocks to discover their global dependency relational groupings withrespect to various strengths of links.

To discover the mystery of the stock dynamics, ﬁnancial researchers focus onstock returns or log returns. Consider a time series of the price of a stock X t .1 a r X i v : . [ q -f i n . S T ] J a n lack and Scholes[1] proposed in their seminal work to use stochastic pro-cesses in modeling stock prices. A geometric Brownian motion (GBM) wasemployed implying that the log-returns are assumed normally distributed. log X t X t − ∼ N ( µ, σ ) Merton[2] extended Black and Scholes’ model to involve time-dependentparameters aﬀording the serially correlation. Beyond normal distribution, abroader category of models, a general Levy process or particular geometricLevy process model[11], is an appropriate alternative taking account of stabledistribution with heavy tails. Since the independent increments property ofBrownian motion or Levy process, returns over disjoint and equal-lengthtime period still remains i.i.d, so its distribution is invariant to time scaleand to time period.However, it is well known that the distribution of returns is completelydiﬀerent in various volatility stages in stock price. These models may failto capture more extreme price movements[3] since the constant volatility inthe model assumption is not satisﬁed.In order to incorporate stochastic volatility in stock price, a regime-switching or hidden-state process was proposed to govern the dynamics.That is to say, regime-switching models can represent the the stock pro-cess changes between low-volatility state and a more unstable high-volatilityregime, and each regime is characterized by diﬀerent model or distributions.Under assumption of Markov chain for regime switching, Hidden MarkovModel(HMM) was implemented and appeals increasingly to researchers dueto its mathematically tractability.In the setting of simple Markov assumption, parametric approach and pa-rameter estimation procedures are discussed comprehensively. Hamilton[4]described AR and ARCH-type models under Markov regime-switching struc-ture. Hardy[3] oﬀered Markov regime-switching lognormal model by assum-ing diﬀerent normal distribution within each state. log X t X t − | s ∼ N ( µ s , σ s ) where s indicates the hidden states, s = 1 , , ... . Fine et al.[5] developedhierarchical HMM by putting additional sources of dependency in the model.So, every state is composed of substates with multiple levels of dependencies.So far, Markov process is still commonly assumed to govern the hiddenstates of a system and their evolution. Researches on non-parametrics arescarce due to its lack of tractability and involvement of more parameters[12].2ushing et al.[7, 8] developed a non-parametric decoding approach named Hi-erarchical Factor Segmentation(HFS). It was proposed to decode the hiddenstates underlying a lengthy discrete time series without assuming MarkovChain. After categorizing the time series with a certain threshold cut, a0-1 binary process is obtained with 1 indicating an event moment in whichthe value is up or below the threshold, and 0 meaning non-event stamp viceverse. The idea of HFS is to segment the whole time series into station-ary regimes and ﬁt geometric distributions with diﬀerent event-intensity indiﬀerent regime. With a choice of parameters, the discrete process is conse-quently partitioned into alternating sequences of high and low event-intensitysegments. Finally, AIC or BIC is implemented for model selection.However, HFS is only designed to decoding discrete processes with 2-hidden-states embedded and it is still untouched in the continuous-distributionprocesses with multiple potential states, which is more realistic in reality. Inthis paper, we propose an encoding-and-decoding approach to handle thehidden-state model in a nonparametric fashion. After applying a sequentialencoding schedule, the time series is transformed into a series of binary pro-cesses. And then HFS decoding schedule is implemented, so the potentialversatility states is discovered. It is remarked that our approach is totallydata-driven, and it works without assuming underlining Markov Chain orany conditional distribution.In our numerical experiments, it shows that our HFS is more favorablethan Viterbi’s in decoding hidden states even under HMM settings, par-ticularly when the transition probability is low. In the end, tick-by-tickS & P500 index data in 2006 is fully analysed. A network is created with as-sociation measured through overlapping concurrent volatility states upon achosen window. The paper is constructed as following: in Section2, we pro-posed a new version of HFS, which can handle multiple states decoding; inSection3, we described the non-parametric approach in modeling continuous-distribution processes; in Section4, a real data experiments of S & P500 indexhas been done.

To investigate stock dynamics, we consider volatility as a temporal aggrega-tion of rare events which have large absolute returns. Consider the recur-sion time or waiting time between successive returns with extreme returns,Chang et al.[10] proved that the empirical distribution of recursion time3onverges to a geometric distribution in probability, when the observationsare i.i.d, or more generally, in exchangeable join distributions. Under theframework of regime-switching model, each regime can be considered as astationary time period with distribution exchangeable. Motivated by that, ageometric-switching model is established. Geometric distributions with dif-ferent emission or intensity probabilities are adopted to model time series indiﬀerent volatility period, such as high-volatility and low-volatility. Here, thepotential assumption we made is the invariance of recursion time distribu-tion embedded in a regime. Temporal correlations within a regime are out ofour consideration here. The assumption may hold when the duration of eachregime is short enough. But it requires much more underline regimes and thevolatility could vary in extremely high frequency, which makes the model re-ally complex and non-traceable. Based on abundant permutation tests with30-s, 1-min, 5-min return scales, it is claimed in [9] that returns should onlybe consider exchangeable within a short time period, which should not belonger than, for example, 20 minutes. On the other hand, we expect longerduration of volatility period to aﬀord more samples such that our geometricdistribution can get a goodness-of-ﬁt.The threshold used to deﬁne “large” returns or rare events plays an impor-tant role here. If its value is set very extreme, only extremely large returnscan get ﬁltered out. So, the decoding problem is relatively simple in term ofthe number of hidden states. Usually, two-states decoding can work this sit-uation well in diﬀerentiating extreme-volatility and non-extreme-volatilityregimes. But, useful information can get lost with the sacriﬁce of samplesize. While, if the value of threshold is set very small, a complicated case isneeded to be handled with more hidden states in higher switching frequency.Actually, there is an inevitable trade-oﬀ between the magnitude of the rareevents and the sample size.Specially, a recursion process C ( t ) is deﬁned as C ( t ) = (cid:40) log X ( t ) X ( t − ≤ l or ≥ u otherwise where l and u are, for instance, the 0.1 and 0.9 quantile of log returns,respectively. Obviously, many generalizations are possible containing singletale excursions, say l = 0 is for upper tail excursions and u = 1 is for lowertail excursions.One simple way to deﬁne a “large” return or an event is to choose a recur-sion line π that can heuristically make the emission probabilities as diﬀerentas possible. This heuristic diﬀerence can be expressed using standard de-4iance of estimated emission probabilities ˆ p . Precised discussion about thechoice of excursion line is shown in the next section. π ∗ = argmin π std (ˆ p ( π )) The hierarchical factor segmentation(HFS) was originally developed for ananimal behavior study[7], and it was applied to discover stochastic volatilityphases from stock dynamics[8]. Given a chosen event and using hierarchicalcoding scheme, HFS is designed to partition the time series into alternatingsequences of high and low event-intensity segments. The original HFS canonly work for 2-states partition. In this section, we generalized HFS todecoding multiple states phases.Denote the number of internal states is m , m ≥ , the algorithm isdescribed as following. Alg.1

HFS-multiple states decoding1.Deﬁne events and code the whole process as a 0-1 digital string { C ( t ) } nt =1 with 1 as an event and 0 as a non-event point.2.Calculate the event recurrence time of { C ( t ) } nt =1 and denote the sequenceof recurrence time as { R ( t ) } t .3.Choose a sequence of threshold T = ( T , T , . . . , T m − ) where T < T <... < T m − . Transform { R ( t ) } t sequence into a 0-1 digital strings { C i ( t ) } t , i = 1 , ..., m − via the second-level coding scheme: C i ( t ) = (cid:40) R ( t ) ≥ T i otherwise { C i ( t ) } t , take code digit 1 as another new event andrecalculate the event recurrence time sequence { R i ( t ) } t , i = 1 , ..., m − .5. Choose another threshold value T ∗ to transform { R i ( t ) } t into anotherdigital string, for i = 1 , ..., m − :If a recursion time R i ( t ) ≥ T ∗ , then pick the corresponding time period in C i ( t ) and denote it as a segment Seg i , Seg i ⊂ { , ..., n } .6. The ultimate internal state is deﬁned as S = Seg , S = Seg \ Seg ,..., S m − = Seg m − \ Seg m − , S m = { , ..., n }\ Seg m − .5he pair of partition parameters ( T i , T ∗ ) achieves an aggregating par-tition between high event-intensity and low event-intensity segments forthe recursion time { R ( t ) } t . And the intensity level is controlled by T i , i = 1 , ..., m − . If T i is taken its maximum T m − , the partition is designed toseparate an extreme-high-intensity segment from other relative low-intensitysegments. Decreasing the value of T i from T m − to T to implement a seriesof partitions, so the multiple intensity levels of phases can get achieved.In the second level of recursion time calculation, we have the recursiontime { R i ( t ) } t of { C i ( t ) } t . If R i ( t ) is above T ∗ , it reﬂects a long period oftime in which there is no event happens in C i ( t ) . We denote this waiting-time period (time between the two events in C i ( t ) ) as a low-intensity phase Seg i . So, for a given T i , T ∗ is set to decide which phases having relativelylow intensity, and the rests are high-intensity phases. It is noticed that Seg j ⊂ Seg i , for j > i . It is because if a recursion time R ( t ) is greaterthan T j , it is still greater than T i . After applying the same parameter T ∗ to discover the low-intensity segments, the corresponding low event-intensitytime period Seg j obtained by T j is wider and contains Seg i . Multiple volatility phases of time series S i , i = 1 , ..., m are computed andachieved by applying HFS. Assuming join distribution is exchangeable withineach volatility phase, the recursion distribution R ( t ) t ∈ S i converges to geo-metric distribution with parameter p i as the number of recursions goingto inﬁnity. p i is considered as an emission probability of an event of alarge return. Maximized Likelihood Estimation(MLE) and Method of Mo-ment(MOM) give the same estimation, ˆ p i = 1 (cid:80) t ∈ S i R ( t ) (1)for i = 1 , ..., m , say recursion time starting from 0. HFS actually advocatesa way to partition the process into alternating segments with m event inten-sities. With enough sample size, geometric distributions are appropriate tomodel the m phases with ˆ p i as an estimation of the event intensity.There are m parameters involved in total, say θ = T , ..., T m − , and T ∗ .Model selection can be done via information criteria, like AIC and BIC. Theresultant Loss function can be written as,6 oss ( S θi ) = − m (cid:88) i =1 [ (cid:88) t ∈ S θi C ( t ) log ˆ p i + (cid:88) t ∈ S θi (1 − C ( t )) log (1 − ˆ p i )] + kN (2)where C ( t ) is a 0-1 discrete process after applying a event line; m is thenumber of embedded phases or hidden states; N is the total number ofalternating segments, N ≥ m ; k is a coeﬃcient of the penalty term. Forinstance, k = 1 is the AIC model selection criterion, while k = log ( n ) is theBIC criterion. Indeed, k can be chosen based on empirical experiments. Inthis paper, we consistently use BIC in all the experiment.The optimal parameters θ ∗ should be tuned such that the Loss canachieves its minimum. θ ∗ = argmin θ Loss ( S θi ) (3)Currently, grid-search is implemented to solve this non-convex problem.The time complexity is O ( nG m ) , G is the number of grids. Experiments are done in two scenarios. In the ﬁrst case, data is generatedaccording to Hidden Markov Model(HMM) with 2 hidden states. Decodingerror rate is calculated to empirically compare HFS algorithm and Viterbipath. In the second case, data is simulated under a regime-switching modelwith Markov property violated. As an example of multiple decoding, 3 hid-den states are embedded in the underline model.For the application of Viterbi path, since the true transition probabil-ity A and emission probability B are unknown in reality, EM algorithm inBaum–Welch type[6] is ﬁrstly implemented to estimate the parameters. Gen-erally, the initial conditions are chosen randomly. Prior information aboutthe parameters would help if it is available. Here, true emission probabilityinitials are input, but with two kinds of transition probability initials- one isthe true transition probability and the other is a random initial A . A = (cid:20) . . . . (cid:21) For convenience, we set p = p in transition probability matrix A bywhich data is generated, so A is controlled by only one parameter. A = (cid:20) − p p p − p (cid:21) A and emission B = ( p , p ) . The experiments are repeated for 100 times,and the mean and standard deviation of decoding error rate are reported inTable1.It shows that the decoding errors are getting better when p is lower.And HFS is extremely improved when p decreases to . . The reasoncan be explained by the loss function of HFS. The criterion, BIC, favoursa “simple” model with fewer alternating segments, and a long extension foreach segment improves the goodness-of-ﬁt of geometric distribution. In thissettings, the error rate of HFS gets slightly worse when sample size is greater.In summary, without assuming any prior knowledge, HFS is competitive withViterbi’s even under Markov models, a unfair situation for HFS. And HFS ismore favourable especially when the transition probability is low. Besides,the choice of initials rises a big challenge to the stability of Viterbi’s orBaum–Welch’s in real application.Table 1: Decoding Error Rate p1=0.2, p2=0.1 p1=0.1, p2=0.05Viterbi( A ) Viterbi( A true ) HFS Viterbi( A ) Viterbi( A true ) HFSp12=0.005 N=2000 0.4358 ( ) 0.2378 (0.114) (0.093) 0.4550 ( ) 0.3726 (0.110) (0.090)N=5000 0.3940 (0.054) ) (0.077) 0.3171 (0.097)p12=0.01 N=2000 0.4405 ( ) (0.069) 0.3376 (0.109) 0.4722 ( ) (0.082) 0.3710 (0.089)N=5000 0.4504 ( ) (0.058) 0.3584 (0.048) 0.4889 ( ) 0.4219 (0.076) (0.078)p12=0.05 N=2000 ( ) 0.4526 (0.045) 0.4518 (0.034) 0.4718 ( ) 0.4825 (0.037) (0.051)N=5000 0.4398 ( ) (0.051) 0.4752 (0.029) 0.4751 (0.021) 0.4832 (0.033) ( )p12=0.1 N=2000 (0.019) 0.4543 (0.034) 0.4728 ( ) (0.025) 0.4778 (0.032) 0.4744 ( )N=5000 ( ) 0.4650 (0.026) 0.4920 (0.025) ( ) 0.4792 (0.031) 0.4853 (0.022) In the second scenario, We generate the the data which violates theMarkov property. The data (log returns) { Y ( t ) } t =1 is generated with 3hidden states embedded and 8 alternating segments over time: S ( t ) =  t ∈ [1 , , [4001 , , [7001 , t ∈ [1001 , , [3001 , , [6001 , t ∈ [2001 , , [5001 , The order of the hidden states is alternating like , , , , , , , andeach segment extends for 1000 time units, see Fig.1. Data points are sim-ulated based on a conditional distribution given a hidden state. Normaldistribution with mean 0 and a heavy-tail case, Student-t distribution, areconsidered. 8 ( t ) ∼  N (0 , σ ) S ( t ) = 1 N (0 , σ ) S ( t ) = 2 N (0 , σ ) S ( t ) = 3 where standard deviations σ = 1 , σ = 2 , and σ = 3 . Y ( t ) ∼  t ( df ) S ( t ) = 1 t ( df ) S ( t ) = 2 t ( df ) S ( t ) = 3 where degree of freedoms df = 1 , df = 2 , and df = 5 in the simulation.The simulated raw data and the decoding results are shown in Fig.1. Therecursion events are deﬁned by formula. In the Normal distribution setting, | u | = | l | = 2 which is approximately 0.9 quantile of the data; In the t dis-tribution setting, a larger value is used | u | = | l | = 3 which is approximately0.95 quantile. − − − − State1 State2 State3 State2 State1 State3 State2 State1 State1 State2 State3 State2 State1 State3 State2 State1 (A) (B)

Figure 1: Data is simulated via conditional distribution given a hidden state:(A) Normal distribution (B) student-t distribution. 8 underline phases al-ternates over time where 3 kinds of hidden states are embedded. The redline in (A) indicates the recursion event lines.From Fig.2, HFS gives a good result in decoding time series when Markovproperty is violated. The error only appears around the time when the phaseschanges. Besides, we can also compare the estimated emission probability ˆ p = ( ˆ p , ˆ p , ˆ p ) with the ground truth. In the simulation with normal distri-bution, ˆ p = (0 . , . , . and ˆ p = (0 . , . , . in the9

500 1000 1500 2000 time sequence r e c u rr en t t i m e da t a time − − (A) (B) time sequence r e c u rr en t t i m e − − t a (C) (D) time Figure 2: Simulation with Normal distribution(A)(B), student-t distribution(C)(D). (A),(C): Recursion time; (B),(D): raw data with colored decodingstates. “red”, “yellow”, and “pink” 3 colors indicates 3 diﬀerent kinds of states.simulation with t distribution. They are close to the true parameter ∗ (Ψ ( − , Ψ ( − , Ψ ( − . , . , . , and ∗ ( F t ( − , F t ( − , F t ( − . , . , . . Continuous observation hidden model

As described in Section2, the choice of threshold in deﬁning an event or largereturns is of vital importance. The decoding result will have diﬀerent mean-ing with diﬀerent choice of threshold. It is usually set very high in order tocatch the extreme movements. One natural question to ask is that what ifthe threshold is ﬂuctuated, is the decoding result of the 0-1 processes stillstable? The answer is positive. For example, if an event is deﬁne when avariable is below a threshold π , the intensity parameter in geometric distri-bution is p s ( π ) = F s ( π ) given a state index s . By assuming the continuityof the truth distribution F s , p s is also continuous with π . So the emissionprobability of a hidden states would not ﬂuctuate much if π varies slightly.Indeed, our experiment shows that our estimated emission probability is alsostable with π .The idea of dealing with stochastic process with continuous observation isdescribed in Alg.2. We ﬁrstly switch the recursion line from a extreme valueto 0 to discretize the time series into a sequence of 0-1 binary processes.Then implement HFS for all processes and record the estimated emissionprobability. As a consequent result, a vector of probability p ( π ) with diﬀerentchoice of π is obtained for each time stamp. It actually gives an estimation fora Cumulative Distribution Function (CDF), ˆ F ( π ) = ˆ p ( π ) . As a continuousexample of the simulated data with t distribution, Fig.3 shows a series ofempirical CDFs varying over time. Alg.2

Encoding-and-DecodingFor a threshold π in Π :Deﬁne events and code the whole process as a 0-1 digital string { C π ( t ) } nt =1 , C π ( t ) = (cid:40) log X ( t ) X ( t − ≤ − π or ≥ π otherwise ( symmetric ) Or C π ( t ) =  log X ( t ) X ( t − ≤ π if π < log X ( t ) X ( t − ≥ π if π > otherwise ( asymmetric ) Repeat step 2 to 6 in Alg.1 11stimate emission probability ˆ p π ( t ) , t = 1 , ..., n by equation (1)End For − − . − − . − − . − − . − − . . . . . . Color Keyand Histogram C oun t t i m e Figure 3: For t-distributed simulation, eCDF for time point from 1998 to2003. Data from 1998 to 2000 follows t distribution with df 2; data from2001 to 2003 follows t distribution with df 5It looks surprising that how a eCDF can get returned based on the onlyobservation at each time point. Indeed, the eCDF is diﬀerent here. Becauseof the limitation of data information, increasing the number of line cut π would not give a good estimation to the distribution. The estimated proba-bility here is also in relative term. It depends on how well HFS can separatethe states and at what intensity levels. If there exists a π by which all theconditional distribution can be separated well, the decoding can achieve agood result via HFS. However, if the recursion lines is chosen not appropri-ate, for example, if π is set close to 0, or the emission probability is closeto 0.5, HFS would fail to separate the distributions. It is why there is a big12jump” in the middle of the eCDFs.In summary, the continuous decoding algorithm can be applied by shift-ing π value from high to low to obtain a sequence of eCDFs, although onlyfewer π are meaningful in sense of decoding. It actually gathers the decod-ing result for several discrete processes. The aggregated information sheds alight for us in diﬀerentiating the embedded hidden states. As the result of Alg.2, a sequences of eCDFS is returned. The next question ishow to summarize the information among all n eCDFs, or how many type ofstates or distributions can represent the patterns of distribution switching. Itis also a realistic problem in hidden states models, including HFS and HMM,to decide the number of hidden states. Generally speaking, the more statesis taken into consideration, the less tractability it would be. For example,a 2-state lognormal Markov Model contains 6 parameters, while a 3-statemodel has 10. Researchers can try the simplest model at beginning, andthen increase the number of states to explore the goodness-ﬁt model.Rather than starting at the simplest, our continuous observation decod-ing provides a way to consider as many as possible states at ﬁrst. For exam-ple, n eCDFs are returned via our approach. After that, clustering analysisis implemented to combine states with comparable distributions. Thus, wetransform the problem to a clustering problem- how many clusters embeddedin the n probability vectors or eCDFs.To visualize the clustering structure, Hierarchical clustering Tree is ap-plied on probability vectors (eCDFs), see panel (A) in Fig4. We continue touse simulated data with t distribution mentioned in section2. The numberof clusters can be chosen based on tree heights of the dendrogram. 2 or 3clusters may be embedded inside. Cutting the dendrogram into 3 clusters,we show the cluster index in the original time line in panel (B). Surprisingly,the clusters represents the alternating hidden states very well. If 2 clustersis taken rather than 3, it also makes sense that “State2” and “State3” twonon-high intensity states are combined together as a comparison to “State1”.Take a vector of mean probability values as a representative eCDF fromeach cluster. The 3 eCDFs from 3 clusters are shown with the true CDFs inpanel (C)-(E). It shows that the mean value from each cluster gives a goodestimate to the truth distribution.Without the prior knowledge of 3 state embedded in the time series, ifwe just implement 2-state or 4-states decoding schedule, the clustering resultlooks very similar. And a 3-cluster clustering result can also represent the13 H e i gh t . . . time c l u s t e r i nde x (A)(B) −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F (C) Cluster1 (D) Cluster2 (E) Cluster3 Figure 4: 3-states continuous-distribution decoding in simulation data.(A) Hierarchical Clustering Tree; (B) cluster index switching over time;(C),(D),(E): median eCDFs versus true CDFs, in cluster 1,2,3, respectivelyhidden sates well. The clustering result for 2-state decoding is shown inFig.5. There is a short period (between 1000 and 2000) in which “State2”is misclassiﬁed as “State3”. The result of 4-state decoding is accurate shownin Appendix Fig.10. In short, the clustering result is stable with the priorchoice of number of hidden states.

In this section, we explore tick-by-tick S & P500 stock index in year 2006as an example of real data application. We focus on a return process in14 H e i gh t . . . time c l u s t e r i nde x (A)(B) −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F (C) Cluster1 (D) Cluster2 (E) Cluster3 Figure 5: 2-states continuous-distribution decoding in simulation data.(A) Hierarchical Clustering Tree; (B) cluster index switching over time;(C),(D),(E): median eCDFs versus true CDFs, in cluster 1,2,3, respectivelya market time scale which is measured by market activity rather than thereal time clock. Actually, the idea has been suggested ﬁrst by [14], andthen worked thorough by [15]. One well known example is the random-walk model suggesting that the variance of returns depend on the number oftransactions. Following the idea above, we apply the tiniest sampling rate-one per transaction. So, our returns are in an order of transaction.

We particularly focus on tick-by-tick IBM monthly stock index as an ex-ample. Our continuous observation decoding algorithm is implemented to15iscover the stock dynamics. As illustrated in section3, our approach is notsensitive to the choice of the number of hidden states. And the numbercan be determined by clustering analysis further. Considering the computa-tion pressure, we apply a 2-state decoding schedule in Alg.2 and k-means inclustering volatility phases. It turns out that there are 3 potential clustersembedded in the stochastic process.Take the mean of eCDFs as a representative from each cluster, shownin Fig.6. Indeed, cluster1 and cluster3 presents two type of extreme phases.The distribution of cluster3 with a heavier tail reﬂects a high-volatility phasecompared with that of cluster1; while, cluster1 indicates a phase with novolatility. As a phase in the middle stage, cluster2 shows an asymmetricdistribution with a heavy tail on the left but light tail on the right. However,the other two cluster looks more symmetric in the sense of distributions. Itshows that an asymmetric distribution could embed in real, which is usuallymissed by ﬁnancial researches. −6e−04 −4e−04 −2e−04 0e+00 2e−04 4e−04 6e−04 . . . . . . threshold e CD F cluster1cluster2cluster3 Figure 6: IBM stock in January 2006: the mean functions from 3 clusters ofeCDFsWe can also present volatility dynamics by showing the cluster indexvarying over the market time. The stochastic dynamics of IBM stock inJanuary 2006 are shown in Fig.7. Corresponding to the previous notion,cluster1 indicates a low volatility phase, cluster2 is for a median stage, andcluster3 presents a high volatility phase. Based on the day time segments,it is obvious that the volatility mostly appears at the beginning of a stock16arket, and usually shows up twice or three times per day. . . . transaction c l u s t e r i nde x Jan03 Jan04 Jan05 Jan06 Jan09 Jan10 Jan11 Jan12 Jan13 Jan17 Jan18 . . . transaction c l u s t e r i nde x Jan18 Jan19 Jan20 Jan23 Jan24 Jan25 Jan26 Jan27 Jan30 Jan31

Figure 7: IBM stock in January 2006: a sequnce of volatility stage changingover transactions & P 500 stock networks

Beyond detecting the volatility dynamics for a single stock index, we furtherconsider the interaction among all the S & P500. In ﬁnance, measuring depen-dency between stock time series has become the subject of research interest.Many work about stock relationship in ﬁnance market is based on correla-tion. In order to cover non-linear dependency, Mutual Information(MI), asan alternative, quantiﬁes the average Shannon’s entropy shared by two vari-ables. Another non-parametric measurement, Transfer Entropy(TE)[17, 18]extending of the predominant concept of Granger causality[16], is proposedto measure the reduction of uncertainty by Shannon’s entropy in forecastingthe destination variable due to past value of the source variable.

T E X − >Y = (cid:88) P ( Y t , Y ( t − L ):( t − , X ( t − L ):( t − ) log P ( Y t | Y ( t − L ):( t − , X ( t − L ):( t − ) P ( Y t | Y ( t − L ):( t − ) Comparing with the lead-lag relationships between time series in TE, MIfocuses on the current time. MI can be thought of a special case of TE when L = 0 (no lag eﬀects). M I = (cid:88) P ( Y t , X t ) log P ( Y t | X t ) P ( Y t ) X ∗ t = max { X ( t −(cid:98) w (cid:99) ):( t + (cid:98) w (cid:99) ) } T E ∗ X − >Y = (cid:88) X ∗ t P ( Y ∗ t = 3 , X ∗ t ) log P ( Y ∗ t = 3 | X ∗ t ) P ( Y ∗ t = 3) where w is the window length, say w is 5 seconds; X ∗ and Y ∗ are theprocesses after taking max-window. Compared to TE, this measurementcan take the neighbor of a time (lag and lead values) into account instead ofonly lag eﬀect. The interpretation is that how much uncertainty it is aﬀecteddue to X values nearby such that Y is in its volatility phases (state3). Thehigher the value, the bigger the inﬂuence from X to aﬀect volatility phasesof Y.Our goal of this experiment is to explore how the 500 stocks aﬀect eachother in the volatility phases. We apply Alg.1 with high quantiles l = 0 . and u = 0 . to catch the extreme volatility phases. Hidden sates get in-volved to describe the level of volatility, 1 for non-volatility, 2 for a medianstage, and 3 for volatility. It is shown in Fig.8 (A) that these two stocks(MXIM & NTAP) share a large intersection in volatility phases. Especially,when MXIM is in its volatility (State3), the price of NTAP has a high prob-ability to be volatile. The value of the dependency

T E ∗ from MXIM toNTAP is 0.039 and 0.016 in reverse. The second example in (B) shows thatthe volatility of TWX would aﬀect that of BRCM. T E ∗ from TWX to BRCMis 0.036 and 0.026 in reverse.Once T E ∗ is calculated for all the pairs from S & P500 in 2006. A directed18 e+00 1e+05 2e+05 3e+05 4e+05 5e+05real time(secs)

State1State2State3

MXIMNTAP (A)

State1State2State3

TWXBRCM (B)

Figure 8: sequence of hidden states in real time: (A) MXIM v.s NTAP; (B)TWX v.s BRCMnetwork is established with an edge as dependency association. Via a largethreshold ﬁltering out weak connections, a sparse network with highly asso-ciated stock is established, see networks in January 2006 in Fig.9. It showsthat MXIM(Maxim Integrated Products Inc.) would aﬀect the volatilityphases of BRCM(Broadcom Corporation), EMC(EMC Corporation), andﬁnancial companies like BAC, JPM, and AIG; EBAY has an inﬂuence toSPLS(Staples Inc.), NTAP(NetApp Inc.), and BRCM. The other ﬁnding isthat networks change rapidly from month to month. With setting the sameﬁlter threshold, the highly dependent companies in February are mostly dif-ferent, shown see Fig.11 in Appendix.

Starting from a deﬁnition of large returns, we ﬁrstly propose a nonparametricapproach to segment stock process into multiple levels of volatility phases,which is an extension to Hierarchical Factor Segmentation(HFS). Then, weadvocate another data-driven method, named encoding-and-decoding, indealing with continuous-observation time series. By applying clustering anal-ysis further, the embedded number of hidden states can be discovered, so thestock dynamics is well represented. The manual experiments show that ourdecoding approach is comparable to Viterbi’s, and it works for both light andheavy tail distributions. As a real data application, we study S & P500 index,and establish networks to describe the dependency relation among diﬀerentstocks. Such networks shed lights on which stocks would likely stimulateor even promote, if not cause, volatility on other linked stocks. We discover19hat communities within each scale-speciﬁc network tend to be diverse acrossmultiple industries.

MOAIG APCADI AMATBACCHK CIEN CSCOC COP BMETBJSBMY BRCM DYNEBAYEMCXOM JDSUJPM LLY LOWMXIM PEP NOVNTAP SPLSSBUX SYMC TWXVLO WBWMB

Figure 9: Thresholding stock networks in January 200620 eferences [1]

Black, F. , and

Scholes, M. S. (1973). The Pricing of Options and Cor-porate Liabilities.

Journal of Political Economy, University of ChicagoPress , (3), 637-654.[2] Merton, R. C. (1973). Theory of rational option pricing.

The Bell Jour-nal of Economics and Management Science , (1), 141-183.[3] Hardy, M. R. (2001). A Regime-Switching Model of Long-Term StockReturns.

North American Actuarial Journal , (2), 41-53.[4] Hamilton, J. D. (1989). A new approach to the economic analysis ofnonstationary time series and the business cycle.

Econometrica , (2),357-384.[5] Fine, S. , Singer, W. , and

Tishby, N. (1998). The hierarchical hiddenMarkov model: analysis and applications.

Machine Learning , , 41–62.[6] Baum, L. , Petrie, T. , Soules, G. , and

Weiss, N. (1970). A max-imization technique occurring in the statistical analysis of probabilisticfunctions of Markov chains.

Annals of Mathematical Statistics , (1),164–171.[7] Hsieh, F. , Hwang, C. R. , Lee, H. C. , Lan, Y. C. , and

Horng, S.B. (2006). Testing and mapping non-stationarity in animal behavioralprocesses: a case study on an individual female bean weevil.

Journal ofTheoretical Biology , , 805-816.[8] Hsieh, F. , Chen, S. C. , and

Hwang, C. R. (2012), Discovering StockDynamics through Multidimensional Volatility Phases.

Quantitative Fi-nance , , 213–230.[9] Chang, L. B. , Stuart, G. , Hsieh, F. , and

Hwang, C. R. (2013),Invariance in The Recurrence of Large Returns and The Validation ofModels of Price Dynamics.

Physical Review E , , 022116.[10] Chang, L. B. , Goswami, A. , Hsieh, F. , and

Hwang, C. R. (2013),An Invariance for The Large-sample Empirical Distribution of WaitingTime Between Successive Extremes.

Bulletion of the Institute of Mathe-matics Academia Sinica , , 31-48.2111] Eberlein, E. (2001) Application of Generalized Hyperbolic Lévy Mo-tions to Finance. In: Barndorﬀ-Nielsen O.E., Resnick S.I., Mikosch T.(eds)

Lévy Processes . Birkhäuser, Boston, MA.[12]

Tenyakov, A. (2014). Estimation of Hidden Markov Models and TheirApplications in Finance.

Electronic Thesis and Dissertation Repository ,2348. https://ir.lib.uwo.ca/etd/2348 .[13]

Viterbi, A. J. (1967). Error bounds for convolutional codes and anasymptotically optimal decoding algorithm.

IEEE Transactions on In-formation Theory , (2), 260–269.[14] Mandelbrot, B. , and

Taylor, H. M. (1967). On the Distribution ofStock Price Diﬀerences.

Operations Research , (6), 1057-1062.[15] Clark, P. K. (1973). A Subordinated Stochastic Process Model withFinite Variance for Speculative Prices.

Econometrica , (1), 135-55[16] Barnett, L. , Barrett, A. B. , and

Seth, A. K. (2009). GrangerCausality and Transfer Entropy Are Equivalent for Gaussian Variables.

Physical Review Letters , (23), 238701.[17] Schreiber, T. (2000). Measuring information transfer.

Physical Re-view Letters , (2), 461–464.[18] Hlaváčková-Schindler, K. , Palus, M. , Vejmelka, M. , and

Bhattacharya, J. (2007). Causality detection based on information-theoretic approaches in time series analysis.

Physics Reports , (1),1–46. 22 upplementary H e i gh t . . . time c l u s t e r i nde x (A)(B) −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F (C) Cluster1 (D) Cluster2 (E) Cluster3 Figure 10: 4-states continuous-distribution decoding in simulation data.(A) Hierarchical Clustering Tree; (B) cluster index switching over time;(C),(D),(E): median eCDFs versus true CDFs, in cluster 1,2,3, respectively23