Unraveling S&P500 stock volatility and networks -- An encoding-and-decoding approach
UUnraveling S & P500 stock volatility and networks -An encoding and decoding approach
Xiaodong Wang and Fushing Hsieh Department of Statistics, University of California, Davis.
Abstract
We extend the Hierarchical Factor Segmentation(HFS) algorithm for discov-ering multiple volatility states process hidden within each individual S & P500stock’s return time series. Then we develop an associative measure to linkstocks into directed networks of various scales of associations. Such networksshed lights on which stocks would likely stimulate or even promote, if notcause, volatility on other linked stocks. Our computing endeavors startingfrom encoding events of large return on the original time axis to transformthe original return time series into a recurrence-time process on discrete-time-axis. By adopting BIC and clustering analysis, we identify potentialmultiple volatility states, and then apply the extended HFS algorithm onthe recurrence time series to discover its underlying volatility state process.Our decoding approach is found favorably compared with Viterbi’s in ex-periments involving both light and heavy tail distributions. After recoveringthe volatility state process back to the original time-axis, we decode andrepresent stock dynamics of each stock. Our measurement of associationis measured through overlapping concurrent volatility states upon a cho-sen window. Consequently, we establish data-driven associative networks forS & P500 stocks to discover their global dependency relational groupings withrespect to various strengths of links.
To discover the mystery of the stock dynamics, financial researchers focus onstock returns or log returns. Consider a time series of the price of a stock X t .1 a r X i v : . [ q -f i n . S T ] J a n lack and Scholes[1] proposed in their seminal work to use stochastic pro-cesses in modeling stock prices. A geometric Brownian motion (GBM) wasemployed implying that the log-returns are assumed normally distributed. log X t X t − ∼ N ( µ, σ ) Merton[2] extended Black and Scholes’ model to involve time-dependentparameters affording the serially correlation. Beyond normal distribution, abroader category of models, a general Levy process or particular geometricLevy process model[11], is an appropriate alternative taking account of stabledistribution with heavy tails. Since the independent increments property ofBrownian motion or Levy process, returns over disjoint and equal-lengthtime period still remains i.i.d, so its distribution is invariant to time scaleand to time period.However, it is well known that the distribution of returns is completelydifferent in various volatility stages in stock price. These models may failto capture more extreme price movements[3] since the constant volatility inthe model assumption is not satisfied.In order to incorporate stochastic volatility in stock price, a regime-switching or hidden-state process was proposed to govern the dynamics.That is to say, regime-switching models can represent the the stock pro-cess changes between low-volatility state and a more unstable high-volatilityregime, and each regime is characterized by different model or distributions.Under assumption of Markov chain for regime switching, Hidden MarkovModel(HMM) was implemented and appeals increasingly to researchers dueto its mathematically tractability.In the setting of simple Markov assumption, parametric approach and pa-rameter estimation procedures are discussed comprehensively. Hamilton[4]described AR and ARCH-type models under Markov regime-switching struc-ture. Hardy[3] offered Markov regime-switching lognormal model by assum-ing different normal distribution within each state. log X t X t − | s ∼ N ( µ s , σ s ) where s indicates the hidden states, s = 1 , , ... . Fine et al.[5] developedhierarchical HMM by putting additional sources of dependency in the model.So, every state is composed of substates with multiple levels of dependencies.So far, Markov process is still commonly assumed to govern the hiddenstates of a system and their evolution. Researches on non-parametrics arescarce due to its lack of tractability and involvement of more parameters[12].2ushing et al.[7, 8] developed a non-parametric decoding approach named Hi-erarchical Factor Segmentation(HFS). It was proposed to decode the hiddenstates underlying a lengthy discrete time series without assuming MarkovChain. After categorizing the time series with a certain threshold cut, a0-1 binary process is obtained with 1 indicating an event moment in whichthe value is up or below the threshold, and 0 meaning non-event stamp viceverse. The idea of HFS is to segment the whole time series into station-ary regimes and fit geometric distributions with different event-intensity indifferent regime. With a choice of parameters, the discrete process is conse-quently partitioned into alternating sequences of high and low event-intensitysegments. Finally, AIC or BIC is implemented for model selection.However, HFS is only designed to decoding discrete processes with 2-hidden-states embedded and it is still untouched in the continuous-distributionprocesses with multiple potential states, which is more realistic in reality. Inthis paper, we propose an encoding-and-decoding approach to handle thehidden-state model in a nonparametric fashion. After applying a sequentialencoding schedule, the time series is transformed into a series of binary pro-cesses. And then HFS decoding schedule is implemented, so the potentialversatility states is discovered. It is remarked that our approach is totallydata-driven, and it works without assuming underlining Markov Chain orany conditional distribution.In our numerical experiments, it shows that our HFS is more favorablethan Viterbi’s in decoding hidden states even under HMM settings, par-ticularly when the transition probability is low. In the end, tick-by-tickS & P500 index data in 2006 is fully analysed. A network is created with as-sociation measured through overlapping concurrent volatility states upon achosen window. The paper is constructed as following: in Section2, we pro-posed a new version of HFS, which can handle multiple states decoding; inSection3, we described the non-parametric approach in modeling continuous-distribution processes; in Section4, a real data experiments of S & P500 indexhas been done.
To investigate stock dynamics, we consider volatility as a temporal aggrega-tion of rare events which have large absolute returns. Consider the recur-sion time or waiting time between successive returns with extreme returns,Chang et al.[10] proved that the empirical distribution of recursion time3onverges to a geometric distribution in probability, when the observationsare i.i.d, or more generally, in exchangeable join distributions. Under theframework of regime-switching model, each regime can be considered as astationary time period with distribution exchangeable. Motivated by that, ageometric-switching model is established. Geometric distributions with dif-ferent emission or intensity probabilities are adopted to model time series indifferent volatility period, such as high-volatility and low-volatility. Here, thepotential assumption we made is the invariance of recursion time distribu-tion embedded in a regime. Temporal correlations within a regime are out ofour consideration here. The assumption may hold when the duration of eachregime is short enough. But it requires much more underline regimes and thevolatility could vary in extremely high frequency, which makes the model re-ally complex and non-traceable. Based on abundant permutation tests with30-s, 1-min, 5-min return scales, it is claimed in [9] that returns should onlybe consider exchangeable within a short time period, which should not belonger than, for example, 20 minutes. On the other hand, we expect longerduration of volatility period to afford more samples such that our geometricdistribution can get a goodness-of-fit.The threshold used to define “large” returns or rare events plays an impor-tant role here. If its value is set very extreme, only extremely large returnscan get filtered out. So, the decoding problem is relatively simple in term ofthe number of hidden states. Usually, two-states decoding can work this sit-uation well in differentiating extreme-volatility and non-extreme-volatilityregimes. But, useful information can get lost with the sacrifice of samplesize. While, if the value of threshold is set very small, a complicated case isneeded to be handled with more hidden states in higher switching frequency.Actually, there is an inevitable trade-off between the magnitude of the rareevents and the sample size.Specially, a recursion process C ( t ) is defined as C ( t ) = (cid:40) log X ( t ) X ( t − ≤ l or ≥ u otherwise where l and u are, for instance, the 0.1 and 0.9 quantile of log returns,respectively. Obviously, many generalizations are possible containing singletale excursions, say l = 0 is for upper tail excursions and u = 1 is for lowertail excursions.One simple way to define a “large” return or an event is to choose a recur-sion line π that can heuristically make the emission probabilities as differentas possible. This heuristic difference can be expressed using standard de-4iance of estimated emission probabilities ˆ p . Precised discussion about thechoice of excursion line is shown in the next section. π ∗ = argmin π std (ˆ p ( π )) The hierarchical factor segmentation(HFS) was originally developed for ananimal behavior study[7], and it was applied to discover stochastic volatilityphases from stock dynamics[8]. Given a chosen event and using hierarchicalcoding scheme, HFS is designed to partition the time series into alternatingsequences of high and low event-intensity segments. The original HFS canonly work for 2-states partition. In this section, we generalized HFS todecoding multiple states phases.Denote the number of internal states is m , m ≥ , the algorithm isdescribed as following. Alg.1
HFS-multiple states decoding1.Define events and code the whole process as a 0-1 digital string { C ( t ) } nt =1 with 1 as an event and 0 as a non-event point.2.Calculate the event recurrence time of { C ( t ) } nt =1 and denote the sequenceof recurrence time as { R ( t ) } t .3.Choose a sequence of threshold T = ( T , T , . . . , T m − ) where T < T <... < T m − . Transform { R ( t ) } t sequence into a 0-1 digital strings { C i ( t ) } t , i = 1 , ..., m − via the second-level coding scheme: C i ( t ) = (cid:40) R ( t ) ≥ T i otherwise { C i ( t ) } t , take code digit 1 as another new event andrecalculate the event recurrence time sequence { R i ( t ) } t , i = 1 , ..., m − .5. Choose another threshold value T ∗ to transform { R i ( t ) } t into anotherdigital string, for i = 1 , ..., m − :If a recursion time R i ( t ) ≥ T ∗ , then pick the corresponding time period in C i ( t ) and denote it as a segment Seg i , Seg i ⊂ { , ..., n } .6. The ultimate internal state is defined as S = Seg , S = Seg \ Seg ,..., S m − = Seg m − \ Seg m − , S m = { , ..., n }\ Seg m − .5he pair of partition parameters ( T i , T ∗ ) achieves an aggregating par-tition between high event-intensity and low event-intensity segments forthe recursion time { R ( t ) } t . And the intensity level is controlled by T i , i = 1 , ..., m − . If T i is taken its maximum T m − , the partition is designed toseparate an extreme-high-intensity segment from other relative low-intensitysegments. Decreasing the value of T i from T m − to T to implement a seriesof partitions, so the multiple intensity levels of phases can get achieved.In the second level of recursion time calculation, we have the recursiontime { R i ( t ) } t of { C i ( t ) } t . If R i ( t ) is above T ∗ , it reflects a long period oftime in which there is no event happens in C i ( t ) . We denote this waiting-time period (time between the two events in C i ( t ) ) as a low-intensity phase Seg i . So, for a given T i , T ∗ is set to decide which phases having relativelylow intensity, and the rests are high-intensity phases. It is noticed that Seg j ⊂ Seg i , for j > i . It is because if a recursion time R ( t ) is greaterthan T j , it is still greater than T i . After applying the same parameter T ∗ to discover the low-intensity segments, the corresponding low event-intensitytime period Seg j obtained by T j is wider and contains Seg i . Multiple volatility phases of time series S i , i = 1 , ..., m are computed andachieved by applying HFS. Assuming join distribution is exchangeable withineach volatility phase, the recursion distribution R ( t ) t ∈ S i converges to geo-metric distribution with parameter p i as the number of recursions goingto infinity. p i is considered as an emission probability of an event of alarge return. Maximized Likelihood Estimation(MLE) and Method of Mo-ment(MOM) give the same estimation, ˆ p i = 1 (cid:80) t ∈ S i R ( t ) (1)for i = 1 , ..., m , say recursion time starting from 0. HFS actually advocatesa way to partition the process into alternating segments with m event inten-sities. With enough sample size, geometric distributions are appropriate tomodel the m phases with ˆ p i as an estimation of the event intensity.There are m parameters involved in total, say θ = T , ..., T m − , and T ∗ .Model selection can be done via information criteria, like AIC and BIC. Theresultant Loss function can be written as,6 oss ( S θi ) = − m (cid:88) i =1 [ (cid:88) t ∈ S θi C ( t ) log ˆ p i + (cid:88) t ∈ S θi (1 − C ( t )) log (1 − ˆ p i )] + kN (2)where C ( t ) is a 0-1 discrete process after applying a event line; m is thenumber of embedded phases or hidden states; N is the total number ofalternating segments, N ≥ m ; k is a coefficient of the penalty term. Forinstance, k = 1 is the AIC model selection criterion, while k = log ( n ) is theBIC criterion. Indeed, k can be chosen based on empirical experiments. Inthis paper, we consistently use BIC in all the experiment.The optimal parameters θ ∗ should be tuned such that the Loss canachieves its minimum. θ ∗ = argmin θ Loss ( S θi ) (3)Currently, grid-search is implemented to solve this non-convex problem.The time complexity is O ( nG m ) , G is the number of grids. Experiments are done in two scenarios. In the first case, data is generatedaccording to Hidden Markov Model(HMM) with 2 hidden states. Decodingerror rate is calculated to empirically compare HFS algorithm and Viterbipath. In the second case, data is simulated under a regime-switching modelwith Markov property violated. As an example of multiple decoding, 3 hid-den states are embedded in the underline model.For the application of Viterbi path, since the true transition probabil-ity A and emission probability B are unknown in reality, EM algorithm inBaum–Welch type[6] is firstly implemented to estimate the parameters. Gen-erally, the initial conditions are chosen randomly. Prior information aboutthe parameters would help if it is available. Here, true emission probabilityinitials are input, but with two kinds of transition probability initials- one isthe true transition probability and the other is a random initial A . A = (cid:20) . . . . (cid:21) For convenience, we set p = p in transition probability matrix A bywhich data is generated, so A is controlled by only one parameter. A = (cid:20) − p p p − p (cid:21) A and emission B = ( p , p ) . The experiments are repeated for 100 times,and the mean and standard deviation of decoding error rate are reported inTable1.It shows that the decoding errors are getting better when p is lower.And HFS is extremely improved when p decreases to . . The reasoncan be explained by the loss function of HFS. The criterion, BIC, favoursa “simple” model with fewer alternating segments, and a long extension foreach segment improves the goodness-of-fit of geometric distribution. In thissettings, the error rate of HFS gets slightly worse when sample size is greater.In summary, without assuming any prior knowledge, HFS is competitive withViterbi’s even under Markov models, a unfair situation for HFS. And HFS ismore favourable especially when the transition probability is low. Besides,the choice of initials rises a big challenge to the stability of Viterbi’s orBaum–Welch’s in real application.Table 1: Decoding Error Rate p1=0.2, p2=0.1 p1=0.1, p2=0.05Viterbi( A ) Viterbi( A true ) HFS Viterbi( A ) Viterbi( A true ) HFSp12=0.005 N=2000 0.4358 ( ) 0.2378 (0.114) (0.093) 0.4550 ( ) 0.3726 (0.110) (0.090)N=5000 0.3940 (0.054) ) (0.077) 0.3171 (0.097)p12=0.01 N=2000 0.4405 ( ) (0.069) 0.3376 (0.109) 0.4722 ( ) (0.082) 0.3710 (0.089)N=5000 0.4504 ( ) (0.058) 0.3584 (0.048) 0.4889 ( ) 0.4219 (0.076) (0.078)p12=0.05 N=2000 ( ) 0.4526 (0.045) 0.4518 (0.034) 0.4718 ( ) 0.4825 (0.037) (0.051)N=5000 0.4398 ( ) (0.051) 0.4752 (0.029) 0.4751 (0.021) 0.4832 (0.033) ( )p12=0.1 N=2000 (0.019) 0.4543 (0.034) 0.4728 ( ) (0.025) 0.4778 (0.032) 0.4744 ( )N=5000 ( ) 0.4650 (0.026) 0.4920 (0.025) ( ) 0.4792 (0.031) 0.4853 (0.022) In the second scenario, We generate the the data which violates theMarkov property. The data (log returns) { Y ( t ) } t =1 is generated with 3hidden states embedded and 8 alternating segments over time: S ( t ) = t ∈ [1 , , [4001 , , [7001 , t ∈ [1001 , , [3001 , , [6001 , t ∈ [2001 , , [5001 , The order of the hidden states is alternating like , , , , , , , andeach segment extends for 1000 time units, see Fig.1. Data points are sim-ulated based on a conditional distribution given a hidden state. Normaldistribution with mean 0 and a heavy-tail case, Student-t distribution, areconsidered. 8 ( t ) ∼ N (0 , σ ) S ( t ) = 1 N (0 , σ ) S ( t ) = 2 N (0 , σ ) S ( t ) = 3 where standard deviations σ = 1 , σ = 2 , and σ = 3 . Y ( t ) ∼ t ( df ) S ( t ) = 1 t ( df ) S ( t ) = 2 t ( df ) S ( t ) = 3 where degree of freedoms df = 1 , df = 2 , and df = 5 in the simulation.The simulated raw data and the decoding results are shown in Fig.1. Therecursion events are defined by formula. In the Normal distribution setting, | u | = | l | = 2 which is approximately 0.9 quantile of the data; In the t dis-tribution setting, a larger value is used | u | = | l | = 3 which is approximately0.95 quantile. − − − − State1 State2 State3 State2 State1 State3 State2 State1 State1 State2 State3 State2 State1 State3 State2 State1 (A) (B)
Figure 1: Data is simulated via conditional distribution given a hidden state:(A) Normal distribution (B) student-t distribution. 8 underline phases al-ternates over time where 3 kinds of hidden states are embedded. The redline in (A) indicates the recursion event lines.From Fig.2, HFS gives a good result in decoding time series when Markovproperty is violated. The error only appears around the time when the phaseschanges. Besides, we can also compare the estimated emission probability ˆ p = ( ˆ p , ˆ p , ˆ p ) with the ground truth. In the simulation with normal distri-bution, ˆ p = (0 . , . , . and ˆ p = (0 . , . , . in the9
500 1000 1500 2000 time sequence r e c u rr en t t i m e da t a time − − (A) (B) time sequence r e c u rr en t t i m e − − t a (C) (D) time Figure 2: Simulation with Normal distribution(A)(B), student-t distribution(C)(D). (A),(C): Recursion time; (B),(D): raw data with colored decodingstates. “red”, “yellow”, and “pink” 3 colors indicates 3 different kinds of states.simulation with t distribution. They are close to the true parameter ∗ (Ψ ( − , Ψ ( − , Ψ ( − . , . , . , and ∗ ( F t ( − , F t ( − , F t ( − . , . , . . Continuous observation hidden model
As described in Section2, the choice of threshold in defining an event or largereturns is of vital importance. The decoding result will have different mean-ing with different choice of threshold. It is usually set very high in order tocatch the extreme movements. One natural question to ask is that what ifthe threshold is fluctuated, is the decoding result of the 0-1 processes stillstable? The answer is positive. For example, if an event is define when avariable is below a threshold π , the intensity parameter in geometric distri-bution is p s ( π ) = F s ( π ) given a state index s . By assuming the continuityof the truth distribution F s , p s is also continuous with π . So the emissionprobability of a hidden states would not fluctuate much if π varies slightly.Indeed, our experiment shows that our estimated emission probability is alsostable with π .The idea of dealing with stochastic process with continuous observation isdescribed in Alg.2. We firstly switch the recursion line from a extreme valueto 0 to discretize the time series into a sequence of 0-1 binary processes.Then implement HFS for all processes and record the estimated emissionprobability. As a consequent result, a vector of probability p ( π ) with differentchoice of π is obtained for each time stamp. It actually gives an estimation fora Cumulative Distribution Function (CDF), ˆ F ( π ) = ˆ p ( π ) . As a continuousexample of the simulated data with t distribution, Fig.3 shows a series ofempirical CDFs varying over time. Alg.2
Encoding-and-DecodingFor a threshold π in Π :Define events and code the whole process as a 0-1 digital string { C π ( t ) } nt =1 , C π ( t ) = (cid:40) log X ( t ) X ( t − ≤ − π or ≥ π otherwise ( symmetric ) Or C π ( t ) = log X ( t ) X ( t − ≤ π if π < log X ( t ) X ( t − ≥ π if π > otherwise ( asymmetric ) Repeat step 2 to 6 in Alg.1 11stimate emission probability ˆ p π ( t ) , t = 1 , ..., n by equation (1)End For − − . − − . − − . − − . − − . . . . . . Color Keyand Histogram C oun t t i m e Figure 3: For t-distributed simulation, eCDF for time point from 1998 to2003. Data from 1998 to 2000 follows t distribution with df 2; data from2001 to 2003 follows t distribution with df 5It looks surprising that how a eCDF can get returned based on the onlyobservation at each time point. Indeed, the eCDF is different here. Becauseof the limitation of data information, increasing the number of line cut π would not give a good estimation to the distribution. The estimated proba-bility here is also in relative term. It depends on how well HFS can separatethe states and at what intensity levels. If there exists a π by which all theconditional distribution can be separated well, the decoding can achieve agood result via HFS. However, if the recursion lines is chosen not appropri-ate, for example, if π is set close to 0, or the emission probability is closeto 0.5, HFS would fail to separate the distributions. It is why there is a big12jump” in the middle of the eCDFs.In summary, the continuous decoding algorithm can be applied by shift-ing π value from high to low to obtain a sequence of eCDFs, although onlyfewer π are meaningful in sense of decoding. It actually gathers the decod-ing result for several discrete processes. The aggregated information sheds alight for us in differentiating the embedded hidden states. As the result of Alg.2, a sequences of eCDFS is returned. The next question ishow to summarize the information among all n eCDFs, or how many type ofstates or distributions can represent the patterns of distribution switching. Itis also a realistic problem in hidden states models, including HFS and HMM,to decide the number of hidden states. Generally speaking, the more statesis taken into consideration, the less tractability it would be. For example,a 2-state lognormal Markov Model contains 6 parameters, while a 3-statemodel has 10. Researchers can try the simplest model at beginning, andthen increase the number of states to explore the goodness-fit model.Rather than starting at the simplest, our continuous observation decod-ing provides a way to consider as many as possible states at first. For exam-ple, n eCDFs are returned via our approach. After that, clustering analysisis implemented to combine states with comparable distributions. Thus, wetransform the problem to a clustering problem- how many clusters embeddedin the n probability vectors or eCDFs.To visualize the clustering structure, Hierarchical clustering Tree is ap-plied on probability vectors (eCDFs), see panel (A) in Fig4. We continue touse simulated data with t distribution mentioned in section2. The numberof clusters can be chosen based on tree heights of the dendrogram. 2 or 3clusters may be embedded inside. Cutting the dendrogram into 3 clusters,we show the cluster index in the original time line in panel (B). Surprisingly,the clusters represents the alternating hidden states very well. If 2 clustersis taken rather than 3, it also makes sense that “State2” and “State3” twonon-high intensity states are combined together as a comparison to “State1”.Take a vector of mean probability values as a representative eCDF fromeach cluster. The 3 eCDFs from 3 clusters are shown with the true CDFs inpanel (C)-(E). It shows that the mean value from each cluster gives a goodestimate to the truth distribution.Without the prior knowledge of 3 state embedded in the time series, ifwe just implement 2-state or 4-states decoding schedule, the clustering resultlooks very similar. And a 3-cluster clustering result can also represent the13 H e i gh t . . . time c l u s t e r i nde x (A)(B) −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F (C) Cluster1 (D) Cluster2 (E) Cluster3 Figure 4: 3-states continuous-distribution decoding in simulation data.(A) Hierarchical Clustering Tree; (B) cluster index switching over time;(C),(D),(E): median eCDFs versus true CDFs, in cluster 1,2,3, respectivelyhidden sates well. The clustering result for 2-state decoding is shown inFig.5. There is a short period (between 1000 and 2000) in which “State2”is misclassified as “State3”. The result of 4-state decoding is accurate shownin Appendix Fig.10. In short, the clustering result is stable with the priorchoice of number of hidden states.
In this section, we explore tick-by-tick S & P500 stock index in year 2006as an example of real data application. We focus on a return process in14 H e i gh t . . . time c l u s t e r i nde x (A)(B) −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F (C) Cluster1 (D) Cluster2 (E) Cluster3 Figure 5: 2-states continuous-distribution decoding in simulation data.(A) Hierarchical Clustering Tree; (B) cluster index switching over time;(C),(D),(E): median eCDFs versus true CDFs, in cluster 1,2,3, respectivelya market time scale which is measured by market activity rather than thereal time clock. Actually, the idea has been suggested first by [14], andthen worked thorough by [15]. One well known example is the random-walk model suggesting that the variance of returns depend on the number oftransactions. Following the idea above, we apply the tiniest sampling rate-one per transaction. So, our returns are in an order of transaction.
We particularly focus on tick-by-tick IBM monthly stock index as an ex-ample. Our continuous observation decoding algorithm is implemented to15iscover the stock dynamics. As illustrated in section3, our approach is notsensitive to the choice of the number of hidden states. And the numbercan be determined by clustering analysis further. Considering the computa-tion pressure, we apply a 2-state decoding schedule in Alg.2 and k-means inclustering volatility phases. It turns out that there are 3 potential clustersembedded in the stochastic process.Take the mean of eCDFs as a representative from each cluster, shownin Fig.6. Indeed, cluster1 and cluster3 presents two type of extreme phases.The distribution of cluster3 with a heavier tail reflects a high-volatility phasecompared with that of cluster1; while, cluster1 indicates a phase with novolatility. As a phase in the middle stage, cluster2 shows an asymmetricdistribution with a heavy tail on the left but light tail on the right. However,the other two cluster looks more symmetric in the sense of distributions. Itshows that an asymmetric distribution could embed in real, which is usuallymissed by financial researches. −6e−04 −4e−04 −2e−04 0e+00 2e−04 4e−04 6e−04 . . . . . . threshold e CD F cluster1cluster2cluster3 Figure 6: IBM stock in January 2006: the mean functions from 3 clusters ofeCDFsWe can also present volatility dynamics by showing the cluster indexvarying over the market time. The stochastic dynamics of IBM stock inJanuary 2006 are shown in Fig.7. Corresponding to the previous notion,cluster1 indicates a low volatility phase, cluster2 is for a median stage, andcluster3 presents a high volatility phase. Based on the day time segments,it is obvious that the volatility mostly appears at the beginning of a stock16arket, and usually shows up twice or three times per day. . . . transaction c l u s t e r i nde x Jan03 Jan04 Jan05 Jan06 Jan09 Jan10 Jan11 Jan12 Jan13 Jan17 Jan18 . . . transaction c l u s t e r i nde x Jan18 Jan19 Jan20 Jan23 Jan24 Jan25 Jan26 Jan27 Jan30 Jan31
Figure 7: IBM stock in January 2006: a sequnce of volatility stage changingover transactions & P 500 stock networks
Beyond detecting the volatility dynamics for a single stock index, we furtherconsider the interaction among all the S & P500. In finance, measuring depen-dency between stock time series has become the subject of research interest.Many work about stock relationship in finance market is based on correla-tion. In order to cover non-linear dependency, Mutual Information(MI), asan alternative, quantifies the average Shannon’s entropy shared by two vari-ables. Another non-parametric measurement, Transfer Entropy(TE)[17, 18]extending of the predominant concept of Granger causality[16], is proposedto measure the reduction of uncertainty by Shannon’s entropy in forecastingthe destination variable due to past value of the source variable.
T E X − >Y = (cid:88) P ( Y t , Y ( t − L ):( t − , X ( t − L ):( t − ) log P ( Y t | Y ( t − L ):( t − , X ( t − L ):( t − ) P ( Y t | Y ( t − L ):( t − ) Comparing with the lead-lag relationships between time series in TE, MIfocuses on the current time. MI can be thought of a special case of TE when L = 0 (no lag effects). M I = (cid:88) P ( Y t , X t ) log P ( Y t | X t ) P ( Y t ) X ∗ t = max { X ( t −(cid:98) w (cid:99) ):( t + (cid:98) w (cid:99) ) } T E ∗ X − >Y = (cid:88) X ∗ t P ( Y ∗ t = 3 , X ∗ t ) log P ( Y ∗ t = 3 | X ∗ t ) P ( Y ∗ t = 3) where w is the window length, say w is 5 seconds; X ∗ and Y ∗ are theprocesses after taking max-window. Compared to TE, this measurementcan take the neighbor of a time (lag and lead values) into account instead ofonly lag effect. The interpretation is that how much uncertainty it is affecteddue to X values nearby such that Y is in its volatility phases (state3). Thehigher the value, the bigger the influence from X to affect volatility phasesof Y.Our goal of this experiment is to explore how the 500 stocks affect eachother in the volatility phases. We apply Alg.1 with high quantiles l = 0 . and u = 0 . to catch the extreme volatility phases. Hidden sates get in-volved to describe the level of volatility, 1 for non-volatility, 2 for a medianstage, and 3 for volatility. It is shown in Fig.8 (A) that these two stocks(MXIM & NTAP) share a large intersection in volatility phases. Especially,when MXIM is in its volatility (State3), the price of NTAP has a high prob-ability to be volatile. The value of the dependency
T E ∗ from MXIM toNTAP is 0.039 and 0.016 in reverse. The second example in (B) shows thatthe volatility of TWX would affect that of BRCM. T E ∗ from TWX to BRCMis 0.036 and 0.026 in reverse.Once T E ∗ is calculated for all the pairs from S & P500 in 2006. A directed18 e+00 1e+05 2e+05 3e+05 4e+05 5e+05real time(secs)
State1State2State3
MXIMNTAP (A)
State1State2State3
TWXBRCM (B)
Figure 8: sequence of hidden states in real time: (A) MXIM v.s NTAP; (B)TWX v.s BRCMnetwork is established with an edge as dependency association. Via a largethreshold filtering out weak connections, a sparse network with highly asso-ciated stock is established, see networks in January 2006 in Fig.9. It showsthat MXIM(Maxim Integrated Products Inc.) would affect the volatilityphases of BRCM(Broadcom Corporation), EMC(EMC Corporation), andfinancial companies like BAC, JPM, and AIG; EBAY has an influence toSPLS(Staples Inc.), NTAP(NetApp Inc.), and BRCM. The other finding isthat networks change rapidly from month to month. With setting the samefilter threshold, the highly dependent companies in February are mostly dif-ferent, shown see Fig.11 in Appendix.
Starting from a definition of large returns, we firstly propose a nonparametricapproach to segment stock process into multiple levels of volatility phases,which is an extension to Hierarchical Factor Segmentation(HFS). Then, weadvocate another data-driven method, named encoding-and-decoding, indealing with continuous-observation time series. By applying clustering anal-ysis further, the embedded number of hidden states can be discovered, so thestock dynamics is well represented. The manual experiments show that ourdecoding approach is comparable to Viterbi’s, and it works for both light andheavy tail distributions. As a real data application, we study S & P500 index,and establish networks to describe the dependency relation among differentstocks. Such networks shed lights on which stocks would likely stimulateor even promote, if not cause, volatility on other linked stocks. We discover19hat communities within each scale-specific network tend to be diverse acrossmultiple industries.
MOAIG APCADI AMATBACCHK CIEN CSCOC COP BMETBJSBMY BRCM DYNEBAYEMCXOM JDSUJPM LLY LOWMXIM PEP NOVNTAP SPLSSBUX SYMC TWXVLO WBWMB
Figure 9: Thresholding stock networks in January 200620 eferences [1]
Black, F. , and
Scholes, M. S. (1973). The Pricing of Options and Cor-porate Liabilities.
Journal of Political Economy, University of ChicagoPress , (3), 637-654.[2] Merton, R. C. (1973). Theory of rational option pricing.
The Bell Jour-nal of Economics and Management Science , (1), 141-183.[3] Hardy, M. R. (2001). A Regime-Switching Model of Long-Term StockReturns.
North American Actuarial Journal , (2), 41-53.[4] Hamilton, J. D. (1989). A new approach to the economic analysis ofnonstationary time series and the business cycle.
Econometrica , (2),357-384.[5] Fine, S. , Singer, W. , and
Tishby, N. (1998). The hierarchical hiddenMarkov model: analysis and applications.
Machine Learning , , 41–62.[6] Baum, L. , Petrie, T. , Soules, G. , and
Weiss, N. (1970). A max-imization technique occurring in the statistical analysis of probabilisticfunctions of Markov chains.
Annals of Mathematical Statistics , (1),164–171.[7] Hsieh, F. , Hwang, C. R. , Lee, H. C. , Lan, Y. C. , and
Horng, S.B. (2006). Testing and mapping non-stationarity in animal behavioralprocesses: a case study on an individual female bean weevil.
Journal ofTheoretical Biology , , 805-816.[8] Hsieh, F. , Chen, S. C. , and
Hwang, C. R. (2012), Discovering StockDynamics through Multidimensional Volatility Phases.
Quantitative Fi-nance , , 213–230.[9] Chang, L. B. , Stuart, G. , Hsieh, F. , and
Hwang, C. R. (2013),Invariance in The Recurrence of Large Returns and The Validation ofModels of Price Dynamics.
Physical Review E , , 022116.[10] Chang, L. B. , Goswami, A. , Hsieh, F. , and
Hwang, C. R. (2013),An Invariance for The Large-sample Empirical Distribution of WaitingTime Between Successive Extremes.
Bulletion of the Institute of Mathe-matics Academia Sinica , , 31-48.2111] Eberlein, E. (2001) Application of Generalized Hyperbolic Lévy Mo-tions to Finance. In: Barndorff-Nielsen O.E., Resnick S.I., Mikosch T.(eds)
Lévy Processes . Birkhäuser, Boston, MA.[12]
Tenyakov, A. (2014). Estimation of Hidden Markov Models and TheirApplications in Finance.
Electronic Thesis and Dissertation Repository ,2348. https://ir.lib.uwo.ca/etd/2348 .[13]
Viterbi, A. J. (1967). Error bounds for convolutional codes and anasymptotically optimal decoding algorithm.
IEEE Transactions on In-formation Theory , (2), 260–269.[14] Mandelbrot, B. , and
Taylor, H. M. (1967). On the Distribution ofStock Price Differences.
Operations Research , (6), 1057-1062.[15] Clark, P. K. (1973). A Subordinated Stochastic Process Model withFinite Variance for Speculative Prices.
Econometrica , (1), 135-55[16] Barnett, L. , Barrett, A. B. , and
Seth, A. K. (2009). GrangerCausality and Transfer Entropy Are Equivalent for Gaussian Variables.
Physical Review Letters , (23), 238701.[17] Schreiber, T. (2000). Measuring information transfer.
Physical Re-view Letters , (2), 461–464.[18] Hlaváčková-Schindler, K. , Palus, M. , Vejmelka, M. , and
Bhattacharya, J. (2007). Causality detection based on information-theoretic approaches in time series analysis.
Physics Reports , (1),1–46. 22 upplementary H e i gh t . . . time c l u s t e r i nde x (A)(B) −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F −4 −2 0 2 4 . . . . . . threshold CD F (C) Cluster1 (D) Cluster2 (E) Cluster3 Figure 10: 4-states continuous-distribution decoding in simulation data.(A) Hierarchical Clustering Tree; (B) cluster index switching over time;(C),(D),(E): median eCDFs versus true CDFs, in cluster 1,2,3, respectively23