[PDF] A micro-to-macro approach to returns, volumes and waiting times

Abstract

Fundamental variables in financial market are not only price and return but a very important role is also played by trading volumes. Here we propose a new multivariate model that takes into account price returns, logarithmic variation of trading volumes and also waiting times, the latter to be intended as the time interval between changes in trades, price, and volume of stocks. Our approach is based on a generalization of semi-Markov chains where an endogenous index process is introduced. We also take into account the dependence structure between the above mentioned variables by means of copulae. The proposed model is motivated by empirical evidences which are known in financial literature and that are also confirmed in this work by analysing real data from Italian stock market in the period August 2015 - August 2017. By using Monte Carlo simulations, we show that the model reproduces all these empirical evidences.

Full PDF

NNoname manuscript No. (will be inserted by the editor)

A micro-to-macro approach to returns, volumes andwaiting times

Guglielmo D’Amico · Filippo Petroni the date of receipt and acceptance should be inserted later

Abstract

Fundamental variables in ﬁnancial market are not only price andreturn but a very important role is also played by trading volumes. Here wepropose a new multivariate model that takes into account price returns, loga-rithmic variation of trading volumes and also waiting times, the latter to beintended as the time interval between changes in trades, price, and volumeof stocks. Our approach is based on a generalization of semi-Markov chainswhere an endogenous index process is introduced. We also take into accountthe dependence structure between the above mentioned variables by meansof copulae. The proposed model is motivated by empirical evidences whichare known in ﬁnancial literature and that are also conﬁrmed in this workby analysing real data from Italian stock market in the period August 2015- August 2017. By using Monte Carlo simulations, we show that the modelreproduces all these empirical evidences.

Keywords high frequency data; semi-Markov; copula function.

PACS

In ﬁnancial markets, high frequency data and modelling have acquired a domi-nant role due to relevant information brought by intra-day observations. Nowa-days, even more sophisticated models and ideas can be advanced and testedon real market data based on the huge amount of information that can be

G. D’AmicoDepartment of Pharmacy, Universit`a ‘G. d’Annunzio’ di Chieti-Pescara, ItalyE-mail: [email protected]. PetroniDepartment of Management, Universit`a Politecnica delle Marche, Ancona, Italy E-mail:[email protected] a r X i v : . [ q -f i n . S T ] J u l Guglielmo D’Amico, Filippo Petroni stored and processed on modern computers.A large part of eﬀort in market microstructure studies has been producedin order to understand, mimic and predict basic empirical regularities observedin the most important ﬁnancial variables. A special attention has been dedi-cated to the relation between ﬁnancial volumes and returns. The majority ofthe works in this area can be classiﬁed within the so-called econometric frame-work, sometimes also referred as macro-to-micro approach. The cornerstone ofthis approach is to consider the observed price to be a collateral eﬀect of an un-observable volatility process to which a noise process transformation is applied,see e.g. [5]. This line of research has ﬂourished during the last two decades andconsiderable attention has been dedicated to the problem of irregular spacingin time of observations when dealing with high frequency ﬁnancial data; theseminal work by [24] and the recent review by [4] can provide a wide overviewon the subject. Rapidly, econometricians turned the attention on multivariatemodels of logarithmic price returns, volumes and duration (waiting times), thelatter to be intended as the time interval between changes in trades, price, andvolume of stocks, see e.g. [27,32,22,39]Another strand of literature relies on the modelling of directly observablequantities, the so-called micro-to-macro approach that is philosophically incontrast with the econometric approach. This framework has a long traditionthat has its roots in the paper by [1] and also embraces lattice based modelsincluding the popular binomial and trinomial models (see e.g. [9] and [7]). Themicro-to-macro approach has undergone a revival in recent years mainly dueto the work of econophysicists that introduced the Continuous Time RandomWalks (CTRW) apparatus in the modelling of ﬁnancial returns, see [31,40,41]. The evolution equation of CTRW was formulated and it was shown thatit can catch non-Markovian eﬀects. Sometimes the non-Markovian behaviourof stocks has been accommodated considering a latent Markov process act-ing as a switching process as done in [10]. In any case, a viable solution tonon-Markovian problem is given by semi-Markov based models. Semi-Markovprocesses are the equivalent of CTRW having a non-independent space-timedynamic. They appeared in the ﬁfties in the probability ﬁeld due to the in-dependent contributions by [29] and [46]. They have been successfully investi-gated and applied in connection with a very wide range of problems includingreliability theory, queuing, stochastic systems and DNA analysis, see e.g. [28],[2], [43,44] and [20,21].Financial modeling is not an exception and has assisted to the progressiveabandonment of the Markovian property in favour of the semi-Markov one.Examples comes from credit rating modelling (e.g. [13,50,51]), high-frequencyﬁnancial data ([48,25,49]), ﬁnancial time series ([8,33]) and pricing problems([14,45,47]).However, only recently it has been recognized that also semi-Markov pro-cesses (CTRW included) are not able to reproduce accurately the statisticalproperties of high-frequency ﬁnancial data and a more general solution hasbeen advanced in a series of papers where the concept of weighted-indexedsemi-Markov chains (WISMC) has appeared, see [15] and [17]. The WISMC micro-to-macro approach to returns, volumes and waiting times 3 model represents a generalization of ordinary semi-Markov processes and re-vealed to be particular useful to reproduce long-term dependence in the stockreturns among other stylized facts of statistical ﬁnance. WISMC models wereextended using diﬀerent strategies to multivariate settings and applied to mea-sure risk of ﬁnancial portfolios, see [18] and [19]. In the meantime they werealso successfully applied to the modelling of ﬁnancial volumes by [12].So far, models proposed in the literature of the micro-to-macro approachhave not yet been able to advance a unifying approach where returns, volumesand waiting times are jointly modelled in such a way to reproduce known em-pirical regularities they possess. The contribution of this paper is to presenta modelling framework where these three variables are managed contempo-raneously in a satisfactory and ﬂexible way. In particular, we ﬁrst conduct adetailed explorative data analysis with the aim of a better understanding ofthe empirical relationships among the considered ﬁnancial variables. The dataanalysed are from four Italian stocks for the period August 2015 - August 2017observed at 1 minute frequency. The WISMC model is presented for the ﬁrsttime ever in discrete time with a general state space and is considered as amodel for the log-returns and also for the log-volume returns with diﬀerentkernels. To achieve the objective of a multivariate model of price-volumes-waiting times (triplet process), speciﬁc data-driven assumptions are advancedand the dependence structure between the price and volume return processesconditional on the waiting time process is embodied by using a copula functionon the joint distribution of modulus of returns and modulus of volume returnsthat exhibit a general dependence structure. The dynamic of the multivariatemodel is completely characterized by the determination of the kernel of thetriplet process. In general, this allows the computation of any ﬁnancial statis-tic that can be written as a functional of the kernel of the triplet process. Themodel is used to compute linear and nonlinear measures of dependence, jointﬁrst passage time distribution function of price and volumes and shows abilityto reproduce probability density functions of both variables as well as crossand auto-correlation functions.The paper is organized as follows. Section 2 provides a statistical analy-sis of ﬁnancial data with a particular focus on the relationships among pricereturns, volume returns and waiting times. Section 3 sets out the marginalmodels of price and volumes and the multivariate extension by means of cop-ula functions. In this section the kernel of the triplet process is studied undersome assumptions that are justiﬁed by the data. The section presents also thecomputation of some ﬁnancial functions of broad interest. Section 4 illustratesthe result of the application to real data and demonstrates the accuracy ofthe model in reproducing the main empirical regularities observed in ﬁnancialmarkets. Section 5 summarizes our contribution and results. All proofs aredeferred to the Appendix.

Guglielmo D’Amico, Filippo Petroni

Code NameTIT

Telecom

ISP

Intesa San Paolo

TEN

Tenaris F FCA Group

Table 1

Stocks used in the application and their symbols

Aug15 Aug16 Aug170.811.2 P r i c e TIT

Aug15 Aug16 Aug170510 V o l u m e Fig. 1

Time series of prices and volumes for stock TIT.

Empirical research on price changes, ﬁnancial volumes and waiting times hasidentiﬁed some characteristics often called the stylized facts [26,36,34,35,3,38,37]. In this section we conduct an explorative data analysis with the aim ofa better understanding of empirical relations among the considered ﬁnancialvariables. This analysis will inspire the main assumptions under which ourmodel is going to be built on in the next section.The data used are quotes of Italian stocks for the period August 2015 -August 2017 (2 full years) with 1 minute frequency. Every minute, the lastprice and the cumulated volume (number of transactions) is recorded. Foreach stock the database is composed of about 2 . ∗ volumes and prices.The list of stocks analysed and their symbols are reported in Table 1. Fromnow onward we will use only the codes in the table to identify each stock.The analysed stocks are chosen to represent diﬀerent market sectors. Ac-cording to the Global Industry Classiﬁcation Standard F is in the industrialsector, ISP is one of the largest banks in Italy (ﬁnancial sector),

TIT is inthe telecommunication and

TEN in the energy sector. In Figure 1 we showan example, for the stock

TIT , of the time series of price S ( t ) and tradingvolumes V ( t ) in the analysed period.As a ﬁrst step we analyse the time series and look for the most importantstatistical features. From prices we build a time series of the price returns micro-to-macro approach to returns, volumes and waiting times 5 Stock Mean Median Standard deviation Skewness KurtosisTIT − ∗ − . ∗ − . ∗ − . ISP . ∗ − . ∗ − − . ∗ − . TEN − . ∗ − . ∗ − . ∗ − . F . ∗ − . ∗ − − . ∗ − . Table 2

Descriptive statistics of the price returns r ( t ). Stock Mean Median Standard deviation Skewness KurtosisTIT − . ∗ − − .

06 1 .

58 6 . ∗ − . ISP − . ∗ − − .

05 1 .

53 9 . ∗ − . TEN − . ∗ − − .

05 1 .

31 5 . ∗ − . F − . ∗ − − .

06 1 .

51 8 . ∗ − . Table 3

Descriptive statistics of volume returns v ( t ). TIT -1 0 1 10 -3 ISP -1 0 1 10 -3 TEN -1 0 1 10 -3 F -1 0 1 10 -3 Fig. 2

Histogram of r ( t ) compared with a Gaussian ﬁt. deﬁned as r ( t ) = log( S ( t ) /S ( t − v ( t ) = log( V ( t ) /V ( t − t is the time variable in one minute frequency. To be sure to use onlyvariation in one minute period we exclude from the analyses the variation ofboth variables from the closing of the stock market at day d to the re-openingin the next trading day d + 1 (we remind that the stock market is open from9 am to 17:30 pm in weeks day).In Table 2 we summarize the descriptive statistics of price returns r ( t ),while in Table 3 we summarize the descriptive statistics of volume returns v ( t ).To better visualize the distributions of both time series we show in Figure 2and 3 the histogram of r ( t ) and v ( t ), respectively, and we also compare themwith the best Gaussian ﬁt.We performed a Jarque-Bera test that rejected the Gaussian distributionfor both r ( t ) and v ( t ) at 1% signiﬁcance level.One of the most important statistical feature of both time series is that Guglielmo D’Amico, Filippo Petroni

TIT -5 0 502468 10 ISP -5 0 502468 10 TEN -5 0 502468 10 F -5 0 502468 10 Fig. 3

Histogram of v ( t ) compared with a Gaussian ﬁt. TIT

ISP S a m p l e A u t o c o rr e l a t i on TEN F Fig. 4

Sample autocorrelation of | r ( t ) | . ρ ( r ( t ) , v ( t )) p-value ρ ( | r ( t ) | , v ( t )) p-value ρ ( r ( t ) , | v ( t )) | p-value ρ ( | r ( t ) | , | v ( t )) | p-valueTIT 0.0089 0 0.086 0 -0.0032 0.023 0.020 0ISP -0.010 0 0.086 0 -0.0036 0.0095 -0.019 0TEN -0.0091 0 0.086 0 0.00055 0.69 0.040 0F -0.012 0 0.11 0 -0.0041 0.0037 -0.027 0 Table 4

Cross correlation between price and volume returns. their absolute values are long range correlated. We show this in Figures 4 and5. We also found that there is zero correlation between r ( t ) and v ( t ) while anon-zero correlation between | r ( t ) | and v ( t ) is present. This result is shown inTable 4. In this Table we show all possible combination of correlation between r ( t ) and v ( t ) and their absolute values, we also show the p-values which givesstatistical signiﬁcance of non-zero correlation. micro-to-macro approach to returns, volumes and waiting times 7 TIT

ISP S a m p l e A u t o c o rr e l a t i on TEN F Fig. 5

Sample autocorrelation of | v ( t ) | . Fig. 6

Waiting time as function of r ( t ). Given these properties a good model should be able to take all of them intoaccount. Another property that we found quite interesting and that should beincluded into a model is the following: for both time series ( r ( t ) and v ( t )) wefound that there is a dependence with the waiting time which is deﬁned as thetime it takes for returns to change their values. This can be seen in Figures 6and 7 where we have plotted the times it takes from a speciﬁc values of r ( t )( v ( t )) to jump into all other values. It can be noticed that waiting times hassome dependence from r ( t ) ( v ( t )) values. This empirical evidence was alreadyobserved for price returns in [31] and now we highlight it also for volumereturns.This result is conﬁrmed from contingency tables where we can see thedependence between r ( t ), v ( t ) and the waiting time T . The contingency tables, Guglielmo D’Amico, Filippo Petroni

Fig. 7

Waiting as function of v ( t ). r ( t ) −∞ : − . − .

13% : − . − .

05% : 0 .

05% 0 .

05% : 0 .

13% 0 .

13% : + ∞ T ∞

45 (565,0) 70 (1053,1) 4217 (1212,5) 66 (1068,2) 53 (552,2) v ( t ) −∞ : − − − . − . . . ∞ T ∞ Table 5

Contingency table for F . In the ﬁrst table we tested the dependence between r ( t )and T while in the second table we tested the dependence between v ( t ) and T . The numbersin brackets are obtained under the independence hypotheses. The χ test rejects the nullhypothesis of independence. shown in Table 5, have been obtained by a discretization of both r ( t ) and v ( t )into 5 states.It can be easily noticed, from Tables 5, that there is a dependence of thenumber of transition from T (for T = 1 there are much more transitions thanfor T = 2 or T = 3). More speciﬁcally, in brackets we give the number oftransitions that we would expect for independent processes where the proba-bility of ﬁnding a given number of transition is simply given by the productof the frequencies of having each variable at that given state. From Tables 5 itis obvious that the independent hypothesis does not hold for both processes.We obtained similar results for all other stocks which, for reasons of space, arenot shown here. From the above tables and from all results obtained in thissection, we can say that a good real world model of price and volumes shouldtake into account all the afore detected stylized facts that we can summarizein a list:- distributions of price returns and volume returns are not Gaussian; micro-to-macro approach to returns, volumes and waiting times 9 - the absolute values of price returns are long range correlated;- the absolute values of volume returns are long range correlated;- price returns and volume returns are uncorrelated while a non-zero correla-tion between | r ( t ) | and v ( t ) is present;- r ( t ) and the waiting times inﬂuence each other;- v ( t ) and the waiting times inﬂuence each other.The majority of them was already known and extensively documented in theﬁnancial literature, here they have been conﬁrmed in our dataset. A very inter-esting summary and ﬁnancial implications of those empirical regularities arediscussed in [6] and in the references therein. The empirical evidences in thelist are the cornerstones on which is built the model we are going to presentin next section. In this section we ﬁrst present the WISMC model that is used as a marginalmodel for both the price and volume returns processes. Successively, we extendthe mathematical model in a multivariate setting by considering a dependencestructure between price, volumes and waiting times (durations) using a copulafunction.3.1 Weighted-Indexed Semi-Markov ChainsHere, we introduce discrete-time WISMC model with Borel phase space inrelation to the ﬁnancial problem to which we are interested in.Let ( Ω, F , P ) be a probability space endowed with a ﬁltration F := ( F n ) n ∈ IN where all upcoming random variables are deﬁned.Let S ( t ) be the price of a ﬁnancial asset at time t ∈ IN. The time varying logreturn, deﬁned as log( S ( t ) /S ( t − { T Jn } n ∈ IN ,the so-called jump-times of the asset price process.In correspondence of the times { T Jn } n ∈ IN , the logarithmic return processassumes diﬀerent values denoted by { J n } n ∈ IN and along any waiting time X n := T Jn +1 − T Jn it does not change value and remains constant. Thus, J n isthe value of the logarithmic change in price at its n-th transition.Let assume that at current time, say t = 0, we dispose of a set of past dataconsisting of two vectors of observations collecting the last m + 1 visited statesof the log-return process and corresponding transitions times, respectively, i.e. J − m = ( J − m , J − m +1 , . . . , J ) , T − m = ( T J − m , T J − m +1 , . . . , T J ) . Consider also an index process: I Jn ( λ ) := m + n − (cid:88) r =0 T Jn − r − (cid:88) a = T Jn − − r f λ ( J n − − r , T Jn , a ) + f λ ( J n , T Jn , T Jn ) , (1)where f λ : IR × IN × IN → IR is a bounded function.The process I Jn ( λ ) can be interpreted as an accumulated reward processwith the function f λ as a measure of the weighted rate of reward per unit time.The parameter λ is a memory parameter that should be calibrated on the data.A speciﬁc calibration procedure will be discussed in the application (Section4). It should also be remarked that the index process considered in this paperis slightly more general than those considered in previous research articlesbecause we added the term f λ ( J n , T Jn , T Jn ) that add to the index process alsothe score deriving from observing the current log-return state J n at presenttime T Jn .Introduce the counting process N J ( t ) := max { n ∈ IN : T Jn ≤ t } , and let usnow introduce the notion of weighted-indexed semi-Markov chains. Deﬁnition 1

The process Z J ( t ) := J N J ( t ) is said to be a weighted-indexedsemi-Markov chain with phase-space (IR , B (IR)) if ∀ i, x, j ∈ IR and ∀ t ∈ INthere exists a function q J = q J ( i, x ; j, t ), called the indexed semi-Markov ker-nel, such that ∀ n ∈ IN the following equality holds true: P [ J n +1 ≤ j, T Jn +1 − T Jn = t | σ ( J h , T Jh , I Jh ( λ ) , h ≤ n ) , J n = i, I Jn ( λ ) = x ]= P [ J n +1 ≤ j, T Jn +1 − T Jn = t | J n = i, I Jn ( λ ) = x ] =: q J ( i, x ; j, t ) . (2) Remark 1

Relation (2) asserts that the knowledge of the values of the vari-ables J n , I Jn ( λ ) is suﬃcient to give the conditional distribution of the couple J n +1 , T Jn +1 − T Jn whatever the values of the past variables might be. Therefore,to assess the probability of the next value of the log-return process and of thetime in which the process is going to change state, we need only the knowledgeof the last state of the log-return and the last value of the index process. Remark 2

The function Q J ( i, x ; j, t ) := (cid:80) s ≤ t q J ( i, x ; j, s ) satisﬁes the follow-ing properties:a) Q J ( i, x ; j, · ) is a nondecreasing discrete real function such that Q J ( i, x ; j,

0) = 0 . b) p J ( · , x ; · ) := Q J ( · , x ; · , ∞ ) is a Markov transition probability function from(IR , B (IR)) to itself. Remark 3

If the indexed semi-Markov kernel is constant in x , i.e. ﬁxed thetriple ( i, j, t ) for all y (cid:54) = x q J ( i, x ; j, t ) (cid:54) = q J ( i, y ; j, t ) , then, it degenerates in a semi-Markov kernel and the WISMC model becomesequivalent to classical semi-Markov chain model, see e.g. [30] and [11]. micro-to-macro approach to returns, volumes and waiting times 11 The triplet { J n , T Jn , I Jn ( λ ) } describes the system in correspondence of anyjump time T Jn . However, it is also important to describe the system in cor-respondence of any time t , which can be a jump time ( t = T Jn ) or not( t (cid:54) = T Jn ). The random process Z J ( t ) := J N J ( t ) introduced in deﬁnition (1)marks the log-return at any time t , while the backward recurrence time process B J ( t ) := t − T JN J ( t ) denotes the time elapsed since the last transition. In ourmodel this information is not suﬃcient to completely characterize the statusof the system because we need to know also the value of the index process. Tothis end we extended the deﬁnition of the index process allowing to considerany time t ∈ IN as follows: I J ( λ ; t ) = m + N J ( t ) − θ (cid:88) r =0 ( t ∧ T JNJ ( t )+ θ − r ) − (cid:88) a = T JNJ ( t )+ θ − − r f λ ( J N J ( t )+ θ − − r , t, a )+ f λ ( J N J ( t ) , t, t ) , (3)where θ = 1 { t>T JNJ ( t ) } . If t = T Jn , that is t is a jump time, we have that I J ( λ ; t ) = I Jn ( λ ).The following deﬁnition and result, which reduces the complexity of themodel, is important for practical application of the WISMC model. Deﬁnition 2 [Shift operator] Let ( i, t ) n − m = { ( i α , t α ) , α = − m, . . . , n } be asequence of states and corresponding transition times, i.e. i α ∈ IR, t α ∈ Z , t α < t α +1 .Let denote by Θ n − m = { ( i α , t α ) , α = − m, . . . , , . . . , n, i α ∈ IR , t α ∈ Z } , thenwe deﬁne the shift operator ◦ : Θ n +1 − m → Θ n − m − deﬁned by ◦ (( i, t ) n +1 − m ) = ( s, k ) n − m − where s α = i α +1 , k α = t α +1 − t n +1 , α = − m − , . . . , , . . . , n .From an intuitive point of view, the shift operator when applied to a trajectory( i, t ) n +1 − m gives back a new trajectory where the sequence of visited states is thesame as in the input trajectory with the diﬀerence that transition times aretranslated of t n +1 time units backward and the number of transitions is setone unit backward.The following assumption concerning the score function f λ will be neededin the rest of the article: A1 : ∀ i ∈ IR , t ∈ IN , a ∈ IN, f λ ( i, t, a ) = f λ ( i, t − a ). Lemma 1

For a WISMC with score function f λ that satisﬁes assumption A1 , for ﬁxed arbitrary state j and time t and ( i, t ) n +1 − m ∈ Θ n +1 − m , we have: P [ J n +2 ≤ j, T Jn +2 − T Jn +1 = t | ( J, T J ) n +1 − m = ( i, t ) n +1 − m ]= P [ J n +1 ≤ j, T Jn +1 − T Jn = t | ( J, T J ) n − m − = ◦ (( i, t ) n +1 − m )] . (4) Proof

See the appendix.The result presented in Lemma (1) focuses on a class of score functions lead-ing probability (4) to be independent of n . Accordingly, the WISMC inheritsa homogeneity property that is particularly useful for the applications of themodel. Throughout this article, we are going to consider homogeneous WISMConly.In this research we consider also ﬁnancial volume as one important vari-able worthwhile to be investigated. The WISMC model was also applied to themodeling of ﬁnancial volumes in a recent article by [12] and revealed to be ableto reproduce several statistical properties of volumes at high-frequency scales.In order to be able to distinguish between the WISMC model for returns andthat for volumes we introduce an additional notation for the volume model.Precisely, if V ( t ) is the volume of a ﬁnancial asset at time t ∈ IN, the timevarying log volume is deﬁned as log( V ( t ) /V ( t − { T Vn } n ∈ IN , the so-called jump-times of the asset volume process. In correspon-dence of the times { T Vn } n ∈ IN , the logarithmic volume process assumes diﬀerentvalues denoted by { J Vn } n ∈ IN . We introduce the index process for the volumeby replacing in formula (1) the variables J n , T Jn and λ by V n , T Vn and γ , re-spectively. The semi-Markov kernel for the volume process will be denotes by q V = q V ( i, x ; j, t ) and the WISMC process for the volume variable is deﬁnedby Z V ( t ) := V N V ( t ) being N V ( t ) := max { n ∈ IN : T Vn ≤ t } .3.2 The multivariate modelIn this section we extend the WISMC model into a multivariate setting insuch a way that it is able to describe jointly the time evolution of the threeconsidered variables: log-returns, log-volumes and waiting times. The extensionis done advancing a series of assumption that allow us to merge the WISMCkernel of the log-return process and that of the log-volumes in a new kernelthat is completely characterized in this section.The ﬁrst step in the joint modelization of returns, volumes and durationsis to synchronize the time events of the returns and volumes. In order to do itlet us start from the two sequences( J n , T Jn ) n ∈ IN , ( V n , T Vn ) n ∈ IN . (5)They mark the values and points in time where log-returns and log-volumeschange states, respectively. First, we deﬁne a new sequence of transition times: { ˜ T n } = { T Jn } ∪ { T Vn } , with ˜ T = T J = T V = 0 . (6)Relation (6) means that we consider the union between the sets of transitiontimes of returns and volumes and the obtained ordered sequence of times isdenoted with the symbol { ˜ T n } n ∈ IN . Intuitively, the time ˜ T is the ﬁrst timewhen a change in the returns or in the volumes occurred, ˜ T the second point micro-to-macro approach to returns, volumes and waiting times 13 in time when a second change of state of whichever of the two processes J n and V n occurred, and so on. The corresponding inter-arrival times can be denotedby ˜ X n = ˜ T n +1 − ˜ T n . Furthermore we deﬁne the corresponding values of the returns and volumesfor each time of the random sequence ˜ T n according to the following relations:˜ J n = J s , if s = max { h ∈ IN : T Jh ≤ ˜ T n } , ˜ V n = V s , if s = max { h ∈ IN : T Vh ≤ ˜ T n } , Thus, we ended up with three variables ( ˜ J n , ˜ V n , ˜ T n ) that denote the syn-chronized sequences of log-returns, log-volumes and transition times. In orderto advance a joint model for this three-variate process we need to advancesome speciﬁc properties concerning their interdependence and dynamics.Suppose the following conditional independence relation, namely assump-tion A2 , holds true: P [ ˜ J n +1 ≤ j, ˜ V n +1 ≤ a, ˜ X n = t | σ ( ˜ J h , ˜ V h , ˜ T h , h ≤ n ) , ˜ J n = i, ˜ V n = v, ˜ I Jn = x, ˜ I Vn = w, ˜ T n = s, ˜ T n − T JN J ( s ) = b J , ˜ T n − T VN V ( s ) = b V ]= P [ ˜ J n +1 ≤ j, ˜ V n +1 ≤ a, ˜ X n = t | ˜ J n = i, ˜ V n = v, ˜ I Jn = x, ˜ I Vn = w, ˜ T n = s, ˜ T n − T JN J ( s ) = b J , ˜ T n − T VN V ( s ) = b V ] . (7)Assumption A2 considers a quasi-Markovian-type hypothesis which assertsthat the knowledge of the last values of synchronized variables ( ˜ J n = i, ˜ V n = v, ˜ T n = s ) together with corresponding values of the index processes ( ˜ I Jn = x, ˜ I Vn = w ) and of the time elapsed from last transition of both log-returnand log-volume ( ˜ T n − T JN J ( s ) = b J , ˜ T n − T VN V ( s ) = b V ) suﬃces to give theconditional distribution of the triplet ( ˜ J n +1 , ˜ V n +1 , ˜ X n ) whatever the valuesof the past variables might be.The following information sets are introduced for notational convenience: A Jn,s := { ˜ J n = i, ˜ I Jn = x, ˜ T n = s, ˜ T n − T JN J ( s ) = b J } , A Vn,s := { ˜ V n = v, ˜ I Vn = w, ˜ T n = s, ˜ T n − T VN V ( s ) = b V } , A JVn,s := A Jn,s (cid:91) A Vn,s , A JV Tn,s := A JVn,s (cid:91) { ˜ X n = t } . Probability (7) is so important to merit a formal deﬁnition:

Deﬁnition 3

Let ( ˜ J n , ˜ V n , ˜ T n ) be the synchronized triplet process of log-return,log-volume and transition times. The function q JV = q JV ( A JVn,s ; j, a, t ) , with i, v, x, w, j, a ∈ R and b J , b V , t ∈ N deﬁned by q JV ( A JVn,s ; j, a, t ) := P [ ˜ J n +1 ≤ j, ˜ V n +1 ≤ a, ˜ X n = t |A JVn,s ] . (8)is called the kernel of the triplet process.The kernel of the triplet process, can be factorized into the product of theconditional joint distribution of log-return and log-volumes multiplied by theconditional distribution of inter-arrival times, i.e. P [ ˜ J n +1 ≤ j, ˜ V n +1 ≤ a |A JV Tn,s ] · P [ ˜ X n = t |A JVn,s ] . (9)Our next main task is to give a representation of this kernel in such away that the dynamic of the joint process ( ˜ J n , ˜ V n , ˜ T n ) could be completelycharacterized. We shall now consider reasonable data-driven assumptions thatpermits this computation.Assumption A3 : synchronized waiting time distribution.We assume that the probability distribution of inter-arrival time is inde-pendent on current time s , this avoid the use of time non-homogeneous prob-abilistic structures of the process. Moreover we assume that the waiting timedistribution does not explicitly depends on the time elapsed by log-return andlog-volume into their current states but includes past information dependingon the index processes of returns and volumes. In formula P [ ˜ X n = t |A JVn,s ] = P [ ˜ X n = t | ˜ J n = i, ˜ V n = v, ˜ I Jn = x, ˜ I Vn = w ] . (10)Assumption A3 implies that the distributional properties of the waiting-timesin our model can diﬀer according to price and volume movements ( ˜ J n and ˜ V n values) as well as with their past behavior measured by the index processes˜ I Jn and ˜ I Vn .Denote the conditional probability of ˜ X n by˜ H i,v ( x, w ; t ) := P [ ˜ X n ≤ t | ˜ J n = i, ˜ V n = v, ˜ I Jn = x, ˜ I Vn = w ] , (11)and the corresponding probability mass function by P [ ˜ X n = t | ˜ J n = i, ˜ V n = v, ˜ T n = s, ˜ I Jn = x, ˜ I Vn = w ]= ˜ H i,v ( x, w ; t ) − ˜ H i,v ( x, w ; t −

1) =: ˜ h i,v ( x, w ; t ) . (12)The independence of the cdf of waiting times on the number of transitions n , and on the time of last transition s is done in order to avoid unnecessarycomplications that would have made the model inhomogeneous in time.The knowledge of the kernel of the triplet process needs also the spec-iﬁcation of the conditional joint probability distribution of log-returns andlog-volumes. In this respect we propose the following micro-to-macro approach to returns, volumes and waiting times 15 Assumption A4 : the conditional joint distribution of modulus of log-returnsand log-volumes is given by P [ | ˜ J n +1 |≤ j, | ˜ V n +1 |≤ a | A JV Tn,s ] = C (cid:16) F | J | ( i, x, t + b J ; j ) , F | V | ( v, w, t + b V ; a (cid:17) , (13)where C is a Copula-function and the marginal distributions F | J | ( i, x, t + b J ; j )and F | V | ( v, w, t + b V ; a ) are given by F | J | = P [ | J N J ( s )+1 |≤ j | T N J ( s )+1 − s = t, J N J ( s ) = i, I JN J ( s ) = x, s − T JN J ( s ) = b J ] ,F | V | = P [ | V N V ( s )+1 |≤ a | T N V ( s )+1 − s = t, V N V ( s ) = i, I VN V ( s ) = x, s − T VN V ( s ) = b V ] . Assumption A4 is motivated by the data analysis executed in Section 2,speciﬁcally in Table 4 we have shown that the two processes are dependenton each other. Essentially this assumption allows us to consider a dependencestructure between the modulus of the log-returns and log-volumes that is man-aged through the use of any copula function. The copula maps the two marginaldistributions F | J | and F | V | into a joint probability distribution function. Thequantity F | J | expresses the probability to get the modulus of log-return less orequal to j conditionally on the last value of the variable, corresponding indexprocess, waiting time length and duration in the last visited states. The sameinterpretation can be given to the quantity F | V | with the only exception thatit is related to the modulus of log-volume process.The F | J | can be evaluated as follows: F | J | ( i, x, t + b J ; j )= P [ | J N J ( s )+1 |≤ j | T N J ( s )+1 − s = t, J N J ( s ) = i, I JN J ( s ) = x, s − T JN J ( s ) = b J ]= P [ | J N J ( s )+1 |≤ j | T N J ( s )+1 − T N J ( s ) + T N J ( s ) − s = t, J N J ( s ) = i,I JN J ( s ) = x, T JN J ( s ) = s − b J ]= P [ | J N J ( s )+1 |≤ j | T N J ( s )+1 − T N J ( s ) = t + b J , J N J ( s ) = i, I JN J ( s ) = x ]= P [ | J n +1 |≤ j | T n +1 − T n = t + b J , J n = i, I Jn = x ]= P [ | J n +1 |≤ j, T n +1 − T n = t + b J | J n = i, I Jn = x ] P [ T n +1 − T n = t + b J | J n = i, I Jn = x ]= q J ( i, x ; j, t + b J ) − q J ( i, x ; − j, t + b J ) H Ji ( x ; t + b J ) − H Ji ( x ; t + b J − . Similar computations gives F | V | ( v, w, t + b V ; a ) = q V ( v, w ; a, t + b V ) − q V ( v, w ; − a, t + b V ) H Vv ( w ; t + b V ) − H Vv ( w ; t + b V − . By means of assumptions A3 and A4 we can get information on the jointdistribution of modulus of log-returns and modulus of log-volumes. Nonethe-less, it is our interest to recover information on the exact values (with signs)of these two variables. This is motivated by the empirical observation that although {| ˜ J n |} and {| ˜ V n |} are signiﬁcantly correlated, { ˜ J n } and { ˜ V n } areuncorrelated.To be able to reach this objective we advance a ﬁnal assumptions: A5 : For each n ∈ N , ˜ J n and ˜ V n satisfy the following relations:˜ J n = | ˜ J n | · η Jn ;˜ V n = | ˜ V n | · η Vn ;where η Jn and η Vn are two sequences of i.i.d. random variables with pmf η Jn ∼ (cid:26) +1 with probability p J − − p J η Vn ∼ (cid:26) +1 with probability p V − − p V This assumptions allows us to get the value of the variables starting fromthe knowledge of their modulus. Indeed, the variables η Jn and η Vn provides thesign of the size of the variation. Obviously, the parameters p J and p V need tobe estimated on the data.The next theorem will characterize the kernel of the triplet process. Theorem 1

Under assumptions A1 − A5 , ∀ s ∈ IN the kernel of the tripletprocess ( ˜ J n , ˜ V n , ˜ T n ) , deﬁned in formula (8) , is given by:(i) for j ≥ , a ≥ q JV ( A JVn,s ; j, a, t ) = ˜ h i,v ( x, w ; t ) · (cid:34) p J p V (cid:32) − F | J | ( i, x, t + b J ; j ) − F | V | ( v, w, t + b V ; a ) + C (cid:16) F | J | ( i, x, t + b J ; j ) , F | V | ( v, w, t + b V ; a ) (cid:17)(cid:33) − p V (cid:16) − F | V | ( v, w, t + b V ; a ) (cid:17) − p J (cid:16) − F | J | ( i, x, t + b J ; j ) (cid:17)(cid:35) , (14) (ii) for j < , a < q JV ( A JVn,s ; j, a, t ) = ˜ h i,v ( x, w ; t ) · (1 − p J )(1 − p V ) (cid:34) − F | J | ( i, x, t + b J ; − j ) − F | V | ( v, w, t + b V ; − a ) + C (cid:16) F | J | ( i, x, t + b J ; − j ) , F | V | ( v, w, t + b V ; − a ) (cid:17)(cid:35) , (15) micro-to-macro approach to returns, volumes and waiting times 17 (iii) for j < , a > q JV ( A JVn,s ; j, a, t ) = ˜ h i,v ( x, w ; t ) · (cid:34) (1 − p J ) · (cid:104) F | V | ( v, w, t + b V ; a ) − C (cid:16) F | J | ( i, x, t + b J ; − j ) , F | V | ( v, w, t + b V ; a ) (cid:17)(cid:105) + (1 − p J )(1 − p V ) · (cid:104) − F | J | ( i, x, t + b J ; − j ) − F | V | ( v, w, t + b V ; a )+ C (cid:16) F | J | ( i, x, t + b J ; − j ) , F | V | ( v, w, t + b V ; a ) (cid:17)(cid:105)(cid:35) , (16) (iv) for j > , a < q JV ( A JVn,s ; j, a, t ) = ˜ h i,v ( x, w ; t ) · (cid:34) (1 − p V ) · (cid:104) F | J | ( i, x, t + b J ; j ) − C (cid:16) F | J | ( i, x, t + b J ; j ) , F | V | ( v, w, t + b V ; − a ) (cid:17)(cid:105) + (1 − p J )(1 − p V ) · (cid:104) − F | J | ( i, x, t + b J ; j ) − F | V | ( v, w, t + b V ; − a )+ C (cid:16) F | J | ( i, x, t + b J ; j ) , F | V | ( v, w, t + b V ; − a ) (cid:17)(cid:105)(cid:35) , (17) Proof

See the appendix.3.3 Financial functionsIn this subsection we show how it is possible to compute ﬁnancial functions ofspeciﬁc interest using the characterization of the kernel of the triplet processgiven in Theorem 1. Results are conﬁned to marginal distributions of log-returns and log-volumes, correlation structures and joint ﬁrst passage timedistributions. In general given the kernel, it is possible to compute any typeof functional of the kernel.

The ﬁrst question to which we are interested in is the determination of themarginal distributions of log-returns and log-volumes. Since the dependencestructure has been introduced on the modulus of these variables the marginaldistributions we are looking for do not coincide with those used in the copula,i.e. with F | J | and F | V | .Let us consider the problem of ﬁnding the marginal distribution of thereturn process. Let us proceed by integration of the volume variable and sum-mation on the duration one, this gives (cid:80) t ≥ q JV ( A JVn,s ; j, ∞ , t ). Thus, for j > F | V | ( v, w, |∞| , t + b V ) = 1 and that C (cid:16) F | J | ( i, x, t + b J ; j ) , F | V | ( v, w, t + b V ; ∞ ) (cid:17) = F | J | ( i, x, t + b J ; j ) weobtain the following sequence of equalities: (cid:88) t ≥ q JV ( A JVn,s ; j, ∞ , t ) = (cid:88) t ≥ ˜ h i,v ( x, w ; t ) · (cid:34) p J p V (cid:32) − F | J | ( i, x, t + b J ; j ) − F | J | ( i, x, t + b J ; j ) (cid:33) − p V (cid:16) − (cid:17) − p J (cid:16) − F | J | ( i, x, t + b J ; j ) (cid:17)(cid:35) = (cid:88) t ≥ ˜ h i,v ( x, w ; t ) · (cid:34) − p J (cid:16) − F | J | ( i, x, t + b J ; j ) (cid:17)(cid:35) = 1 − p J · (cid:32) − (cid:88) t ≥ ˜ h i,v ( x, w ; t ) · F | J | ( i, x, t + b J ; j ) (cid:33) . This marginal distribution expresses the probability to observe with nexttransition, executed at any future time t , a return not greater than j . Sym-metric arguments can be used to get the marginal distrbution of volumes thatresults in 1 − p V · (cid:32) − (cid:88) t ≥ ˜ h i,v ( x, w ; t ) · F | V | ( v, w, t + b V ; a ) (cid:33) . The kernel of the triplet process (8) completely describes the dependencestructure between returns and volumes and waiting times. Nevertheless, itis relevant to measure this dependence using classical indicators of linear andnonlinear dependence.The most widely studied measure of linear dependence is the correlationcoeﬃcient. Let ρ A JVn,s ( | ˜ J n +1 | , | ˜ V n +1 | ) be the correlation coeﬃcient between themodulus of returns and the modulus of volumes at next transition uncondi-tionally on the time when the next transition will happen, i.e. ρ A JVn,s ( | ˜ J n +1 | , | ˜ V n +1 | ) = Cov A JVn,s ( | ˜ J n +1 | , | ˜ V n +1 | ) σ A Jn,s ( | ˜ J n +1 | ) · σ A Vn,s ( | ˜ V n +1 | ) . (18)Using the formula discussed above, we can calculate the correlation coeﬃ-cient by using the joint probability density function of ( | ˜ J n +1 | , | ˜ V n +1 | ) condi-tional on the information set A JVn,s . For every j, a ≥

0, one gets: F ( | ˜ J n +1 | , | ˜ V n +1 | ) ( j, a ) := P [ | ˜ J n +1 |≤ j, | ˜ V n +1 |≤ a | A JVn,s ] (cid:88) t ≥ P [ | ˜ J n +1 |≤ j, | ˜ V n +1 |≤ a | A JV Tn,s ] · P [ ˜ X n = t | A JVn,s ] (cid:88) t ≥ ˜ h i,v ( x, w ; t ) · C (cid:16) F | J | ( i, x, t + b J ; j ) , F | V | ( v, w, t + b V ; a (cid:17) . (19) micro-to-macro approach to returns, volumes and waiting times 19 Consequently the density can be obtained by derivation of the cumulativedistribution function, i.e. f ( | ˜ J n +1 | , | ˜ V n +1 | ) ( j, a ) = ∂ ∂j∂a F ( | ˜ J n +1 | , | ˜ V n +1 | ) ( j, a )= (cid:88) t ≥ ˜ h i,v ( x, w ; t ) · ∂ ∂j∂a C (cid:16) F | J | ( i, x, t + b J ; j ) , F | V | ( v, w, t + b V ; a (cid:17) . (20)Accordingly we get Cov A JVn,s ( | ˜ J n +1 | , | ˜ V n +1 | ) = (cid:90) ∞ (cid:90) ∞ j · a · f ( | ˜ J n +1 | , | ˜ V n +1 | ) ( j, a ) djda − (cid:90) ∞ jf | ˜ J n +1 | ( j ) dj · (cid:90) ∞ af | ˜ V n +1 | ( a ) da. (21)This allows the recovering of ρ A JVn,s ( | ˜ J n +1 | , | ˜ V n +1 | ) once the standard devi-ations σ A Jn,s ( | ˜ J n +1 | ) and σ A Vn,s ( | ˜ V n +1 | ) are known. They can be obtained byusing the univariate densities f | ˜ J n +1 | ( · ) and f | ˜ V n +1 | ( · ) that in turn can be ob-tained by integration of the joint density.It is also interesting to compute the covariance function between the mod-ulus of log-returns and the log-volumes at next transition, i.e. Cov A JVn,s ( | ˜ J n +1 | , ˜ V n +1 ) = E A JVn,s [ | ˜ J n +1 | · ˜ V n +1 ] − E A Jn,s [ | ˜ J n +1 | ] · E A Vn,s [ ˜ V n +1 ]= E A JVn,s [ | ˜ J n +1 | · η Vn +1 · | ˜ V n +1 | ] − E A Jn,s [ | ˜ J n +1 | ] · E A Vn,s [ η Vn +1 · | ˜ V n +1 | ]= E A Vn,s [ η Vn +1 ] · (cid:16) Cov A JVn,s ( | ˜ J n +1 | , | ˜ V n +1 | ) (cid:17) . (22)Note that if E A Vn,s [ η Vn +1 ] = 0, then the modulus of log-returns and log-volumes are uncorrelated at next transition.One may also be interested in providing nonlinear measures of dependencebetween random variables. Mutual information, which goes back to [42], pos-sesses relevant properties that imposed it as a suitable measure of nonlineardependence, see e.g. [23]. It is simple to express the mutual information withinour model: M I A JVn,s ( | ˜ J n +1 | , | ˜ V n +1 | ) = (cid:90) ∞ (cid:90) ∞ f ( | ˜ J n +1 | , | ˜ V n +1 | ) ( j, a ) log f ( | ˜ J n +1 | , | ˜ V n +1 | ) ( j, a ) f | ˜ J n +1 | ( j ) · f | ˜ V n +1 | ( a ) djda, (23)where the densities are given in formula (20). The ﬁrst passage time distribution has attracted a lot of attention in ﬁnance. Ithas been considered for diﬀerent assumptions about the stochastic processesthat describes the asset behaviour. It has been investigated for log-returns when described by Ornstein-Uhlenbeck processes (see e.g. [52]) and more re-cently for generalized semi-Markov models in [15,17,16]. We shall now derivethe ﬁrst passage time distribution for our multivariate model.Let ˜ M Jt ( τ ) be the accumulation factor of the return process in the multi-variate model from time t to t + τ . Formally, the accumulation factor can bedeﬁned as follows: ˜ M Jt ( τ ) = e (cid:80) τ − r =0 ˜ Z J ( t + r ) . A similar deﬁnition applies for the volume process, i.e.˜ M Vt ( τ ) = e (cid:80) τ − r =0 ˜ Z V ( t + r ) . For ρ ∈ IR + and ψ ∈ IR + , denote the joint ﬁrst passage time by Γ ( ρ ; ψ ) := min { τ ≥ { ˜ M J ( τ ) ≥ ρ } ∪ { ˜ M V ( τ ) ≥ ψ }} . (24)Thus, Γ ( ρ ; ψ ) is the ﬁrst time when at least one accumulation factor exceedsits own thresholds. Denote the corresponding conditional survival function by R ( ρ ; ψ ) (( i, v, t ) − m , u ; t ) = P [ Γ ( ρ ; ψ ) > t | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] , where ˜ B ( u ) = u − ˜ T ˜ N ( u ) .The deﬁnition of the shift operator given in Deﬁnition 2 can be easilyextended to triplet sequences ( i, v, t ) n − m . Deﬁnition 4

Let ( i, v, t ) n − m = { ( i α , v α , t α ) , α = − m, . . . , n } be a sequence ofreturns, volumes and corresponding transition times.Let denote by Φ n − m = { ( i α , v α , t α ) , α = − m, . . . , , . . . , n, i α ∈ IR , v α ∈ IR , t α ∈ Z , t α < t α +1 } , then we deﬁne the shift operator ◦ : Φ n +1 − m → Φ n − m − deﬁned by ◦ (( i, v, t ) n +1 − m ) = ( s, y, k ) n − m − where s α = i α +1 , y α = v α +1 , k α = t α +1 − t n +1 , α = − m − , . . . , , . . . , n .We formulate and prove a theorem which provides an equation for the jointﬁrst passage time distribution. Theorem 2

Let f λ and g γ be the score functions of the index processes rel-ative to the return and volume processes, respectively. For i ≥ and v ≥ ,it results that R ( ρ ; ψ ) (( i, v, t ) − m , u ; t ) = 1 { e i t <ρ } { e v t <ψ } − ˜ H i ,v (cid:0) α , β ; t (cid:1) − ˜ H i ,v (cid:0) α , β ; u (cid:1) + t (cid:88) t = u +1 (cid:90) + ∞−∞ (cid:90) + ∞−∞ di dv { e i t <ρ } { e v t <ψ } − ˜ H i ,v (cid:0) α , β ; u (cid:1) · ∂ q JV ( i , v , α , β ; i , v , t ) ∂i ∂v · R (cid:16) ρei t ; ψev t (cid:17) ( ◦ (( i, v, t ) − m ) , t − t ) , (25) micro-to-macro approach to returns, volumes and waiting times 21 where α = m − (cid:88) r =0 t − r − (cid:88) a = t − r − f λ ( i − r − , − a ) + f λ ( i , ,β = m − (cid:88) r =0 t − r − (cid:88) a = t − r − g γ ( v − r − , − a ) + g γ ( v , . (26) For i < replace e i t with e i everywhere in formula (25) .For v < replace e v t with e v everywhere in formula (25) .Proof See the appendix.

To verify the validity of the model described above, we applied it to thedatabase introduced in Section 2. Following [16, ? ] we use, as deﬁnition ofthe function f λ in (1), an exponentially weighted moving average (EWMA) ofthe squares of J n which has the following expression: f λ ( J n − − k , T Jn , a ) = λ T Jn − a J n − − k (cid:80) m + n − k =0 (cid:80) T Jn − k − a = T Jn − − k λ T Jn − a + 1 = λ T Jn − a J n − − k (cid:80) T Jn a = T J − m λ a . (27)A similar choice is done for the volume return process leading to the choiceof g γ ( V n − − k , T Vn , a ) = γ T Vn − a V n − − k (cid:80) m + n − k =0 (cid:80) T Vn − k − a = T Vn − − k γ T Vn − a + 1 = γ T Vn − a V n − − k (cid:80) T Vn a = T V − m γ a . (28)We remark that the choice for the functional form of f λ is not obtainedtrough any optimization procedure. One can probably ﬁnd functional formsthat perform better according to some performance measure. Our choice ismotivated by its simplicity and the fact that it gives very good results. More-over, it is justiﬁed by the empirical evidence that price and volumes returnsdynamics do depend on volatility regime.Next step in the application is ﬁnding the optimal parameters to be used inthe model. We followed the same procedure described in [19] and summarizedin the following subsection.4.1 Parameters optimizationWe describe here the whole procedure to set and optimize the parameters usedin the univariate models. r ( t ) v ( t ) s λ MAP E (%) s γ MAP E (%) TIT

ISP

TEN F Table 6

Parameters used in the application to real data.

1. The ﬁrst step to set the WISMC model is, by using the descriptive statisticsof the dataset, to ﬁx a number of states s and a value for the weightparameter λ ;2. Build the trajectory ( J n , T Jn ) implied by the choice of s and λ ;3. Estimate the weighted-indexed semi-Markov kernel q J applying the em-pirical estimators to the trajectory obtained at previous step;4. Perform Monte Carlo simulation to build synthetic time series;5. Estimate the autocorrelation function (ACF) for the synthetic time series Σ ( τ ; s, λ ). Note that this ACF depends on the number of states s and onthe value of the weight parameter λ ;6. Compare the real ACF, Σ ( τ ), with the synthetic one, Σ ( τ ; s, λ ), by com-puting the Mean Absolute Percentage Error (MAPE) between them. TheMAPE depends on the number of states and on the value of the weightparameter, then it is denoted by M AP E ( s, λ );7. Change the number of states and the parameter λ , restart from point 2and repeat all points;At the end of the whole process, choose the number of states s ∗ and parameter λ ∗ that best represent the dataset by minimizing the M AP E ( s, λ ), i.e.( s ∗ , λ ∗ ) = argmin ( s,λ ) { M AP E ( s, λ ) } . Notice that the algorithm can stop whenever the increase in the number ofstates does not decrease the MAPE more than a given threshold (cid:15) .This procedure should be repeated for all stocks in the portfolio and also forthe variable v ( t ). Once all the parameters for the two univariate models areoptimized use a copula to build the bivariate model.4.2 ResultsHere we show some results obtained and a comparison with real data. Usingthe optimization procedure described above we found the optimal parameterswhich are summarized in table 6 for the four stocksThe dependence between the two real processes v ( t ) and r ( t ) is kept in themodel by using a copula function. We tested diﬀerent copulas like Gaussian,t-student, Gumbel and Clayton ﬁnding almost no diﬀerences in the results.This is mainly due to the fact that 1 minute price returns are almost discrete micro-to-macro approach to returns, volumes and waiting times 23 ρ ( r ( t ) ,v ( t )) p-value ρ ( | r ( t ) | ,v ( t )) p-value ρ ( r ( t ) , | v ( t )) | p-value ρ ( | r ( t ) | , | v ( t )) | p-valueTIT 0.0001 0 0.091 0 -0.0042 0.36 0.0039 0ISP -0.00390 0 0.083 0 -0.0025 0.053 -0.0086 0TEN -0.0046 0 0.081 0 0.0011 0.46 0.022 0F -0.0075 0 0.12 0 -0.0029 0.025 -0.0092 0 Table 7

Cross correlation between price and volume returns for simulated data. and varies in a small range, then, in this dataset there is no tail eﬀect. Tokeep the application as simple as possible we decided to use a Gaussian copulathat has only one parameter. We simulated, using the estimated kernels anda Gaussian copula, the joint process | r ( t ) | and v ( t ) and obtained r ( t ) by usingthe relation described in Assumption A5. The results are trajectories with thesame time length of real data for both variables v ( t ) and r ( t ).In Table 7 we show the cross-correlation between the synthetic v ( t ) and r ( t ) and their absolute values, for all combinations, as done in Table 4. TheTable shows that there is a good agreement with what was found for real data.The model is also used to compare the ﬁrst passage time distribution (fptd)of the joint processes v ( t ) and r ( t ). From the synthetic variables ( r ( t ) and v ( t )) we build the variables price and volumes in the following way: at eachdiscrete state of r ( t ) ( v ( t )) is associated a range of variability of the continuousreal r ( t ) ( v ( t )), inside this range a continuous value is chosen by extractinga random number form a uniform distribution and then inverting the empiri-cal distribution of real continuous price (volume) return. Once r ( t ) ( v ( t )) aretransformed back into continuous values, prices S ( t ) (volume V ( t )) are ob-tained by S ( t ) = S × e (cid:80) k = tk =1 r ( k ) ( V ( t ) = V × e (cid:80) k = tk =1 v ( k ) .). Synthetic price andvolumes are then used to build the distribution of time at which there is a ﬁrstcross of given thresholds.To verify if the price fptd depends on volume values we estimated the fptdas a function of the value of initial condition on the discretized volume returns.In this way, for each initial v ( t ) value, we obtain a diﬀerent fptd. At the sametime, we veriﬁed if the proposed model also keeps this dependence structure.In Figure 8 we show the results and comparison with real data (for two of thegiven stocks). We ﬁxed a price increment threshold at 0 . S ( t ) and V ( t )cross a given threshold. Again, the price increment threshold is set at 0 . Fist Passage Time distribution: real data

Volume states

Fist Passage Time distribution: simulated data

Volume states

TIT

Fist Passage Time distribution: real data

Volume states

Fist Passage Time distribution: simulated data

Volume states

ISP

Fig. 8

First passage time distribution of real data compared with synthetic data.

Overall we can say that the model is able to capture all statistical featuresof real data keeping all the dependencies between price, volumes and waitingtimes. Furthermore, we found very good agreement between real data andmodel also for the ﬁrst passage time distribution.

In this work we have advanced a new stochastic model, based on Weigthed-Indexed Semi-Markov Chain, for modelling price, volumes and waiting timesin high frequency ﬁnance. After showing, by analyzing real data, all the em-pirical evidences that support the use of a multivariate model, we deﬁnedthe probabilistic structure of the model and give a detailed mathematicalimplementation. Furthermore, mathematical expressions for covariance andﬁrst passage time distributions are given. In the last part we show, by using micro-to-macro approach to returns, volumes and waiting times 25 -3 TIT -3 ISP -3 TEN Time lag (minutes)0246 P r obab ili t y -3 F Real dataSimulation

Fig. 9

First passage time distribution of real data compared with synthetic data.

Monte Carlo simulations, that the model has the same statistical features ofreal data. In fact, the proposed model is able to reproduce the autocorrelationfunctions, the dependence between price and volume and the ﬁrst passage timedistributions. Further development can be the use of the model in portfoliooptimization, development of risk measure and volatility forecasting.

Proof (of Lemma (1))

Given the information set (

J, T ) n +1 − m = ( i, t ) n +1 − m we can proceed to computethe value of the index process at the ( n + 1) − th transition through formula(1) and assumption A1 : I Jn +1 ( λ ) = m + n +1 − (cid:88) r =0 T n +1 − r − (cid:88) a = T n +1 − − r f λ ( J n +1 − − r , T n +1 − a ) + f λ ( J n +1 , T n +1 − T n +1 )= m + n (cid:88) r =0 t n +1 − r − (cid:88) a = t n − r f λ ( i n − r , t n +1 − a ) + f λ ( i n +1 , . (29)For simplicity of notation, denote by x this value, i.e. I Jn +1 ( λ ) = x . Thus, P [ J n +2 ≤ j, T n +2 − T n +1 = t | ( J, T ) n +1 − m = ( i, t ) n +1 − m ]= P [ J n +2 ≤ j, T n +2 − T n +1 = t | J n +1 = i n +1 , I Jn +1 ( λ ) = x ] = q ( i n +1 , x ; j, t ) . (30)Let us consider now the probability P [ J n +1 ≤ j, T n +1 − T n = t | ( J, T ) n − m − = ◦ (( i, t ) n +1 − m )] and apply the deﬁnition of the shift operator to have: ◦ (( i, t ) n +1 − m ) = ( s, k ) n − m − , (31) and in turn P [ J n +1 ≤ j, T n +1 − T n = t | ( J, T ) n − m − = ◦ (( i, t ) n +1 − m )]= P [ J n +1 ≤ j, T n +1 − T n = t | ( J, T ) n − m − = ( s, k ) n − m − )]= P [ J n +1 ≤ j, T n +1 − T n = t | J n = s n , I Jn ( λ ) = b ] , (32)where b = m + n (cid:88) r =0 k n − r − (cid:88) a = k n − − r f λ ( s n − − r , k n − a ) + f λ ( s n , . (33)Since s n − − r = i n − r and k n − − r = t n − r − t n +1 it follows that b = m + n (cid:88) r =0 t n − r +1 − t n +1 − (cid:88) a = t n − r − t n +1 f λ ( i n − r , − a ) + f λ ( i n +1 , . A change of variable y = a + t n +1 gives b = m + n (cid:88) r =0 t n − r +1 − (cid:88) y = t n − r f λ ( i n − r , t n +1 − y ) + f λ ( i n +1 ,

0) = x. Accordingly we get P [ J n +1 ≤ j, T n +1 − T n = t | ( J, T ) n − m − = ◦ (( i, t ) n +1 − m )]= P [ J n +1 ≤ j, T n +1 − T n = t | J n = i n +1 , I Jn ( λ ) = x ] = q ( i n +1 , x ; j, t ) , which completes the proof. Proof (of Theorem (1))

The kernel of the triplet process has been represented in formula (9) as follows: q JV ( A JVn,s ; j, a, t ) = P [ ˜ J n +1 ≤ j, ˜ V n +1 ≤ a |A JV Tn,s ] · P [ ˜ X n = t |A JVn,s ] , and from assumption A3 we get ˜ h i,v ( x, w ; t ) = P [ ˜ X n = t |A JVn,s ].Thus, it remains to evaluate the conditional probability of the joint dis-tribution of log-return and log-volume. Let consider the case when j ≥ a ≥ F | J | ( j ) and F | V | ( a ) to denote in a compactform the marginal distributions of the copula.Let us consider the following representation: P [ ˜ J n +1 ≤ j, ˜ V n +1 ≤ a |A JV Tn,s ] = P [ | ˜ J n +1 | ≤ j, | ˜ V n +1 | ≤ a |A JV Tn,s ]+ P [ | ˜ J n +1 | > j, η Jn +1 = − , | ˜ V n +1 | ≤ a |A JV Tn,s ]+ P [ | ˜ J n +1 | > j, η Jn +1 = − , | ˜ V n +1 | > a, η Vn +1 = − |A JV Tn,s ]+ P [ | ˜ J n +1 | ≤ j, | ˜ V n +1 | > a, η Vn +1 = − |A JV Tn,s ] . (34) micro-to-macro approach to returns, volumes and waiting times 27 Let us proceed to the computation of each one of the four addenda in (34).From assumption A4 we know that P [ | ˜ J n +1 | ≤ j, | ˜ V n +1 | ≤ a |A JV Tn,s ] = C (cid:16) F | J | ( j ) , F | V | ( a ) (cid:17) . (35)Next consider P [ | ˜ J n +1 | > j, η Jn +1 = − , | ˜ V n +1 | > a, η Vn +1 = − |A JV Tn,s ].From assumption A5 this probability is equal to P [ η Jn +1 = − · P [ η Vn +1 = − · P [ | ˜ J n +1 | > j, | ˜ V n +1 | > a |A JV Tn,s ]= (1 − p J )(1 − p V ) { − P [ | ˜ J n +1 | ≤ j, | ˜ V n +1 | ≤ a |A JV Tn,s ] − P [ | ˜ J n +1 | > j, | ˜ V n +1 | ≤ a |A JV Tn,s ] − P [ | ˜ J n +1 | ≤ j, | ˜ V n +1 | > a |A JV Tn,s ] } = (1 − p J )(1 − p V ) { − C (cid:16) F | J | ( j ) , F | V | ( a ) (cid:17) − ( P [ | ˜ J n +1 | ≤ j |A JV Tn,s ] − P [ | ˜ J n +1 | ≤ j, | ˜ V n +1 | ≤ a |A JV Tn,s ]) − ( P [ | ˜ V n +1 | ≤ a |A JV Tn,s ] − P [ | ˜ J n +1 | ≤ j, | ˜ V n +1 | ≤ a |A JV Tn,s ]) } = (1 − p J )(1 − p V ) (cid:110) − C (cid:16) F | J | ( j ) , F | V | ( a ) (cid:17) − F | J | ( j ) + C (cid:16) F | J | ( j ) , F | V | ( a ) (cid:17) − F | V | ( a ) + C (cid:16) F | J | ( j ) , F | V | ( a ) (cid:17)(cid:111) = (1 − p J )(1 − p V )[1 − F | J | ( j ) − F | V | ( j ) + C (cid:16) F | J | ( j ) , F | V | ( a ) (cid:17) ] . (36)Then, proceed to compute P [ | ˜ J n +1 | > j, η Jn +1 = − , | ˜ V n +1 | ≤ a |A JV Tn,s ].Apply again assumptions A4 and A5 to get P [ η Jn +1 = − · P [ | ˜ J n +1 | > j, | ˜ V n +1 | ≤ a |A JV Tn,s ]= (1 − p J )[ P [ | ˜ V n +1 | ≤ a |A JV Tn,s ] − P [ | ˜ J n +1 | ≤ j, | ˜ V n +1 | ≤ a |A JV Tn,s ]]= (1 − p J ) (cid:104) F | V | ( a ) − C (cid:16) F | J | ( j ) , F | V | ( a ) (cid:17)(cid:105) . (37)Analogous computations allow to get P [ | ˜ J n +1 | ≤ j, η Vn +1 = − , | ˜ V n +1 | > a |A JV Tn,s ]= (1 − p V ) (cid:104) F | J | ( j ) − C (cid:16) F | J | ( j ) , F | V | ( a ) (cid:17)(cid:105) . (38)A substitution of (35), (36), (37) and (38) into (34) and some algebraicmanipulations produces P [ ˜ J n +1 ≤ j, ˜ V n +1 ≤ a |A JV Tn,s ] =1 − p J p V · (cid:16) − F | J | ( j ) − F | V | ( a ) + C (cid:0) F | J | ( j ) , F | V | ( a ) (cid:1)(cid:17) − p V (1 − F | V | ( a )) − p J (1 − F | J | ( j )) . (39)A multiplication of (39) by ˜ h i,v ( x, w ; t ) concludes the proof for the case (i).The remaining cases (ii) - (iv) can be accomplished by similar arguments. Proof (of Theorem (2)) R ( ρ ; ψ ) (( i, v, t ) − m , u ; t ) = P [ Γ ( ρ ; ψ ) > t | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ]= P [ Γ ( ρ ; ψ ) > t, ˜ T > t | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] (40)+ P [ Γ ( ρ ; ψ ) > t, ˜ T ≤ t | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] . (41)By the deﬁnition of conditional probability (40) can be written as P [ Γ ( ρ ; ψ ) > t | ˜ T > t, ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] × P [ ˜ T > t | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] . (42)Note that by deﬁnition ˜ X := ˜ T − ˜ T but since ˜ T = t = 0, we canreplace ˜ T with the corresponding sojourn time ˜ X . Also note that the event { ˜ B ( u ) = u } is equivalent to the event { ˜ T N ( u ) = 0 , ˜ T N ( u )+1 > u } . The latterequality between events means that at least one between returns and volumesdid last transition at time t = 0 and the other process made its last transitionat some time before. Let b J and b V generically denote the times since lasttransition of the backward recurrence time processes, i.e.˜ T − T J = b J , ˜ T − T V = b V . Besides, note that the information set ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m generatesa value of the index process of returns equal to˜ I V = m − (cid:88) r =0 t − r − (cid:88) a = t − r − f λ J ( i − r − , − a ) + f λ J ( i ,

0) =: α , (43)and of the index process of volumes equal to˜ I V = m − (cid:88) r =0 t − r − (cid:88) a = t − r − g λ V ( v − r − , − a ) + g λ V ( v ,

0) =: β . (44)Thus, in virtue of assumption A2 , the probability (42) becomes equal to P [ Γ ( ρ ; ψ ) > t | ˜ X > t, ˜ J = i , ˜ V = v , ˜ I J = α , ˜ I V = β , ˜ T > u, ˜ T − T J = b J , ˜ T − T V = b V ] × P [ ˜ X > t | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] . (45)Nevertheless, according to assumption A3 , we have P [ ˜ X > t | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ]= P [ ˜ X > t | ˜ X > u, ˜ J = i , ˜ V = v , ˜ I J = α , ˜ I V = β ]= P [ ˜ X > t | ˜ J = i , ˜ V = v , ˜ I J = α , ˜ I V = β ] P [ ˜ X > u | ˜ J = i , ˜ V = v , ˜ I J = α , ˜ I V = β ] = 1 − ˜ H i ,v ( α , β ; t )1 − ˜ H i ,v ( α , β ; u ) . (46) micro-to-macro approach to returns, volumes and waiting times 29 By the deﬁnition of joint ﬁrst passage time we have that P [ Γ ( ρ ; ψ ) > t | ˜ X > t, ˜ J = i , ˜ V = v , ˜ I J = α , ˜ I V = β , ˜ T > u, ˜ T − T J = b J , ˜ T − T V = b V ]= P [min { τ ≥ { ˜ M J ( τ ) ≥ ρ } ∪ { ˜ M V ( τ ) ≥ ψ }}|A JV , , ˜ T > u ] , (47)where A JV , = { ˜ J = i , ˜ V = v , ˜ I J = α , ˜ I V = β , ˜ T = 0 , ˜ T − T J = b J , ˜ T − T V = b V } . It is clear that since i ≥ v ≥ T > t , the processes ˜ M J ( τ ) and˜ M V ( τ ) are both increasing with respect to the variable τ . Accordingly,max τ ∈{ , ,...,t } { ˜ M J ( τ ) } = ˜ M J ( t ) = e (cid:80) t − r =0 ˜ Z J ( r ) = e i t , and analogously max τ ∈{ , ,...,t } { ˜ M V ( τ ) } = e v t . Thus, formula (47) becomes P [ e i t < ρ, e v t < ψ |A JV , , ˜ T > u ] = 1 { e i t <ρ } { e v t <ψ } . (48)A substitution of (48) and (46) in (45) gives: P [ Γ ( ρ ; ψ ) > t | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ]= 1 { e i t <ρ } { e v t <ψ } − ˜ H i ,v ( α , β ; t )1 − ˜ H i ,v ( α , β ; u ) . (49)It remains to compute probability (41). By the law of total probabilityand by the deﬁnition of conditional probability we have the following chain ofequality: P [ Γ ( ρ ; ψ ) > t, ˜ T ≤ t | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] = t (cid:88) t =1 (cid:90) + ∞−∞ (cid:90) + ∞−∞ P [ Γ ( ρ ; ψ ) > t,, ˜ T = t , ˜ J ∈ ( i , i + di ) , ˜ V ∈ ( v , v + dv ) | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ]= t (cid:88) t =1 (cid:90) + ∞−∞ (cid:90) + ∞−∞ P [ Γ ( ρ ; ψ ) > t | ˜ T = t , ˜ J = i , ˜ V = v , ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] × P [ ˜ T = t , ˜ J ∈ ( i , i + di ) , ˜ V ∈ ( v , v + dv ) | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] . Let start to compute the following probability: P [ ˜ T = t , ˜ J ∈ ( i , i + di ) , ˜ V ∈ ( v , v + dv ) | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ]= P [ ˜ T = t , ˜ J ∈ ( i , i + di ) , ˜ V ∈ ( v , v + dv ) |A JV , , ˜ T > u ]= P [ u < ˜ T = t , ˜ J ∈ ( i , i + di ) , ˜ V ∈ ( v , v + dv ) |A JV , ] P [ ˜ T > u |A JV , ]= 1 { t >u } ∂ q JV ( i ,v ,α ,β ; i ,v ,t ) ∂i ∂v di dv − ˜ H i ,v ( α , β ; u ) . (50) It remains to compute P [ Γ ( ρ ; ψ ) > t | ˜ T = t , ˜ J = i , ˜ V = v , ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ]= P [ max s ∈{ , ,...,t } { ˜ M J ( s ) } < ρ, max s ∈{ , ,...,t } { ˜ M V ( s ) } < ψ | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] . (51)Now observe that since ˜ T = t we have thatmax s ∈{ , ,...,t } { ˜ M J ( s ) } = max { max s ∈{ , ,...,t } { ˜ M J ( s ) } , max s ∈{ ,...,t − t } { ˜ M J ( t + s ) }} , and due to the fact that i ≥ s ∈{ , ,...,t } { ˜ M J ( s ) } =˜ M J ( t ) = e t i . Accordingly we can deduce thatmax s ∈{ , ,...,t } { ˜ M J ( s ) } = max { e t i , max s ∈{ ,...,t − t } { e t i e (cid:80) s − r =0 ˜ Z J ( t + r ) }} . Similarly we havemax s ∈{ , ,...,t } { ˜ M V ( s ) } = max { e t v , max s ∈{ ,...,t − t } { e t v e (cid:80) s − r =0 ˜ Z V ( t + r ) }} . Thus by substitution, the probability (51) becomes= P (cid:104) max (cid:110) e t i , max s ∈{ ,...,t − t } { e t i e (cid:80) s − r =0 ˜ Z J ( t + r ) } (cid:111) < ρ,, max (cid:110) e t v , max s ∈{ ,...,t − t } { e t v e (cid:80) s − r =0 ˜ Z V ( t + r ) } (cid:111) < ψ | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u (cid:105) . = 1 { e t i <ρ } { e t v <ψ } · P [ max s ∈{ ,...,t − t } { e t i e (cid:80) s − r =0 ˜ Z J ( t + r ) } < ρ, max s ∈{ ,...,t − t } { e t v e (cid:80) s − r =0 ˜ Z V ( t + r ) } < ψ | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] . = 1 { e t i <ρ } { e t v <ψ } · P [ max s ∈{ ,...,t − t } { e (cid:80) s − r =0 ˜ Z J ( t + r ) } < ρe t i ,, max s ∈{ ,...,t − t } { e (cid:80) s − r =0 ˜ Z V ( t + r ) } < ψe t v | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] . = 1 { e t i <ρ } { e t v <ψ } · P [ Γ ( ρet i ; ψet v ) > t − t | ( ˜ J, ˜ V , ˜ T ) − m = ( i, v, t ) − m , ˜ B ( u ) = u ] . The latter probability, making use of deﬁnition 4 can be expressed as1 { e t i <ρ } { e t v <ψ } · P [ Γ ( ρet i ; ψet v ) > t − t | ( ˜ J, ˜ V , ˜ T ) − m − = ◦ ( i, v, t ) − m , ˜ B ( u ) = u ]= 1 { e t i <ρ } { e t v <ψ } · R (cid:16) ρei t ; ψev t (cid:17) ( ◦ (( i, v, t ) − m ) , t − t ) . (52)A substitution of (52) in (51) and then of the obtained quantity in (41)togheter with (50) concludes the proof. micro-to-macro approach to returns, volumes and waiting times 31 References

1. Bachelier, L.: Thorie de la spculation. Annales scientiﬁques de l’cole normale suprieure , 21–86 (1900)2. Barbu, V., Limnios, N.: Semi-Markov Chains and Hidden Semi-Markov Models towardApplications: Their Use in Reliability and DNA Analysis. Springer, Lecture Notes inStatistics (2008)3. Baviera, R., Pasquini, M., Serva, M., Vergni, D., Vulpiani, A.: Correlations and multi-aﬃnity in high frequency ﬁnancial datasets. Physica A: Statistical Mechanics and itsApplications (3), 551 – 557 (2001)4. Bhogal, S.K., Thekke Variyam, R.: Conditional duration models for highfrequency data:A review on recent developments. Journal of Economic Surveys (1), 252–273 (2019)5. Bollerslev, T.: Generalized autoregressive conditional heteroskedasticity. Journal ofEconometrics (3), 307–327 (1986)6. Bouchaud, J.: An introduction to statistical ﬁnance. Physica A: Statistical Mechanicsand its Applications (3), 238–251 (2002)7. Boyle, P.: Option valuation using a three-jump process. International Options Journal , 7–12 (1986)8. Bulla, J., Bulla, I.: Stylized facts of ﬁnancial time series and hidden semi-markov models.Computational Statistics and Data Analysis , 2192–2209 (2006)9. Cox, J.C., Ross, S.A., Rubinstein, M.: Option pricing: A simpliﬁed approach. Journalof Financial Economics , 229–263 (1979)10. Curato, G., Lillo, F.: Modeling the coupled return-spread high frequency dynamics oflarge tick assets. Journal of Statistical Mechanics: Theory and Experiment P01028 (2015)11. D’Amico, G.: Stochastic dividend discount model: risk and return. Markov Processesand Related Fields (5), 349–376 (2017)12. D’Amico, G., Gismondi, F., Petroni, F.: A new approach to the modeling of ﬁnancialvolumes. In: S. Silvrestov et al. (eds), Stochastic Processes and Applications, vol. 271,pp. 363–373. Springer Proceedings in Mathematics & Statistics (2018)13. D’Amico, G., Janssen, J., Manca, R.: Homogeneous semi-markov reliability models forcredit risk management. Decisions in Economics and Finance (2), 79–93 (2005)14. D’Amico, G., Janssen, J., Manca, R.: European and american options: The semi-markovcase. Physica A: Statistical Mechanics and its Applications , 3181–3194 (2009)15. D’Amico, G., Petroni, F.: A semi-markov model with memory for price changes. Journalof Statistical Mechanics: Theory and Experiment (12), P12009 (2011)16. D’Amico, G., Petroni, F.: A semi-markov model for price returns. Physica A: StatisticalMechanics and its Applications , 4867–4876 (2012)17. D’Amico, G., Petroni, F.: Weighted-indexed semi-markov models for modeling ﬁnancialreturns. Journal of Statistical Mechanics: Theory and Experiment (07), P07015 (2012)18. D’Amico, G., Petroni, F.: Multivariate high-frequency ﬁnancial data via semi-markovprocesses. Markov Processes and Related Fields , 415–434 (2014)19. D’Amico, G., Petroni, F.: Copula based multivariate semi-markov models with appli-cations in high-frequency ﬁnance. European Journal of Operational Research ,765–777 (2018)20. Dassios, A., Wu, S.: Perturbed brownian motion and its application to parisian optionpricing. Finance and Stochastics (3), 473–494 (2010)21. Dassios, A., Zhang, Y.Y.: The joint distribution of parisian and hitting times of brownianmotion with application to parisian option pricing. Finance and Stochastics (3), 773–804 (2016)22. De Jong, F., Rindi, B.: The Microstructure of Financial Markets. Cambridge UniversityPress (2009)23. Dionisio, A., Menezes, R., Mendes, D.: Mutual information: a measure of dependencyfor nonlinear time series. Physica A: Statistical Mechanics and its Applications ,326–329 (2004)24. Engle, R.: The econometrics of ultrahighfrequency data. Econometrica (1), 1–22(2000)2 Guglielmo D’Amico, Filippo Petroni25. Fodra, P., Pham, H.: High frequency trading and asymptotics for small risk aversionin a markov renewal model. SIAM Journal on Financial Mathematics (1), 656–684(2015)26. Guillaume, D.M., Dacorogna, M.M., Dav´e, R.R., M¨uller, U.A., Olsen, R.B., Pictet, O.V.:From the bird’s eye to the microscope: A survey of new stylized facts of the intra-dailyforeign exchange markets. Finance and Stochastics (2), 95–129 (1997)27. Jain, P., Joh, G.: The dependence between hourly prices and trading volume. Journalof Financial and Quantitative Analysis , 269–283 (1988)28. Janssen, J., Manca, R.: Applied semi-Markov processes. Springer Science & BusinessMedia (2006)29. Levy, P.: Processus semi-markoviens. In: Proceedings of the International Congress ofMathematics, vol. 271, pp. 416–426 (1956)30. Limnios, N., Oprisan, G.: Semi-Markov Processes and Reliability. Birkhauser Boston(2001)31. Mainardi, F., Raberto, M., Gorenﬂo, R., Scalas, E.: Fractional calculus and continuous-time ﬁnance ii: the waiting-time distribution. Physica A: Statistical Mechanics and itsApplications (3), 468 – 481 (2000)32. Manganelli, S.: Duration, volume and volatility impact of trades. Journal of FinancialMarkets (4), 377 – 399 (2005)33. Nystrup, P., Madsen, H., Lindstrm, E.: Stylised facts of ﬁnancial time series and hiddenmarkov models in continuous time. Quantitative Finance (9), 1531–1541 (2015)34. Pasquini, M., Serva, M.: Multiscale behaviour of volatility autocorrelations in a ﬁnancialmarket. Economics Letters (3), 275 – 279 (1999)35. Pasquini, M., Serva, M.: Multiscaling and clustering of volatility. Physica A: StatisticalMechanics and its Applications (1), 140 – 147 (1999)36. Pasquini, M., Serva, M.: Clustering of volatility as a multiscale phenomenon. TheEuropean Physical Journal B - Condensed Matter and Complex Systems (1), 195–201 (2000)37. Petroni, F., Serva, M.: Spot foreign exchange market and time series. The EuropeanPhysical Journal B - Condensed Matter and Complex Systems (4), 495–500 (2003)38. Petroni, F., Serva, M.: Observability of market daily volatility. Physica A: StatisticalMechanics and its Applications , 838 – 842 (2016)39. Podobnik, B., Horvatic, D., Petersen, A., Stanley, H.: Cross-correlations between volumechange and price change. PNAS (52), 22079–22084 (2009)40. Raberto, M., Scalas, E., Mainardi, F.: Waiting-times and returns in high-frequencyﬁnancial data: an empirical study. Physica A: Statistical Mechanics and its Applications (1), 749 – 755 (2002). Horizons in Complex Systems41. Repetowicz, P., Richmond, P.: Modeling share price evolution as a continuous timerandom walk (ctrw) with non-independent price changes and waiting times. Physica A:Statistical Mechanics and its Applications (1), 108 – 111 (2004). Applications ofPhysics in Financial Analysis 4 (APFA4)42. Shannon, C.E.: A mathematical theory of communication. Bell system technical journal (3), 379–423 (1948)43. Silvestrov, D., Silvestrov., S.: Nonlinearly perturbed semi-Markov processes. Springer(2017)44. Silvestrov, D., Silvestrov, S.: Asymptotic expansions for power-exponential moments ofhitting times for nonlinearly perturbed semi-markov processes. Theory of Probabilityand Mathematical Statistics , 183–200 (2018)45. Silvestrov, D., Stenberg, F.: A pricing process with stochastic volatility controlled bya semi-markov process. Communications in Statistics - Theory and Methods (3),591–608 (2004)46. Smith, W.L.: Regenerative stochastic processes. Proceedings of the Royal Society SerieA 232 , 6–31 (1955)47. Swishchuk, A.: Modeling and Pricing of Swaps for Financial and Energy Markets withStochastic Volatilities. World Scientiﬁc (2013)48. Swishchuk, A., Islam, M.S.: The geometric markov renewal processes with applicationto ﬁnance. Stochastic Analysis and Applications (4), 684–705 (2011)49. Swishchuk, A., Vadori, N.: A semi-markovian modeling of limit order markets. SIAMJournal on Financial Mathematics (1), 240–273 (2017) micro-to-macro approach to returns, volumes and waiting times 3350. Vasileiou, A., Vassiliou, P.C.G.: An inhomogeneous semi-markov model for the termstructure of credit risk spreads. Advances in Applied Probability (1), 171198 (2006)51. Vassiliou, P.C.: Fuzzy semi-markov migration process in credit risk. Fuzzy Sets andSystems , 39 – 58 (2013). Theme: Fuzzy random variables52. Yi, C.: On the ﬁrst passage time distribution of an ornsteinuhlenbeck process. Quanti-tative Finance10