[PDF] Infinitely Stochastic Micro Forecasting

Abstract

Forecasting costs is now a front burner in empirical economics. We propose an unconventional tool for stochastic prediction of future expenses based on the individual (micro) developments of recorded events. Consider a firm, enterprise, institution, or state, which possesses knowledge about particular historical events. For each event, there is a series of several related subevents: payments or losses spread over time, which all leads to an infinitely stochastic process at the end. Nevertheless, the issue is that some already occurred events do not have to be necessarily reported. The aim lies in forecasting future subevent flows coming from already reported, occurred but not reported, and yet not occurred events. Our methodology is illustrated on quantitative risk assessment, however, it can be applied to other areas such as startups, epidemics, war damages, advertising and commercials, digital payments, or drug prescription as manifested in the paper. As a theoretical contribution, inference for infinitely stochastic processes is developed. In particular, a non-homogeneous Poisson process with non-homogeneous Poisson processes as marks is used, which includes for instance the Cox process as a special case.

Full PDF

IInﬁnitely Stochastic Micro Forecasting

Mat ´uˇs Maciak , Ostap Okhrin , and Michal Peˇsta ∗ Charles University, Prague, Czech Republic, Faculty of Mathematics and Physics, Department of Probabilityand Mathematical Statistics Technische Universit¨at Dresden, Germany, Chair of Statistics, Institute of Economics and Transport, School ofTransportation

September 20, 2019

Abstract

Forecasting costs is now a front burner in empirical economics. We propose an uncon-ventional tool for stochastic prediction of future expenses based on the individual (micro)developments of recorded events. Consider a ﬁrm, enterprise, institution, or state, whichpossesses knowledge about particular historical events. For each event, there is a seriesof several related subevents: payments or losses spread over time, which all leads to aninﬁnitely stochastic process at the end. Nevertheless, the issue is that some already oc-curred events do not have to be necessarily reported. The aim lies in forecasting futuresubevent ﬂows coming from already reported, occurred but not reported, and yet not oc-curred events. Our methodology is illustrated on quantitative risk assessment, however, itcan be applied to other areas such as startups, epidemics, war damages, advertising andcommercials, digital payments, or drug prescription as manifested in the paper. As a theo-retical contribution, inference for inﬁnitely stochastic processes is developed. In particular,a non-homogeneous Poisson process with non-homogeneous Poisson processes as marks isused, which includes for instance the Cox process as a special case.

Keywords: stochastic prediction, inﬁnitely stochastic process, marked process, time-varying models, dy-namic panel data, resampling, risk valuation ∗ Corresponding author; Address: Sokolovsk´a 49/83, 18675 Prague 8, Czech Republic; Email: [email protected] a r X i v : . [ ec on . E M ] S e p nﬁnitely Stochastic Micro Forecasting Human as well as monetary losses and uncertainty about their extent are one of the mainsources of risk. A probabilistic prediction of the future monetary losses lies on the cuttingedge of quantitative risk assessment, for instance, valuation of operational risk in banking orreserving risk in insurance, while the study of human losses is of particular interest in con-ﬂict solution and epidemics modeling. We propose a general prediction methodology, whichtogether with the underlying stochastic procedures is applicable to various areas as demon-strated by case examples later on.Let us deﬁne the general structure of our model on the basis of a ﬁnancial example. Theevent’s ‘lifetime’ can be described as follows: The i th loss occurs at the occurrence time , which isdenoted by T i . Such a loss is often reported (e.g., to a ﬁnancial company) not immediately afterthe event, but for various reasons, after the reporting delay (waiting time) W i , which is the timedifference between the occurrence epoch (event time) and the observation epoch (reporting time).Furthermore, Z i = T i + W i stands for the i th reporting (notiﬁcation) time . The contemplatedcash ﬂows are visualized in Figure 1, which elucidates the whole framework behind the lossreporting process together with the time developments of the losses.Our observation history for the reported losses is a time interval [ a ] , where a is the presenttime. The main aim is to predict the losses, which are going to be reported in the future timehorizon ( a , b ] , and simultaneously to predict the development of the losses within the timeinterval ( a , b ] , that have already occurred before time a , but are not settled yet. Some of themare already incurred (i.e., occurred within [ a ] , but will be reported after time point a ). Theobserved loss data are truncated in the way that we observe only the reported losses, i.e., Z i ≤ a .Without loss of generality, we assume that the reporting times are chronologically ordered suchthat Z i ≤ Z i for i < i . After some internal process is carried out, the company pays N i ( t ) payments till time point t , or N i ( ∞ ) payments in order to fully settle the i th loss. The amount ofthe k th payment within the i th loss paid at time U i , k is represented by X i , k , for k =

1, . . . , N i ( a ) .The time window from the reporting time Z i up to the last observed (available) time a for the i th loss has length V i , i.e., V i = a − Z i . One can think of the reporting times Z i ’s as the arrivaltimes of the counting process { M ( t ) } t ≥ and the payment times U i , k ’s as the arrival times ofthe counting processes { N i ( t ) } t ≥ for i ∈ N . Assuming that there are i =

1, . . . , M ( a ) losses2 . Maciak, O. Okhrin, and M. Peˇsta O cc u rr e n c e s ijkM ( a ) lM ( b ) a Future bT W Z U X U X T i W i Z i U i ,1 X i ,1 U i ,2 X i ,2 U i ,3 X i ,3 V i T j W j Z j U j ,1 X j ,1 U j ,2 X j ,2 T k W k Z k T l W l Z l Figure 1: Scheme of the event occurrence process and the event development processes.already reported, we observe a collection { T i , Z i , { U i , j } j = N i ( a ) , { X i , j } j = N i ( a ) } i = M ( a ) or, alternatively and equivalently, { Z i , W i , { N i ( t ) } t ∈ [ a ] , { X i , j } j = N i ( a ) } i = M ( a ) . The proposed class of models— inﬁnitely stochastic processes —is a very rich and general class thatnests, for examples, doubly stochastic (Cox) processes. Our approach and results are motivatedin the context of several applications taken from the empirical economics literature.

Case 1: Operational risk

Banks and other ﬁnancial institutions have to face operationalrisk covering fraud, system failures, security, privacy protection, terrorism, legal risks, em-ployee compensation claims, physical (e.g., infrastructure shutdown) or environmental risks.As pointed out in Chernobai et al. (2007), some large banks prefer to use their own formaldeﬁnition of operational risk. For instance, Deutsche Bank (2017) deﬁnes operational risk as3 nﬁnitely Stochastic Micro Forecasting “the risk of loss resulting from inadequate or failed internal processes, people and systemsor from external events, and includes legal risk.” Recent developments for operational riskprediction—comprehensively summarized by Benito and L ´opez-Mart´ın (2018)—reveal thatchallenges like truncated data and non-homogeneous processes have to be handled in oper-ational risk modeling. The empirical literature, for instance Cohen (2018), suggests that oper-ational risk capital models can be based on the loss distribution approach. Generally, a loss i corresponding to operational risk is occurred at T i , but is internally reported later at Z i . Conse-quently, compensations need to carried out by the bank to the affected side. For the loss i , thecompensations X i , k ’s are going to be paid at the times U i , k ’s. The bank is then required (e.g., bythe Third Basel Accord) to quantify the future distribution of losses (measured through theircompensations) belonging to operational risk. Case 2: War damages

Modeling of the evolution of national and international conﬂicts isa long standing strand of research (Gleditsch et al., 2014), although data collection is a verychallenging task, see Arnold (2019) and Cressey (2008). Based on the comprehensive databasesas COPDAB (Azar, 1980), MIDLOC (Braithwaite, 2010), or PRIO (Hallberg, 2012), main ap-proaches still remain to be classic econometric linear ones with a list of exogenous factors orthose based on the hazard models, cf. Collier et al. (2004), Schrodt (2014), Clauset (2018) andWard et al. (2010), Harrison and Wolf (2012). As the list of current approaches was criticizedby Schrodt (2014), namely “garbage can models that ignore the effect of collinearity”, “com-plex models without understanding the underlying assumptions” or “linear statistical mono-culture”, we believe that our model can make a useful step forward in the conﬂict prediction.Using the proposed model, one considers each point T i in the M process as the beginning of thetension between regions/countries. The followed up Z i is the ofﬁcial beginning of the conﬂict,through the ofﬁcial notice or the ﬁrst armed intrusion. This point (as the mark) starts a spreadthe armed conﬂict over a series of battles k at time points U i , k that take X i , k lifes. The main as-sumptions of the process are fully in-line with the nature of the war: M N Case 3: Epidemics

Proper modeling and prediction of the spread of epidemic is a very im-portant strand of literature, in particular in the view of recent H1N1 and Ebola epidemics.Classical models arising from modeling the online diffusions are those based on Susceptible-Infected-Recovered model introduced by Kermack and McKendrick (1927), Linda and Allen4 . Maciak, O. Okhrin, and M. Peˇsta (2008), often extended to the stochastic case as in Bobashev et al. (2007), Yan (2008), and others.Newly, these models were linked to the Hawkes processes in which spread of the disease inone population has been investigated, see Rizoiu et al. (2018). Considering several populations (neighbor regions, countries, ﬂight connections, etc.), the proposed model is a natural ﬂexibleextension, in which each point T i of the process is the infection of an individual in the region i .The delay W i is thus the incubation period, after which the process of infection individuals inthe population i starts, with individuals being infected at the points U i , k . The values X i , k maybe considered as the severity of the illness also converted to monetary quantities. Case 4: Drug prescription

Health care expenditures have become one of the most seriousissues of the modern society and prescribed medicaments seem to form one of the fastest growingcomponent of the health insurance expenses. Managed care organizations encourage physi-cians to be more cost-conscious, see, e.g., Miller and Luﬁ (1994). They use ﬁnancial incentivesto induce physicians to reduce expenses while maintaining the quality of medical care.

Generalpractitioners (GP) comprise a signiﬁcant part of the health care system and inﬂuence impor-tantly the total amount of insurance money spent. Therefore, the prescription behaviour of GPs isof utmost importance. It has been studied from the point of view of the pharmaceutical ﬁrms inseveral studies, see G ¨on ¨ul et al. (2001), Manchanda and Chintangunta (2004), and other refer-ences therein; or from the point of view of the health insurance companies, e.g., Hudecov´a et al.(2017). Furthermore, the prescription patterns and the inﬂuencing factors have been analyzedin Ekedahl et al. (1995), Rokstad et al. (1997), Watkins et al. (2003), or Caldbick et al. (2015). Onemay think of a spread of disease/illness as the occurrence time T i of event and, correspond-ingly, a visit to the GP as the reporting time Z i . Expenses for the prescribed drugs—sometimesmore than one medical examination by the GP is needed (at times U i , k )—are then the event pay-ments X i , k . After all, the responsible health care ﬁnancing organization is interested to knowthe future expenditures for prescribed medicaments by the GPs within a predetermined timehorizon. Case 5: Startups

A recent entrepreneurship bloom prompts for another straightforward ap-plication of the proposed micro forecasting method. Many well-known multinational com-panies leading the global market these days have begun their business in terms of small andlocally based startups with only very limited human, social, and ﬁnancial capital. These are,although, crucial factors for the future startup performance and its ability to survive (Bosma5 nﬁnitely Stochastic Micro Forecasting et al., 2004). Especially the last one—the ﬁnancial capital—turns out to play the most signiﬁ-cant role for establishing an entrepreneur on the global market (Colombo and Grilli, 2008). Theinitial ﬁnancial capital is, however, usually not sufﬁcient to start operations at the desired scaleas the credit constraints for bank loans are too strict (Holtz-Eakin et al., 1994). Therefore, thestartup team looks for additional external sources of equity capital (such as external support,collaboration, fund raising, etc.), which is credited later over time. When modeling the overallimpact of the startup in terms of its frontier production, this additional capital should be alsoconsidered (Aigner et al., 1977). Thus, the future cash ﬂow is of the main interest. Using ourterminology, a new entrepreneur i starts with its business after some waiting time W i and ad-ditional capital amounts X i , k ’s arrive at the times U i , k ’s. The whole scenario can be analogouslyalso adapted, for instance, for modeling the human capital, social investments, or businessperformance of the startups. Case 6: Advertising and commercials

In recent years, television and internet advertisinghave become increasingly tailored to individuals. Television commercials simply rely on so-called contextual advertising, where ads are chosen based upon the broadcast contents (Li-aukonyte et al., 2015). More sophisticated ad placement techniques use online behavioral ad-vertising or targeting advertising, which are typical for internet commercials and are based onthe browsing history, online activities, or web searches (Goldfarb and Tucker, 2011).

Shoppingbehavior and shopping patterns have recently been analyzed by economists in order to better un-derstand the process through which consumers search for their preferred options (Burda et al.,2012; Xiao, 2018). Let us concentrate on the perspective of a company selling a speciﬁc productover the Internet. A moment, when some ad is placed over the Internet by another companyrunning some website, can be thought of the occurrence time T i . The owner of the websiteis directly or indirectly paid for the advertisement by the product-selling company. Then, thereporting time Z i naturally corresponds to the time when the ad is displayed or when it is rec-ognized by some user (for example, the ﬁrst click on it). If the user proceeds with a purchase inthe online store advertised by the ad, the payment time and the payment amount represent themark of an event development process. The product-selling company can consequently predicttheir future income based on the advertising and, thus, judge the efﬁciency of their commercialproduct placement. 6 . Maciak, O. Okhrin, and M. Peˇsta Case 7: Apple Pay

Digital payment platforms are multi-sided and layered modular artifactsthat primarily mediate payment transactions between payers and payees (Kazan, 2015). ApplePay as a payment method has become increasingly popular in everyday life during recentyears. As remarked by Liu and Mattila (2019), it simply boosts satisfaction through elevatedcoolness in a successful encounter. Liu et al. (2015) examined recent changes in the paymentsector in ﬁnancial services, speciﬁcally related to mobile payments that enable new channels forconsumer payments for goods and services purchases, and other forms of economic exchange.Although, Apple Pay serves merely as a proxy and mediator between cardholders’ (card issuer)and merchants’ (acquirer) bank accounts, a card issuer (e.g., a bank) is obliged to pay a portionfrom the payment for such a service. Therefore, from the card issuer’s perspective , it is of interestto determine the amount of charges for the Apple Pay service during the future time period.Here, the date of issuing a payment card can be considered as the occurrence time T i of event,the date of registering the Apple Pay is the reporting time Z i , and the corresponding purchasesare the payment amounts X i , k realized at payment times U i , k . Case 8: Actuarial claims reserving An insurance company needs to predict future claims withcorresponding payments and, additionally, future payments coming from already occurredclaims, which do not have to be necessarily reported, and this is due to the current regula-tory framework for insurance supervision (e.g., Solvency II Directive or Swiss Solvency Test).A lifetime of a claim can be characterized by the following variables that are driving the claimprocess: A claim i (i.e., a loss) occurs at the accident time T i , however the insurance companyis notiﬁed with some delay (heavy injuries that did not allow insured person report the claim,too light damages of the vehicle that allow an insured person to postpone a report, etc.) at the reporting time Z i ; the corresponding claim payments X i , k ’s are going to be paid by the insur-ance company to the insured at the payment times U i , k ’s. Finally, the distribution of cumula-tive payments within a predetermined time window has to be predicted in order to settle therequired claim reserves (e.g., Value at Risk or Expected Shortfall at 99.5%). Below, we concen-trate in more details on the claims reserving task and exemplify the proposed methodology throughanalyses of two insurance lines of business in order to demonstrate practical efﬁciency of ourprediction method. 7 nﬁnitely Stochastic Micro Forecasting This paper is structured as follow: Next section introduces the data and main stochastic ob-jects we intend to model. Section 3 provides the assumptions and theory for occurrence andreporting times; number of payments; reporting delays; and payment amounts in differentsubsections. Section 4 contains a practical application to the actuarial data. Afterwards, ourconclusion follows. The proofs of our theoretical results are put in the Appendix.

A classical actuarial problem called claims reserving is elaborated from an emerging perspec-tive. In contrast to the traditional claims reserving techniques based on aggregated informationfrom historical data, our approach relies on granular individual claim-by-claim data and con-tributes to increase in the prediction’s precision.

Claims (loss) reserving in insurance determines a sufﬁcient amount of money, that needs tobe put aside from the premium, to cover future claim (loss) payments. The main issue is to esti-mate/predict these claims reserves , which should be held by the insurer in order to meet all futureclaims arising from policies currently in force and policies written in the past. Claims reservingis a classical problem in non-life insurance , sometimes also called general insurance (in UK) orproperty and casual insurance (in USA). A non-life insurance policy is a contract between theinsurer and the insured. The insurer receives a deterministic amount of money, known as pre-mium, from the insured in order to obtain a ﬁnancial coverage against well-speciﬁed randomlyoccurred events. If such an event (claim) occurs, the insurer is obliged to pay in respect of theclaim a claim amount, also known as a loss amount. In layman’s terms, if an accident happensto an insured person, he or she goes to the insurance company to request a claim payment.The insurance company pays this claim amount from the loss reserves. In many cases severalpayments are performed for a single accident, for instance, further health problems, hiddendamages of the car that were not visible by the ﬁrst inspection, etc.Claims reserving methods based on aggregated data from so-called run-off triangles are pre-dominantly used to calculate the claims reserves, see England and Verrall (2002) or W ¨uthrichand Merz (2008) for an overview. Such models are not based on the particular claims or acci-dents, but rather on the aggregated overall payments through some predeﬁned period, typi-cally one year. These conventional reserving techniques have series of disadvantages: loss of8 . Maciak, O. Okhrin, and M. Peˇsta information from the policy and the claim’s development due to the aggregation; usually smallnumber of observations in the aggregated data; only few observations for recent accident years;various assumptions of independence, which can sometimes be unrealistic or at least question-able; and sensitivity to the most recent paid claims, see Hudecov´a and Peˇsta (2013) or Peˇsta andOkhrin (2014) for some recent developments. In order to overcome the above mentioned deﬁ-ciencies or imperfections, micro (granular) loss reserving methods for individual claim-by-claim data need to be derived. Moreover, estimation of the whole distribution of the total future paymentsis a crucial part of the risk valuation process.

To estimate the distribution of reserves means to predict future cash ﬂows and their uncertainty.On the top of that, this is becoming compulsory by the law due to the introduction of newsupervisory guidelines.The loss reserving approaches based on individual/micro-level/granular/claim-by-claimdata do not represent the mainstream in the reserving ﬁeld. First attempts within the reservingframework of incorporating the claim information for reporting delays were using a Bayesianapproach (Jewell, 1989, 1990) or an empirical-Bayes approach (Weisberg et al., 1984). Substan-tial branch of the individual loss reserving methods, that are based on a position dependentmarked Poisson process, involves work of Arjas (1989), Norberg (1993, 1999), and Haastrup andArjas (1996). A Markov model for granular loss reserving was proposed by Hesselager (1994).Including claims features to specify the model components within the setup of the markedpoint processes was revisited by Larsen (2007). Empirical investigation by Antonio and Plat(2014) indicates that individual reserving provides a better accuracy compared to some selectedaggregated models. A discrete time formulation instead of the continuous time point processdescription was suggested by Godecharle and Antonio (2015) and Pigeon et al. (2014). Besidesthat, Zhao et al. (2009) and Zhao and Zhou (2010) proposed semiparametric techniques fromsurvival analysis. Several case studies of the individual reserving approaches can be foundin Taylor et al. (2008). Machine learning techniques in the individual claims reserving wereelaborated by W ¨uthrich (2016). Furthermore, Verrall and W ¨uthrich (2016) pointed out a gainin using the individual methods by employing non-stationarity. Cox processes were utilizedby Badescu et al. (2016).Practical loss prediction techniques often forfeit diagnostics of the theoretical models’ as-9 nﬁnitely Stochastic Micro Forecasting sumptions. This is overcome by our approach, where we deal with nonlinear continuous timeMarkov environments (Hansen and Scheinkman, 2009). Generally, times of events togetherwith accompanying measures can be analyzed as marked point processes, e.g., in case of ultra-high-frequency data, see Engle (2000). Stochastic methods for modeling the total claim amountvia marked Poisson cluster models have been recently proposed by Basrak et al. (2018). Here,the marks can take values only in a ﬁnite-dimensional space and have a common distribution.Our approach allows for point processes as marks, which is very suitable for a practical ap-plication of unrestricted number of payments, and enables time-varying distributions for, e.g.,payment dates, reporting delays, or payment amounts.

This paper contributes to the literature by aiming at using all the available information in thedata. The proposed model motivated by the data controls for dependencies (between differentpayment amounts, between payments amounts and reporting delays, between reporting delayand accident date, etc.) in a simple and natural way. We assume a marked non-stationary Pois-son process for the time ordered reporting dates; ﬂexibly parametrized conditional distributionof the reporting delay and payment sizes given the accident date; or a non-homogeneous Pois-son process in the role of a process’ mark for the number of payments. All the models in thispaper are supported with the asymptotic theory, and are combined in an omnibus model . Appli-cation to the true data strongly outperforms classical models in both point and interval forecastof the reserved losses. Up to the best of our knowledge, this is the ﬁrst time where all thepossible cross and temporal dependencies of the claim data are taken into account.

Nowadays, modern databases and computer facilities provide a foundation for loss reservingbased on individual data. There is no more reason to rely on the reserving techniques usingaggregated data only. We possess the unique database from the Guarantee Fund of the CzechInsurer’s Bureau for car insurance which consist of claims developments from the beginningof 2004 up to the end of 2016. Each record in the data set contains: • Claim ID (if one claim is associated with more payments, each payment has a separateentry); 10 . Maciak, O. Okhrin, and M. Peˇsta • Type of claim, which can be either bodily injury or material damage ; • Accident time (occurrence); • Reporting time (notiﬁcation); • Date of payment, when the payment is credited to the client’s bank account; • Amount of payment.All in all we have 4450 claims comprising of 10820 payments for bodily injuries and 30545claims distributed into 35642 payments for material injuries within the investigated time in-terval. For back-testing purposes, we only use the data up to the end of 2015 to construct theprediction. The data from 2016 are only employed for comparison purposes with the obtainedresults.

The size of the claims reserves protects insurance company against the future losses. Bearingthis practical issue in mind, we propose a series of theoretical models describing each compo-nent of the claim’s chain. We assume that the reporting dates Z i ’s follow a non-homogeneousPoisson process with a parametric intensity function; the reporting delays W i ’s follows a time-varying continuous parametric distribution conditional on the reporting dates; the paymentdates for each claim i are represented by arrival times of a non-homogeneous Poisson process N i ( t ) , where N i is a mark of M ; and the payment amounts X i , j ’s are modeled similarly to thereporting delays via a time-varying parametric conditional distribution. All the models arethen brought together under one umbrella in the empirical study.Recently, Giesecke and Schwenkler (2018) have discussed marked point processes with ap-plications in ﬁnance and economics to model the timing of defaults, corporate bankruptcies,market transactions, unemployment spells, births, and mortgage delinquencies. They devel-oped likelihood estimators for the parameters of a marked point process and incompletely ob-served explanatory factors that inﬂuence the arrival intensity and mark distribution, althoughthey presumed only ﬁnite dimensional marks. We go beyond therein investigated models byassuming inﬁnite dimensional marks via different stochastic framework. We establish an ap-proximation to the likelihood and analyze the convergence and large-sample properties of theassociated estimators. Numerical results illustrate the behavior of our estimators.11 nﬁnitely Stochastic Micro Forecasting Recall that our primary practical goal is to model and, consequently, to simulate a distributionof the sum of future payments within the time period ( a , b ] . Besides that, the secondary practicalgoal is to back-predict claims that have already occurred, but are still not reported. As a the-oretical by-product, we investigate marked non-homogeneous Poisson processes with inﬁnitedimensional marks, which has not been done yet. As for the insurance company it is not so relevant, when the accident has happened, but ratherwhen it has been reported, as on that date the whole procedure of the claim payments starts.We proceed to the assumptions on the reporting dates Z i ’s that are needed for showingexistence, uniqueness, and asymptotic properties of the proposed estimators. Taking into ac-count, that different claims are supposed to be unrelated, it is natural to assume that the timedifferences { Z i − Z i − } i ∈ N are independent ( Z ≡ Z i ’s can beviewed as the arrival times of a counting process with independent increments. A reasonableand parsimonious representative would be a non-homogeneous Poisson process. Assumption M . The time ordered reporting times { Z i } i ∈ N are arrival times of a non-homogeneousPoisson process { M ( t ) } t ≥ with a parametric intensity ψ ( t ; ρ ) > such that M ( t ) = ∑ ∞ i = { Z i ≤ t } , ρ ∈ R ⊆ R q , and R is an open convex set. The reporting epochs { Z i } i ∈ N are reversely determined by the counts { M ( t ) } t ≥ such that Z i = inf t ≥ { M ( t ) ≥ i } . The intensity ψ ( t ; ρ ) can be considered as a risk exposure for an accidentreporting (not occurring) in time t . Although, one may still argue that it should be more conve-nient to assume that the occurrence (accident) times, and not the reporting times, should formthe arrival times of some non-homogeneous Poisson process. This is indeed in concordancewith the parametric time-varying conditional density f W for the reporting delay W i (given Z i = z ) deﬁned later on, which results in the fact that { T i } i ∈ N are arrival times of anothernon-homogeneous Poisson process having the intensity µ ( t ; ρ , ϑ ) = (cid:90) R ψ ( z ; ρ ) f W { t ; w ( z , ϑ ) } d z , (1)because of the displacement theorem (Kingman, 1993, p. 61). Thus, µ ( t ; ρ , ϑ ) is just a risk exposurefor an accident occurring in time t . 12 . Maciak, O. Okhrin, and M. Peˇsta We should emphasize that similarly to the below derived statistical inference for the non-homogeneous Poisson process, many other authors dealt with consistent estimation of theprocess intensity. To mention at least some of them, we refer to Konecny (1987), Schoenberg(2005), Waagepetersen (2007), Waagepetersen and Guan (2009), Coeurjolly and Møller (2014),and Prokeˇsov´a et al. (2017), although sometimes in a more general setup. There are two mainreasons why we derive consistency and asymptotic normality of the intensity estimator in a dif-ferent fashion: First, it is of a practical interest to require simple assumptions, which are easilyveriﬁable and allowing for a huge class of parametric intensities. Second, our theoretical re-sults and ways of proving them serve as an intermediate product for developing a suitablestatistical inference for the marked non-homogeneous Poisson process with marks being non-homogeneous Poisson processes (discussed in Subsection 3.2).Since we consider a fully parametric approach, it is ﬁrstly necessary to estimate the un-known parameter ρ . We employ the maximum likelihood (ML) approach for the arrival times.The unconditional likelihood in case of M ( t ) , when the last observable (deterministic) time is t ,has the form L ( ρ ; Z , . . . , Z M ( t ) , t ) = exp {− Ψ ( t ; ρ ) } M ( t ) ∏ i = ψ ( Z i ; ρ ) , 0 < Z < . . . < Z M ( t ) < t , (2)where Ψ ( t ; ρ ) : = (cid:82) t ψ ( z ; ρ ) d z = E M ( t ) is a cumulative intensity function.Maximizing the log-likelihood function (cid:96) ( ρ ; Z , t ) = M ( t ) ∑ i = log ψ ( Z i ; ρ ) − (cid:90) t ψ ( z ; ρ ) d z (3)with respect to ρ provides an ML estimator (cid:98) ρ . The true value ρ ∈ R of the unknown pa-rameter ρ is supposed to uniquely maximize E ρ (cid:96) ( ρ ; Z , t ) . The uniqueness of ρ is essential foridentiﬁability and, consequently, for consistency and asymptotic normality of the ML estima-tor (White, 1982). These properties are proved later on. Although, one has to realize that weare dealing with not independent and not identically distributed (n.i.n.i.d.) random variables (i.e.,arrival times). Let us deﬁne h ( Z i ; ρ , t ) : = M ( t ) Ψ ( t ; ρ ) − log ψ ( Z i ; ρ ) ,13 nﬁnitely Stochastic Micro Forecasting which means that (cid:98) ρ = arg min ρ ∈ R M ( t ) ∑ i = h ( Z i ; ρ , t ) . (4)Assume the following to hold with respect to the h ( Z i ; ρ , t ) functions in order to obtain a sen-sible estimator (i.e., consistent and asymptotically normal). Assumption M . h ( z ; ρ , t ) is convex in ρ ∈ R for all < z < t. The convexity of h from Assumption M (cid:98) ρ ≡ (cid:98) ρ ( t ) .For simplicity of further notations, let us denote [ · ][ · ] (cid:62) ≡ [ · ] ⊗ , ∂ ρ ≡ ∂∂ ρ [ · ] ρ = ρ , ∂ ρ ≡ ∂ ∂ ρ ∂ ρ (cid:62) [ · ] ρ = ρ , ∂ ρ , i ≡ ∂∂ρ i [ · ] ρ = ρ , and ∂ ρ , i , j ≡ ∂ ∂ρ i ∂ρ j [ · ] ρ = ρ for ρ = [ ρ , . . . , ρ q ] (cid:62) . Symbol standsfor a zero vector and I means an identity matrix with a suitable dimension. Furthermore for t >

0, let us deﬁne an information matrix I ( t ; ρ ) : = (cid:82) t { ∂ ρ ψ ( z ; ρ ) } ⊗ ψ ( z ; ρ ) d z and a matrix K ( t , ρ ) : = √ ψ ( t ; ρ ) (cid:20) ∂ ρ ψ ( t ; ρ ) − { ∂ ρ ψ ( t ; ρ ) } ⊗ ψ ( t ; ρ ) (cid:21) . Assumption M . ∂ ∂ ρ ∂ ρ (cid:62) ψ ( · ; ρ ) : R + → R q × q is continuous for all ρ ∈ R and there exist Lebesgue-integrable functions m i and m i , j such that (cid:12)(cid:12)(cid:12)(cid:12) ∂∂ρ i ψ ( t ; ρ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ m i ( t ) and (cid:12)(cid:12)(cid:12)(cid:12) ∂ ∂ρ i ∂ρ j ψ ( t ; ρ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ m i , j ( t ) for all ρ ∈ R , almost every t > , and i , j =

1, . . . , q. The deﬁnition of I ( t ; ρ ) , the uniqueness of the true value ρ ∈ R , and the differentiabilityof ψ ( t ; · ) from Assumption M I ( t ; ρ ) is positive deﬁnite. Moreover, the integrablemajorants from Assumption M ψ allow to interchange theintegral and the derivative of ψ . Assumption M . As t → ∞ ,(i) M ( t ) I − ( t , ρ ) converges in probability to a positive semideﬁnite matrix;(ii) (cid:82) t (cid:8) I − ( t , ρ ) K ( z , ρ ) I − ( t , ρ ) (cid:9) dz → . To check whether Assumption M I − ( t , ρ ) E M ( t ) = I − ( t , ρ ) Ψ ( t ; ρ ) converges to a positive semideﬁnite matrix,which may also be a zero matrix; and Var (cid:8) (cid:0) I − ( t , ρ ) (cid:1) i , j M ( t ) (cid:9) = (cid:0) I − ( t , ρ ) (cid:1) i , j Ψ ( t ; ρ ) → t → ∞ for all i , j =

1, . . . , q . By the Cauchy-Schwarz inequality and equations (16)–(18) from the14 . Maciak, O. Okhrin, and M. Peˇsta proof of the consequent Theorem 1, Assumption M j , k , (cid:96) , m =

1, . . . , q holds (cid:16) I − ( t , ρ ) (cid:17) j , k (cid:16) I − ( t , ρ ) (cid:17) (cid:96) , m × (cid:90) t ψ ( z ; ρ ) (cid:26) ∂ ρ , k , (cid:96) ψ ( z ; ρ ) − ∂ ρ , k ψ ( z ; ρ ) ∂ ρ , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:27) d z → t → ∞ .At ﬁrst sight, technical Assumptions M M ψ and on the amount of information about the parameter ρ contained in the process M . Basically, the intensity ψ has to be sufﬁciently smooth and ade-quately regular with respect to the information matrix function I . In practice, we do not allowfor too ‘wild’ and too quickly changing behavior of the process of reporting times. Theorem 1 (Consistency I) . Under Assumptions M – M , I ( t , ρ ) ( (cid:98) ρ − ρ ) = − I − ( t , ρ ) M ( t ) ∑ i = ∂ ρ h ( Z i ; ρ , t ) + o P ( ) , t → ∞ .Let us discretize the ‘continuous’ time t ∈ R + for the process { M ( t ) } t ≥ in a way that oneobserves M only at all discrete time points a ∈ N . This is indeed in concordance with thenature of our practical problem, where we evaluate the number of reported claims at the endof the calendar year represented by a discrete value of a .Additionally, the next Lindeberg condition can extend the assertion of Theorem 1.

Assumption M . lim a → ∞ ∑ ai = E (cid:0) d (cid:62) Y i (cid:1) {| d (cid:62) Y i | ≥ ε (cid:107) d (cid:107) } = for all d ∈ R q and ε > , where Y i : = I − ( a , ρ ) (cid:82) ii − { ∂ ρ log ψ ( z ; ρ ) } ( dM ( z ) − ψ ( z ; ρ ) dz ) . For practical veriﬁcation purposes, one can assume a version of the

Lyapunov condition instead of the Lindeberg one. For instance, for all d ∈ R q , there exists δ > a → ∞ ∑ ai = E (cid:12)(cid:12) d (cid:62) Y i (cid:12)(cid:12) + δ =

0. On one hand, the Lyapunov condition is more restrictive thanthe Lindeberg one, on the other hand, it is easier to verify.

Corollary 2 (Asymptotic normality I) . Under Assumptions M – M , I ( a , ρ ) ( (cid:98) ρ − ρ ) D −−→ a → ∞ N q ( , I ) .Let us consider cases of the constant and exponential intensity function.15 nﬁnitely Stochastic Micro Forecasting Example 1.

Intensity ψ ( z ; ρ ) = ρ , which corresponds to a homogeneous Poisson process. Here, R = ( ∞ ) and the cumulative intensity is Ψ ( t , ρ ) = ρ t. The log-likelihood function is (cid:96) ( ρ ; Z , t ) = M ( t ) log ρ − ρ t. The ML estimator is ˆ ρ = M ( t ) / t, the information number becomes I ( t ; ρ ) = t / ρ ,and K ( t ; ρ ) = − ρ − . Assumption M is easily veriﬁable. For the Lyapunov condition, the choiceof δ = leads to ∑ ai = (cid:0) ρ a (cid:1) E (cid:12)(cid:12)(cid:12) (cid:82) ii − ρ dM ( z ) − (cid:82) ii − dz (cid:12)(cid:12)(cid:12) = (cid:113) ρ a E (cid:12)(cid:12)(cid:12) Z i − Z i − ρ − (cid:12)(cid:12)(cid:12) = (cid:113) ρ a ( − − ) → as a → ∞ , because the interarrival times { Z i − Z i − } i have the exponential distribution withparameter ρ (i.e., its expectation equals ρ ). Hence, (cid:112) a / ρ ( (cid:98) ρ − ρ ) D −−→ a → ∞ N (

0, 1 ) . Example 2.

Intensity ψ ( z ; ρ ) = exp { ρ + ρ z } . Here, R = ( ∞ ) × ( ∞ ) . The log-likelihood func-tion is (cid:96) ( ρ ; Z , t ) = ρ M ( t ) + ρ ∑ M ( t ) i = Z i − e ρ (cid:0) e ρ t − (cid:1) / ρ . The ML estimator of the parameter ρ can be obtained as a solution of ∑ M ( t ) i = Z i + M ( t ) / ˆ ρ − tM ( t ) / (cid:0) − e − ˆ ρ t (cid:1) = and the ML estimatorof the parameter ρ comes from ˆ ρ = log (cid:8) ˆ ρ M ( t ) / (cid:0) e ˆ ρ t − (cid:1)(cid:9) . Consequently, I ( t ; ρ ) =  e ρ (cid:0) e ρ t − (cid:1) / ρ e ρ (cid:8) e ρ t ( ρ t − ) + (cid:9) / ρ e ρ (cid:8) e ρ t ( ρ t − ) + (cid:9) / ρ e ρ (cid:2) e ρ t { ρ t ( ρ t − ) + } − (cid:3) / ρ  , which can be easily proved to be positive deﬁnite for all t > and any ρ ∈ R . Assumption M togetherwith the Lyapunov condition can be checked as well. Hence, I ( a ; ρ ) ( (cid:98) ρ − ρ ) D −−→ a → ∞ N ( , I ) . An example of the intensity function directly used in the consequent practical analysis ofour data for modeling the reporting times of bodily injury claims, see also Figure 2 (left panel,green line), is given below.

Example 3.

Intensity ψ ( z ; ρ ) = exp { ρ + ρ log z + ρ cos ( π z / ρ ) + ρ sin ( π z / ρ ) } . Theabove formulated assumptions are satisﬁed for a particular open convex R ⊆ R . The deﬁned enti-ties are not presented here due to their voluminous forms. Besides that, the next example is used in the data analysis for the reporting times of materialdamage claims, cf. Figure 2 (right panel, green line).

Example 4.

Intensity ψ ( z ; ρ ) = exp { ρ + ρ z + ρ z + ρ cos ( π z / ρ ) + ρ sin ( π z / ρ ) } . Therequired assumptions are again satisﬁed, but the above deﬁned entities are not presented here due to theircomplicated and voluminous forms. Suitability of Examples 3 and 4 for the practical analysis is illustrated in Figure 2, wherethe observed and ﬁtted cumulative intensities (corresponding to the theoretical cumulativeintensity Ψ ( t ; ρ ) ) are compared. The deviations between them are minor. Let us recall that16 . Maciak, O. Okhrin, and M. Peˇsta Bodily

Years R epo r t i ng s Material

Years R epo r t i ng s Figure 2: Number of reported claims—empirical (observed) cumulative intensity in blue, esti-mated cumulative intensity in red, and estimated intensity in green (prediction uses only dataup to the end of 2015).our estimation of the reporting dates’ intensity is based on the data up to the end of year 2015.Extrapolation of the estimated cumulative intensity for the ‘future’ year 2016 also nicely mimicsthe known reality from year 2016 (not used for estimation). The estimated underlying intensityis depicted as well.Finally, it would be natural to characterize the number of new arriving claims with respectto the intensity of the process M . Proposition 3 (Inﬁnite number of renewals) . If Assumption M holds and lim t → ∞ Ψ ( t ; ρ ) = ∞ ,then P { lim t → ∞ M ( t ) = ∞ } = . This proposition reveals that the divergent cumulative intensity Ψ ( t ; ρ ) assures that thereare still new claims being reported with probability one. Let us recall that the number of payments corresponding to the i th claim till time point t isdenoted by N i ( t ) . So, we possess panels of count processes { N i ( t ) } t > for i =

1, . . . , M ( t ) that can be represented as { N ( t ) } t > . Thus, the claim notiﬁcations together with the claimpayments can be viewed as a marked Poisson process with Poisson processes as marks {{ M ( t ) } t ≥ , { N ( t ) } t ≥ } .17 nﬁnitely Stochastic Micro Forecasting In practice, we observe the counting process { N i ( t ) } t > through { U i ,1 , . . . , U i , N i ( t ) } for i =

1, . . . , M ( t ) , where the number of payments for the i th claim is denoted by N i ( t ) and U i , k isthe time of the k th payment within the i th claim for k =

1, . . . , N i ( t ) . Moreover, the amount ofthe k th payment for the i th claim paid at time U i , k is represented by X i , k , which is going to bemodeled in Subsection 3.4. Assumption N . The ordered payment times { U i ,1 , U i ,2 , . . . } of the ith loss are arrival times of a non-homogeneous Poisson process { N i ( t ) } t ≥ . Processes { N i ( t ) } t ≥ , i =

1, 2, . . . are independent havingparametric intensities λ ( t , Z i ; θ ) such that N i ( t ) = ∑ ∞ k = { U i , k ≤ t } , θ ∈ P ⊆ R p , and P is an openconvex set. Since U i , k ≥ Z i (i.e., the payment times come after the reporting time), the correspondingdensity λ has to be constant zero up to the reporting date Z i . Alternatively, one can think ofa ‘restarted’ process ˜ N Z i ( τ ) = ∑ ∞ k = { U i , k − Z i ≤ τ } with an ‘internal’ time τ of the claim i after its reporting time Z i .Note that the processes { N i ( t ) } t ≥ , i =

1, 2, . . . do not have to be identically distributed,because of possible different effect of the reporting date. Here, the intensities λ ( t , Z i ; θ ) can beconsidered as payment frequencies . However, the common parameter θ is assumed to be sharedby different intensity functions λ ’s. In contrast to Lawless (1987), we do not assume a speciﬁcproduct form of the intensity λ and, moreover, we allow the processes N i to depend on theprocess M via the reporting times Z i ’s as the marks’ locations. To the best of our knowledge,we are not aware of any previous work dealing with inference for the marked non-homogeneousPoisson process with marks being non-homogeneous Poisson processes.In order to estimate the unknown parameter θ , we again use the ML approach for the arrivaltimes, which can be considered as an extension of the case for a single realization of the Pois-son process. Such a framework can be extended for several independent non-homogeneousPoisson processes, where the likelihood is as follows L { θ ; N ( t ) , M ( t ) } = M ( t ) ∏ i = (cid:34)(cid:40) N i ( t ) ∏ k = λ ( U i , k , Z i ; θ ) (cid:41) exp (cid:26) − (cid:90) tZ i λ ( τ , Z i ; θ ) d τ (cid:27)(cid:35) ,where Z = ( Z , . . . , Z M ( t ) ) (cid:62) can be also viewed as the covariates (regressors) of the intensity λ having corresponding realizations z . One should bear in mind that the whole informationabout the process { M ( t ) } t ≥ is included in the sequence { Z , Z , . . . } . Furthermore, the inten-18 . Maciak, O. Okhrin, and M. Peˇsta sity function may be decomposed , for instance, as λ ( t , Z i ; θ ) = λ ( t − Z i ; ν ) exp { f ( Z i ; η ) } (5)such that θ = ( ν (cid:62) , η (cid:62) ) (cid:62) , λ ( τ ; ν ) is a baseline intensity function, where λ ( τ ; ν ) = τ < f ( Z i ; η ) is a parametric covariate function introducing the effects of the covariates Z i ’s.To obtain the ML estimator of θ , one has to maximize the log-likelihood function (cid:96) { θ ; N ( t ) , M ( t ) } = M ( t ) ∑ i = (cid:40) N i ( t ) ∑ k = log λ ( U i , k , Z i ; θ ) − (cid:90) tZ i λ ( τ , Z i ; θ ) d τ (cid:41) . (6)The true value θ ∈ P of the unknown vector parameter θ is supposed to uniquely maximize E θ (cid:96) { θ ; N ( t ) , M ( t ) } . For U i : = ( U i ,1 , . . . , U i , N i ( t ) ) (cid:62) , let us deﬁne g i ( U i ; θ , t ) : = (cid:90) tZ i λ ( τ , Z i ; θ ) d τ − N i ( t ) ∑ k = log λ ( U i , k , Z i ; θ ) ,which means that (cid:98) θ = arg min θ ∈ P M ( t ) ∑ i = g i ( U i ; θ , t ) . (7)Assume the following to hold with respect to the g i ( U i ; θ , t ) functions in order to obtain con-sistent and asymptotically normal estimators. Next assumption, being analogous to Assump-tion M

2, secures the existence of the unique solution.

Assumption N . g i ( u ; θ , t ) are convex in θ ∈ P for all < u < . . . < u n < t and n ∈ N . At a very ﬁrst sight, further N -assumptions might be considered as the copied M -assump-tions mutatis mutandis. There is, however, an additional layer of randomness present for themarks N i ’s. For instance, deterministic integrals become stochastic ones. Furthermore, theassumptions regarding the process M are not just replaced by the assumptions for the pro-cesses N i ’s. There are indeed several additional assumptions regarding the marks N i ’s added tosome assumptions for the original underlying process M . Therefore, one can neither simplifynor unify the M - and N -assumptions. Next assumption, being similar to Assumption M Z i . Assumption N . ∂ ∂ θ ∂ θ (cid:62) λ ( · , Z i ; θ ) : R + → R p × p are continuous for all θ ∈ P and there exist nﬁnitely Stochastic Micro Forecasting Lebesgue-integrable functions m i , j and m i , j , k such that (cid:12)(cid:12)(cid:12)(cid:12) ∂∂θ j λ ( t , Z i ; θ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ m i , j ( t ) and (cid:12)(cid:12)(cid:12)(cid:12) ∂ ∂θ j ∂θ k λ ( t , Z i ; θ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ m i , j , k ( t ) almost surely, for all θ ∈ P , i ∈ N , almost every t > , and j , k =

1, . . . , p. Let us deﬁne a cumulative intensity Λ ( t , Z i ; θ ) : = (cid:82) tZ i λ ( τ , Z i ; θ ) d τ , an information matrix J ( t ; θ ) : = E ∑ M ( t ) i = J i ( t ; θ ) , where J i ( t ; θ ) : = (cid:82) tZ i { ∂ θ λ ( τ , Z i ; θ ) } ⊗ λ ( τ , Z i ; θ ) d τ , and a matrix L i ( t , θ ) : = √ λ ( t , Z i ; θ ) (cid:20) ∂ θ λ ( t , Z i ; θ ) − { ∂ θ λ ( t , Z i ; θ ) } ⊗ λ ( t , Z i ; θ ) (cid:21) for t >

0. The forthcoming assumption differs fromthe analogous one M Z i . Assumption N . As t → ∞ ,(i) M ( t ) J − ( t , θ ) converges in probability to a positive semideﬁnite matrix;(ii) E ∑ M ( t ) i = (cid:82) tZ i (cid:8) J − ( t , θ ) L i ( τ , θ ) J − ( t , θ ) (cid:9) d τ → . Analogous discussions like after Assumptions M M N N

4. Brieﬂy and informally, the intensity functions λ ’s representing the behav-ior of the payment times’ processes N i ’s are supposed to be sufﬁciently smooth and adequatelyregular with respect to the amount of information about the parameter θ contained in N i ’s. Theorem 4 (Consistency II) . Under Assumptions M and N – N , J ( t , θ ) (cid:16) (cid:98) θ − θ (cid:17) = − J − ( t , θ ) M ( t ) ∑ i = ∂ θ g i ( U i ; θ , t ) + o P ( ) , t → ∞ .Again, let us discretize the ‘continuous’ time t ∈ R + for the processes { N i ( t ) } t ≥ in a waythat one observes N i only at discrete time points a ∈ N , e.g., status at the closed calendar years.The next Lindeberg condition is used to extend the assertion of Theorem 4 in order to deriveasymptotic normality of (cid:98) θ . Assumption N . lim a → ∞ ∑ ai = E (cid:0) d (cid:62) Y i (cid:1) {| d (cid:62) Y i | ≥ ε (cid:107) d (cid:107) } = for all d ∈ R p and ε > , where Y i : = J − ( a , θ ) (cid:82) jj − (cid:82) az { ∂ θ log λ ( τ , z ; θ ) } ( d ˜ N z ( τ − z ) − λ ( τ , z ; θ ) d τ ) dM ( z ) . The Lyapunov condition can be assumed as well. For instance, for all d ∈ R p , there exists δ > a → ∞ ∑ ai = E (cid:12)(cid:12) d (cid:62) Y i (cid:12)(cid:12) + δ = Corollary 5 (Asymptotic normality II) . Under Assumptions M and N – N , J ( a , θ ) (cid:16) (cid:98) θ − θ (cid:17) D −−→ a → ∞ N p ( , I ) .20 . Maciak, O. Okhrin, and M. Peˇsta The simplest situation is that each { N i ( t ) } t ≥ is a homogeneous Poisson process after thereporting time Z i having a constant common intensity θ > i ’s. Example 5.

Intensity λ ( τ , Z i ; θ ) = θ { τ ≥ Z i } with P = ( ∞ ) . The log-likelihood function is (cid:96) { θ ; N ( t ) , M ( t ) } = ( log θ ) ∑ M ( t ) i = N i ( t ) − θ ∑ M ( t ) i = ( t − Z i ) . The ML estimator becomes ˆ θ = ∑ M ( t ) i = N i ( t ) tM ( t ) − ∑ M ( t ) i = Z i and the information number is J ( t ; θ ) = (cid:110) t Ψ ( t ; ρ ) − (cid:82) t z ψ ( z ; ρ ) dz (cid:111) / θ . Assumption N to-gether with the Lyapunov condition can be checked as well. Hence, (cid:115) a Ψ ( a ; ρ ) − (cid:82) a z ψ ( z ; ρ ) dz θ (cid:16)(cid:98) θ − θ (cid:17) D −−→ a → ∞ N (

0, 1 ) . Moreover, if the underlying intensity of the Poisson process M is ψ ( t ; ρ ) = ρ for t ≥ (i.e., homoge-neous Poisson process), then (cid:113) ρ θ a (cid:16)(cid:98) θ − θ (cid:17) D −−→ a → ∞ N (

0, 1 ) . Another example contains a baseline intensity, which is motivated by Crowder et al. (1991,Subsection 8.5, p. 166).

Example 6.

Intensity λ ( τ , Z i ; ν , η ) = ν ν ( τ − Z i ) ν − exp { η Z i } { τ ≥ Z i } . Here, P = ( ∞ ) × R . The log-likelihood becomes (cid:96) { ν , η ; N ( t ) , M ( t ) } = M ( t ) ∑ i = (cid:40) N i ( t ) log ( ν ν ) + ( ν − ) N i ( t ) ∑ k = log ( U i , k − Z i ) + η Z i N i ( t ) − ν ( t − Z i ) ν exp { η Z i } (cid:41) . The ML estimator has to be computed numerically. The above formulated assumptions are satisﬁed forthe particular open convex P ⊆ R . The deﬁned entities are not presented here due to their complicatedforms. The next example of the intensity function is directly used in the consequent practical anal-ysis of our data for modeling the payment times of bodily injury as well as material damageclaims.

Example 7. λ ( τ , Z i ; ν , η ) = exp { ν + ν ( τ − Z i ) + η cos ( π Z i / η ) + η sin ( π Z i / η ) } . Thedeﬁned entities are again not presented here due to their voluminous forms. Since the reporting delays correspond to different claims from different accidents, indepen-dence between the reporting delays W i ’s for different contracts is assumed. However, the dis-21 nﬁnitely Stochastic Micro Forecasting tribution of the reporting delays is allowed to change with respect to the reporting time Z i .The reporting delays seem to become shorter and shorter, which can be explained by a possi-bility to report an accident over the internet and even by a denser net of the insurance com-pany branches. So, given Z i , W i has a parametric conditional density f W {· ; w ( Z i , ϑ ) } , where ϑ ∈ R r . Note that { W i } i ∈ N are not identically distributed, which allows, for instance, to assume time-varying distributions for the W i ’s through the function w ( · , ϑ ) . Using similar arguments asin Hjort and Pollard (2011), one has consistency and asymptotic normality for the ML estima-tor (cid:98) ϑ .A variety of parametric distributions are suitable for the analysis, particularly those withthe shapes similar to the ones provided by log-normal, Weibull, or Gamma distributions. All ofthem have similar performance, whereas the importance lies in the time-varying parameters.Although in the rest of the study, we concentrate ourselves purely on the log-normal distribu-tion, let us brieﬂy recall all three mentioned densities f Gam ( x ; c , d ) = d c Γ ( c ) x c − exp (cid:16) − xd (cid:17) , x ≥ c > d > f Wei ( x ; c , d ) = ad c x c − exp (cid:110) − (cid:16) xd (cid:17) c (cid:111) , x ≥ c > d > f LN ( x ; c , d ) = √ π xd exp (cid:26) − ( log x − c ) d (cid:27) , x > c ∈ R , d >

0. (8)For the Gamma and Weibull distributions, parameters c and d are called the shape and scale,respectively, for the log-normal distribution c and d are the mean and standard deviation of thedistribution on the log scale, respectively.Taking into account the dependency between the accident date Z i and the reporting de-lay W i and, additionally to that, allowing for possible seasonal behavior, we consider truncatedFourier series for the parameters of the conditional distributions with ϑ = ( α c , β c , δ c ,1 , ξ c ,1 , γ c ,1 , . . . , δ c , L , ξ c , L , γ c , L ) (cid:62) , ϑ = ( α d , β d , δ d ,1 , ξ d ,1 , γ d ,1 , . . . , δ d , L , ξ d , L , γ d , L ) (cid:62) in the form of c ( z , ϑ ) = α c + z β c + L ∑ (cid:96) = (cid:26) δ c , (cid:96) cos (cid:18) ξ c , (cid:96) · π · z · (cid:19) + γ c , (cid:96) sin (cid:18) ξ c , (cid:96) · π · z · (cid:19)(cid:27) , (9)22 . Maciak, O. Okhrin, and M. Peˇsta d ( z , ϑ ) = α d + z β d + L ∑ (cid:96) = (cid:26) δ d , (cid:96) cos (cid:18) ξ d , (cid:96) · π · z · (cid:19) + γ d , (cid:96) sin (cid:18) ξ d , (cid:96) · π · z · (cid:19)(cid:27) , (10)where 52 is the number of weeks in one year and 7 is the number of days in one week. Later onfor considering models with increasing ﬂexibility , we discuss constant models with L = β c = β d =

0, linear models with L =

0, and models with one ( L =

1) or two ( L =

2) seasonalitypatterns. The ML estimator of ϑ = ( ϑ (cid:62) , ϑ (cid:62) ) (cid:62) is obtained as (cid:98) ϑ = arg max ϑ M ( t ) ∑ i = log f W { W i ; c ( Z i , ϑ ) , d ( Z i , ϑ ) } , (11)s.t. c ( Z i , ϑ ) > d ( Z i , ϑ ) >

0, for all i ∈ {

1, . . . , M ( t ) } . (12)The condition c ( Z i , ϑ ) > f W isWeibull or Gamma. As initial values in the iterative maximization procedure, we took theparameter values that at best ﬁt the piecewise constant weekly averaged values in the leastsquares sense.Figure 3 shows the estimation results for f W being the log-normal distribution. The leftpanel corresponds to the bodily injury claims, where the right one to the material damageclaims, on the bottom panel we show the number of accidents as the function of the reportingdate. In the two upper panels, we depict the location parameters c ( z , (cid:98) ϑ ) and two middlepanels scale parameters d ( z , (cid:98) ϑ ) over different weeks of the reporting date. Different colorsrepresent different complexities of the models used: for cyan we have a constant unconditionalmodel with L = β c = β d =

0; pink uses only linear temporal dependency with L = L =

1) and two ( L =

2) levels of seasonality. The most ﬂexibledensity with L = W i .In order to validate our results from the ﬁtted model, Figure 4 compares the observed andpredicted quarterly averaged reporting (waiting) delays in days for the bodily injury claims aswell as for the material damage claims. 23 nﬁnitely Stochastic Micro Forecasting W, lognorm, Bodily cases

Index S hape 23456 Index S c a l e W, weeklyW, cnst + lin + period1 + period2W, cnst + lin + period1W, cnst + linW, cnst . . . . . . Reporting date, Z N u m o f A cc . W, lognorm, Material cases

Index S hape 2 . . . . . . . Index S c a l e W, weeklyW, cnst + lin + period1 + period2W, cnst + lin + period1W, cnst + linW, cnst . . . Reporting date, Z N u m o f A cc . Figure 3: Weekly estimates (solid grey) and conditional temporal models: constant (cyandashed), linear trend (pink dotted), linear trend and one period (green dashed), and lineartrend with two periods (blue solid) for shape (top panels) and scale (middle panels) of the log-normal distribution of the reporting delay W i . The extrapolated periods are depicted in redand the numbers of accidents are in black. 24 . Maciak, O. Okhrin, and M. Peˇsta D a ys Reporting delay [Bodily injury] D a ys Reporting delay [Material damage]

Figure 4: Quarterly averaged (with respect to the reporting date) reporting (waiting) delays indays—observed in blue and predicted in orange for the period 2015–2016 (prediction uses onlydata up to the end of 2015).

Denote the j th payment amount for the i th claim by X i , j , where j =

1, . . . , N i ( t ) . The X i , j ’s areindependent over all j ’s as well as all i ’s. This independency assumption can be easily relaxedand any other times series model (e.g., an autoregressive model) can be used instead. How-ever, our empirical ﬁndings imply independency. Given Z i , X i , j has a parametric conditionaldensity f X {· ; v ( Z i , ς ) } , where ς ∈ R s and the function v ( · , ς ) introduces the time-varying ef-fects of Z i ’s. Moreover, the reporting delays W i ’s are also supposed to be independent fromthe payment amounts X i , j ’s, as well as all the payment amounts from the same claim are inde-pendent among each other. These assumptions are based on the preliminary empirical analysisof the pairwise relationships between the waiting time ( W i ) and the ﬁrst ( X i ,1 ), second ( X i ,2 ),third ( X i ,3 ), and fourth ( X i ,4 ) claim payment amounts shown in Figure 5. For this plot, the dataare transformed by the estimated cumulative distribution functions (cdf) ˆ F X and ˆ F W obtainedfrom the plugged-in densities ˆ f X ( · ) ≡ f X {· , v ( Z i , (cid:98) ς ) } and ˆ f W ( · ) ≡ f W {· , w ( Z i , (cid:98) ϑ ) } , respectively.Further, the data are transformed via the quantile function of the standard normal distribution.If the distributional assumptions are correct and the payments and delays are independent, thebivariate kernel density estimates and scatterplots of the transformed data should be sugges-tive of circular shapes as clearly visible in Figure 5.Using similar arguments as in Hjort and Pollard (2011), one can prove consistency andasymptotic normality for the ML estimator (cid:98) ς . The procedure for modeling the claim paymentsclosely resembles the procedure mentioned when modeling the reporting delays in Subsec-tion 3.3. For the modeling of the payment amounts, we also considered more complex models,25 nﬁnitely Stochastic Micro Forecasting F - ( F^ W ( w )) −3 −1 0 1 2 3 . . . . . . . . . . . . . . −3 −1 0 1 2 3 . . . . . . . . − − . . . . . . . − − F - ( F^ X ( x )) . . . . . . . . . . . . . . . . . . . . . . F - ( F^ X ( x )) . . . . . . . . . − − . . . . . . . − − F - ( F^ X ( x )) . . . . . . −3 −1 0 1 2 3 −3 −1 0 1 2 3 −3 −1 0 1 2 3 − − F - ( F^ X ( x )) F - ( F^ W ( w )) −3 −1 0 1 2 3 . . . . . . . . . . . . . . . . . . . . −3 −1 0 1 2 3 . . . . . . . . . . − − . . . . . . . . . . − − F - ( F^ X ( x )) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F - ( F^ X ( x )) . . . . . . . . . . − − . . . . . . . . . . − − F - ( F^ X ( x )) . . . . . . . . . . . −3 −1 0 1 2 3 −3 −1 0 1 2 3 −3 −1 0 1 2 3 − − F - ( F^ X ( x )) Figure 5: Pairwise relationship between the reporting delay and the claim payment amounts(bodily injury claims—left; material damage claims—right). Subﬁgures below the diagonalshow scatter plots for the transformed reporting delays/payment amounts Φ − { ˆ F j ( · ) } versus Φ − { ˆ F k ( · ) } , where j , k ∈ { W , X , X , X , X } , j (cid:54) = k , ˆ F j is the corresponding estimated cdf, and Φ is the cdf of the standard normal distribution. Subﬁgures above the diagonal display contourplots of Φ − { ˆ F j ( · ) } against Φ − { ˆ F k ( · ) } .where previous payments were included as exogenous variables. This however did not bringany improvements.Figure 6 presents the time-varying parameters of log-normal distribution of bodily injuryas well as material damage claims for the ﬁrst payment in yellow (fully ﬂexible model with twoseasonal periods) and in grey (separately for each week). Other curves present parameters forall the payment amounts pooled together. As there is no much difference and results for thepooled models seemed to be more stable, we concentrate in the later only on the pooled ones,namely the most ﬂexible with two periods and a linear trend. To numerically illustrate the performance of our method, we use two data sets—bodily injuryand material damage claims (cf. motivation and data description in Section 2.3). Let us recallthat data from the last available year 2016 are used only for back-testing and comparison withthe predicted results. Furthermore, our ‘micro’ (granular, claim-by-claim) approach is also26 . Maciak, O. Okhrin, and M. Peˇsta

Payments amounts, log−norm, Bodily cases

Index S hape 891011 Index S c a l e X1, weeklyX1, cnst + lin + period1 + period2Xi, cnst + lin + period1 + period2Xi, cnst + lin + period1Xi, cnst + linXi, cnst . . . . . . Reporting date, Z N u m o f A cc . Payments amounts, log−norm, Material cases

Index S hape 9 . . . . . . . Index S c a l e X1, weeklyX1, cnst + lin + period1 + period2Xi, cnst + lin + period1 + period2Xi, cnst + lin + period1Xi, cnst + linXi, cnst . . . . . . Reporting date, Z N u m o f A cc . Figure 6: Weekly estimates (only X t in solid grey) and conditional temporal models: constant(cyan dashed), linear trend (pink dotted), linear trend and one period (green dashed), andlinear trend with two periods (blue solid for X i , t and yellow dot-dashed for X t ) for shape (toppanel) and scale (middle panel) of the log-normal distribution of the X i , t . The extrapolatedperiods are depicted in red and the numbers of accidents are in black.27 nﬁnitely Stochastic Micro Forecasting compared with a traditional standard actuarial technique— bootstrap chain-ladder (England andVerrall, 1999)—in combination with linear extrapolation of the reported claims in the next year.This ‘macro’ approach is based on aggregation of data and, hence, it disregards the informationabout the policy and the claim’s development. We refer to it from now on as the aggregatedmethod.Firstly, the previously described estimation procedures (Section 3) provide parameter es-timates of our omnibus model. Secondly, a Monte Carlo prediction technique is involved inorder to generate (simulate) the future claims’ developments. In essence, prediction for a dis-tribution of the total payments in year 2016, for which we also possess the real paid claimamounts. All the estimates are obtained through the ML approach, which guaranties a proper stochasticinference. For the case of densities, it is widely known that under some regularity conditionsthe ML estimators are consistent and asymptotically normal. For the case of intensities, wehave proved consistency and asymptotic normality of these ML estimators. Consequently, onecan plug-in the estimated parameters into the parametric forms of the densities and intensi-ties present in our micro model in order to have predicted (ﬁtted) intensities of the reportingdates/payment dates and densities of the reporting delays/payment amounts. They are goingto be used for the simulation of the future payments (dates and amounts).In particular, the intensity function for modeling the reporting times of bodily injury claimscomes from Example 3 and in case of material damage claims from Example 4. The reportingdelay and the claim payments are modeled as in relations (8)–(12). And the intensity functionsfor the payment times of bodily injury as well as material damage claims come from Example 7.

Basically for each claim being reported up to the future time point b from Figure 1 (e.g., endof the next calendar year), one needs to simulate payment dates and corresponding paymentamounts, which are going to be summed in each simulation’s run. These sums of paymentsgive us the simulated (empirical) predictive distribution of the total future payments. Hence,for the next year (in a general future time window ( a , b ] ), we need to simulate the new paymentdates for all already reported claims as well as the payment dates corresponding to the incurred28 . Maciak, O. Okhrin, and M. Peˇsta but not reported claims. Consequently, it is requisite to generate a corresponding paymentamount for every payment time within the time interval ( a , b ] .Let us realize that we need to generate many realizations of the non-homogeneous Poissonprocess for each Monte Carlo simulation’s run. To simulate the non-homogeneous Poisson pro-cess, we suggest to rely on the thinning algorithm by Lewis and Shedler (1979). The main reasonfor choosing this way to generate enormous number of realizations of the non-homogeneousPoisson process is that this approach can be applied to any rate function without the necessityof numerical integration or simulation of Poisson variables. Our primary target is to predict the distribution of the total payment amounts within the futuretime period. Such a prediction is going to be obtained through Procedure 1.The predicted distribution of the forthcoming payment amounts within the next year isgraphically displayed in Figure 7. Here, our micro approach is compared with the traditionalmethod based on data aggregation. Moreover, the true cumulative amount of the one-yearahead payments is depicted in order to judge the point prediction’s precision.There are two general and, from a practical point of view, very important ﬁndings withrespect to the prediction of the future total payment amounts. First, our claim-by-claim basedmethod is more precise in point prediction to the ‘unknown’ true value compared to the tradi-tional technique based on aggregated data. And this holds for both lines of business. Second,our micro approach provides less volatile predicted distribution, e.g., in terms of the coefﬁcientof variation.

Our secondary practical target is to back-predict the accident dates of the claims, which are truncated due to the reporting delay. We are indeed not aware of so-called incurred but notreported claims and the insurance company needs to back-predict these claims, which havealready occurred, but have not been reported yet. This can be reached via Procedure 2.The counts of the back-predicted accident dates for the latest year as well as the counts ofthe predicted accident dates for the next year are visualized in Figure 8.It is of utmost importance for the insurance company to have information regarding theaccidents, which have not been reported yet. Figure 8 reveals triplets of bars, that nicely ac-29 nﬁnitely Stochastic Micro Forecasting

Procedure 1

Prediction of the distribution of the future payments

Input:

Collection of observations { T i , Z i , { U i , j } N i ( a ) j = , { X i , j } N i ( a ) j = } M ( a ) i = and number of MonteCarlo simulation’s runs S Output:

Simulated predictive distribution of the total future payments in ( a , b ] , i.e., the empir-ical distribution where probability mass 1/ S concentrates at each of ( ) P ( a , b ) , . . . , ( S ) P ( a , b ) obtain the ML estimator (cid:98) ρ as in (4) for the parametric intensity of the reporting dates compute the ML estimator (cid:98) θ as in (7) for the parametric intensities of the payments dates calculate the ML estimator (cid:98) ς in the same manner as in (11)–(12) for the parametric densitiesof the payment amounts for s = S do // repeat in order to obtain the empirical distribution generate a realization of the non-homogeneous Poisson process { ( s ) M ( t ) } t ≥ with intensity ψ ( t ; (cid:98) ρ ) for the future time window ( a , b ] as the arrival times { ( s ) Z M ( a )+ , . . . , ( s ) Z ( s ) M ( b ) } representing the future reporting dates for i = M ( a ) do // payments for the (old) already reported claims generate a realization of the non-homogeneous Poisson process { ( s ) N i ( t ) } t ≥ with intensity λ ( t , Z i ; (cid:98) θ ) for the time window ( a , b ] as the arrival times { ( s ) U i , N i ( a )+ , . . . , ( s ) U i , ( s ) N i ( b ) } representing the future payments dates generate the payment amounts { ( s ) X i , N i ( a )+ , . . . , ( s ) X i , ( s ) N i ( b ) } independently from thedensity f X {· ; v ( Z i , (cid:98) ς ) } end for for i = M ( a ) + ( s ) M ( b ) do // payments for the (future) new reported claims generate a realization of the non-homogeneous Poisson process { ( s ) N i ( t ) } t ≥ with intensity λ ( t , ( s ) Z i ; (cid:98) θ ) for the time window ( a , b ] as the arrival times { ( s ) U i , N i ( a )+ , . . . , ( s ) U i , ( s ) N i ( b ) } representing the future payments dates generate the payment amounts { ( s ) X i , N i ( a )+ , . . . , ( s ) X i , ( s ) N i ( b ) } independently from thedensity f X {· ; v ( ( s ) Z i , (cid:98) ς ) } end for calculate the total future payments ( s ) P ( a , b ) = ∑ ( s ) M ( b ) i = ∑ ( s ) N i ( b ) j = N i ( a )+ ( s ) X i , j end forProcedure 2 Estimation of the intensity of the accident dates

Input:

Observations { T i , Z i } i = M ( a ) Output:

Fitted intensity (cid:98) µ for the underlying Poisson process of the accident dates obtain the ML estimator (cid:98) ρ as in (4) for the parametric intensity of the reporting dates calculate the ML estimator (cid:98) ϑ as in (11)–(12) for the parametric densities of the reportingdelays get the estimator of the intensity µ as in (1), i.e., by plugging-in the correspond-ing estimates (cid:98) ρ and (cid:98) ϑ and performing numerical integration (cid:98) µ ( t ) ≡ µ ( t ; (cid:98) ρ , (cid:98) ϑ ) = (cid:82) R ψ ( u ; (cid:98) ρ ) f W { t ; w ( u , (cid:98) ϑ ) } d u commodate the problem of truncated data in our setup. The middle horizontal bar in eachtriplet corresponds to the known number of accident dates based on the database up to year2015. The left bar in the triplet stands again for the known number of accident dates, although,coming from the database up to year 2016. Therefore, the left bar has to be higher than the mid-dle one, because additional claims can be reported within the calendar year 2016 (and they can30 . Maciak, O. Okhrin, and M. Peˇsta Distribution of payments [Bodily claims]

Method F u t u r e pa y m en t s [ M illi on s ] Distribution of payments [Material claims]

Method F u t u r e pa y m en t s [ M illi on s ] Figure 7: Prediction of the distribution of the forthcoming payments for the next year (primaryaim)—traditional (aggregated) method in blue, micro (granular) method in red. Bold solidhorizontal line corresponds to the median of the predictive distribution. Height of the greyvertical bar corresponds to the mean of the predictive distribution. Colored solid horizontalwhiskers represent the 0.5th and the 99.5th percentiles of the predictive distribution. Dashedhorizontal line stands for the real (true) sum of payments.occur in 2016 or even in previous years). The right bar in the triplet represents our prediction.It is supposed to be slightly higher even than the left bar, because there can occur additionalaccidents even before the end of year 2016 that are not going to be reported till the end of year2016.

Micro forecasting is in general a stochastic prediction method for future losses/costs relyingon the individual developments of the recorded historical events. Our prediction approach iscapable to model the probabilistic behavior of the future losses’ occurrences, the occurrencesof the incurred but not reported losses, the lengths of the reporting delays, and the frequencyand severity of the loss payments in time. This is indeed sufﬁcient for prediction of the future31 nﬁnitely Stochastic Micro Forecasting

Jan15 Apr15 Jul15 Oct15 Jan16 Apr16 Jul16 Oct16

Counts of accident dates [Bodily]

Jan15 Apr15 Jul15 Oct15 Jan16 Apr16 Jul16 Oct16

Counts of accident dates [Material]

Figure 8: Predicted truncated accident dates (secondary aim)—triplets of bars represent: ob-served counts of the accident dates from the database up to the end of 2016 (in yellow/lightblue); observed counts of the accident dates from the database up to the end of 2015 (in or-ange/darker blue); and predicted counts of the accident dates based on the data till the end of2015 (in red/violet).cash-ﬂows for a predetermined time horizon.We employ the micro prediction technique in claims reserving. To meet all future insuranceclaims rising from policies, it is requisite to quantify the outstanding loss liabilities. Here, utilityfor solvency of the insurance company is developed. And, clearly, valuation of the reserving riskin insurance is not the only area of empirical economics, where the proposed methodology canbe applied as documented by several case examples.Quantifying reserving risk in non-life insurance inadvertently yields to a theoretical frame-work of the marked non-homogeneous Poisson process with non-homogeneous Poisson processes asmarks . It can be viewed as an inﬁnitely stochastic Poisson process and, consequently, a properstatistical inference relying on simple and veriﬁable assumptions is derived.

Acknowledgements

The research of Mat ´uˇs Maciak and Michal Peˇsta was supported by the Czech Science Founda-tion project GA ˇCR No. 18-01781Y.

A Proofs

Proof of Theorem 1.

Let us choose t >

0. With respect to Assumption M

2, consider the convexfunction H t ( s ) : = M ( t ) ∑ i = (cid:104) h (cid:110) Z i ; ρ + I − ( t , ρ ) s , t (cid:111) − h ( Z i ; ρ , t ) (cid:105) . Maciak, O. Okhrin, and M. Peˇsta in s ∈ R q . It is minimized by I ( t , ρ ) ( (cid:98) ρ − ρ ) . The Taylor series expansion gives H t ( s ) = s (cid:62) I − ( t , ρ ) M ( t ) ∑ i = ∂ ρ h ( Z i ; ρ , t ) (cid:124) (cid:123)(cid:122) (cid:125) = : U ( t ) + s (cid:62) I − ( t , ρ ) (cid:40) M ( t ) ∑ i = ∂ ρ h ( Z i ; ρ , t ) (cid:41) I − ( t , ρ ) (cid:124) (cid:123)(cid:122) (cid:125) = : V ( t ) s + r t ( s ) (13)almost surely, where r t ( s ) = M ( t ) o (cid:8) s (cid:62) I − ( t , ρ ) s (cid:9) → t → ∞ , because ofAssumption M M ρ denoted by U t ( ρ ) such that for all ρ ∈ U t ( ρ ) , it holds that (cid:82) t (cid:12)(cid:12) ∂ ρ , j ψ ( z ; ρ ) (cid:12)(cid:12) d z < ∞ for all j =

1, . . . , q and one caninterchange derivative and integral, i.e., ∂ ρ (cid:82) t ψ ( z ; ρ ) d z = (cid:82) t ∂ ρ ψ ( z ; ρ ) d z .Let us realize that the sequence { Z i } i ∈ N forms arrival times of the Poisson counting process { M ( t ) } t ≥ and, hence, M ( t ) ∑ i = ∂ ρ ψ ( Z i ; ρ ) ψ ( Z i ; ρ ) = (cid:90) t ∂ ρ ψ ( z ; ρ ) ψ ( z ; ρ ) d M ( z ) .Recall that h ( Z i ; ρ , t ) = M ( t ) Ψ ( t ; ρ ) − log ψ ( Z i ; ρ ) . Since ∂ ρ ψ ( · ; ρ ) ψ ( · ; ρ ) is continuous, we obtain E U ( t ) = E (cid:34) I − ( t , ρ ) M ( t ) ∑ i = ∂ ρ (cid:26) M ( t ) Ψ ( t ; ρ ) − log ψ ( Z i ; ρ ) (cid:27)(cid:35) = I − ( t , ρ ) E (cid:40) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z − M ( t ) ∑ i = ∂ ρ ψ ( Z i ; ρ ) ψ ( Z i ; ρ ) (cid:41) = I − ( t , ρ ) E (cid:26) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z − (cid:90) t ∂ ρ ψ ( z ; ρ ) ψ ( z ; ρ ) d M ( z ) (cid:27) = I − ( t , ρ ) (cid:26) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z − (cid:90) t ∂ ρ ψ ( z ; ρ ) ψ ( z ; ρ ) ψ ( z ; ρ ) d z (cid:27) = .One can apply the It ˆo isometry for jump processes Var U ( t ) = E { U ( t ) } ⊗ = E (cid:34) I − ( t , ρ ) M ( t ) ∑ i = ∂ ρ (cid:26) M ( t ) Ψ ( t ; ρ ) − log ψ ( Z i ; ρ ) (cid:27)(cid:35) ⊗ = I − ( t , ρ ) E (cid:26) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z − (cid:90) t ∂ ρ ψ ( z ; ρ ) ψ ( z ; ρ ) d M ( z ) (cid:27) ⊗ I − ( t , ρ )= I − ( t , ρ ) E (cid:34) (cid:26) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z (cid:27) ⊗ + (cid:26) (cid:90) t ∂ ρ ψ ( z ; ρ ) ψ ( z ; ρ ) d M ( z ) (cid:27) ⊗ nﬁnitely Stochastic Micro Forecasting − (cid:26) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z (cid:27) (cid:26) (cid:90) t ∂ ρ ψ ( z ; ρ ) ψ ( z ; ρ ) d M ( z ) (cid:27) (cid:62) − (cid:26) (cid:90) t ∂ ρ ψ ( z ; ρ ) ψ ( z ; ρ ) d M ( z ) (cid:27) (cid:26) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z (cid:27) (cid:62) (cid:35) I − ( t , ρ )= I − ( t , ρ ) (cid:34) (cid:90) t { ∂ ρ ψ ( z ; ρ ) } ⊗ ψ ( z ; ρ ) ψ ( z ; ρ ) d z (cid:35) I − ( t , ρ )= I − ( t , ρ ) I ( t ; ρ ) I − ( t , ρ ) = I , (14)due to Assumptions M

3. Moreover, E V ( t ) = E (cid:34) I − ( t , ρ ) M ( t ) ∑ i = ∂ ρ (cid:26) M ( t ) Ψ ( t ; ρ ) − log ψ ( Z i ; ρ ) (cid:27) I − ( t , ρ ) (cid:35) = I − ( t , ρ ) E (cid:34) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z − M ( t ) ∑ i = (cid:32) ∂ ρ ψ ( Z i ; ρ ) ψ ( Z i ; ρ ) − { ∂ ρ ψ ( Z i ; ρ ) } ⊗ ψ ( Z i ; ρ ) (cid:33) (cid:35) I − ( t , ρ )= I − ( t , ρ ) E (cid:34) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z − (cid:90) t (cid:32) ∂ ρ ψ ( z ; ρ ) ψ ( z ; ρ ) − { ∂ ρ ψ ( z ; ρ ) } ⊗ ψ ( z ; ρ ) (cid:33) d M ( z ) (cid:35) I − ( t , ρ )= I − ( t , ρ ) (cid:34) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z − (cid:90) t (cid:32) ∂ ρ ψ ( z ; ρ ) ψ ( z ; ρ ) − { ∂ ρ ψ ( z ; ρ ) } ⊗ ψ ( z ; ρ ) (cid:33) ψ ( z ; ρ ) d z (cid:35) I − ( t , ρ )= I − ( t , ρ ) (cid:90) t { ∂ ρ ψ ( z ; ρ ) } ⊗ ψ ( z ; ρ ) d z I − ( t , ρ )= I − ( t , ρ ) I ( t ; ρ ) I − ( t , ρ ) = I , (15)because of Assumption M

3. Furthermore for every s ∈ R q , it holds that Var (cid:110) s (cid:62) V ( t ) s (cid:111) = s (cid:62) Var { V ( t ) s } s = s (cid:62) (cid:104) E { V ( t ) s } ⊗ − { E V ( t ) s } ⊗ (cid:105) s .The ( j , k ) -element of E { V ( t ) s } ⊗ ≡ E (cid:8) V ( t ) ss (cid:62) V ( t ) (cid:9) has a form of E q ∑ (cid:96) = q ∑ m = s (cid:96) s m ( V ( t )) j , (cid:96) ( V ( t )) m , k = E q ∑ (cid:96) = q ∑ m = s (cid:96) s m . Maciak, O. Okhrin, and M. Peˇsta × q ∑ ˜ (cid:96) = q ∑ ˘ (cid:96) = κ j ,˜ (cid:96) ( t ) (cid:32) M ( t ) ∑ i = ∂ ρ h ( Z i ; ρ , t ) (cid:33) ˜ (cid:96) ,˘ (cid:96) κ ˘ (cid:96) , (cid:96) ( t ) q ∑ ˜ m = q ∑ ˘ m = κ m , ˜ m ( t ) (cid:32) M ( t ) ∑ i = ∂ ρ h ( Z i ; ρ , t ) (cid:33) ˜ m , ˘ m κ ˘ m , k ( t ) ,(16)where ss (cid:62) = ( s (cid:96) s m ) q , q (cid:96) = m = and I − ( t , ρ ) = : ( κ (cid:96) , m ( t )) q , q (cid:96) = m = . Let us calculate E (cid:32) M ( t ) ∑ i = ∂ ρ h ( Z i ; ρ , t ) (cid:33) j , (cid:96) (cid:32) M ( t ) ∑ i = ∂ ρ h ( Z i ; ρ , t ) (cid:33) m , k  = E (cid:34)(cid:32) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z − M ( t ) ∑ i = (cid:40) ∂ ρ ψ ( Z i ; ρ ) ψ ( Z i ; ρ ) − { ∂ ρ ψ ( Z i ; ρ ) } ⊗ ψ ( Z i ; ρ ) (cid:41) (cid:33) j , (cid:96) × (cid:32) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z − M ( t ) ∑ i = (cid:40) ∂ ρ ψ ( Z i ; ρ ) ψ ( Z i ; ρ ) − { ∂ ρ ψ ( Z i ; ρ ) } ⊗ ψ ( Z i ; ρ ) (cid:41) (cid:33) m , k (cid:35) = E (cid:34)(cid:40) ∂ ρ , j , (cid:96) (cid:90) t ψ ( z ; ρ ) d z (cid:41)(cid:40) ∂ ρ , m , k (cid:90) t ψ ( z ; ρ ) d z (cid:41) + (cid:40) (cid:90) t (cid:32) ∂ ρ , j , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) − ∂ ρ , j ψ ( z ; ρ ) ∂ ρ , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:33) d M ( z ) (cid:41) × (cid:40) (cid:90) t (cid:32) ∂ ρ , m , k ψ ( z ; ρ ) ψ ( z ; ρ ) − ∂ ρ , m ψ ( z ; ρ ) ∂ ρ , k ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:33) d M ( z ) (cid:41) − (cid:40) ∂ ρ , j , (cid:96) (cid:90) t ψ ( z ; ρ ) d z (cid:41)(cid:40) (cid:90) t (cid:32) ∂ ρ , m , k ψ ( z ; ρ ) ψ ( z ; ρ ) − ∂ ρ , m ψ ( z ; ρ ) ∂ ρ , k ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:33) d M ( z ) (cid:41) − (cid:40) ∂ ρ , m , k (cid:90) t ψ ( z ; ρ ) d z (cid:41)(cid:40) (cid:90) t (cid:32) ∂ ρ , j , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) − ∂ ρ , j ψ ( z ; ρ ) ∂ ρ , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:33) d M ( z ) (cid:41)(cid:35) = (cid:40) ∂ ρ , j , (cid:96) (cid:90) t ψ ( z ; ρ ) d z (cid:41)(cid:40) ∂ ρ , m , k (cid:90) t ψ ( z ; ρ ) d z (cid:41) + (cid:90) t (cid:40) ∂ ρ , j , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) − ∂ ρ , j ψ ( z ; ρ ) ∂ ρ , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:41) × (cid:40) ∂ ρ , m , k ψ ( z ; ρ ) ψ ( z ; ρ ) − ∂ ρ , m ψ ( z ; ρ ) ∂ ρ , k ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:41) ψ ( z ; ρ ) d z + (cid:34) (cid:90) t (cid:40) ∂ ρ , j , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) − ∂ ρ , j ψ ( z ; ρ ) ∂ ρ , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:41) ψ ( z ; ρ ) d z (cid:35) × (cid:34) (cid:90) t (cid:40) ∂ ρ , m , k ψ ( z ; ρ ) ψ ( z ; ρ ) − ∂ ρ , m ψ ( z ; ρ ) ∂ ρ , k ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:41) ψ ( z ; ρ ) d z (cid:35) − (cid:40) ∂ ρ , j , (cid:96) (cid:90) t ψ ( z ; ρ ) d z (cid:41)(cid:34) (cid:90) t (cid:40) ∂ ρ , m , k ψ ( z ; ρ ) ψ ( z ; ρ ) − ∂ ρ , m ψ ( z ; ρ ) ∂ ρ , k ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:41) ψ ( z ; ρ ) d z (cid:35) − (cid:40) ∂ ρ , m , k (cid:90) t ψ ( z ; ρ ) d z (cid:41)(cid:34) (cid:90) t (cid:40) ∂ ρ , j , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) − ∂ ρ , j ψ ( z ; ρ ) ∂ ρ , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:41) ψ ( z ; ρ ) d z (cid:35) nﬁnitely Stochastic Micro Forecasting = (cid:90) t ψ ( z ; ρ ) (cid:26) ∂ ρ , j , (cid:96) ψ ( z ; ρ ) − ∂ ρ , j ψ ( z ; ρ ) ∂ ρ , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:27) × (cid:26) ∂ ρ , m , k ψ ( z ; ρ ) − ∂ ρ , m ψ ( z ; ρ ) ∂ ρ , k ψ ( z ; ρ ) ψ ( z ; ρ ) (cid:27) d z + (cid:90) t ∂ ρ , j ψ ( z ; ρ ) ∂ ρ , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) d z (cid:90) t ∂ ρ , m ψ ( z ; ρ ) ∂ ρ , k ψ ( z ; ρ ) ψ ( z ; ρ ) d z . (17)Moreover, E (cid:32) M ( t ) ∑ i = ∂ ρ h ( Z i ; ρ , t ) (cid:33) j , (cid:96) E (cid:32) M ( t ) ∑ i = ∂ ρ h ( Z i ; ρ , t ) (cid:33) m , k = E (cid:32) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z − M ( t ) ∑ i = (cid:34) ∂ ρ ψ ( Z i ; ρ ) ψ ( Z i ; ρ ) − { ∂ ρ ψ ( Z i ; ρ ) } ⊗ ψ ( Z i ; ρ ) (cid:35) (cid:33) j , (cid:96) × E (cid:32) ∂ ρ (cid:90) t ψ ( z ; ρ ) d z − M ( t ) ∑ i = (cid:34) ∂ ρ ψ ( Z i ; ρ ) ψ ( Z i ; ρ ) − { ∂ ρ ψ ( Z i ; ρ ) } ⊗ ψ ( Z i ; ρ ) (cid:35) (cid:33) m , k = (cid:90) t ∂ ρ , j ψ ( z ; ρ ) ∂ ρ , (cid:96) ψ ( z ; ρ ) ψ ( z ; ρ ) d z (cid:90) t ∂ ρ , m ψ ( z ; ρ ) ∂ ρ , k ψ ( z ; ρ ) ψ ( z ; ρ ) d z . (18)Thus,tr Var { V ( t ) s } = tr (cid:104) E { V ( t ) s } ⊗ − { E V ( t ) s } ⊗ (cid:105) = tr (cid:90) t I − ( t , ρ ) K ( z , ρ ) I − ( t , ρ ) ss (cid:62) I − ( t , ρ ) K ( z , ρ ) I − ( t , ρ ) d z = s (cid:62) (cid:20) (cid:90) t (cid:110) I − ( t , ρ ) K ( z , ρ ) I − ( t , ρ ) (cid:111) d z (cid:21) s → Var (cid:8) s (cid:62) V ( t ) s (cid:9) → t → ∞ , because of Assumption M (cid:4) Proof of Corollary 2.

Let us choose a ∈ N . One can observe that I − ( a , ρ ) M ( a ) ∑ i = ∂ ρ h ( Z i ; ρ , a )= − a ∑ i = (cid:90) ii − I − ( a , ρ ) { ∂ ρ log ψ ( z ; ρ ) } ( d M ( z ) − ψ ( z ; ρ ) d z ) = − a ∑ i = Y i is a sum of independent random vectors. Since E Y i = E (cid:90) ii − I − ( a , ρ ) { ∂ ρ log ψ ( z ; ρ ) } ( d M ( z ) − ψ ( z ; ρ ) d z ) = . Maciak, O. Okhrin, and M. Peˇsta and E Y ⊗ i = E (cid:20) (cid:90) ii − I − ( a , ρ ) { ∂ ρ log ψ ( z ; ρ ) } ( d M ( z ) − ψ ( z ; ρ ) d z ) (cid:21) ⊗ = I − ( a , ρ ) (cid:90) ii − { ∂ ρ ψ ( z ; ρ ) } ⊗ ψ ( z ; ρ ) ψ ( z ; ρ ) d z I − ( a , ρ ) ,we have Var a ∑ i = Y i = Var (cid:40) I − ( a , ρ ) M ( a ) ∑ i = ∂ ρ h ( Z i ; ρ , a ) (cid:41) = I − ( a , ρ ) (cid:90) a { ∂ ρ ψ ( z ; ρ ) } ⊗ ψ ( z ; ρ ) d z I − ( a , ρ ) = I − ( a , ρ ) I ( a , ρ ) I − ( a , ρ ) = I .The Lindeberg condition in Assumption M Y i ’s. Thus, I − ( a , ρ ) M ( a ) ∑ i = ∂ ρ h ( Z i ; ρ , a ) D −−→ a → ∞ N q ( , I ) .Hence, the desired convergence in distribution follows from the asymptotic representation inTheorem 1. (cid:4) Proof of Proposition 3.

For the non-homogeneous Poisson process { M ( t ) } t ≥ holds that M ( t ) = (cid:101) M ( Ψ ( t ; ρ )) , where { (cid:101) M ( t ) } t ≥ is a standard Poisson process (i.e., homogeneous Poisson processwith intensity equal one). Suppose that (cid:101) Z n ’s are arrival times of the standard Poisson process { (cid:101) M ( t ) } t ≥ . Since P (cid:26) lim t → ∞ M ( t ) < ∞ (cid:27) = P (cid:20) lim t → ∞ (cid:101) M { Ψ ( t ; ρ ) } < ∞ (cid:21) = P (cid:16) (cid:101) Z n − (cid:101) Z n − = ∞ for some n (cid:17) = P (cid:34) ∞ (cid:91) n = (cid:110) (cid:101) Z n − (cid:101) Z n − = ∞ (cid:111)(cid:35) ≤ ∞ ∑ n = P (cid:16) (cid:101) Z n − (cid:101) Z n − = ∞ (cid:17) = t → ∞ M ( t ) = ∞ with probability one. (cid:4) Proof of Theorem 4.

Let us choose t >

0. With respect to Assumption N

2, consider the convexfunction G t ( s ) : = M ( t ) ∑ i = (cid:104) g i (cid:110) U i ; θ + J − ( t , θ ) s , t (cid:111) − g i ( U i ; θ , t ) (cid:105) nﬁnitely Stochastic Micro Forecasting in s ∈ R p . It is minimized by J ( t , θ ) (cid:16) (cid:98) θ − θ (cid:17) . The Taylor series expansion gives G t ( s ) = s (cid:62) J − ( t , θ ) M ( t ) ∑ i = ∂ θ g i ( U i ; θ , t ) (cid:124) (cid:123)(cid:122) (cid:125) = : U ( t ) + s (cid:62) J − ( t , θ ) (cid:40) M ( t ) ∑ i = ∂ θ g i ( U i ; θ , t ) (cid:41) J − ( t , θ ) (cid:124) (cid:123)(cid:122) (cid:125) = : V ( t ) s + R t ( s ) (19)almost surely, where R t ( s ) = M ( t ) o (cid:0) s (cid:62) J − ( t , θ ) s (cid:1) → t → ∞ , because ofAssumption N N θ denotedby V t ( ρ ) such that for all θ ∈ V t ( θ ) , it holds that (cid:82) tZ i (cid:12)(cid:12) ∂ θ , j λ ( τ , Z i ; θ ) (cid:12)(cid:12) d τ < ∞ almost surelyfor all j =

1, . . . , p and one can interchange derivative and integral, i.e., ∂ θ (cid:82) tZ i λ ( τ , Z i ; θ ) d τ = (cid:82) tZ i ∂ θ λ ( τ , Z i ; θ ) d τ .Let us realize that for every i ∈ N the sequence { U i , k } k ∈ N forms arrival times of the Poissoncounting process { N i ( t ) } t ≥ and, hence, N i ( t ) ∑ k = ∂ θ λ ( U i , k , Z i ; θ ) λ ( U i , k , Z i ; θ ) = (cid:90) tZ i ∂ θ λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) d N i ( τ ) . (20)In sequel, we use conditioning on { M ( z ) } z ∈ [ t ] (i.e., information contained in the process M up to time t ), which corresponds to conditioning on Z , . . . , Z M ( t ) . Recall that g i ( U i ; θ , t ) = (cid:82) tZ i λ ( τ , Z i ; θ ) d τ − ∑ N i ( t ) k = log λ ( U i , k , Z i ; θ ) . Since ∂ θ λ ( · , Z i ; θ ) λ ( · , Z i ; θ ) is continuous, we obtain E { U ( t ) |{ M ( z ) } z ∈ [ t ] } = E (cid:34) J − ( t , θ ) M ( t ) ∑ i = ∂ θ (cid:40) Λ ( t , Z i ; θ ) − N i ( t ) ∑ k = log λ ( U i , k , Z i ; θ ) (cid:41)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ] (cid:35) = J − ( t , θ ) M ( t ) ∑ i = E (cid:40) ∂ θ (cid:90) tZ i λ ( τ , Z i ; θ ) d τ − N i ( t ) ∑ k = ∂ θ λ ( U i , k , Z i ; θ ) λ ( U i , k , Z i ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ] (cid:41) = J − ( t , θ ) M ( t ) ∑ i = E (cid:40) ∂ θ (cid:90) tZ i λ ( τ , Z i ; θ ) d τ − (cid:90) tZ i ∂ θ λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) d N i ( τ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ] (cid:41) = J − ( t , θ ) M ( t ) ∑ i = (cid:40) ∂ θ (cid:90) tZ i λ ( τ , Z i ; θ ) d τ − (cid:90) tZ i ∂ θ λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) d τ (cid:41) = .One can apply the It ˆo isometry for jump processes in a similar way as in (14) and utilize that38 . Maciak, O. Okhrin, and M. Peˇsta the processes N i ’s are independent Var { U ( t ) |{ M ( z ) } z ∈ [ t ] } = M ( t ) ∑ i = E (cid:34) J − ( t , θ ) ∂ θ (cid:40) Λ ( t , Z i ; θ ) − N i ( t ) ∑ k = log λ ( U i , k , Z i ; θ ) (cid:41)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ] (cid:35) ⊗ = M ( t ) ∑ i = J − ( t , θ ) E (cid:40) ∂ θ (cid:90) tZ i λ ( τ , Z i ; θ ) d τ − (cid:90) tZ i ∂ θ λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) d N i ( τ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ] (cid:41) ⊗ J − ( t , θ )= J − ( t , θ ) M ( t ) ∑ i = (cid:40) (cid:90) tZ i { ∂ θ λ ( τ , Z i ; θ ) } ⊗ λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) d τ (cid:41) J − ( t , θ ) ,due to Assumption N

3. Then, we get E U ( t ) = E [ E { U ( t ) |{ M ( z ) } z ∈ [ t ] } ] = and Var U ( t ) = Var [ E { U ( t ) |{ M ( z ) } z ∈ [ t ] } ] + E [ Var { U ( t ) |{ M ( z ) } z ∈ [ t ] } ]= J − ( t , θ ) E M ( t ) ∑ i = J i ( t ; θ ) J − ( t , θ ) = I .Moreover analogously as in (15), E V ( t ) = E [ E { V ( t ) |{ M ( z ) } z ∈ [ t ] } ]= E (cid:34) J − ( t , θ ) M ( t ) ∑ i = E (cid:40) ∂ θ (cid:90) tZ i λ ( τ , Z i ; θ ) d τ − (cid:90) tZ i (cid:32) ∂ θ λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) − { ∂ θ λ ( τ , Z i ; θ ) } ⊗ λ ( τ , Z i ; θ ) (cid:33) d N i ( τ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ] (cid:41) J − ( t , θ ) (cid:35) = J − ( t , θ ) E (cid:34) M ( t ) ∑ i = (cid:90) tZ i { ∂ θ λ ( τ , Z i ; θ ) } ⊗ λ ( τ , Z i ; θ ) d τ (cid:35) J − ( t , θ )= J − ( t , θ ) J ( t ; θ ) J − ( t , θ ) = I ,because of Assumption N

3. Furthermore for every s ∈ R p , it holds that Var (cid:110) s (cid:62) V ( t ) s (cid:111) = s (cid:62) Var { V ( t ) s } s = s (cid:62) (cid:104) E { V ( t ) s } ⊗ − { E V ( t ) s } ⊗ (cid:105) s .The ( j , k ) -element of E { V ( t ) s } ⊗ ≡ E (cid:8) V ( t ) ss (cid:62) V ( t ) (cid:9) has a form of39 nﬁnitely Stochastic Micro Forecasting E p ∑ (cid:96) = p ∑ m = s (cid:96) s m ( V ( t )) j , (cid:96) ( V ( t )) m , k = E p ∑ (cid:96) = p ∑ m = s (cid:96) s m × p ∑ ˜ (cid:96) = p ∑ ˘ (cid:96) = κ j ,˜ (cid:96) ( t ) (cid:32) n ∑ i = ∂ θ g i ( U i ; θ , t ) (cid:33) ˜ (cid:96) ,˘ (cid:96) κ ˘ (cid:96) , (cid:96) ( t ) p ∑ ˜ m = p ∑ ˘ m = κ m , ˜ m ( t ) (cid:32) n ∑ i = ∂ θ g i ( U i ; θ , t ) (cid:33) ˜ m , ˘ m κ ˘ m , k ( t ) ,(21)where ss (cid:62) = ( s (cid:96) s m ) p , p (cid:96) = m = and J − ( t , θ ) = : ( κ (cid:96) , m ( t )) p , p (cid:96) = m = . In a similar fashion as in (17)–(18) together with the independence of N i ’s, we get E (cid:32) M ( t ) ∑ i = ∂ θ g i ( U i ; θ , t ) (cid:33) j , (cid:96) (cid:32) M ( t ) ∑ i = ∂ θ g i ( U i ; θ , t ) (cid:33) m , k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ]  = M ( t ) ∑ i = E (cid:34)(cid:32) ∂ θ (cid:90) tZ i λ ( τ , Z i ; θ ) d τ − N i ( t ) ∑ ˜ i = (cid:40) ∂ θ λ ( U i ,˜ i , Z i ; θ ) λ ( U i ,˜ i , Z i ; θ ) − (cid:8) ∂ θ λ ( U i ,˜ i , Z i ; θ ) (cid:9) ⊗ λ ( U i ,˜ i , Z i ; θ ) (cid:41) (cid:33) j , (cid:96) × (cid:32) ∂ θ (cid:90) tZ i λ ( τ , Z i ; θ ) d τ − N i ( t ) ∑ ˜ i = (cid:40) ∂ θ λ ( U i ,˜ i , Z i ; θ ) λ ( U i ,˜ i , Z i ; θ ) − (cid:8) ∂ θ λ ( U i ,˜ i , Z i ; θ ) (cid:9) ⊗ λ ( U i ,˜ i , Z i ; θ ) (cid:41) (cid:33) m , k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ] (cid:35) + M ( t ) ∑ i = M ( t ) ∑ ι = i (cid:54) = ι E (cid:32) ∂ θ (cid:90) tZ i λ ( τ , Z i ; θ ) d τ − N i ( t ) ∑ ˜ i = (cid:40) ∂ θ λ ( U i ,˜ i , Z i ; θ ) λ ( U i ,˜ i , Z i ; θ ) − (cid:8) ∂ θ λ ( U i ,˜ i , Z i ; θ ) (cid:9) ⊗ λ ( U i ,˜ i , Z i ; θ ) (cid:41) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ] (cid:33) j , (cid:96) × E (cid:32) ∂ θ (cid:90) tZ ι λ ( τ , Z ι ; θ ) d τ − N ι ( t ) ∑ ˜ ι = (cid:40) ∂ θ λ ( U ι ,˜ ι , Z ι ; θ ) λ ( U ι ,˜ ι , Z ι ; θ ) − { ∂ θ λ ( U ι ,˜ ι , Z ι ; θ ) } ⊗ λ ( U ι ,˜ ι , Z ι ; θ ) (cid:41) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ] (cid:33) m , k = M ( t ) ∑ i = (cid:90) tZ i λ ( τ , Z i ; θ ) (cid:26) ∂ θ , j , (cid:96) λ ( τ , Z i ; θ ) − ∂ θ , j λ ( τ , Z i ; θ ) ∂ θ , (cid:96) λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) (cid:27) × (cid:26) ∂ θ , m , k λ ( τ , Z i ; θ ) − ∂ θ , m λ ( τ , Z i ; θ ) ∂ θ , k λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) (cid:27) d τ + (cid:40) M ( t ) ∑ i = (cid:90) tZ i ∂ θ , j λ ( τ , Z i ; θ ) ∂ θ , (cid:96) λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) d τ (cid:41) (cid:40) M ( t ) ∑ i = (cid:90) tZ i ∂ θ , m λ ( τ , Z i ; θ ) ∂ θ , k λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) d τ (cid:41) .(22)40 . Maciak, O. Okhrin, and M. Peˇsta Moreover, E (cid:32) M ( t ) ∑ i = ∂ θ g i ( U i ; θ , t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ] (cid:33) j , (cid:96) E (cid:32) M ( t ) ∑ i = ∂ θ g i ( U i ; θ , t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { M ( z ) } z ∈ [ t ] (cid:33) m , k = (cid:40) M ( t ) ∑ i = (cid:90) tZ i ∂ θ , j λ ( τ , Z i ; θ ) ∂ θ , (cid:96) λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) d τ (cid:41) (cid:40) M ( t ) ∑ i = (cid:90) tZ i ∂ θ , m λ ( τ , Z i ; θ ) ∂ θ , k λ ( τ , Z i ; θ ) λ ( τ , Z i ; θ ) d τ (cid:41) .(23)Thus, tr Var { V ( t ) s } = tr E (cid:104) Var (cid:110) V ( t ) s |{ M ( z ) } z ∈ [ t ] (cid:111)(cid:105) + tr Var (cid:104) E (cid:110) V ( t ) s |{ M ( z ) } z ∈ [ t ] (cid:111)(cid:105) = tr E (cid:20) E (cid:110) V ( t ) s |{ M ( z ) } z ∈ [ t ] (cid:111) ⊗ − (cid:110) E (cid:16) V ( t ) s |{ M ( z ) } z ∈ [ t ] (cid:17)(cid:111) ⊗ (cid:21) + tr E (cid:104) E (cid:110) V ( t ) s |{ M ( z ) } z ∈ [ t ] (cid:111)(cid:105) ⊗ − tr (cid:104) E (cid:110) E (cid:16) V ( t ) s |{ M ( z ) } z ∈ [ t ] (cid:17)(cid:111)(cid:105) ⊗ = tr E M ( t ) ∑ i = (cid:90) tZ i J − ( t , θ ) L i ( τ , θ ) J − ( t , θ ) ss (cid:62) J − ( t , θ ) L i ( τ , θ ) J − ( t , θ ) d τ = s (cid:62) (cid:34) E M ( t ) ∑ i = (cid:90) tZ i (cid:110) J − ( t , θ ) L i ( τ , θ ) J − ( t , θ ) (cid:111) d τ (cid:35) s → t → Var (cid:8) s (cid:62) V ( t ) s (cid:9) → t → ∞ , because of Assumption N (cid:4) Proof of Corollary 5.

Let us choose a ∈ N . Recall that ˜ N z ( τ ) = ∑ ∞ k = { U i , k − z ≤ τ } is therestarted process of N i . One can observe that J − ( a , θ ) M ( a ) ∑ i = ∂ θ g i ( U i ; θ , a )= M ( a ) ∑ i = J − ( a , θ ) ∂ θ (cid:40) (cid:90) aZ i λ ( τ , Z i ; θ ) d τ − N i ( a ) ∑ k = log λ ( U i , k , Z i ; θ ) (cid:41) = − M ( a ) ∑ i = J − ( a , θ ) (cid:20) (cid:90) aZ i { ∂ θ log λ ( τ , Z i ; θ ) } ( d N i ( τ ) − λ ( τ , Z i ; θ ) d τ ) (cid:21) = − a ∑ j = (cid:90) jj − J − ( a , θ ) (cid:20) (cid:90) az { ∂ θ log λ ( τ , z ; θ ) } (cid:0) d ˜ N z ( τ − z ) − λ ( τ , z ; θ ) d τ (cid:1)(cid:21) d M ( z )= − a ∑ i = Y i nﬁnitely Stochastic Micro Forecasting is a sum of independent random vectors. Since E Y i = E (cid:90) jj − J − ( a , θ ) (cid:20) (cid:90) az { ∂ θ log λ ( τ , z ; θ ) } (cid:0) d ˜ N z ( τ − z ) − λ ( τ , z ; θ ) d τ (cid:1)(cid:21) d M ( z )= ,we have Var a ∑ i = Y i = Var (cid:40) J − ( a , θ ) M ( a ) ∑ i = ∂ θ g i ( U i ; θ , a ) (cid:41) = J − ( a , θ ) J ( a , θ ) J − ( a , θ ) = I .The Lindeberg condition in Assumption N Y i ’s. Then, J − ( a , θ ) M ( t ) ∑ i = ∂ θ g i ( U i ; θ , a ) D −−→ a → ∞ N p ( , I ) .To conclude, the desired convergence in distribution follows from the asymptotic representa-tion in Theorem 4. (cid:4) References

Aigner, D., Knox-Lovell, C., and Schmidt, P. (1977). Formulation and estimation of stochasticfrontier production function models.

J. Econometrics , 6(1):21–37.Antonio, K. and Plat, R. (2014). Micro-level stochastic loss reserving for general insurance.

Scand. Actuar. J. , 2014(7):649–669.Arjas, E. (1989). The claims reserving problem in non-life insurance: Some structural ideas.

ASTIN Bull. , 19(2):139–152.Arnold, C. (2019). Death, statistics and a disaster zone: The struggle to count the dead afterHurricane Maria.

Nature , 566(7742):22–25.Azar, E. E. (1980). The conﬂict and peace databank (COPDAB) project.

J. Conﬂict Resolut. ,24(1):143–152. 42 . Maciak, O. Okhrin, and M. Peˇsta

Badescu, A. L., Lin, X. S., and Tang, D. (2016). A marked Cox model for the number of IBNRclaims: Theory.

Insur. Math. Econ. , 69(1):29–37.Basrak, B., Wintenberger, O., and ˇZugec, P. (2018). On total claim amount for marked Poissoncluster models. https://hal.archives-ouvertes.fr/hal-01788339 .Benito, S. and L ´opez-Mart´ın, C. (2018). A review of the state of the art in quantifying opera-tional risk.

J. Oper. Risk , 13(4):89–129.Billingsley, P. (2008).

Probability and Measure . Wiley, New York, NY, 3rd edition.Bobashev, G., Goedecke, D., Yu, F., and Epstein, J. (2007). A hybric epidemic model: Combiningadvantages of agent-based and equation-based approaches. In

Proceedings – 2007 WinterSimulation Conference. IEEE , pages 1532–1537.Bosma, N., Van Praag, M., Thurik, R., and De Witt, G. (2004). The value of human and socialcapital investments for the business performance of startups.

Small Bus. Econ. , 23(3):227–236.Braithwaite, A. (2010). MIDLOC: Introducing the militarized interstate dispute locationdataset.

J. Peace Res. , 47(1):91–98.Burda, M., Harding, M., and Hausman, J. (2012). A Poisson mixture model of discrete choice.

J. Econometrics , 166(2):184–203.Caldbick, S., Wu, X., Lynch, T., Al-Khatib, N., Andkhoie, M., and Farag, M. (2015). The ﬁnan-cial burden of out of pocket prescription drug expenses in canada.

Int. J. Health Econ. Ma. ,15(3):329–338.Chernobai, A. S., Rachev, S. T., and Fabozzi, F. J. (2007).

Operational Risk: A guide to Basel IICapital Requirements, Models and Analysis . Wiley ﬁnance, New York, NY.Clauset, A. (2018). Trends and ﬂuctuations in the severity of interstate wars.

Science Advances ,4:1–9.Coeurjolly, J.-F. and Møller, J. (2014). Variational approach for spatial point process intensityestimation.

Bernoulli , 20(3):1097–1125.Cohen, R. D. (2018). An operational risk capital model based on the loss distribution approach.

J. Oper. Risk , 13(2):69–81. 43 nﬁnitely Stochastic Micro Forecasting

Collier, P., Hoefﬂer, A., and S ¨oderbom, M. (2004). On the duration of civil war.

J. Peace Res. ,41(3):253–273.Colombo, M. and Grilli, L. (2008). Start-up size: The role of external ﬁnancing.

Econ. Lett. ,88(1):243–250.Cressey, D. (2008). War survey points to millions more dead.

Nature News .doi:10.1038/news.2008.901.Crowder, M. J., Kimber, A. C., Smith, R. L., and Sweeting, T. J. (1991).

Statistical Analysis ofReliability Data . Chapman and Hall, Malta, 1st edition.Ekedahl, A., Andersson, S. I., Hovelius, B., M ¨olstad, S., Liedholm, H., and Melander, A. (1995).Drug prescription attitudes and behaviour of general practitioners.

Eur. J. Clinical Pharmacol. ,47(5):381–387.England, P. and Verrall, R. (2002). Stochastic claims reserving in general insurance (with dis-cussion).

British Actuarial Journal , 8(3):443–518.England, P. D. and Verrall, R. J. (1999). Analytic and bootstrap estimates of prediction errors inclaims reserving.

Insur. Math. Econ. , 25(3):281–293.Engle, R. F. (2000). The econometrics of ultra-high-frequency data.

Econometrica , 68(1):1–22.Giesecke, K. and Schwenkler, G. (2018). Filtered likelihood for point processes.

J. Econometrics ,204(1):33–53.Gleditsch, K. S., Metternich, N. W., and Ruggeri, A. (2014). Data and progress in peace andconﬂict research.

J. Peace Res. , 51(2):301–314.Godecharle, E. and Antonio, K. (2015). Reserving by conditioning on markers of individualclaims: A case study using historical simulation.

North American Actuarial Journal , 19(4):273–288.Goldfarb, A. and Tucker, C. (2011). Online display advertising: Targeting and obtrusiveness.

Market. Sci. , 30(3):389–404.G ¨on ¨ul, F. F., Carter, F., Peterova, E., and Srinivasan, K. (2001). Promotion of prescription drugsand its behavior on physicians’ choice behavior.

J. Marketing , 65:79–90.44 . Maciak, O. Okhrin, and M. Peˇsta

Haastrup, S. and Arjas, E. (1996). Claims reserving in continuous time: A nonparametricBayesian approach.

ASTIN Bull. , 26(2):139–164.Hallberg, J. D. (2012). PRIO conﬂict site 1989–2008: A geo-referenced dataset on armed conﬂict.

Conﬂict Manag. Peace , 29(2):219–232.Hansen, L. P. and Scheinkman, J. A. (2009). Long term risk: An operator approach.

Econometrica ,77(1):177–234.Harrison, M. and Wolf, N. (2012). The frequency of wars.

Econ. Hist. Rev. , 65(3):1055–1076.Hesselager, O. (1994). A Markov model for loss reserving.

ASTIN Bull. , 24(2):183–193.Hjort, N. L. and Pollard, D. (2011). Asymptotics for minimisers of convex processes. https://arxiv.org/abs/1107.3806 .Holtz-Eakin, D., Joulfaian, D., and Rosen, H. (1994). Entrepreneurial decisions and liquidityconstraints.

Rand J. Econ. , 25:334–347.Hudecov´a, ˇS. and Peˇsta, M. (2013). Modeling dependencies in claims reserving with GEE.

Insur.Math. Econ. , 53(3):786–794.Hudecov´a, ˇS., Peˇsta, M., and Hlubinka, D. (2017). Modelling prescription behaviour of generalpractitioners.

Math. Slovaca , 67(1):1–17.Jewell, W. (1989). Predicting IBNYR events and delays, part I continuous time.

ASTIN Bull. ,19:25–56.Jewell, W. (1990). Predicting IBNYR events and delays, part II discrete time.

ASTIN Bull. ,20:93–111.Kazan, E. (2015). The innovative capabilities of digital payment platforms: A comparativestudy of Apple Pay & Google Wallet. In .Paper 4, http://aisel.aisnet.org/icmb2015/4 .Kermack, W. and McKendrick, A. (1927). A contribution to the mathematical theory of epi-demics.

Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences ,115(772):700–721.Kingman, J. F. C. (1993).

Poisson Processes . Oxford University Press, New York, NY.45 nﬁnitely Stochastic Micro Forecasting

Konecny, F. (1987). The asymptotic properties of maximum likelihood estimators for markedPoisson processes with a cyclic intensity measure.

Metrika , 34:143–155.Larsen, C. (2007). An individual claims reserving model.

ASTIN Bull. , 37(1):113–132.Lawless, J. F. (1987). Regression methods for Poisson process data.

J. Am. Stat. Assoc. ,82(399):808–815.Lewis, P. A. W. and Shedler, G. S. (1979). Simulation of nonhomogeneous Poisson processes bythinning.

Naval Research Logistics , 26(3):403–413.Liaukonyte, J., Teixeira, T., and Wilbur, K. C. (2015). Television advertising and online shop-ping.

Market. Sci. , 34(3):311–330.Linda, J. and Allen, S. (2008). An introduction to stochastic epidemic models. In

MathematicalEpidemiology , chapter 3, pages 81–120. Springer-Verlag.Liu, J., Kauffman, R. J., and Ma, D. (2015). Competition, cooperation, and regulation: Under-standing the evolution of the mobile payments technology ecosystem.

Electron. Commer. R.A. , 14(5):372–391.Liu, S. Q. and Mattila, A. S. (2019). Apple Pay: Coolness and embarrassment in the serviceencounter.

Int. J. Hosp. Manag. , 78:268–275.Manchanda, P. and Chintangunta, P. K. (2004). Responsiveness of physician prescription be-havior of salesforce effort: An individual level analysis.

Market. Lett. , 15:129–145.Miller, R. H. and Luﬁ, H. S. (1994). Managed care plan performance since 1980: A literatureanalysis.

JAMA – J. Am. Med. Assoc. , 271(19):1512–1519.Norberg, R. (1993). Prediction of outstanding liabilities in non-life insurance.

ASTIN Bull. ,23(1):95–115.Norberg, R. (1999). Prediction of outstanding liabilities II. Model variations and extensions.

ASTIN Bull. , 29(1):5–25.Peˇsta, M. and Okhrin, O. (2014). Conditional least squares and copulae in claims reserving fora single line of business.

Insur. Math. Econ. , 56(1):28–37.46 . Maciak, O. Okhrin, and M. Peˇsta

Pigeon, M., Antonio, K., and Denuit, M. (2014). Individual loss reserving using paid-incurreddata.

Insur. Math. Econ. , 58:121–131.Prokeˇsov´a, M., Dvoˇr´ak, J., and Jensen, E. B. V. (2017). Two-step estimation procedures forinhomogeneous shot-noise Cox processes.

Ann. I. Stat. Math. , 69(3):513–542.Rizoiu, M.-A., Mishra, S., Kong, Q., Carman, M., and Xie, L. (2018). SIW-Hawkes: Linkingepidemic models and Hawkes processes to model diffusions in ﬁnite populations. Technicalreport, arXiv:1711.01679v3.Rokstad, K., Straand, J., and Fugelli, P. (1997). General practitioners’ drug prescribing practiceand diagnoses for prescribing: The Møre & Romsdal prescription study.

J. Clin. Epidemiol. ,50(4):485–494.Schoenberg, F. P. (2005). Consistent parametric estimation of the intensity of a spatial-temporalpoint process.

J. Stat. Plan. Infer. , 128(1):79–93.Schrodt, P. (2014). Seven deadly sins of contemporary quantitative political analysis.

J. PeaceRes. , 51(2):287–300.Taylor, G., McGuire, G., and Sullivan, J. (2008). Individual claim loss reserving conditioned bycase estimates.

Annals of Actuarial Science , 3(1–2):215–256.Verrall, R. J. and W ¨uthrich, M. V. (2016). Understanding reporting delay in general insurance.

Risks , 4(3):25.Waagepetersen, R. and Guan, Y. (2009). Two-step estimation for inhomogeneous spatial pointprocesses.

J. Roy. Stat. Soc. B. Met. , 71(3):685–702.Waagepetersen, R. P. (2007). An estimating function approach to inference for inhomogeneousNeyman–Scott processes.

Biometrics , 63(1):252–258.Ward, M. D., Greenhill, B. D., and Bakke, K. M. (2010). The perils of policy by p-value: Predict-ing civil conﬂict.

J. Peace Res. , 45(5):363–375.Watkins, C., Harvey, I., Carthy, P., Moore, L., Robinson, E., and Brawn, R. (2003). Attitudesand behaviour of general practitioners and their prescribing costs: A national cross sectionalsurvey.

Qual. and Saf. Health Care , 12:29–34. 47 nﬁnitely Stochastic Micro Forecasting

Weisberg, H. I., Tomberlin, T. J., and Chatterjee, S. (1984). Predicting insurance losses undercross-classiﬁcation: A comparison of alternative approaches.

J. Bus. Econ. Stat. , 2(2):170–178.White, H. (1982). Maximum likelihood estimation of misspeciﬁed models.

Econometrica ,50(1):1–25.W ¨uthrich, M. (2016). Machine learning in individual claims reserving. Swiss Finance InstituteResearch Paper No. 16–67. https://ssrn.com/abstract=2867897 .W ¨uthrich, M. and Merz, M. (2008).

Stochastic claims reserving methods in insurance . Wiley ﬁnanceseries. John Wiley & Sons.Xiao, R. (2018). Identiﬁcation and estimation of incomplete information games with multipleequilibria.

J. Econometrics , 203(2):328–343.Yan, P. (2008). Distribution theory, stochastic processes and infectious disease modelling. In

Mathematical Epidemiology , chapter 10, pages 229–293. Springer-Verlag.Zhao, X. and Zhou, X. (2010). Applying copula models to individual claim loss reservingmethods.

Insur. Math. Econ. , 46:290–299.Zhao, X., Zhou, X., and Wang, J. (2009). Semiparametric model for prediction of individualclaim loss reserving.